Re: [L2tpext] WG last call for draft-ietf-l2tpext-failover-05.txt

Carlos,

Thanks for you comments. My responses are inline.

> This looks like a very good document to me, please find a couple of
> comments/queries:
> 
> 1.
>       2.2.1 Recovery tunnel establishment
> 
>          corresponding old tunnel.  An endpoint SHOULD not send any
>          control message on this tunnel, other than the messages to
>          establish and tear down the tunnel itself.
> 
> ***CP: I know this was updated with the "and tear down" to clarify about
> ***CP: StopCCN. Maybe just a nit, I wonder however if "establish,
> ***CP: keepalive and tear down" is more complete to include ZLB as well.
> ***CP: Or possibly simpler enumerate the allowed control messages: SCCRQ,
> ***CP: SCCRP, SCCCN, StopCCN, ZLB Ack and Explicit-Ack (for L2TPv3
> ***CP: only).
I think the intention is obvious, but if intrepreted strictly it can lead to
some confusion. Perhaps following conveys it more clearly:
"messages other than those required to manage the life of the recovery tunnel"

> 2.
>          Tunnel Recovery AVP for L2TPv3 tunnels:
> [snip]
>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>        |                        Recover Tunnel Id                      |
>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>        |                     Recover Remote Tunnel Id                  |
>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> 
> ***CP: s/Tunnel Id/Control Connection ID/
> ***CP: Same clarification between tunnel id and control connection id in
> ***CP: the following 2 or 3 para (only mentions tunnel id).
The protocol defines Recovery Tunnel as standard term, hence we thought using
'Recover Tunnel Id' would seem more appropriate. Some tunnel definitions do
refer themselves as control connections.

> 3.
>          Id and responds with an SCCRP. It MUST terminate the tunnel if:
>          - Recover Tunnel Id or Remote Recover Tunnel Id is unknown.
>          - Non failed endpoint did not indicate it was failover capable.
>          - The L2TP version of recovery tunnel is different from the
> 
> ***CP: What if the _failed_ endpoint did not indicate it was failover
> ***CP: capable when established the "old tunnel"? In such case:
> ***CP: 1. non-failed would not know peer's Recovery time, and any
> ***CP: assumption for it could result in extended downtime.
> ***CP: 2. is seems could open the door for a malicious party
> ***CP: impersonating a failed party?
> ***CP: So probably should/must terminate the recovery tunnel of the
> ***CP: failed endpoint did not indicate was failover capable?
If failed endpoint did not indicate it was failover capable then we'd like to
keep the behavior as is. The behavior mentioned in the failover protocol is
applicable only for those tunnels that are failover capable. In that respect,
yes it would extend the downtime, but how do we know if the remote endpoint
really failed or not. To answer the question "how do you know if remote
endpoint failed if it didn't initiate a failover tunnel?" is tricky. Do you
deduce it based on another tunnel to the same peer restarting? 

> 4.
>          tunnel. If, for any reason, the failed endpoint could not
>          establish the recovery tunnel then it MUST silently clear the
>          recovered tunnel and sessions within, assuming the recovery
>          process has failed.
> 
>          Any control packet received on the recovered tunnel, before
>          control channel reset, MUST be silently discarded.
> 
> ***CP: The "recovered tunnel" in those 2 sentences is in fact the "old
> ***CP: tunnel" at this point, right? If the recovery tunnel
> ***CP: establishment fails then the old tunnel never makes it to
> ***CP: recovered, correct?
Yes, 'recovered tunnel' is indeed the 'old tunnel'. Will change in the text.

> 5.
>          An endpoint MUST use tie breaker AVP (section 4.4.3 [L2TPv2]
>          and section 5.4.3 [L2TPv3]) in the setup of the recovery tunnel
> 
> ***CP: the "tie breaker AVP" would be the "Control Connection Tie
> ***CP: Breaker AVP" for L2TPv3; some other parts of the document make
> ***CP: the distinction between different named v2/v3 AVPs.
Yes, that's the right terminology as per L2TPv3. I'll change that.

> 6.
>       2.2.1 Recovery tunnel establishment
> 
> ***CP: What are the guidelines for including Failover Capability AVP in
> ***CP: the Recovery Tunnel SCCRQ/SCCRP establishment? MUST NOT be used?
> ***CP: Or use the values included in the Failover Capability AVP
> ***CP: for the Recovery Tunnel as applicable for the recovered tunnel,
> ***CP: to be used in a subsequent failure?
I think using Failover Capability AVP in Recovery tunnel would not be of much
use. If the system fails while establishing Recovery tunnel, it will establish
another recovery tunnel in order to recover the 'old tunnel'.
I don't see a need for explicitly recommending it. Do you?
Should we explicitly recommend it?

> 7.
>    2.1 Pre Failover Operation
> 
>       The D bit, when set indicates that an endpoint is capable of
>       resetting Nr value based on received Ns value(s) from one or more
>       'out of order but in sequence' packets from the peer.  This bit is
>       applicable only for the sessions using sequence numbers on the
>       data channel i.e. data channel failure on the system not
> 
>       2.2.2 Control and Data Channel Reset
> 
>          numbers and if data channel has failed over. Failed endpoint
>          resets its Ns value to zero, where as non failed endpoint could
>          continue to use the Ns values it was using previously. To reset
>          Nr values during failover, if an endpoint receives 'n' out of
>          order but in sequence packets then it MUST set the Nr value
>          based on the Ns value of the incoming packets, as suggested in
>          Appendix C [L2TPv3]. The value of 'n' should be configurable.
> 
> ***CP: Nit comment: I wonder if these paragraphs should say "expected
> ***CP: sequence number" instead of Nr and Sequence Number instead of Ns
> ***CP: for data channel, to make it clear is not the control connection
> ***CP: Nr/Ns for L2TPv3. Or note what Nr/Ns are referring to in L2Tpv3,
> ***CP: as sequencing fields do not have that name.
Will change it to 'expected sequence number' and 'sequence number'.

> 8.
>    2.3 Session State Synchronization
>       Step2: Both endpoints SHOULD identify the sessions that might have
>       been in inconsistent states, perhaps based on data channel
>       inactivity.
> 
> ***CP: Over what period is data channel inactivity measured? In any case
> ***CP: it may not be an accurate indicator of inconsistency, a silent
> ***CP: session may be consistent and a session receiving data packets
> ***CP: may be inconsistent. Shouldn't FSS for all sessions instead?
The inactivity period was put in keeping the protocol 'echo/hello' period in
mind. Sending FSS for all sessions is the sure way. We can leave that to the
implementation.

Keyur suggested putting following text in the beginning of section 2.3:
"Two new messages FSQ/FSR have been introduced to synchronize session state at
any given point during the life of a session between the two endpoints. These
messages are used when one endpoint determines or suspects in an implementation
specific manner that a session state between it and its peer is in incossistent
state.  One way to make this determination may be based on data activity for
the session."

> 9.
>       Step3: An endpoint sends Failover Session Query (FSQ) message,
>       message type 21, to query the state of stale sessions on its peer.
>       An FSQ message MUST include at least one Failover Session State
>       (FSS) AVPs.  An endpoint MAY send another FSQ message on the
> 
> ***CP: It may be useful to make explicit what AVPs MAY/MUST in FSQ. MUST
> ***CP: Message Type, MAY Message Digest and MUST one or more Failover
> ***CP: Session State AVPs. Right? Additionally, in the FSS description,
> ***CP: it would be useful to specify that FSS can only be used in FSQ
> ***CP: and FSR messages.
It is a good idea to list the AVPs explicitly. In that regard:
FSQ, FSR message MUST have 'Message Type AVP' and 'FSS AVP'. They MAY send
'Random Vector AVP' and 'Message digest AVP in L2TPv3'.

Similarly it could mention that 'FSS MUST Be used only in FSQ and FSR messages'

> 
> 10.
>       Before all sessions are synchronized using FSQ/FSR mechanism, if
>       an endpoint receives an ICRQ for a session it believe is already
>       in established state, it MUST respond to such ICRQ with a CDN,
>       setting Assigned/Local Session ID AVP ([L2TPv2] section 4.4.4,
>       [L2TPv3] section 5.4.4) to its local session id, and clear the
> 
> ***CP: The first line seems to be the first mention of FSR (the name
> ***CP: "Failover Session Response" is first introduced further down). It
> ***CP: would be useful to expand the name before and detail which AVPs
> ***CP: MAY/MUST in FSRs: MUST Message Type, MAY Message Digest and MUST
> ***CP: one or more Failover Session State AVPs. Correct?
Yes. Will change that.

> 11.
> 4.0 Security Considerations
>    The failover mechanism described here leaves a some room (1 in 2^32)
>    for an intruder to discover the old tunnel id of an existing tunnel
> 
> ***CP: This probability number seems only for L2TPv2. More importantly
> ***CP: though, this section should include what to can be done to
> ***CP: minimize the exposure. At least, control message security
> ***CP: mechanisms must be considered, specifically for mutual endpoint
> ***CP: authentication (for tunnel establishment for v2 and also for
> ***CP: control message auth for v3). Additionally, an impersonator may
> ***CP: try to create a recovery tunnel clearing C and D bits to drop the
> ***CP: old tunnel/sessions.
Actually the above mentioned text should have been '1 in 2^32 for an intruder
to discover the old tunnel and session id of an existing tunnel/session. And
that was applicable only for L2TPv2. 
The new text should look something like:
The failover mechanism described here leaves a room (1 in 2^16 for L2TPv2 and 1
in 2^32 for L2TPv3) for an intruder to discover the old tunnel id, which could
be misused to fake the failover to result into a complete shutdown of an
existing tunnel. To avoid this, Control channel authentication as indicated in
section 2.2.1. L2TPv3 tunnels should be used. L2TPv3 tunnels should also use
the 'Digest AVP' to make it secure. Protecting L2TP with IPSec would also help
secure the tunnels for failover.

> 12.
> Appendix A
> 
> ***CP: The 4 Appendices are most useful !!!
> 
> 13.
>       -  The mechanism should be backward compatible; i.e. it should not
>       redefine existing behavior of [L2TP] compliant systems.
> 
> ***CP: There's no [L2TP] reference, is that v2 only or both?
I think both as the standard is coming after L2TPv2 and L2TPv3.

> 14.
> Appendix B
>    from recovering multiple tunnels in parallel. It also allows an
>    endpoint from sending multiple FSQs to recover quickly.
> 
> ***CP: In addition, from including multiple FSSs in FSQ and FSR to
> ***CP: recover quickly.
Yes. Will add that.

> 
> I hope these help !

Yes. Thanks for a thourough review.

-- vipin

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

_______________________________________________
L2tpext mailing list
L2tpext@ietf.org
https://www1.ietf.org/mailman/listinfo/l2tpext