draft-aruns-ccamp-rsvp-restart-ext-01.txt
"Adrian Farrel" <adrian@olddog.co.uk> Wed, 28 July 2004 00:18 UTC
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA23115 for <ccamp-archive@ietf.org>; Tue, 27 Jul 2004 20:18:56 -0400 (EDT)
Received: from psg.com ([147.28.0.62] ident=mailnull) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1BpcBW-0002k9-W1 for ccamp-archive@ietf.org; Tue, 27 Jul 2004 20:20:43 -0400
Received: from majordom by psg.com with local (Exim 4.41 (FreeBSD)) id 1BpbvF-0005Qx-SG for ccamp-data@psg.com; Wed, 28 Jul 2004 00:03:53 +0000
Received: from [80.168.70.142] (helo=relay2.mail.uk.clara.net) by psg.com with esmtp (Exim 4.41 (FreeBSD)) id 1BpbvD-0005NK-Jx; Wed, 28 Jul 2004 00:03:52 +0000
Received: from du-069-0111.access.clara.net ([217.158.132.111] helo=Puppy) by relay2.mail.uk.clara.net with smtp (Exim 4.34) id 1Bpbv7-0005aX-As; Wed, 28 Jul 2004 01:03:47 +0100
Message-ID: <001601c47436$5ab25870$6f849ed9@Puppy>
Reply-To: Adrian Farrel <adrian@olddog.co.uk>
From: Adrian Farrel <adrian@olddog.co.uk>
To: aruns@movaz.com
Cc: lb@movaz.com, Dimitri Papadimitriou <Dimitri.Papadimitriou@alcatel.be>, Dimitri Papadimitriou <dpapadimitriou@psg.com>, Reshad Rahman <rrahman@cisco.com>, 'Anca Zamfir' <ancaz@cisco.com>, jisrar@cisco.com, ccamp@ops.ietf.org
Subject: draft-aruns-ccamp-rsvp-restart-ext-01.txt
Date: Tue, 27 Jul 2004 20:05:48 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on psg.com
X-Spam-Status: No, hits=-4.5 required=5.0 tests=AWL,BAYES_00, DATE_IN_PAST_03_06,RCVD_IN_SORBS autolearn=no version=2.63
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Score: 0.8 (/)
X-Scan-Signature: 1449ead51a2ff026dcb23465f5379250
Content-Transfer-Encoding: 7bit
Hi, Thanks for coalescing your ideas into a single draft. Here are some random thoughts and questions for you to parse and then discard :-) Cheers, Adrian What if restarted node was doing PHP? Does egress (i.e. downstream node) have any record of the LSP? Presume so, and will be able to send back Resv saying implicit null. Need to describe procedures for multiple node restart? Although the procedures build on RFC3473, I detect some pressures to integrate this work into MPLS implementations. That is: RFC3209-based implementations intend to take the Hello processing and nodal restart function from RFC3473 and also the new processing described in this draft. How do we feel about this? Is this pick-and-mix approach OK or should we say that it is time for packet-based solutions to cut over to GMPLS PSC? Section 2.3 In the sender descriptor, the Recovery Label object MUST be included, with the label value copied from the label value in the Label object in the most recent associated Resv message sent to the restarted node, for the LSP being recovered. Arguably you are trying to carry this single piece of Resv state on the RecoveryPath message. I guess you might that this is information that will be added to the first Path message sent by the restarted node, but this is not true. You must make a clear case for needing this information in advance of the Resv that you will receive in due course. I don't think that section 2.4.2 does this. Section 2.3 I think you need to exclude <MESSAGE_ID_ACK> and <MESSAGE_ID_NACK> as copied objects and allow them as defined by RFC2961. Section 2.3 All other objects from the most recent received Path message MUST be included in the RecoveryPath message. I think you need to future-proof this by saying that the definition of new objects MAY specify that those objects MUST be omitted from the RecoveryPath message. Section 2.3 After sending a RecoveryPath message and during the Recovery Period, the node SHOULD periodically re-send the RecoveryPath message until it receives a corresponding response. A corresponding response is a Message ID acknowledgment or a Path message matching the RecoveryPath message. - Need to define whether it should continue to re-send for the whole period. Compare with the relatively short duration implied in MsgID retransmission - Need to define "periodically" Compare with the relatively rapid retransmission in MsgID retransmission - Need to define "matching". Is it enough that the received Path is for the same LSP, or does it need to match the RecoveryPath in more detail? 2.4. Procedures for the Restarting Node These procedures apply during the "state recovery process" and "Recovery Period" as defined in Section 9.5.2 in [RFC3473]. Any RecoveryPath message received after the Recovery Period has expired MUST be discarded. A node MAY send a PathTear message downstream matching the discarded message. This is somewhat ambiguous. After all, a node MAY send a PathTear downstream at any time. If you are trying to say something more specific, please say it (e.g. "if there is no matching local LSP state"). 2.4.2. Re-Synchronization Procedures After receipt of the RecoveryPath message and, for non-ingress LSPs, the corresponding Path message with a Recovery Label object, the restarting node SHOULD Although it may be obvious, you should say how a node determines that it is the ingress for this LSP. 2.4.2 needs to describe what to do if the RecoveryPath (and/or recovery Path) can be matched to a LSP for which state is known, but does not completely match the record that the restarting node has. The most pressing example is what to do when the control plane state recovered through RFC3473 and these extensions does not match the data plane state in the restarting node. There may be a judgment call here since the upstream and downstream neighbors clearly know what they are talking about, yet the data plane may be carrying active traffic. Is it worth noting that when moving from 3473 to include these extensions, it may be necessary to increase the recovery period as there is more processing to be done? 2.4.3 Is it the case that we may receive a Path message with Recovery_Label from upstream and not match state. If so, we wait to receive a RecoveryPath message. If we do not receive one in the Recovery Period, we treat the Path message as if we were processing according to RFC3473. BUT, if we are processing according to RFC3473 and we have not responded to a Path message received with Recovery_Label in the Recovery Period, isn't the LSP abandoned? In other words, we will not send Resv for such an LSP until after the end of the Recovery Period. This is worse in section 2.5 if the downstream node does not support these extensions, when we will send no Resv for any recovered LSPs until after the Recovery Period. Would like to be able to globally de-select recovery path messages (if I have retained full state). Ideally this would be the default position so that RecoveryPath messages are not sent to a legacy node. I think Hello Capabilities should be used to select the willingness to receive RecoveryPath. (This would also ease the previous issue). Section 3 I think you have one more message exchange than you need. Imagine you have just one LSP. In the normal case you have just one message sent (RecoverySrefresh). In the non-recovered case you have three messages (RecoverySrefreh, Ack[Nack], RecoveryPath). *However* if the RecoverySrefresh was sent by the restarting node you would still have one message on the main case, and could drop to two messages in the non-recovered case (RecoverySrefresh, RecoveryPath). This would also make the RecoverySrefresh identical to the Srefresh, Further, since we know that this is used when only some of the state has been retained, it cuts down on the size of the RecoverySrefresh. Section 3 There is a slight issue with the Nack. We need to distinguish a Nack to an Srefresh (uses Message ID from a previous Resv) and a Nack to a RecoverySrefresh (uses Message ID from a previous Path). This is admittedly only a rare problem, but might occur with clashes of epoch and Message ID. This may be what you are trying to resolve using the new bit in the various Message ID objects (see below) but the reasoning is not clear from the text. Hint: if you use my proposal immediately above, this issue goes away and the new flag simplifies as below... 3.1. MESSAGE_ID ACK/NACK and MESSAGE_ID LIST Objects The trouble with defining an additional bit like this is we have to define the meaning of the bit on *any* Message_ID. Since (presumably) ordinary Srefresh messages may (might?) be interspersed with RecoverySrefresh why don't we have a way of distinguishing the messages rather than the contents of the object? Actually, I would argue that it is only the List Object that needs to be distinguished (with the caveat of the previous point). 3.2. Capability Object One of the lessons of the Restart_Cap Object is that we should be careful with the specification of capabilities objects. So, I am concerned that your new object is defined as a fixed length object with space for another 31 bits of information. How about TLVs? 3.2.2. Compatibility You missed forwards compatibility. That is: reserved bits MUST be set to zero on transmission and MUST be ignored on receipt. Nits === Need to expand citations in the Abstract. The Abstract could probably be usefully made shorter. Section 3, second para. I don't think we need the description of Srefresh in normal processing. Section 9. IANA Could you beef up this section please. The ideal is to show the names and characteristics of new messages/objects in this section so that IANA does not have to ask any further questions. You might like to reference draft-kompella-zinin-early-allocation-02.txt and draft-kompella-rsvp-change-02.txt to sort out values to use for pre-RFC work.
- draft-aruns-ccamp-rsvp-restart-ext-01.txt Adrian Farrel
- Re: draft-aruns-ccamp-rsvp-restart-ext-01.txt Arun Satyanarayana