draft-aruns-ccamp-rsvp-restart-ext-01.txt

"Adrian Farrel" <adrian@olddog.co.uk> Wed, 28 July 2004 00:18 UTC

Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA23115 for <ccamp-archive@ietf.org>; Tue, 27 Jul 2004 20:18:56 -0400 (EDT)
Received: from psg.com ([147.28.0.62] ident=mailnull) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1BpcBW-0002k9-W1 for ccamp-archive@ietf.org; Tue, 27 Jul 2004 20:20:43 -0400
Received: from majordom by psg.com with local (Exim 4.41 (FreeBSD)) id 1BpbvF-0005Qx-SG for ccamp-data@psg.com; Wed, 28 Jul 2004 00:03:53 +0000
Received: from [80.168.70.142] (helo=relay2.mail.uk.clara.net) by psg.com with esmtp (Exim 4.41 (FreeBSD)) id 1BpbvD-0005NK-Jx; Wed, 28 Jul 2004 00:03:52 +0000
Received: from du-069-0111.access.clara.net ([217.158.132.111] helo=Puppy) by relay2.mail.uk.clara.net with smtp (Exim 4.34) id 1Bpbv7-0005aX-As; Wed, 28 Jul 2004 01:03:47 +0100
Message-ID: <001601c47436$5ab25870$6f849ed9@Puppy>
Reply-To: Adrian Farrel <adrian@olddog.co.uk>
From: Adrian Farrel <adrian@olddog.co.uk>
To: aruns@movaz.com
Cc: lb@movaz.com, Dimitri Papadimitriou <Dimitri.Papadimitriou@alcatel.be>, Dimitri Papadimitriou <dpapadimitriou@psg.com>, Reshad Rahman <rrahman@cisco.com>, 'Anca Zamfir' <ancaz@cisco.com>, jisrar@cisco.com, ccamp@ops.ietf.org
Subject: draft-aruns-ccamp-rsvp-restart-ext-01.txt
Date: Tue, 27 Jul 2004 20:05:48 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on psg.com
X-Spam-Status: No, hits=-4.5 required=5.0 tests=AWL,BAYES_00, DATE_IN_PAST_03_06,RCVD_IN_SORBS autolearn=no version=2.63
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Score: 0.8 (/)
X-Scan-Signature: 1449ead51a2ff026dcb23465f5379250
Content-Transfer-Encoding: 7bit

Hi,

Thanks for coalescing your ideas into a single draft.

Here are some random thoughts and questions for you to parse and then discard :-)

Cheers,
Adrian


What if restarted node was doing PHP? Does egress (i.e. downstream node) have any record
of the LSP? Presume so, and will be able to send back Resv saying implicit null.

Need to describe procedures for multiple node restart?

Although the procedures build on RFC3473, I detect some pressures to integrate this work
into MPLS implementations. That is: RFC3209-based implementations intend to take the Hello
processing and nodal restart function from RFC3473 and also the new processing described
in this draft. How do we feel about this? Is this pick-and-mix approach OK or should we
say that it is time for packet-based solutions to cut over to GMPLS PSC?

Section 2.3
      In the sender descriptor, the Recovery Label object MUST be
      included, with the label value copied from the label value in the
      Label object in the most recent associated Resv message sent to
      the restarted node, for the LSP being recovered.
Arguably you are trying to carry this single piece of Resv state on the RecoveryPath
message.
I guess you might that this is information that will be added to the first Path message
sent by the restarted node, but this is not true. You must make a clear case for needing
this information in advance of the Resv that you will receive in due course. I don't think
that section 2.4.2 does this.

Section 2.3
I think you need to exclude <MESSAGE_ID_ACK> and <MESSAGE_ID_NACK> as copied objects and
allow them as defined by RFC2961.

Section 2.3
   All other objects from the most recent received Path message MUST be
   included in the RecoveryPath message.
I think you need to future-proof this by saying that the definition of new objects MAY
specify that those objects MUST be omitted from the RecoveryPath message.

Section 2.3
   After sending a RecoveryPath message and during the Recovery Period,
   the node SHOULD periodically re-send the RecoveryPath message until
   it receives a corresponding response.  A corresponding response is a
   Message ID acknowledgment or a Path message matching the RecoveryPath
   message.
- Need to define whether it should continue to re-send for the whole period.
  Compare with the relatively short duration implied in MsgID retransmission
- Need to define "periodically"
  Compare with the relatively rapid retransmission in MsgID retransmission
- Need to define "matching". Is it enough that the received Path is for the
   same LSP, or does it need to match the RecoveryPath in more detail?

2.4. Procedures for the Restarting Node
   These procedures apply during the "state recovery process" and
   "Recovery Period" as defined in Section 9.5.2 in [RFC3473].  Any
   RecoveryPath message received after the Recovery Period has expired
   MUST be discarded.  A node MAY send a PathTear message downstream
   matching the discarded message.
This is somewhat ambiguous. After all, a node MAY send a PathTear
downstream at any time.
If you are trying to say something more specific, please say it (e.g. "if
there is no matching local LSP state").

2.4.2. Re-Synchronization Procedures
   After receipt of the RecoveryPath message and, for non-ingress LSPs,
   the corresponding Path message with a Recovery Label object, the
   restarting node SHOULD
Although it may be obvious, you should say how a node determines that it is the ingress
for this LSP.

2.4.2 needs to describe what to do if the RecoveryPath (and/or recovery Path) can be
matched to a LSP for which state is known, but does not completely match the record that
the restarting node has. The most pressing example is what to do when the control plane
state recovered through RFC3473 and these extensions does not match the data plane state
in the restarting node. There may be a judgment call here since the upstream and
downstream neighbors clearly know what they are talking about, yet the data plane may be
carrying active traffic.

Is it worth noting that when moving from 3473 to include these extensions, it may be
necessary to increase the recovery period as there is more processing to be done?

2.4.3
Is it the case that we may receive a Path message with Recovery_Label from upstream and
not match state. If so, we wait to receive a RecoveryPath message. If we do not receive
one in the Recovery Period, we treat the Path message as if we were processing according
to RFC3473.
BUT, if we are processing according to RFC3473 and we have not responded to a Path message
received with Recovery_Label in the Recovery Period, isn't the LSP abandoned?
In other words, we will not send Resv for such an LSP until after the end of the Recovery
Period.
This is worse in section 2.5 if the downstream node does not support these extensions,
when we will send no Resv for any recovered LSPs until after the Recovery Period.

Would like to be able to globally de-select recovery path messages (if I have retained
full state). Ideally this would be the default position so that RecoveryPath messages are
not sent to a legacy node. I think Hello Capabilities should be used to select the
willingness to receive RecoveryPath. (This would also ease the previous issue).

Section 3
I think you have one more message exchange than you need.
Imagine you have just one LSP.
In the normal case you have just one message sent (RecoverySrefresh).
In the non-recovered case you have three messages (RecoverySrefreh, Ack[Nack],
RecoveryPath).
*However* if the RecoverySrefresh was sent by the restarting node you would still have one
message on the main case, and could drop to two messages in the non-recovered case
(RecoverySrefresh, RecoveryPath).
This would also make the RecoverySrefresh identical to the Srefresh,
Further, since we know that this is used when only some of the state has been retained, it
cuts down on the size of the RecoverySrefresh.

Section 3
There is a slight issue with the Nack. We need to distinguish a Nack to an Srefresh (uses
Message ID from a previous Resv) and a Nack to a RecoverySrefresh (uses Message ID from a
previous Path). This is admittedly only a rare problem, but might occur with clashes of
epoch and Message ID. This may be what you are trying to resolve using the new bit in the
various Message ID objects (see below) but the reasoning is not clear from the text.
Hint: if you use my proposal immediately above, this issue goes away and the new flag
simplifies as below...

3.1. MESSAGE_ID ACK/NACK and MESSAGE_ID LIST Objects
The trouble with defining an additional bit like this is we have to define the meaning of
the bit on *any* Message_ID.
Since (presumably) ordinary Srefresh messages may (might?) be interspersed with
RecoverySrefresh why don't we have a way of distinguishing the messages rather than the
contents of the object?
Actually, I would argue that it is only the List Object that needs to be distinguished
(with the caveat of the previous point).

3.2. Capability Object
One of the lessons of the Restart_Cap Object is that we should be careful with the
specification of capabilities objects.
So, I am concerned that your new object is defined as a fixed length object with space for
another 31 bits of information.
How about TLVs?

3.2.2. Compatibility
You missed forwards compatibility. That is: reserved bits MUST be set to zero on
transmission and MUST be ignored on receipt.

Nits
===

Need to expand citations in the Abstract.

The Abstract could probably be usefully made shorter.

Section 3, second para. I don't think we need the description of Srefresh in normal
processing.

Section 9. IANA
Could you beef up this section please.
The ideal is to show the names and characteristics of new messages/objects in this section
so that IANA does not have to ask any further questions.
You might like to reference draft-kompella-zinin-early-allocation-02.txt and
draft-kompella-rsvp-change-02.txt to sort out values to use for pre-RFC work.