Re: [Gen-art] Gen-art telechat review of draft-ietf-mpls-tp-psc-itu-03.txt

Elwyn Davies <elwynd@folly.org.uk> Thu, 27 March 2014 13:35 UTC

Return-Path: <elwynd@folly.org.uk>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6AF701A00DC for <gen-art@ietfa.amsl.com>; Thu, 27 Mar 2014 06:35:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9O3LB-LcrwB6 for <gen-art@ietfa.amsl.com>; Thu, 27 Mar 2014 06:35:25 -0700 (PDT)
Received: from b.painless.aa.net.uk (b.painless.aa.net.uk [IPv6:2001:8b0:0:30::51bb:1e34]) by ietfa.amsl.com (Postfix) with ESMTP id E70901A00B4 for <gen-art@ietf.org>; Thu, 27 Mar 2014 06:35:24 -0700 (PDT)
Received: from mightyatom.folly.org.uk ([81.187.254.250]) by b.painless.aa.net.uk with esmtp (Exim 4.72) (envelope-from <elwynd@folly.org.uk>) id 1WTASe-0003Yd-Mc; Thu, 27 Mar 2014 13:35:16 +0000
From: Elwyn Davies <elwynd@folly.org.uk>
To: Jari Arkko <jari.arkko@piuha.net>
In-Reply-To: <E959A52E-0835-498E-800E-3BD8AC769925@piuha.net>
References: <1395511257.15324.480.camel@mightyatom> <61819e57d85f408d880d97dacc78fb6e@BLUPR05MB151.namprd05.prod.outlook.com> <0af301cf4613$aae41130$00ac3390$@olddog.co.uk> <1395859711.15324.605.camel@mightyatom> <08fc01cf4933$86a713d0$93f53b70$@olddog.co.uk> <E959A52E-0835-498E-800E-3BD8AC769925@piuha.net>
Content-Type: text/plain
Organization: Folly Consulting
Date: Thu, 27 Mar 2014 13:35:10 +0000
Message-Id: <1395927310.15324.609.camel@mightyatom>
Mime-Version: 1.0
X-Mailer: Evolution 2.26.3
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/gen-art/dwL36dKz0G7gTKRzm8iIgiutp2o
Cc: adrian@olddog.co.uk, 'General Area Review Team' <gen-art@ietf.org>, draft-ietf-mpls-tp-psc-itu.all@tools.ietf.org
Subject: Re: [Gen-art] Gen-art telechat review of draft-ietf-mpls-tp-psc-itu-03.txt
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Mar 2014 13:35:29 -0000

Hi Jari and Adrian,

Yup!  This looks like a good way forwards.

I haven't seen the proposed updates for the editorials but I am sure it
will be fine.

Thanks,
Elwyn

On Thu, 2014-03-27 at 10:37 +0200, Jari Arkko wrote:
> Elwyn,
> 
> Thanks for the review & raising this important issue. Adrian's text suggestion looks like a good way to address the main concern (to me at least, and without going through the details myself). Your thoughts?
> 
> Jari
> 
> On Mar 26, 2014, at 10:39 PM, Adrian Farrel <adrian@olddog.co.uk> wrote:
> 
> > Hi Elwyn,
> > 
> >>> I understand from the document editor that there is a revision in waiting to
> > be
> >>> posted that clears up some remaining nits from your review.
> >> I haven't seen this as yet, nor the psc-updates new version.
> > 
> > PSC-updates just posted (got stuck in a tools snafu).
> > Update to this I-D is pending until after IESG evaluation completes.
> > 
> >> I was trying to capture essentially two points:
> >> - As it stands this doc both introduces capabilities/modes and provides
> >> a definition of one new mode as well as redefining RFC 6378 as providing
> >> a basic mode.  In its introductory function I felt it needed to be
> >> explicit that if you in future wanted a new combination of capabilities
> >> maybe using some additional extra capabiliies then you need to specify
> >> some extra mode in another doc that tells you the combination of bits in
> >> the TLV for the new mode.  OK, there might never be any other modes but
> >> I was told that wasn't the original plan, but I don't understand why it
> >> isn't helpful to be explicit here.
> > 
> > OK. I was distracted by the words in your proposed text :-)
> >>>>   Only combinations of capabilities specified by modes will be
> >>>>   supported by implementations.
> > I read this to mean that implementations *of*this*specification* would reject
> > combinations of capabilities that were not those specified in the two modes
> > described here.
> > My reaction was to say: but the text describes this in great detail.
> > 
> > Now I see what you are saying with respect to capabilities so, we could update
> > the last three paragraphs of the Introduction:
> > OLD
> >   This document introduces capabilities and modes.  A capability is an
> >   individual behavior.  The capabilities of a node are advertised using
> >   the method given in this document.  A mode is a particular
> >   combination of capabilities.  Two modes are defined in this document:
> >   PSC mode and Automatic Protection Switching (APS) mode.
> > 
> >   This document describes the behavior, the priority logic, and the
> >   state machine of the PSC protocol when all the capabilities
> >   associated with the APS mode are enabled.  The PSC protocol behavior
> >   for the PSC mode is as defined in [RFC6378].
> > 
> >   This document updates [RFC6378] by adding a capability advertisement
> >   mechanism.  It is recommended that existing implementations of the
> >   PSC protocol be updated to support this capability.  Backward
> >   compatibility with existing implementations is described in
> >   Section 9.2.1.
> > NEW
> >   This document introduces capabilities and modes.  A capability is an
> >   individual behavior.  The capabilities of a node are advertised using
> >   the method given in this document.  A mode is a particular
> >   combination of capabilities.  Two modes are defined in this document:
> >   PSC mode and Automatic Protection Switching (APS) mode.  Other modes
> >   may be defined as new combinations of the capabilities defined in
> >   this document or through the definition of additional capabilities. 
> >   In either case, the specification defining a new mode will be 
> >   responsible for documenting the behavior, the priority logic, and the
> >   state machine of the PSC protocol when the set of capabilities in the
> >   new mode are enabled.
> > 
> >   This document describes the behavior, the priority logic, and the
> >   state machine of the PSC protocol when all the capabilities
> >   associated with the APS mode are enabled.  The PSC protocol behavior
> >   for the PSC mode is as defined in [RFC6378].
> > 
> >   This document updates [RFC6378] by adding a capability advertisement
> >   mechanism.  It is recommended that existing implementations of the
> >   PSC protocol be updated to support this mechanism.  Backward
> >   compatibility with existing implementations that do not support this
> >   mechanism is described in Section 9.2.1.
> > 
> >   Implementations are expected to be configured to support a specific
> >   set of capabilities (a mode) and to reject messages that indicate the
> >   use of a different set of capabilities (a different mode). Thus, the
> >   capabilities advertisement is not a negotiation, but a verification
> >   that peers are using the same mode.
> > END
> > 
> >> - I had understood from an email exchange with Huub that the authors had
> >> the concept that PSC messages with duff values (such as wrong lengths)
> >> would be picked off as 'invalid messages' and never make it to the main
> >> protocol engine (I assume that this will be addressed in the
> >> psc-updates).  Huub seemed to imply that a message with a set of
> >> capability bits that did not match a mode understood by the node would
> >> be treated as an invalid message rather than triggering the operator
> >> intervention.  This seemed sensible so that the alarm would only be
> > 
> >> triggered if an operator acciedentally reconfigured a different mode.
> > 
> > Implementations can fiddle with their alarm thresholds. It is likely that an
> > implementation that has never talked will raise a flag at once; that an
> > implementation might soak an individual event; and that an implementation could
> > track soak-avoiding flip-flops. However, I don't believe in telling people how
> > to write good code if it doesn't change the bits on the wire. (Well, I do
> > believe in telling them, but I also believe in being paid to tell them :-)
> > 
> > The latest revision of the PSC-updates draft does not mention behaviour on
> > receiving a "malformed" or "unknown" TLV. We planned to raise this point as a
> > last call comment to ensure it is properly discussed, and I have done so just
> > now that last call has started.
> > 
> >> I think Stephen Farrell has picked up on the DoS aspect of this in his
> >> tracker comment.
> > 
> > I don't think Stephen was talking about DoS. I think he referred to fat fingers,
> > and my response was that this is not something dynamic in the configuration
> > within a network. You'll pick one mode for all your nodes, or you'll pick the/an
> > other mode. So if you get it wrong on a new box/interface it just won't come up
> > properly and you'll fix it.
> > 
> > The random splat is, covered by sensible implementation and rarity.
> > A subverted node has far better things to do with its subversion.
> > A MitM attacker could tamper with these bits, but again, they could do far more
> > interesting things to packets if they are able to catch and modify them and
> > reconcile the lower layer, link-level security.
> > 
> > Cheers,
> > Adrian
> > 
> >> 
> >> Regards,
> >> Elwyn
> >>> 
> >>>> Summary:
> >>>> Almost ready.  There are a couple of points which I raised at Last Call
> >>>> and discussed with the authors and others both by email and f2f in
> >>>> London that are not resolved.  These point revolve around being rigorous
> >>>> about wire encoding, clarifying error behaviour and being definite that
> >>>> implementations support modes as specfic combinations of capabilities so
> >>>> that arbitrary capability combinations are not allowed and result in
> >>>> invalid protocol messages.
> >>>> 
> >>>> Major Issues:
> >>>> 
> >>>> Minor Issues:
> >>>> s1: From my discussions with the authors and others associated with this
> >>>> document, it is my understanding that the intention is that only
> >>>> combinations of capabilities specified by modes should be legal and
> >>>> hence that implementations would support modes rather than arbitrary
> >>>> sets of capbilities. I think it would be worth being more explicit about
> >>>> this.  This would answer my comments at Last Call that it was unclear
> >>>> whether other combinations were allowed and would make it clear that a
> >>>> message that arrived with a corrupted bit in the flags field was
> >>>> definitely malformed.  I suggest adding the following text to para 16 of
> >>>> s1 (starts "This document introduces capabilities and modes.") before
> >>>> the last sentence:
> >>>>   Only combinations of capabilities specified by modes will be
> >>>>   supported by implementations.
> >>> 
> >>> While this is true, it is also not helpful!
> >>> Any combination of capabilities (these five and any of the future
> >>> nearly-infinite number of capabilities that can be represented in the bit
> > field)
> >>> could be specified as supported (i.e. as a mode) in the future.
> >>> There are two points of note:
> >>> 1. Only two modes are currently defined
> >>> 2. Any future mode must be specified in combination with the state machines
> >> for
> >>> the mode.
> >>> 
> >>> A message that is received containing a set of capabilities (i.e. a mode)
> > not
> >>> supported by the receiver would be rejected. See Section 9.1.1. That is,
> > this is
> >>> not a negotiation. This is a verification that both speakers are operating
> > in
> >>> the same mode.
> >>> 
> >>> For future compatibility, there is no distinction between a corrupted set of
> >>> capability bits and an unknown mode.
> >>> 
> >>>> Nits/Editorial Comments:
> >>>> 
> >>>> s4.4, para 1:
> >>>> OLD:
> >>>> When the modified priorities specified in this document is in use,..
> >>>> NEW:
> >>>> When the modified priorities specified in this document are in use,..
> >>>> (or maybe better:)
> >>>> When the modified priority order specified in this document is in use,..
> >>>> 
> >>>> s7.3 et seq: The term "selector bridge" is introduced without
> >>>> definition.  I suspect it is a piece of jargon I am supposed to know but
> >>>> I think a reference would help.
> >>> 
> >>> Yes, it is a piece of standard terminology in protection switching. I'm sure
> > the
> >>> authors can find a reference.
> >>> 
> >>>> s9.1: RFC 6378 doesn't define the encoding of the TLV type and TLV
> >>>> length fields, so it needs to be done here (Unsigned integers). It also
> >>>> doesn't define encoding of the overall TLV length field in
> >>>> the PSC header.  This may be thought to be 'obvious' but there is no
> >>>> default specified in IETF documents.
> >>> 
> >>> This is being fixed in draft-ietf-mpls-psc-updates that updates 4368. New
> >>> revision about to be posted before IETF last call.
> >>> 
> >>>> s9.1: Both RFC6378 and this document are incomplete as regards
> >>>> specifying what constitutes an invalid protocol message.  In particular
> >>>> there is no discussion of behaviour if correctly formed but unrecognized
> >>>> TLVs are received.  Do these make the message invalid or should they be
> >>>> ignored?
> >>> 
> >>> This should be included in draft-ietf-mpls-psc-updates as well.
> >>> 
> >>>> s9.1.1 and s12:
> >>>> In s12 it is stated (similar wording in s9.1.1):
> >>>>>   o  If the Capabilities TLV mismatches, the node MUST alert the
> >>>>>      operator and MUST NOT perform any protection switching until the
> >>>>>      operator resolves the mismatch in the Capabilities TLV.
> >>>> Having discussed the situation with the authors and others, I understand
> >>>> that there are circumstances, depending on the underlying transport,
> >>>> that bit errors might not be detected and hence that there is a small
> >>>> probability that corrupt PSC messages may be propagated up to the
> >>>> protocol machine.  At present there is no explicit statement that a
> >>>> corrupted flag word would be trapped as an invalid protocol message
> >>>> (this seems to be the intention) rather than triggering this operator
> >>>> alert.  I think that the best that can be done is specify that a PSC
> >>>> protocol message MUST have the flags for a recognized mode set exactly
> >>>> and otherwise it will be treated as an invalid message.  The wording in
> >>>> s9.1.1 and s12 would then catch an inadvertent reconfiguration.  I
> >>>> suggest adding the following to s9.1.1:
> >>>>   Any PSC message that has a combination of capability bits set that
> >>>>   does not correspond to a defined mode will be treated as an invalid
> >>>>   message and ignored.
> >>> 
> >>> This is plain wrong!
> >>> The receiving device is set to operate in a single mode.
> >>> If the received message is not identical to that mode, it cannot operate.
> >>> Section 9.1.1 already explains how this is handled.
> >>> To restate: this is not a negotiation.
> >>> It is an announcement.
> >>> 
> >>> The possibility of a corrupt message does exist. Neutrinos are remarkably
> >>> unpredictable beasts. And it is remotely possible that the error will arise
> >>> without the underlying transport detecting it. And it is further possible
> > that
> >>> the error will take out a single bit in the capabilities. The result is
> >>> indistinguishable from the sender deciding to tweak its capabilities. That
> > would
> >>> cause a mode mismatch and the process in 9.1.1 would be invoked. Given that
> >> it
> >>> is indistinguishable, why would this be a cause for any different behaviour?
> >>> 
> >>> BTW, Stewart was asking some time back whether there was any record
> >> anywhere of
> >>> an MPLS packet that had been misdelivered because the label had had a
> >> corruption
> >>> event on the wire. We didn't come up with anything and the general feeling
> >> was
> >>> that hardware memory was far more vulnerable.
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> _______________________________________________
> >>> Gen-art mailing list
> >>> Gen-art@ietf.org
> >>> https://www.ietf.org/mailman/listinfo/gen-art
> > 
> > _______________________________________________
> > Gen-art mailing list
> > Gen-art@ietf.org
> > https://www.ietf.org/mailman/listinfo/gen-art
> 
> _______________________________________________
> Gen-art mailing list
> Gen-art@ietf.org
> https://www.ietf.org/mailman/listinfo/gen-art