Re: Routing Directorate comments on draft-ietf-ccamp-automesh-01

"Adrian Farrel" <adrian@olddog.co.uk> Thu, 21 September 2006 17:00 UTC

Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GQRug-0006Hu-9j for ccamp-archive@ietf.org; Thu, 21 Sep 2006 13:00:38 -0400
Received: from psg.com ([147.28.0.62]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GQRuc-0006jc-6i for ccamp-archive@ietf.org; Thu, 21 Sep 2006 13:00:38 -0400
Received: from majordom by psg.com with local (Exim 4.60 (FreeBSD)) (envelope-from <owner-ccamp@ops.ietf.org>) id 1GQRk8-0008n9-8Q for ccamp-data@psg.com; Thu, 21 Sep 2006 16:49:44 +0000
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on psg.com
X-Spam-Level:
X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, FORGED_RCVD_HELO autolearn=ham version=3.1.1
Received: from [80.68.34.49] (helo=mail2.noc.data.net.uk) by psg.com with esmtp (Exim 4.60 (FreeBSD)) (envelope-from <adrian@olddog.co.uk>) id 1GQRk6-0008mW-9O for ccamp@ops.ietf.org; Thu, 21 Sep 2006 16:49:42 +0000
Received: from 57-99.dsl.data.net.uk ([80.68.57.99] helo=cortex.aria-networks.com) by mail2.noc.data.net.uk with esmtp (Exim 3.36 #1) id 1GQRk1-0002v6-00 for ccamp@ops.ietf.org; Thu, 21 Sep 2006 17:49:37 +0100
Received: from your029b8cecfe ([217.158.132.220] RDNS failed) by cortex.aria-networks.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 21 Sep 2006 17:49:34 +0100
Message-ID: <0f3f01c6dd9d$e30e7fa0$0a23fea9@your029b8cecfe>
Reply-To: Adrian Farrel <adrian@olddog.co.uk>
From: Adrian Farrel <adrian@olddog.co.uk>
To: JP Vasseur <jvasseur@cisco.com>
Cc: ccamp@ops.ietf.org, Ross Callon <rcallon@juniper.net>, rtg-dir@cisco.com
References: <0ae101c6ce93$8e36fac0$89849ed9@your029b8cecfe> <A2912454-4C2A-439E-8053-C247B6FDA987@cisco.com>
Subject: Re: Routing Directorate comments on draft-ietf-ccamp-automesh-01
Date: Thu, 21 Sep 2006 17:48:53 +0100
Organization: Old Dog Consulting
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2180
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
X-OriginalArrivalTime: 21 Sep 2006 16:49:35.0475 (UTC) FILETIME=[EB597430:01C6DD9D]
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 325b777e1a3a618c889460b612a65510

Hi JP,

Thanks for addressing the comments. I have forwarded these to the Routing 
Directorate and copied them on this email to let them respond if they want. 
But here are my comments:

>> 1) The Tail-end name field facilitates LSP identification. Is this
>> a new form of LSP identification?
>> If it is not new, then there should be a reference to RFC3209 and a
>> statement of which RFC3209 fields are mapped to this IGP field.
>> If it is not new then there is a significant concern that a new
>> identification is being introduced when it is not needed.
>
> As indicated in the document the string refers to a "Tail-end" name,
> not an TE LSP name: thus it does not replace the session name of the
> SESSION-ATTRIBUTE object defined in RFC3209.

Hmmm, yes it is not an LSP name, but recall that the LSP is identified by a 
combination of Session and Sender Template, and that the Session includes 
the destination IP address. In Section 3.2 I see:
   - A Tail-end name: string used to ease the TE-LSP naming.
and in Section 4.1:
   - A Tail-end name: a variable length field used to facilitate the TE
   LSP identification.

These definitions seem to imply that the tail-end name is used as an 
identifier for the LSP. The question that will be asked is: How does this 
identification of an LSP differ from the conventional identification of the 
LSP?  Given that you also have:
   - A Tail-end address: an IPv4 or IPv6 IP address to be used as a
   tail-end TE LSP address by other LSRs belonging to the same mesh-
   group
it appears that the tail-name is superfluous information.

So, perhaps the name is present for diagnostic purposes? Perhaps it is there 
to ease OAM? But it does not seem to play any role in the protocol 
procedures as it is not explicitly mentioned later in the I-D (e.g. Section 
5).

How would a node behave if it received a mesh group advertisement that 
indicated a tail-end address that did not appear to match its record of the 
tail-end name?

>> 2) The document mentions that the number of mesh groups is limited
>> but potentially (depending on encoding) provides for binary
>> encoding for 2^32-1 groups (although this might be constrained by
>> OSPF's limit of a TLV size to 2^16 bytes.
>> The document (and the authors) state that scaling of these
>> extensions is not an issue because only a small number of mesh
>> groups are likely to be in existence in a network, and any one
>> router is unlikely to participate in more than a very few.
>> There are two concerns:
>> a) Whenever we say that something in the Internet is limited,
>> history usually proves us wrong.
>
> And that's undoubtedly a good news :-)
>
>> Indeed, there is already a
>> proposal (draft-leroux-mpls-p2mp-te-autoleaf-01.txt) that uses a
>> similar mechanism for a problem that would have far more groups.
>
> Two comments:
>- Mesh groups are used to set up TE LSP meshes. If we consider let
> say 10 meshes comprising 100 routers each, that gives us 99,000 TE
> LSPs. One can easily see that the number of meshes is unlikely to
> explode in a foreseeable future. If it turns out to be the case,
> we'll have other scalability issues to fix before any potential with
> the IGP.

What about 100 meshes comprising 10 routers each?
I make that only 9,000 TE LSPs.

So clearly the scaling of MPLS-TE is not directly related to the scaling of 
automesh.

What this comes down to is your statement about how automesh will be used. I 
think we can all accept that this is the problem space that you intend to 
deploy in, and that is great. But the original point from the Routing 
Directorate was that there is nothing in the I-D that imposes this 
restriction. So how can we say that the protocol extensions will scale?

> - More importantly, the dynamics of joining a TE mesh is such that
> IGP updates are used to advertise to TE mesh group membership change
> (join or prune), which are indeed expected to be very unfrequent.

Again, the concern raised is that the problem space you intend to deploy in 
is, indeed, limited in this way. All good. But how can we say whether the 
protocol extensions will be used differently in the future? What controls 
are there over constructing a mesh where joins and prunes are frequent?

>> b) The I-D does not itself impose any reasonable limits on the
>> number of groups with the potential for a single router (by
>> misconfiguration, design, or malice) advertising a very large
>> number of groups.
>> Thus, it appears that the scaling concerns are not properly
>> addressed in this I-D.
>
>Not sure to see the point here. If indeed, a large number of TE MESH
>GROUPs were advertised, this would not impact the other LSRs since
>they would not create any new TE LSPs trying to join the new TE-MESH-
>GROUP. In term of amount of flooded information, this should not be a
>concern either (handled by routing). We clarified this in the
>security section.

The impact on the other LSRs is exactly flooding question. Covering that in 
the security section is fine for the misconfiguration and malice cases.

>> 3) The document mentions that "The TE-MESH-GROUP TLV is OPTIONAL
>> and must at most appear once in a OSPF Router Information LSA or
>> ISIS Router Capability TLV." but for addition/removal it mentions
>> "conversely, if the LSR leaves a mesh-group the corresponding entry
>> will be removed from the TE-MESH-GROUP TLV."
>> What are these "entries" referring to - that there is a top-level
>> TE-MESH-GROUP TLV with multiple sub-TLVs (but the document mentions
>> "No sub-TLV is currently defined for the TE-mesh-group TLV") ?
>>
>> AF>> My comment on this is that the definition of the TLVs seems
>> AF>> unclear.
>> AF>> From figure 2, it appears that some additional information can be
>> AF>> present in the TLV after the fields listed, and (reading
>> AF>> between the lines) it would appear that this additional
>> AF>> information is a series of repeats of the set of fields to
>> AF>> define multiple mesh groups.
>> AF>> This could usefully be clarified considerably.
>
>
> You're absolutely right. The figures have been modified:
>
> (example show below):

[SNIP]
Looks good to me.

>> AF>> But it is now unclear to me whether a single router can be a
>> AF>> member of IPv4 an IPv6 mesh groups. It would seem that
>> AF>> these cannot be mixed within a single TLV, and multiple
>> AF>> TLVs (one IPv4 and one IPv6) are prohibited.
>
> OK the text requires some clarification. What is prohibited is to
> have two IPv4 sub-TLV or two IPv6 sub-TLV but one of each is
> permitted. New proposed text to clarify:
>
> The TE-MESH-GROUP TLV is OPTIONAL and at most one IPv4 instance and
> one IPv6 instance MUST appear in a OSPF Router Information LSA or
> ISIS Router Capability TLV. If the OSPF TE-MESH-GROUP TLV (IPv4 or
> IPv6) occurs more than once within the OSPF Router Information LSA,
> only the first instance is processed, subsequent TLV(s) will be
> silently ignored. Similarly, If the ISIS TE-MESH-GROUP sub-TLV (IPv4
> or IPv6) occurs more than once within the ISIS Router capability TLV,
> only the first instance is processed, subsequent TLV(s) will be
> silently ignored.

OK. That's fine.
I think you want to make a couple of changes:
- "at most one instance MUST appear" is ambiguous since it will
  be confused with "an instance MUST appear". I suggest you
  reword as "MUST NOT include more than one of each of"
- "If the OSPF TE-MESH-GROUP TLV (IPv4 or IPv6) occurs
  more than once" should really be phrased as "If the either the
  IPv4 or IPv6 OSPF TE-MESH-GROUP TLV occurs more
  than once".  Ditto for the IS-IS sub-TLV.
- Two instances of "will be silently ignored" should read "SHOULD
  be silently ignored"

>> 4) Small terminology issue in section 5.1 it says: "Note that both
>> operations can be performed in the context of a single refresh."
>> This is not a refresh. It is a trigger/update. A better term for
>> OSPF would be "LSA origination".
>
>OK fixed (I used the term "Update"), thanks.

OK

>> 5) Please state the applicability to OSPF v2 and or v3. Note that
>> the Router_Cap document covers both v2 and v3
>
>Indeed, Thanks for the comments.  The OSPFv3 aspects have been
>incorporated. Here is the new text:

[SNIP]
OK

>> 6) The term "fairly static" at the end of section 5.1 is
>> meaningless without some relative context.
>> Presumably this relates to the number times an LSR joins or leaves
>> a mesh group over time.
>> Is it intended to be relative to the IGP refresh period?
>> Please clarify in an objective rather than a subjective way.
>
>
> Right, this requires clarification. Here is the new text: Moreover,
> TE mesh-group membership should not change frequently: each time an
> LSR joins or leaves a new TE mesh-group.

I could live with this, personally. We'll see whether we get any more 
comments.
I think the nub will be:
1. whether your "should not" can be "SHOULD NOT"
2. what does "frequently mean"?
3. what is there in this I-D to say that an LSR does not join/leave a
   TE mesh-group very often?

> I guess that this is sufficiently explicit: it is a well-known fact
> that LSRs are infrequently added or removed to a TE mesh.

:-) Very well known. In fact, my mother was commenting on it to me only the 
other day ;-)

Consider the case where PE membership of an automesh is dependent on whether 
there are C-nodes subscribed to some service.

Perhaps this well known fact could be noted in the Introduction to this I-D 
which is AFAIK the only IETF document on the subject of automesh.

>> 7) The security section (section 8) is inadequate and will
>> undoubtedly be rejected by the security ADs. At the very least, the
>> I-D needs a paragraph (i.e. more than one or two lines) explaining
>> why there are no new security considerations. But what would be the
>> impact of adding false mesh groups to a TLV? Is there anything
>> (dangerous) that can be learned about the network by inspecting
>> mesh group TLVs?
>
> The following section has been added:

[SNIP]
OK. Let's run with that and see how much we get beaten up by the Security 
experts.

Cheers,
Adrian