Re: [bess] Comments on draft-ietf-bess-ir

<thomas.morin@orange.com> Tue, 29 September 2015 11:17 UTC

Return-Path: <thomas.morin@orange.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4DA121A874F; Tue, 29 Sep 2015 04:17:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GfoLfSof0_fO; Tue, 29 Sep 2015 04:17:11 -0700 (PDT)
Received: from relais-inet.francetelecom.com (relais-ias243.francetelecom.com [80.12.204.243]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A3FD91A8752; Tue, 29 Sep 2015 04:17:10 -0700 (PDT)
Received: from omfeda08.si.francetelecom.fr (unknown [xx.xx.xx.201]) by omfeda14.si.francetelecom.fr (ESMTP service) with ESMTP id AD94D2AC21A; Tue, 29 Sep 2015 13:17:08 +0200 (CEST)
Received: from Exchangemail-eme1.itn.ftgroup (unknown [10.114.1.183]) by omfeda08.si.francetelecom.fr (ESMTP service) with ESMTP id 5D98F38404A; Tue, 29 Sep 2015 13:17:08 +0200 (CEST)
Received: from [10.193.71.12] (10.197.38.3) by PEXCVZYH02.corporate.adroot.infra.ftgroup (10.114.1.183) with Microsoft SMTP Server (TLS) id 14.3.248.2; Tue, 29 Sep 2015 13:17:07 +0200
From: thomas.morin@orange.com
To: Eric C Rosen <erosen@juniper.net>, draft-ietf-bess-ir@ietf.org, BESS <bess@ietf.org>
References: <20534_1443086368_5603C020_20534_2608_3_5603C01F.70607@orange.com> <56094630.40606@juniper.net>
Organization: Orange
Message-ID: <5536_1443525428_560A7334_5536_15342_1_560A7332.8020003@orange.com>
Date: Tue, 29 Sep 2015 13:17:06 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <56094630.40606@juniper.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.197.38.3]
X-PMX-Version: 6.2.1.2478543, Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2015.9.29.101516
Archived-At: <http://mailarchive.ietf.org/arch/msg/bess/CwWujDrxiXavp0ZStWqWb9E7hvA>
Cc: "Jeffrey (Zhaohui) Zhang" <zzhang@juniper.net>, Karthik Subramanian <kartsubr@cisco.com>
Subject: Re: [bess] Comments on draft-ietf-bess-ir
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Sep 2015 11:17:14 -0000

Hi Eric,


2015-09-28, Eric C Rosen:
>  From the draft:
>
>      "This document does not provide any new protocol elements or
> procedures"
>
> I think we can agree that it does not specify any new protocol elements.
>
>  > [Thomas] Sections 3, 4.1.1 and 9, at least, introduce what I think
> can fairly be considered new procedures.
>
> I don't see anything in section 3 or 4.1.1 that I would call "new
> procedures".
>
> However, your point is well-taken about section 9, as RFC6514 does not
> really address the use of timers to achieve "make before break"
> functionality.  On the other hand, RFC 6513 section 7 does specify the
> use of timers when switching a flow from one P-tunnel to another, so the
> use of timers is not a new addition.
>
> When we started implementing ingress replication, we found that it
> wasn't always very clear how to apply the procedures of RFC6514 when
> ingress replication is being used.  The purpose of this draft is to pull
> together into one place all the procedures relevant to ingress
> replication, and to explain clearly how ingress replication is done
> using the procedures of RFC6514.  The focus is on getting it clear
> enough to increase the likelihood of multi-vendor interoperability.  We
> really tried hard to avoid creating any new IR-specific procedures,
> though section 9 may be an exception.

And I fully agree that the specs do fit this intention, but one 
exception is enough to make the assertion wrong.

I would suggest to distinguish intent and strict truth, e.g. by 
replacing the quoted sentence by "To bring the required clarifications, 
this document updates the behavior specified by RFC6514, but does so 
without introducing new protocol elements or any fundamentally new 
procedures". Or something along these lines.


>  From the draft:
>
>      "4.1. Advertised P-tunnels The procedures in this section apply
> when the P-tunnel to be joined has been advertised in an S-PMSI A-D
> route, an Inter-AS I-PMSI A-D route, or an Intra-AS I-PMSI A-D route."
>
>  > For sake of clarity and avoid any misinterpretation, can you please
> add ", and the PMSI Tunnel Attribute is of type Ingress Replication"
>
> Well, section 4 is called "How to Join an IR P-tunnel", and the entire
> draft is exclusively about IR P-tunnels.  If you think that is not
> clear, perhaps the sentence above should just say "when the IR P-tunnel
> to be joined has been ..."

Yes, that would be just fine.


>  From the draft:
>
>      "Note that if a set of IR P-tunnels is joined in this manner, the
> "discard from the wrong PE" procedures of [RFC6513] section 9.1.1 cannot
> be applied to that P-tunnel.  Thus duplicate prevention on such IR
> P-tunnels requires the use of either Single Forwarder Selection
> ([RFC6513] section 9.1.2) or native PIM procedures ([RFC6513] section
> 9.1.3).
>
> [Thomas] I would suggest rewording with "Note that, in the general case,
> ..."  and "...unless the tunneling technique relies on an IP transport,
> which may allow the identification of the PE sourcing the traffic".
>
> It is certainly true in theory that one could use an IP encapsulation in
> this way, but in practice it creates a couple of complications:
>
> - I think it presupposes that the IP source address field of the
> tunneled packets contains the same IP address that the ingress PE puts
> in the Global Administrator field of the VRF Route Import EC that it
> attaches to the unicast routes that it distributes.

(I guess it could use a different one and be made to advertise which one 
to expect in a BGP attribute.)

> - All the egress PEs need to implement this IP address check in the data
> plane forwarding path.

Yes, and this is already true in RFC6513.

> While using the IP encapsulation in this way is a possible option, it
> has never seemed like a very attractive option, and as far as I know, no
> one has implemented it.
>
> To avoid the need for an option like this, I always recommend that if
> one wants to use IR by default, one should advertise the IR P-tunnels in
> a (C-*,C-*) S-PMSI A-D route rather than in an Intra-AS I-PMSI A-D
> route.  One can still use IP tunnels if one wants, but the "discard from
> the wrong PE" procedures would be based on the MPLS label that is
> carried by the IP payload.

I would tend to agree that the choice made makes sense.

It is however better to not make it look like the only possible design 
choice ("'discard from the wrong PE' procedures of [RFC6513] section 
9.1.1 cannot be applied to that P-tunnel" is ), to avoid misleading 
future readers.

I think that at least "[procedure xyz] cannot be applied to that 
P-tunnel, in the general case," would be better.


> Another problem with using the IP header to apply the "discard from the
> wrong PE" procedure is that it will not easily generalize to the case of
> extranet.  (Still another problem would be that it is just one more
> unnecessary option.)
>
> I could add some text explaining this, and explaining why it is not
> recommended to use the IP header to apply the "discard from the wrong
> PE" procedure.

Yes, this would be useful to document in one or two paragraphs in an 
Appendix for instance.

> Now, regarding the use of timers when switching UMH ...
>
> [Thomas] I understand -- even if that is a bit implicit -- that the NLRI
> for the Leaf A-D route to the old UMH is the same as the NLRI for the
> Leaf A-D route to the new UMH.
>
> Correct.

See below, there is a lot of implicit in the sentence as currently 
written. Not enough for me to understand correctly on a first reading.

>
> [Thomas] But I don't in fact understand why this has to be the case...
>
> Leaf A-D routes are originated in response to I/S-PMSI A-D routes, and
> the rules for creating the NLRI of a Leaf A-D route, as specified in
> RFC6514, are independent of the tunnel type.

I agree with that.

> [Thomas] One has to ignore the procedures to build a Leaf A-D route of
> RFC6514 since this document specifies new ones for IR in section 4.1.1
>
> I don't understand why you say that.  The 4.1.1 rules for generating the
> NLRI of a Leaf A-D route follow the RFC6514 procedures.

(see below)

>
> [Thomas] section 4.1.1 says that the Key field of the Leaf A-D route
> contains the "tunnel identifier" defined in section 3
>
> Yes; the tunnel identifier defined in section 3 is the NLRI of the
> corresponding I/S-PMSI A-D route, which is exactly that RFC6514
> specifies for the route key.

(see below)

> [Thomas] section 3 says that (when the "Leaf info required" bit is set,
> which is the case for section 4.1.1) the tunnel identifier is
> RECOMMENDED to be a routable address of the router that built the PTA
>
> No; section 3 says that the "tunnel identifier" field of the PTA is
> recommended to be a routable address of the router that built the PTA.
> But section 3 also tries to make it clear that the identifier of the IR
> P-tunnel does not appear in the tunnel identifier field of the PTA.

I have re-read section 3 and now got why I had initially misunderstood 
section 4.1.1. Section 3 does in fact say that ''the identifier of an IR 
P-tunnel is not the "Tunnel Identifier" the PTA'', which is pretty close 
to "the tunnel identifier is not the tunnel identifier".

When you read Section 4.1.1, the phrase "MUST contain the tunnel 
identifier (as defined in Section 3 above)" might be misunderstood,
especially because this time "IR P-tunnel identifier" has become juste 
"tunnel identifier" (might be read as Tunnel Identifier with the missing 
uppercase). All this being made even more likely that one may had in 
mind that "MANDATORY" wording is most often related to new things that 
one has to be careful about rather than a mere repeat of an existing spec.

I would suggest the following wording:

Current text:
    Once the UMH is determined, the router joining the IR P-tunnel
    originates a Leaf A-D route.  The NLRI of the Leaf A-D route MUST
    contain the tunnel identifier (as defined in Section 3 above) as its
    "route key".

Proposal:
    Once the UMH is determined, the router joining the IR P-tunnel
    originates a Leaf A-D route following the procedures in RFC65414;
    i.e. the NLRI of the Leaf A-D route MUST is set to the NLRI of
    the route triggering the join (which happens to be the IR P-tunnel
    identifier, as defined in section 3, and distinct from the PTA
    Tunnel Identifier field).


> [Thomas] Anyhow, it seems to me that ensuring that the Key changes when
> the UMH changes, would simplify the make before break procedure:
> everything is at the hand of the downstream PE which can advertise both
> routes for as long as it wishes,
>
> That does not seem to me to be a simplification.  The specified
> procedure is pretty simple:
>
> - To change parents, only a single control plane operation is needed: a
> change in the RT of the Leaf A-D route.

Note that I haven't implied anywhere that re-originating a new route 
would be of a problematic complexity.

After a thorough re-reading of section 3, I understand now only why I 
initially totally misunderstood why "only a change in the RT of the Leaf 
A-D route is needed".

Let me suggest a rewording that may avoid other readers to be lost as I 
was...

Current text:
    Suppose a child node has joined a particular IR P-tunnel via a
    particular UMH, and it now determines (for whatever reason) that it
    needs to change its UMH on that P-tunnel.

There is in fact a lot of implicit in this sentence: "joined ... via" 
and "a particular P-tunnel"/"that P-tunnel" refer to the particulars in 
sections 3 and 4.1.1.

Proposal:
    Suppose a child node has joined a particular IR P-tunnel via a
    particular UMH (following procedures in section 4), and it now
    determines (for whatever reason) that it needs to change its UMH
    on that P-tunnel (same tunnel identifier as defined in Section 3).
    This can for instance arise on a change of UMH for a intermediate
    node in a deployment where segmented trees are used.


> - In both the upstream and the downstream node, the to-be-deleted data
> plane state is timed out.
>
> - There are no data-driven state changes. (Note that to avoid
> data-driven state changes, the downstream node really needs to run a
> timer in order to decide when to modify its data plane state.)
>
> - The timers do not need to be very precisely tuned, and certainly do
> not need to be tuned on a per-peer basis.
>
> - We retain the RFC6514 principle of keeping the NLRI independent of the
> tunnel type.  Thus we minimize the chances of creating unintended
> side-effects or new corner cases that need to be thought out.  That is,
> we minimize the chances of breaking existing MVPN implementations in
> unanticipated ways.

The above is a very precise refutal of issues that I hadn't even raised. 
  If PETA was taking care of strawmen, I would certainly alert them at 
once ;)

You have left uncommented the one reason I had given to illustrate the 
complexity of this solution: with the specs as they are, somebody will 
have to write code to make these two timers tunable, somebody will have 
to test these new settings, somebody will have to map that into a Yang 
model (or similar), and somebody will have to support that in an OSS 
tool and use it to force consistent values on all PEs/A(S)BRs.

After getting a better understanding of the procedures, I agree they are 
useful, under the condition that a reasonable default for each of the 
two timers is standardized in the specs (so that they can be implemented 
viably even before all the actions described above happen).

I would propose:

    An implementation of these specs SHOULD offers means to configure
    the values of timers 1 and 2. An implementation of these specs MUST
    have a default value for timer 1 of at least [T1] seconds and a
    value of timer 2 of at most [T2] seconds.

T1 and T2 are then left to be determined, with [T2] < [T1].
The target is to have T2 large enough to make it likely that the new UMH 
has received and processed the route.

I would offer T2=60s and T1=120s.

Of course, setups that want a finer tuning to optimize bandwidth, will 
typically to use the tuning knobs to change the timers.

Comments ?

-Thomas


_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.