Re: [Idr] Open issues with Comments on draft-vandevelde-idr-remote-next-hop-08

Eric Rosen <erosen@juniper.net> Thu, 23 October 2014 14:09 UTC

Return-Path: <erosen@juniper.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 378041A9136 for <idr@ietfa.amsl.com>; Thu, 23 Oct 2014 07:09:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GiUaRWJ-DrOV for <idr@ietfa.amsl.com>; Thu, 23 Oct 2014 07:09:49 -0700 (PDT)
Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0741.outbound.protection.outlook.com [IPv6:2a01:111:f400:fc10::741]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 70FD31A8AFE for <idr@ietf.org>; Thu, 23 Oct 2014 07:09:48 -0700 (PDT)
Received: from DM2PR0501MB1104.namprd05.prod.outlook.com (25.160.245.140) by DM2PR0501MB1102.namprd05.prod.outlook.com (25.160.245.12) with Microsoft SMTP Server (TLS) id 15.0.1054.13; Thu, 23 Oct 2014 14:09:24 +0000
Received: from DM2PR0501MB1104.namprd05.prod.outlook.com ([25.160.245.140]) by DM2PR0501MB1104.namprd05.prod.outlook.com ([25.160.245.140]) with mapi id 15.00.1054.004; Thu, 23 Oct 2014 14:09:24 +0000
From: Eric Rosen <erosen@juniper.net>
To: Susan Hares <shares@ndzh.com>, idr wg <idr@ietf.org>
Thread-Topic: [Idr] Open issues with Comments on draft-vandevelde-idr-remote-next-hop-08
Thread-Index: Ac/o1lMByC7zb8RLT6SRzAAn7YF8qAF9FekQ
Date: Thu, 23 Oct 2014 14:09:23 +0000
Message-ID: <d63e824a7984483a8f5cfb59ba7f7520@DM2PR0501MB1104.namprd05.prod.outlook.com>
References: <013001cfe8d8$a8535fd0$f8fa1f70$@ndzh.com>
In-Reply-To: <013001cfe8d8$a8535fd0$f8fa1f70$@ndzh.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [66.129.241.10]
x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR0501MB1102;
x-exchange-antispam-report-test: UriScan:;
x-forefront-prvs: 0373D94D15
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(189002)(199003)(66066001)(76482002)(95666004)(2656002)(16236675004)(86362001)(19609705001)(99286002)(106356001)(15202345003)(107046002)(64706001)(99396003)(77096002)(120916001)(76576001)(20776003)(74316001)(15975445006)(105586002)(101416001)(85852003)(19625215002)(31966008)(33646002)(97736003)(108616004)(76176999)(54356999)(40100003)(50986999)(122556002)(4396001)(230783001)(19580395003)(92566001)(85306004)(21056001)(87936001)(19300405004)(80022003)(46102003)(24736002); DIR:OUT; SFP:1102; SCL:1; SRVR:DM2PR0501MB1102; H:DM2PR0501MB1104.namprd05.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords; MX:1; A:1; LANG:en;
Content-Type: multipart/alternative; boundary="_000_d63e824a7984483a8f5cfb59ba7f7520DM2PR0501MB1104namprd05_"
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
Archived-At: http://mailarchive.ietf.org/arch/msg/idr/vvbCBYseySUnEKYaNZRF3ehhA6w
Cc: "'John G. Scudder'" <jgs@bgp.nu>
Subject: Re: [Idr] Open issues with Comments on draft-vandevelde-idr-remote-next-hop-08
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2014 14:09:54 -0000

I think Sue has done a good job summarizing the problems with this draft; I
think it still needs a lot of work before it can be accepted.

The number one question is whether the approach can be made architecturally
sound.  The draft significantly alters the role played by the "next hop" in
BGP.  In doing so, in doing so leaves a lot of unanswered questions that
will lead to lots of future interoperability problems.  I understand that
there are already deployments of some of these mechanisms in particular
environments.  The question though is how the mechanisms will work outside
the environments that the authors happen to have in mind today.

The authors compare their approach to that taken by RFC5512, but I don't
think they focus on the essential differences.  To my mind, the essential
feature of the RFC5512 approach is that there is no change at all to the
next hop semantics.  A BGP Update associates an NLRI with a next hop, and
information can then be provided to tell an ingress node how to forward
traffic through a tunnel to that next hop.  If there are multiple tunnel
types that can be used to reach the next hop, information can be specified
for each of them.  Particular NLRIs can also be bound to particular next
hops.

It is true that RFC5512 proposes to use a new AFI/SAFI to distribute he
encapsulation information, but this is really a secondary issue.  One could
certainly put the encapsulation SAFI attribute on NLRIs of AFI/SAFI 1/1 or
1/2.  The real difference between "remote next hop" and RFC5512 is that
"remote next hop" does not require the tunnel endpoint to be the same as the
next hop.  The primary issue is whether the use of the "next hop" concept in
BGP is being changed in a well-defined way.

In the remote-next-hop draft, every NLRI can be associated with not one next
hop, but with a whole set of them.  This is the part I find problematic.
The BGP specifications have many rules that involve next hops, and we now
have to find them all, look at each one, and figure out how to change it so
that it works when there are multiple next hops.  (And it does not appear to
me that this exercise has been done.)

For instance, suppose an update associated NLRI N with Next Hop H and
with Remote Next Hops R1, R2, and R3.

We could decide that this update is equivalent to the following three
conventional BGP routes:

           <NLRI=N, NH=R1>
           <NLRI=N, NH=R2>
           <NLRI=N, NH=R3>

distributed using add-path.  Then the semantics would at least be
well-defined.  But this doesn't seem to be at all what the authors
intend.   In fact, I can't tell from the draft just how the authors
really do intend such an update to be interpreted.

The draft does say that when choosing the best path for N, if the "IGP
metric" tie breaker is used, the IGP metric should be largest one from among
the set of remote next hops.  Well, at least that's well-defined.  But is it
a sensible strategy?  Will it result in the proper route being chosen?

If some or all of the remote next hops are unreachable, but the
"ordinary" next hop is reachable, is the route discarded, or do we just
ignore the unreachable next hops?  What if the ordinary next hop is
unreachable, but some or all of the remote next hops are reachable?  Is
the route discarded altogether, even though the tunnels it binds to the NLRI
could actually be used?

What if a route is distributed to another AS, and the next hop changes, but
the remote next hops do not change?  Now if the BGP decision process reaches
the "IGP metric" tie breaker, only the IGP metric to the next hop itself can
be considered.  (The remote next hops are in a different AS, so the IGP
metric to them is not known.)  But the next hop might not even be on the
path to the remote next hop.

If there are multiple remote next hop endpoints, are they all equally
preferable?  Or is each one going to have its own LOCAL_PREF and its
own MED?

If the NLRI is a labeled route, and there are multiple remote next hops,
does it need to carry a label for each of the remote next hops and a label
for the ordinary next hop as well?  (Presumably all the remote next hops are
not using the same label space.)

Various specifications say that labels must be changed when the next
hop is changed.  What happens if the next hop is changed, but some or
all of the remote next hops are not changed?  Or is it supposed to be
illegal to use remote next hop on labeled routes?

If a route carries an AIGP attribute, does the AIGP value apply to some
or all of the remote next hops?  When the AIGP attribute is modified,
are you supposed to add in the IGP metric to the furthest of the remote
next hops?

I know that there are deployments of this feature in specific
environments where these questions don't necessarily come up.  That
doesn't mean that the feature is well-defined, or that it is the best
solution.  Section 6 is nowhere near satisfactory.

I appreciate that the problematic "multi-homed IPv6" use case has been
removed, but none of the other use cases is described in enough detail to be
evaluated.  And no attempt is made to characterize those cases where the
mechanism can be used without problems, and to distinguish those from cases
where the mechanism may give unintended results.

As currently specified, this just doesn't seem to me to be an
architecturally sound solution.

It's also worth noting that, by default, the remote next hops leak
across administrative domain boundaries, which doesn't seem like the
right thing.

A few other more minor points:

   "If a device does not support this attribute, and receives this
   attribute, then it follows the normal NLRI processing and BGP best
   path selection, and the resulting forwarding decision is used, as the
   attribute is optional."

Won't that lead to the packets going to the wrong place?

   "While [RFC5512] allows multiple tunnel endpoints and multiple tunnel
   types to be carried within a BGP Encaps SAFI, the correlation of
   Tunnel information with other SAFIs is done using the color extended
   community which is also non-trivial."

I suppose one could complain that adding a level of indirection is
"non-trivial".  But indirection is not exactly unheard of in computer
science, and could also be useful.  By using the color EC, the originator of
the route could say "always use a red tunnel for packets to this NRLI".  The
red tunnel would not need to be the same tunnel in each AS through which the
packet passes.  Without this level of indirection, the tunnel encapsulation
attribute might have to be "swapped" at AS boundaries.