Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt

"Templin, Fred L" <Fred.L.Templin@boeing.com> Wed, 03 May 2017 20:10 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BB0B4129C6B for <int-area@ietfa.amsl.com>; Wed, 3 May 2017 13:10:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PFmrS8P2OGlR for <int-area@ietfa.amsl.com>; Wed, 3 May 2017 13:09:58 -0700 (PDT)
Received: from phx-mbsout-01.mbs.boeing.net (phx-mbsout-01.mbs.boeing.net [130.76.184.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 08438129B8C for <int-area@ietf.org>; Wed, 3 May 2017 13:07:52 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by phx-mbsout-01.mbs.boeing.net (8.14.4/8.14.4/DOWNSTREAM_MBSOUT) with SMTP id v43K7o0A052076; Wed, 3 May 2017 13:07:50 -0700
Received: from XCH15-06-11.nw.nos.boeing.com (xch15-06-11.nw.nos.boeing.com [137.136.239.220]) by phx-mbsout-01.mbs.boeing.net (8.14.4/8.14.4/UPSTREAM_MBSOUT) with ESMTP id v43K7jNA052024 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=OK); Wed, 3 May 2017 13:07:46 -0700
Received: from XCH15-06-08.nw.nos.boeing.com (2002:8988:eede::8988:eede) by XCH15-06-11.nw.nos.boeing.com (2002:8988:efdc::8988:efdc) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Wed, 3 May 2017 13:07:45 -0700
Received: from XCH15-06-08.nw.nos.boeing.com ([137.136.238.222]) by XCH15-06-08.nw.nos.boeing.com ([137.136.238.222]) with mapi id 15.00.1263.000; Wed, 3 May 2017 13:07:45 -0700
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Joe Touch <touch@isi.edu>
CC: "int-area@ietf.org" <int-area@ietf.org>
Thread-Topic: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt
Thread-Index: AQHSpxB0dZTEVAB270esoqM4xp8NZaGqcslggACqEoCAASy6sIAAjvoAgDY8cKCAAI2TgP//mgIg
Date: Wed, 3 May 2017 20:07:44 +0000
Message-ID: <c2d3942118774ad9b302fdb7d609c053@XCH15-06-08.nw.nos.boeing.com>
References: <149062888196.30638.8369941985115982808@ietfa.amsl.com> <f5ab0422-fd49-9082-147b-8312e974de7e@isi.edu> <4d2a86f4948c4dc49ab3b0729743d028@XCH15-06-08.nw.nos.boeing.com> <583e59d2-f846-6cd6-8e15-f3a0888889ac@isi.edu> <6ede932f07ca4b8ebd17f82e17eb4cf4@XCH15-06-08.nw.nos.boeing.com> <340d81c0-8af9-b353-44ec-f40c722745f5@isi.edu> <5a8c5001421e45d086107f208f08f2d2@XCH15-06-08.nw.nos.boeing.com> <03f6765b-a2c9-ae67-2aba-08c7f5e22a9c@isi.edu>
In-Reply-To: <03f6765b-a2c9-ae67-2aba-08c7f5e22a9c@isi.edu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [137.136.248.6]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-TM-AS-MML: disable
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/-jX_AppbodSMcQGC9H49Fo1Ddhw>
Subject: Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 May 2017 20:10:01 -0000

Hi Joe,

> -----Original Message-----
> From: Joe Touch [mailto:touch@isi.edu]
> Sent: Wednesday, May 03, 2017 11:47 AM
> To: Templin, Fred L <Fred.L.Templin@boeing.com>
> Cc: int-area@ietf.org
> Subject: Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt
> 
> Hi, Fred,
> 
> Your response keeps raising the same issues that I think we agree upon:
> 
>     - PMTUD always has the possibility of creating black holes (whether
> in the presence of multipath or not)
> 
>     - PLPMTUD can generate cases where the MIN isn't actually detected
> 
> and some claims with which I disagree:
> 
>     - that PMPMTUD probes travel different paths than the data

I think I have already said this many times, but let me try again. The
tunnel ingress is not the source of the transit packet; it is only the
source of the tunnel packet. And, the ingress may have to tunnel
a wide variety of transit packets sourced by many original sources.
So, if ECMP somewhere within the tunnel peeks at the transit
packet, then the packet could take a very different path within
the tunnel than the ingress' PLPMTUD probes took. 

> However, PLPMTUD probes are generated by transport or higher layers,
> using the same transport protocol and port pairs as the data (they ARE
> the data).

Maybe we are talking past each other - I thought the document was
asking the tunnel ingress to act as a packetization layer and send
PLMTUD probes as if it were a transport or higher layer?

> Those probes should be used to determine an MTU only for a
> given flow; the potential hazards of sharing that information across
> flows is already discussed in that RFC.

I don't think RFC4821 does enough to take ECMP into account when it
talks about storing and sharing discovered MTUs.
 
> The entire point of PMPMTUD is
> that the probes travel the same path as the data - and yes, any
> multipath system could give false information, but should eventually
> converge on a useful minimum as long as data packets get through.

If the probes look exactly like the data, then the probes will travel the
same ECMP path as the data and will discover a valid MTU yes. The
trouble with tunnels is that there is no way for the ingress to send a
probe for each and every transit packet flow - and, if a flow arrives
that has not been probed it can black hole.

>     - that tunnels behave differently than unicast links in this regard

You won't get me to make this argument - I am convinced that tunnels
are links.

> I don't yet see any explanation as to why this would be true.
> 
> So I'm left with the following, which I propose as the way forward:
> 
>     - the text will be clear about the potential issues for multipath
> potentially taking a long time to converge

It's not that it could take a long time to converge - it is that it might
never converge if some paths of the multipath block ICMPs.

>     - the text will be clear about the potential issues for black-holing
> 
> However, in both cases, I don't see a good reason to flag how a tunnel
> behaves as unique.
> 
> Does that work?
> 
> If not, why not?

The tunnel ingress is only the source of the tunnel packets; it is not the
source of the transit packets. The only way for PLPMTUD to function
properly in the presence of ECMP would be for the source of the transit
packets to do the RFC4821 probing - not the tunnel ingress.

Thanks - Fred
fred.l.templin@boeing.com

> Joe
> 
> 
> 
> On 5/3/2017 10:48 AM, Templin, Fred L wrote:
> > Hi Joe,
> >
> > Sorry for the extended delay - see below for responses:
> >
> > Thanks - Fred
> >
> >> -----Original Message-----
> >> From: Joe Touch [mailto:touch@isi.edu]
> >> Sent: Wednesday, March 29, 2017 3:06 PM
> >> To: Templin, Fred L <Fred.L.Templin@boeing.com>
> >> Cc: int-area@ietf.org
> >> Subject: Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt
> >>
> >> Hi, Fred,
> >>
> >> I think we're agreed except for PMTUD and PLMTUD. See below about the
> >> latter (AFAICT, if it black holes, then the PLMTUD would detect a
> >> failure to make forward progress). Black-holing in PMTUD is known and
> >> not something this (or any other doc) is fixing, but also a known standard.
> >>
> >> Joe
> >>
> >>
> >> On 3/29/2017 2:18 PM, Templin, Fred L wrote:
> >>> ...
> >>>>> 10) Section 4.2.1, bulleted list toward bottom of section, tunnel
> >>>>>    "atom" is a very strange word to me. Tunnel "cell"?
> >>>> The concept of an atomic packet was defined in RFC 6864. This is derived
> >>>> from that. Cell would be introducing a new term, one that is overloaded
> >>>> with ATM, which we want to avoid.
> >>> OK, but then please find a way to call it an "atomic packet".
> >> Agreed.
> > OK.
> >
> >>> ...
> >>>>> 13) Section 4.2.2, final sentence is incorrect. RFC4821-style MTU
> >>>>>     probing cannot be used by tunnels due to ECMP because the probe
> >>>>>     packets may take a different path than the data packets. That
> >>>>>     is why AERO no longer uses RFC4821 probing.
> >>>> Regarding ECMP issues, I think we need to wrap this issue up. Here's
> >>>> what I propose:
> >>>>
> >>>> - point out that ECMP causes problems with PMTUD (and can cause problems
> >>>> with PLMTUD).
> >>>> - an interface has two choices:
> >>>>     - keep track of PMTU based on other packet context (flowID or next
> >>>> header info)
> >>>>     - merge PMTU feedback, taking the MIN of reported values
> >>> The problem is that some paths in the multipath may fail to deliver
> >>> the ICMPs. So there is no way to know whether the MIN has been
> >>> determined.
> >> That's no different from the multicast case, though.
> > In terms of PMTUD, you are right that in a certain sense unicast destinations
> > with ECMP are like multicast. In other words, a single destination with multiple
> > paths - some of which may not deliver ICMPs.
> >
> > But (unlike unicast) the Internet community seems to be OK with the fact
> > that some multicast group members might not receive multicasts due to
> > an MTU black hole. For unicast destinations, it can be a real problem  if
> > there is an MTU black hole along one or more paths of the multipath.
> >
> >>>  Also the ingress may be handling transit packets sourced
> >>> by a very large number of original sources each of which produce
> >>> a very large number of distinct flows. So, there is no way for the
> >>> ingress to cache all of the flow information it handles.
> >> Min requires maintaining the same information as any interface would keep.
> > I was referring to PLPMTUD here. For PLPMTUD, the probes sent by the
> > tunnel ingress may take a very different path of the multipath than most
> > data packets will take. So, there is no way for the ingress to definitely
> > determine the Min MTU.
> >
> >>>> There's no magic here. It's a lot like multicast - either keep track in
> >>>> a way that you *think* correlates to the different PMTU feedback or take
> >>>> the MIN.
> >>> It only works if all paths on the mutlipath can be counted on to deliver
> >>> the ICMPs. If any paths in the multipath fail to deliver the ICMPs, it
> >>> black holes. And, this is a known problem.
> >> Again, same as multicast, and frankly also the same as unicast when the
> >> ICMPs are blocked. That's a known problem with PMTUD.
> > As above, in some sense it is like multicast. But, the Internet community
> > has deemed it OK for multicast to black hole for some group members.
> > For unicast, the behavior has to be deterministic and neither PMTUD
> > nor PLPMTUD can guarantee that for tunnels.
> >
> >>>> The current doc does need a scrub to make this point clearly and
> >>>> consistently.
> >>> It doesn't work, regardless of the amount of scrubbing.
> >> If your point is that PMTUD doesn't work and should never be used,
> >> that's clearly not accurate and unlikely to get WG consensus. You're
> >> welcome to try, though. However, 1981bis is on its way to increased
> >> standards maturity as we speak.
> > Multipath really is problematic for both PMTUD and PLPMTUD *for
> > tunnels* - that is aside from any considerations for promoting
> > RFC1981bis to standards-track in the more general sense.
> >
> >>> ...
> >>>>> 17) Section 4.2.3 cites RFC4821, but PLPMTUD cannot be used by
> >>>>>     tunnels due to ECMP.
> >>>> I disagree; it can, but the system needs to either take the MIN or have
> >>>> a way to decouple discovered PMTUs in way that can be trusted to
> >>>> reasonably correspond to the ECMP splitting.
> >>> It doesn't work in the generalized case. The ECMP might split into a
> >>> multitude of distinct paths, and there is no way for the ingress to
> >>> known which of the paths have been tested. And, all it takes is
> >>> one un-tested path in the multipath and there is potential for a
> >>> black hole.
> >> If PLPMTUD thinks the protocol is making forward progress, then it is
> >> not a black hole.
> > "Making forward progress" over some paths of the multipath does not
> > guarantee progress over all paths - some paths might black hole.
> >
> >>>>> 20) Section 4.3.3, fourth paragraph, "A multipoint tunnel MUST
> >>>>>     have support for broadcast and multicast" - I think this
> >>>>>     would be better as a "SHOULD". RFC2529 and AERO support
> >>>>>     multicast, but RFC5214 does not yet it is widely deployed.
> >>>> Multicast or its equivalent. Otherwise, you can't support IPv6
> >>>> multicast, which is a required capability of IPv6.
> >>> Large NBMA links can connect many nodes - thousands or more.
> >>> So, for link-scoped multicast, serialized multicast (i.e., multicast
> >>> via iterative unicast) would not scale.
> >> Serial multicast is not the only equivalent. LANE pushed broadcast ARPs
> >> to a unicast ARP server.
> > Yes, and NBMA had the MARS proposal.
> >
> >> And yes, that won't scale to millions of links, but then if/when it
> >> doesn't scale, then you cease to be able to claim this is a valid IP
> >> link. Multicast is not an optional protocol for IPv6 in particular.
> > RFC5214 and AERO work fine with IPv6.
> >
> >>> That is why some large NBMA links (e.g., RFC5214, AERO) use unicast
> >>> NS/NA/RS/RA instead of link-scoped multicast, as permitted by RFC4861.
> >>> Link-scoped multicast service discovery (e.g., DHCPv6 discovery) is
> >>> supported via multicast mapping to a unicast link-layer address.
> >> Essentially like LANE.
> > Unlike LANE, AERO links are link-local-only (i.e., there are no on-link
> > subnets). AERO links connect only routers and/or hosts that act like
> > routers.
> >
> >>>>> 23) Section 5.1, first sub-bullet under "Tunnels must obey core IP
> >>>>>     requirements", Are you meaning to talk about IPv4 DF=1?
> >>>> Yes, and that should be made more explicit. Also honoring the EMTU_R
> >>>> limits until told otherwise.
> >>> OK.
> >>>
> >>> One other comment. I agree with figures 12 and 13 but (and I think this is
> >>> a crucial point) I think they need a supporting sentence or two explaining
> >>> why the procedure is "fragment then encapsulate" and not "encapsulate
> >>> then fragment".
> >> Agreed.
> > Good.
> >
> >>> This is the difference between tunnel fragmentation
> >>> and ordinary outer fragmentation, where your document is correctly
> >>> advocating tunnel fragmentation. To the best of my knowledge, this was
> >>> first documented in Section 3.1.7 of RFC2764 and should be cited as such.
> >>> At least, that is what Bob B. suggested to me about 10yrs ago.
> >> I'll check that...
> > Please see the final paragraph of Section 3.12 of the AERO spec for example
> > text.
> >
> > Thanks - Fred
> > fred.l.templin@boeing.com
> >
> >> ----
> >
>