Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt

"Templin, Fred L" <> Wed, 03 May 2017 17:51 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 355FF129BCC for <>; Wed, 3 May 2017 10:51:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id WId_TsPWe0jg for <>; Wed, 3 May 2017 10:51:34 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 88D09129BF3 for <>; Wed, 3 May 2017 10:48:59 -0700 (PDT)
Received: from localhost (localhost []) by (8.14.4/8.14.4/DOWNSTREAM_MBSOUT) with SMTP id v43Hmwu6036692; Wed, 3 May 2017 10:48:59 -0700
Received: from ( []) by (8.14.4/8.14.4/UPSTREAM_MBSOUT) with ESMTP id v43Hml3Z036445 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=OK); Wed, 3 May 2017 10:48:47 -0700
Received: from (2002:8988:eede::8988:eede) by (2002:8988:efdb::8988:efdb) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Wed, 3 May 2017 10:48:47 -0700
Received: from ([]) by ([]) with mapi id 15.00.1263.000; Wed, 3 May 2017 10:48:47 -0700
From: "Templin, Fred L" <>
To: Joe Touch <>
CC: "" <>
Thread-Topic: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt
Thread-Index: AQHSpxB0dZTEVAB270esoqM4xp8NZaGqcslggACqEoCAASy6sIAAjvoAgDY8cKA=
Date: Wed, 3 May 2017 17:48:46 +0000
Message-ID: <>
References: <> <> <> <> <> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: []
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-TM-AS-MML: disable
Archived-At: <>
Subject: Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF Internet Area Mailing List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 03 May 2017 17:51:36 -0000

Hi Joe,

Sorry for the extended delay - see below for responses:

Thanks - Fred

> -----Original Message-----
> From: Joe Touch []
> Sent: Wednesday, March 29, 2017 3:06 PM
> To: Templin, Fred L <>
> Cc:
> Subject: Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt
> Hi, Fred,
> I think we're agreed except for PMTUD and PLMTUD. See below about the
> latter (AFAICT, if it black holes, then the PLMTUD would detect a
> failure to make forward progress). Black-holing in PMTUD is known and
> not something this (or any other doc) is fixing, but also a known standard.
> Joe
> On 3/29/2017 2:18 PM, Templin, Fred L wrote:
> > ...
> >>> 10) Section 4.2.1, bulleted list toward bottom of section, tunnel
> >>>    "atom" is a very strange word to me. Tunnel "cell"?
> >> The concept of an atomic packet was defined in RFC 6864. This is derived
> >> from that. Cell would be introducing a new term, one that is overloaded
> >> with ATM, which we want to avoid.
> > OK, but then please find a way to call it an "atomic packet".
> Agreed.


> > ...
> >>> 13) Section 4.2.2, final sentence is incorrect. RFC4821-style MTU
> >>>     probing cannot be used by tunnels due to ECMP because the probe
> >>>     packets may take a different path than the data packets. That
> >>>     is why AERO no longer uses RFC4821 probing.
> >> Regarding ECMP issues, I think we need to wrap this issue up. Here's
> >> what I propose:
> >>
> >> - point out that ECMP causes problems with PMTUD (and can cause problems
> >> with PLMTUD).
> >> - an interface has two choices:
> >>     - keep track of PMTU based on other packet context (flowID or next
> >> header info)
> >>     - merge PMTU feedback, taking the MIN of reported values
> > The problem is that some paths in the multipath may fail to deliver
> > the ICMPs. So there is no way to know whether the MIN has been
> > determined.
> That's no different from the multicast case, though.

In terms of PMTUD, you are right that in a certain sense unicast destinations
with ECMP are like multicast. In other words, a single destination with multiple
paths - some of which may not deliver ICMPs.

But (unlike unicast) the Internet community seems to be OK with the fact
that some multicast group members might not receive multicasts due to
an MTU black hole. For unicast destinations, it can be a real problem  if
there is an MTU black hole along one or more paths of the multipath. 

> >  Also the ingress may be handling transit packets sourced
> > by a very large number of original sources each of which produce
> > a very large number of distinct flows. So, there is no way for the
> > ingress to cache all of the flow information it handles.
> Min requires maintaining the same information as any interface would keep.

I was referring to PLPMTUD here. For PLPMTUD, the probes sent by the
tunnel ingress may take a very different path of the multipath than most
data packets will take. So, there is no way for the ingress to definitely
determine the Min MTU.

> >> There's no magic here. It's a lot like multicast - either keep track in
> >> a way that you *think* correlates to the different PMTU feedback or take
> >> the MIN.
> > It only works if all paths on the mutlipath can be counted on to deliver
> > the ICMPs. If any paths in the multipath fail to deliver the ICMPs, it
> > black holes. And, this is a known problem.
> Again, same as multicast, and frankly also the same as unicast when the
> ICMPs are blocked. That's a known problem with PMTUD.

As above, in some sense it is like multicast. But, the Internet community
has deemed it OK for multicast to black hole for some group members.
For unicast, the behavior has to be deterministic and neither PMTUD
nor PLPMTUD can guarantee that for tunnels.

> >> The current doc does need a scrub to make this point clearly and
> >> consistently.
> > It doesn't work, regardless of the amount of scrubbing.
> If your point is that PMTUD doesn't work and should never be used,
> that's clearly not accurate and unlikely to get WG consensus. You're
> welcome to try, though. However, 1981bis is on its way to increased
> standards maturity as we speak.

Multipath really is problematic for both PMTUD and PLPMTUD *for
tunnels* - that is aside from any considerations for promoting
RFC1981bis to standards-track in the more general sense.

> > ...
> >>> 17) Section 4.2.3 cites RFC4821, but PLPMTUD cannot be used by
> >>>     tunnels due to ECMP.
> >> I disagree; it can, but the system needs to either take the MIN or have
> >> a way to decouple discovered PMTUs in way that can be trusted to
> >> reasonably correspond to the ECMP splitting.
> > It doesn't work in the generalized case. The ECMP might split into a
> > multitude of distinct paths, and there is no way for the ingress to
> > known which of the paths have been tested. And, all it takes is
> > one un-tested path in the multipath and there is potential for a
> > black hole.
> If PLPMTUD thinks the protocol is making forward progress, then it is
> not a black hole.

"Making forward progress" over some paths of the multipath does not
guarantee progress over all paths - some paths might black hole.

> >
> >>> 20) Section 4.3.3, fourth paragraph, "A multipoint tunnel MUST
> >>>     have support for broadcast and multicast" - I think this
> >>>     would be better as a "SHOULD". RFC2529 and AERO support
> >>>     multicast, but RFC5214 does not yet it is widely deployed.
> >> Multicast or its equivalent. Otherwise, you can't support IPv6
> >> multicast, which is a required capability of IPv6.
> > Large NBMA links can connect many nodes - thousands or more.
> > So, for link-scoped multicast, serialized multicast (i.e., multicast
> > via iterative unicast) would not scale.
> Serial multicast is not the only equivalent. LANE pushed broadcast ARPs
> to a unicast ARP server.

Yes, and NBMA had the MARS proposal.

> And yes, that won't scale to millions of links, but then if/when it
> doesn't scale, then you cease to be able to claim this is a valid IP
> link. Multicast is not an optional protocol for IPv6 in particular.

RFC5214 and AERO work fine with IPv6.

> > That is why some large NBMA links (e.g., RFC5214, AERO) use unicast
> > NS/NA/RS/RA instead of link-scoped multicast, as permitted by RFC4861.
> > Link-scoped multicast service discovery (e.g., DHCPv6 discovery) is
> > supported via multicast mapping to a unicast link-layer address.
> Essentially like LANE.

Unlike LANE, AERO links are link-local-only (i.e., there are no on-link
subnets). AERO links connect only routers and/or hosts that act like

> >>> 23) Section 5.1, first sub-bullet under "Tunnels must obey core IP
> >>>     requirements", Are you meaning to talk about IPv4 DF=1?
> >> Yes, and that should be made more explicit. Also honoring the EMTU_R
> >> limits until told otherwise.
> > OK.
> >
> > One other comment. I agree with figures 12 and 13 but (and I think this is
> > a crucial point) I think they need a supporting sentence or two explaining
> > why the procedure is "fragment then encapsulate" and not "encapsulate
> > then fragment".
> Agreed.


> > This is the difference between tunnel fragmentation
> > and ordinary outer fragmentation, where your document is correctly
> > advocating tunnel fragmentation. To the best of my knowledge, this was
> > first documented in Section 3.1.7 of RFC2764 and should be cited as such.
> > At least, that is what Bob B. suggested to me about 10yrs ago.
> I'll check that...

Please see the final paragraph of Section 3.12 of the AERO spec for example

Thanks - Fred

> ----