Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt

Joe Touch <touch@isi.edu> Wed, 29 March 2017 22:06 UTC

Return-Path: <touch@isi.edu>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 161D3126DDF for <int-area@ietfa.amsl.com>; Wed, 29 Mar 2017 15:06:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.901
X-Spam-Level:
X-Spam-Status: No, score=-6.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LWMpos0b4oCM for <int-area@ietfa.amsl.com>; Wed, 29 Mar 2017 15:06:35 -0700 (PDT)
Received: from boreas.isi.edu (boreas.isi.edu [128.9.160.161]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A7E79126C23 for <int-area@ietf.org>; Wed, 29 Mar 2017 15:06:35 -0700 (PDT)
Received: from [192.168.1.189] (cpe-172-250-240-132.socal.res.rr.com [172.250.240.132]) (authenticated bits=0) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id v2TM6BlH021582 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Wed, 29 Mar 2017 15:06:13 -0700 (PDT)
To: "Templin, Fred L" <Fred.L.Templin@boeing.com>
References: <149062888196.30638.8369941985115982808@ietfa.amsl.com> <f5ab0422-fd49-9082-147b-8312e974de7e@isi.edu> <4d2a86f4948c4dc49ab3b0729743d028@XCH15-06-08.nw.nos.boeing.com> <583e59d2-f846-6cd6-8e15-f3a0888889ac@isi.edu> <6ede932f07ca4b8ebd17f82e17eb4cf4@XCH15-06-08.nw.nos.boeing.com>
Cc: "int-area@ietf.org" <int-area@ietf.org>
From: Joe Touch <touch@isi.edu>
Message-ID: <340d81c0-8af9-b353-44ec-f40c722745f5@isi.edu>
Date: Wed, 29 Mar 2017 15:06:10 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <6ede932f07ca4b8ebd17f82e17eb4cf4@XCH15-06-08.nw.nos.boeing.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: touch@isi.edu
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/Kj9-ZvoH7bHwGn-BCsGZZ_MUaeI>
Subject: Re: [Int-area] I-D Action: draft-ietf-intarea-tunnels-05.txt
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Mar 2017 22:06:37 -0000

Hi, Fred,

I think we're agreed except for PMTUD and PLMTUD. See below about the
latter (AFAICT, if it black holes, then the PLMTUD would detect a
failure to make forward progress). Black-holing in PMTUD is known and
not something this (or any other doc) is fixing, but also a known standard.

Joe


On 3/29/2017 2:18 PM, Templin, Fred L wrote:
> ...
>>> 10) Section 4.2.1, bulleted list toward bottom of section, tunnel
>>>    "atom" is a very strange word to me. Tunnel "cell"?
>> The concept of an atomic packet was defined in RFC 6864. This is derived
>> from that. Cell would be introducing a new term, one that is overloaded
>> with ATM, which we want to avoid.
> OK, but then please find a way to call it an "atomic packet".

Agreed.
> ...
>>> 13) Section 4.2.2, final sentence is incorrect. RFC4821-style MTU
>>>     probing cannot be used by tunnels due to ECMP because the probe
>>>     packets may take a different path than the data packets. That
>>>     is why AERO no longer uses RFC4821 probing.
>> Regarding ECMP issues, I think we need to wrap this issue up. Here's
>> what I propose:
>>
>> - point out that ECMP causes problems with PMTUD (and can cause problems
>> with PLMTUD).
>> - an interface has two choices:
>>     - keep track of PMTU based on other packet context (flowID or next
>> header info)
>>     - merge PMTU feedback, taking the MIN of reported values
> The problem is that some paths in the multipath may fail to deliver
> the ICMPs. So there is no way to know whether the MIN has been
> determined.

That's no different from the multicast case, though.

>  Also the ingress may be handling transit packets sourced
> by a very large number of original sources each of which produce
> a very large number of distinct flows. So, there is no way for the
> ingress to cache all of the flow information it handles.
Min requires maintaining the same information as any interface would keep.


>
>> There's no magic here. It's a lot like multicast - either keep track in
>> a way that you *think* correlates to the different PMTU feedback or take
>> the MIN.
> It only works if all paths on the mutlipath can be counted on to deliver
> the ICMPs. If any paths in the multipath fail to deliver the ICMPs, it
> black holes. And, this is a known problem.
Again, same as multicast, and frankly also the same as unicast when the
ICMPs are blocked. That's a known problem with PMTUD.

>
>> The current doc does need a scrub to make this point clearly and
>> consistently.
> It doesn't work, regardless of the amount of scrubbing.

If your point is that PMTUD doesn't work and should never be used,
that's clearly not accurate and unlikely to get WG consensus. You're
welcome to try, though. However, 1981bis is on its way to increased
standards maturity as we speak.

> ...
>>> 17) Section 4.2.3 cites RFC4821, but PLPMTUD cannot be used by
>>>     tunnels due to ECMP.
>> I disagree; it can, but the system needs to either take the MIN or have
>> a way to decouple discovered PMTUs in way that can be trusted to
>> reasonably correspond to the ECMP splitting.
> It doesn't work in the generalized case. The ECMP might split into a
> multitude of distinct paths, and there is no way for the ingress to
> known which of the paths have been tested. And, all it takes is
> one un-tested path in the multipath and there is potential for a
> black hole. 
If PLPMTUD thinks the protocol is making forward progress, then it is
not a black hole.

>
>>> 20) Section 4.3.3, fourth paragraph, "A multipoint tunnel MUST
>>>     have support for broadcast and multicast" - I think this
>>>     would be better as a "SHOULD". RFC2529 and AERO support
>>>     multicast, but RFC5214 does not yet it is widely deployed.
>> Multicast or its equivalent. Otherwise, you can't support IPv6
>> multicast, which is a required capability of IPv6.
> Large NBMA links can connect many nodes - thousands or more.
> So, for link-scoped multicast, serialized multicast (i.e., multicast
> via iterative unicast) would not scale.

Serial multicast is not the only equivalent. LANE pushed broadcast ARPs
to a unicast ARP server.

And yes, that won't scale to millions of links, but then if/when it
doesn't scale, then you cease to be able to claim this is a valid IP
link. Multicast is not an optional protocol for IPv6 in particular.

> That is why some large NBMA links (e.g., RFC5214, AERO) use unicast
> NS/NA/RS/RA instead of link-scoped multicast, as permitted by RFC4861.
> Link-scoped multicast service discovery (e.g., DHCPv6 discovery) is
> supported via multicast mapping to a unicast link-layer address. 
Essentially like LANE.

>
>>> 23) Section 5.1, first sub-bullet under "Tunnels must obey core IP
>>>     requirements", Are you meaning to talk about IPv4 DF=1?
>> Yes, and that should be made more explicit. Also honoring the EMTU_R
>> limits until told otherwise.
> OK.
>
> One other comment. I agree with figures 12 and 13 but (and I think this is
> a crucial point) I think they need a supporting sentence or two explaining
> why the procedure is "fragment then encapsulate" and not "encapsulate
> then fragment". 
Agreed.

> This is the difference between tunnel fragmentation
> and ordinary outer fragmentation, where your document is correctly
> advocating tunnel fragmentation. To the best of my knowledge, this was
> first documented in Section 3.1.7 of RFC2764 and should be cited as such.
> At least, that is what Bob B. suggested to me about 10yrs ago.
I'll check that...

----