[Int-area] Comments for draft-ietf-intarea-tunnels-02

Vincent Roca <vincent.roca@inria.fr> Wed, 08 June 2016 14:44 UTC

Return-Path: <vincent.roca@inria.fr>
X-Original-To: int-area@ietfa.amsl.com
Delivered-To: int-area@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1F25F12D114 for <int-area@ietfa.amsl.com>; Wed, 8 Jun 2016 07:44:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.345
X-Spam-Level:
X-Spam-Status: No, score=-8.345 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zTn95ahh707Y for <int-area@ietfa.amsl.com>; Wed, 8 Jun 2016 07:44:49 -0700 (PDT)
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B892E12B077 for <int-area@ietf.org>; Wed, 8 Jun 2016 07:44:47 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.26,439,1459807200"; d="asc'?scan'208,217";a="180609475"
Received: from geve.inrialpes.fr ([194.199.28.1]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 08 Jun 2016 16:44:45 +0200
From: Vincent Roca <vincent.roca@inria.fr>
X-Pgp-Agent: GPGMail 2.6b2
Content-Type: multipart/signed; boundary="Apple-Mail=_9CA745A3-42AA-4693-863C-6E8DDBF64F10"; protocol="application/pgp-signature"; micalg="pgp-sha512"
Date: Wed, 08 Jun 2016 16:44:44 +0200
Message-Id: <DC49264E-9590-4B63-B1B6-E87F486114C2@inria.fr>
To: Joe Touch <touch@isi.edu>, townsley@cisco.com
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/aDw2dV3m8rjjG_OtV0rNEhw6tg4>
Cc: int-area@ietf.org
Subject: [Int-area] Comments for draft-ietf-intarea-tunnels-02
X-BeenThere: int-area@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: IETF Internet Area Mailing List <int-area.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/int-area>, <mailto:int-area-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/int-area/>
List-Post: <mailto:int-area@ietf.org>
List-Help: <mailto:int-area-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/int-area>, <mailto:int-area-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jun 2016 14:44:52 -0000

Hello everybody,


First of all, I need to say that I found the draft-ietf-intarea-tunnels-02 I-D extremely useful
and wish I found it before.

We exchanged a few private emails with Joe last week and I promised to send detailed
comments on the list. Here they are... Sorry if this is a bit long.


** Section 2.2. Terminology, Link definition: it is said:
        "Link: a communication device that transfers messages between network devices..."
The notion of network device is undefined. I guess you mean "network nodes" defined above.


** In Ingress and Egress definitions: it is said:
        "Ingress: a network node that..."
The Ingress (resp. Egress) is not a network node. Section 3.4 has a better definition
IMHO. So:

OLD:
   o  Ingress: a network node that receives messages, encapsulates them
      according to the tunnel protocol, and transmits them into the
      tunnel. Note that the ingress and source can be co-located.

NEW:
   o  Ingress: a tunnel entry endpoint at the network node that the tunnel
      interconnects. It it typically described as "network interface"."
      The Ingress receives messages, encapsulates them according to the tunnel
      protocol, and transmits them into the tunnel. Note that the ingress and
      source can be co-located.

(similar changes for the Egress definition)


** In Egress definition: "The egress decapsulates datagrams..."
Maybe "IP datagrams" to be coherent with other uses of this term.


** I suggest adding the min LinkMTU values as they are considered throughout this I-D.
I also noticed that "LinkMTU" is used in the I-D to denote this value, not "LMTU".

OLD:
   o  Link MTU (LMTU): the largest message that can transit a link. Note
      that this need not be the native size of messages on the link.

NEW:
   o  Link MTU (LinkMTU): the largest message that can transit a link. LinkMTU is
      at minimum equal to 68 octets with IPv4 [RFC791] or 1280 octets with IPv6 [RFC2460].
      Note that this need not be the native size of messages on the link.


** In RMTU definition:
"receiver" is ambiguous. I understand that definitions can be generic, however
throughout this I-D we only consider the Egress Reassembly MTU so it's worth
specializing.
I also clarified the case of encapsulation header size that is deducted.

OLD:
   o  Reassembly MTU (RMTU): the largest message that can be reassembled
      by a receiver, and is not directly related to the link or path
      MTU. Sometimes also referred to as "receiver MTU".

NEW:
   o  Reassembly MTU: the largest message that can be reassembled
      by a receiver, i.e., either the Egress or the Destination. It is not
      directly related to the link or path MTU. Sometimes also referred to
      as "receiver MTU".

   o  Egress Reassembly MTU (EgressRMTU): the largest message that can be reassembled
      by the Egress. The minimum EgressRTMU value is 576 octets with IPv4 [RFC791]
      and 1500 octets with IPv6 [RFC2460] minus the encapsulation header size.


** Path MTU definition: I think it's worth clarifying that we focus on the tunnel PMTU in this I-D.
I also explain that the encapsulation header size has already been subtracted from the
Tunnel PMTU.

OLD:
   o  Path MTU (PMTU): the largest message that can transit a path.
      Typically, this is the minimum of the link MTUs of the links of
      the path.

NEW:
   o  Path MTU: the largest message that can transit a path.
      Typically, this is the minimum of the link MTUs of the links of
      the path.

   o  Tunnel Path MTU (TunnelPMTU): the largest Tunnel Transit Packet (or fragment)
      that can transit inside the tunnel's path. The encapsulation header size is already
      subtracted from TunnelPMTU and this TunnelPMTU  is not visible from outside the
      tunnel, unlike the Tunnel MTU.


** In tunnel MTU definition:
Say explicitly that the encapsulation header size has already been subtracted from
the Tunnel MTU,  since encapsulation headers are considered as "headers from lower
protocols" and therefore excluded.

OLD:
   o  Tunnel MTU (TMTU): the largest message that can transit a tunnel.
      Typically, this is limited by the egress reassembly MTU.

NEW:
   o  Tunnel MTU (TunnelMTU): the largest message that can transit a tunnel.
      The encapsulation header size is already subtracted from TunnelMTU.
      Typically, the TunnelMTU is limited by the EgressRMTU.


** I think a figure like this one (70 characters width ;-) can be useful. At least it
helped me a lot!

Tunnel Transit
Packet (TTP) or                Tunnel Link Packet
"tunneled pkt"  +------------+      (TLP) or       +------------+
--------------->|Network Node|   "tunnel packet"   |Network Node| --->
                |    +-------|  ________________   |-------+    |
                |    |Ingress|-|________________|->| Egress|    |
                +----+-------+                     +-------+----+
                   Tunnel MTU   Tunnel Path MTU     Egress Reassembly
                  or Link MTU                       MTU
with:
Link MTU == Tunnel MTU == Egress RMTU
(the maximum encapsulation header size is already deducted)


** In the doc there is a misspelling, the correct acronym is PLPMTUD (two P's),
not PLMTUD (5 occurrences).


** Section 3.1: this is a detail. It is said:
        "... the tunnel serves as a link to the devices it connects (here, Ra and Rb)."
Instead of "device" that is not defined in section 2.2, I think "network nodes" would
be more appropriate.


** Section 4.1, Fragmentation:
I suggest to change the algorithms along with their introduction text as follows:


   // VR: added a reminder that LinkMTU == TunnelMTU
   These rules apply at the host/router where the tunnel is attached (remind that the
   Link MTU is equal to the Tunnel MTU):

      // VR: I've changed the test order to follow the same logic in the two algorithms
      // VR: TTPsize is the official name, so I've removed TTP and sizeof(TTP)
      if (TTPsize <= linkMTU) then
         send TTP into the tunnel "interface" (i.e., ingress)
      else
         if (TTP can be fragmented, e.g., IPv4 DF=0) then
            // VR: as we test against linkMTU, it's better to mention linkMTU below,
            //     not TunMTU even if both are the equal
            split TTP into fragments of linkMTU size
            and send each fragment into the tunnel ingress
         else
            // VR: it's important to detail how the MTU field of the ICMP PTB is initialized
            drop TTP and send ICMP "too big" to TTP source that
            advertises Next-Hop MTU = linkMTU
         endif
      endif

   // VR: added a reminder that TunnelMTU == EgressRMTU
   These rules apply at the tunnel ingress of the host/router where the
   tunnel is attached (remind that the Tunnel MTU is equal to the Egress
   Reassembly MTU):

      // VR: as said above, TTPsize, not sizeof() operator
      if (TTPsize <= TunnelPathMTU) then
         encapsulate TTP as received and emit
      else
         // VR: this test is sufficient and IMHO more lisible
         // VR: I prefer to use TunnelMTU since we are at the ingress
         if (TTPsize <= TunnelMTU) then
            // VR: there was a mistake below that mentioned TunnelMTU (!) chunks.
            fragment TTP into chunks of size TunnelPathMTU
            // VR: there was a mistake below that mentioned TTP instead of TTP chunks
            encapsulate and emit each TTP chunk
         else
            {never happens; host/router already dropped by now}
         endif
      endif

NB: I personally prefer :
        if
        else if
        else
        endif
to nested if's, but this is not a big deal.


** Same section 4.1
The 3rd algorithm should be better introduced. For instance it is not clear
that this is just an example for algorithm 2, using minimum values.
Also it is not clear reading this paragraph if:
        option 1: the tunnel Path MTU must be >= (1280 - 40 - TOptSz), or
        option 2: the tunnel Path MTU must be >= 1280, to which one needs to
                subtract (40 + TOptSz) because of encapsulation
This is option1 if we compare with previous algorithm, but it's far from obvious...
It comes from the fact we compare with the TTP size before encapsulation. If
we compare with the size of encapsulated TTP, it would be different.

Globally, when to count or ignore encapsulation headers in MTUs is extremely
subtle and error prone. This I-D must avoid any ambiguity of this kind.

Here is a proposal:

OLD:
   For IPv4 or IPv6 over IPv6, the tunnel path MTU is a minimum of 1280
   minus the encapsulation header (40 bytes) with its options (TOptSz)
   and the egress reassembly MTU is 1500 minus the same amount:

NEW:
   As an example let us consider IPv4 over IPv6, or IPv6 over IPv6 tunneling,
   where IPv6 encapsulation adds a 40 byte fixed header plus options (i.e.,
   header extensions) of total size TOptSz. From [RFC2460] it follows that the
   Tunnel MTU must be at least 1280 bytes and the Egress Reassembly MTU must
   be at least 1500 - (40 + TOptSz) bytes. The Tunnel Path MTU must be a minimum
   of 1280 - (40 + TOptSz) bytes. Considering these minimum values, the previous
   algorithm becomes:


** Still section 4.1: we cannot say that 1500 is the minimum EgressRMTU as
we need to remove the encapsulation header size before as detailed above.
I also tried to explain why... Not easy! Do you have a better explanation?

NEW:
   When using IP directly over IP, the minimum Egress Reassembly MTU
   equals (576 - encapsulation header size) bytes for IPv4 [RFC791] and (1500 -
   encapsulation header size) bytes for IPv6 [RFC2460].
   Note that the encapsulation header size must be deducted from the  576 (resp. 1280)
   value because [RFC791] (resp. [RFC2460]) requires the destination (here the egress)
   to "be able to accept a fragmented packet that, after reassembly, is as large as 576
   (resp. 1500) octets". In our case the reassembled packet corresponds to the encapsulated
   TTP (i.e., packets (a) and (c) of Fig. 10) that was fragmented by the Ingress.


** Section 5.2: I fully agree with:
   "Detect when the egress MTU drops below the required minimum and shut
   down the tunnel if that happens - configuring the tunnel down and
   issuing a hard error may be the only way to detect this anomaly, and
   it's sufficiently important that the tunnel SHOULD be disabled."

However this not only an implementation aspect. This is also required for security
reasons and therefore it should be discussed in the Security Considerations section
as well. E.g., we observed opposite behaviors from an on-the-shelf IPsec
implementation and discussed consequences in our (now expired)
draft-roca-ipsecme-ptb-pts-attack-00.

[WG: we already discussed this aspect in private emails with Joe last week.]


** Section 5.4.4: it is said, for the "consistent with this doc" part, that:
        "Shuts the tunnel down if the tunnel path MTU isn't => 1280."
This contradicts the example of section 4.1 where TunnelPMTU == 1280-40-TOptSz
is valid...
Do you mean "Shuts the tunnel down if the Tunnel MTU, seen from the outside network,
isn't >= 1280"?


** Section 6, Security considerations
As already discussed privately with Joe, I may have other items but still need
to think it over.


** Section A.1: the following sentence is ambiguous:
OLD:
        "When the encapsulated packet exceeds the MTU of the tunnel, the
        packet needs to be fragmented."
This is the Tunnel Path MTU, not MTU of the tunnel which can be understood as
synonymous to the Tunnel MTU.

NEW:
        "When the packet (iH, iD) size exceeds the Tunnel Path MTU, the encapsulated packet
         needs to be fragmented."


** Section A.1/ A.2:
Discussion in section 4.1 assumes an Outer Fragmentation scheme. The notion of
Egress Reassembly also assumes an Outer Fragmentation. In fact it seems to be
the model assumed throughout this I-D but this is not clearly stated unless I missed
something. Am I right?

It's clearly more universal as it does not make any assumption on the inner packet
(can it be fragmented on path or not, as explained in A.2).
I think the I-D should identify from the beginning the assumption made (Outer vs.
Inner fragmentation) as it will impact the discussion. Said differently move Appendix A
in Section 3 for instance.


Cheers,

  Vincent