Re: [trill] Tsvart early review of draft-ietf-trill-over-ip-10

"Susan Hares" <shares@ndzh.com> Fri, 16 June 2017 01:45 UTC

From: Susan Hares <shares@ndzh.com>
To: 'Magnus Westerlund' <magnus.westerlund@ericsson.com>, tsv-art@ietf.org
Cc: trill@ietf.org, ietf@ietf.org, draft-ietf-trill-over-ip.all@ietf.org, 'Alia Atlas' <akatlas@gmail.com>, 'Jon Hudson' <jon.hudson@gmail.com>
References: <149754795560.13109.17521244075940607817@ietfa.amsl.com>
In-Reply-To: <149754795560.13109.17521244075940607817@ietfa.amsl.com>
Date: Thu, 15 Jun 2017 21:39:10 -0400
Message-ID: <00a301d2e641$5a6cfb00$0f46f100$@ndzh.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AQHMe5Py4K7gCbNISswrkrRWdm5LSqIziw8g
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/trill/ObVEzOqu8m28eAC7Lf-nBHULeJU>
Subject: Re: [trill] Tsvart early review of draft-ietf-trill-over-ip-10
Precedence: list

Magnus: 

Thank you for the very careful review.  Please let me chat with the authors regarding this document.  It make take some time, but each of your points will be responded to by the authors. 

Sue Hares 

-----Original Message-----
From: Magnus Westerlund [mailto:magnus.westerlund@ericsson.com] 
Sent: Thursday, June 15, 2017 1:33 PM
To: tsv-art@ietf.org
Cc: trill@ietf.org; ietf@ietf.org; draft-ietf-trill-over-ip.all@ietf.org
Subject: Tsvart early review of draft-ietf-trill-over-ip-10

Reviewer: Magnus Westerlund
Review result: Not Ready

Early review of draft-ietf-trill-over-ip-10
Reviewer: Magnus Westerlund
Review result: Not Ready

TSV-ART review comments:

I have set this to not ready as there are several issues, some significant that could affect the protocol realization significantly. Some may be me missing things in TRILL, I was not that familiar with it before this review and I have only tried looking up things, not reading the whole earlier specifications. So don't hesitate to push back and provide pointers to things that can resolve issues. The authors and the WG clearly have thought about a lot of issues and dealt with much already.

Diffserv usage
--------------

Section 4.3:

   TRILL over IP implementations MUST support setting the DSCP value in
   the outer IP Header of TRILL packets they send by mapping the TRILL
   priority and DEI to the DSCP. They MAY support, for a TRILL Data
   packet where the native frame payload is an IP packet, mapping the
   DSCP in this inner IP packet to the outer IP Header with the default
   for that mapping being to copy the DSCP without change.

I think it is fine to require that implementations are capable  of setting DSCP values on the outer IP header. However, I fail to see any discussion of the potential issues with actually setting the DSCP values. It is one thing to do this in an IP back bone use case where one can know and have control over the PHB that the DSCP values maps to. But otherwise, over general internet the behavior is not that predictable. One can easily be subject to policers or remapping. Also as the actual DSCP code point usage is domain specific this is difficult. Priority reversal is likely the least of the problems that this can run into over general Internet.

Section 4.3:

   The default TRILL priority and DEI to DSCP mapping, which may be
   configured per TRILL over IP port, is an follows. Note that the DEI
   value does not affect the default mapping and, to provide a
   potentially lower priority service than the default priority 0,
   priority 1 is considered lower priority than 0. So the priority
   sequence from lower to higher priority is 1, 0, 2, 3, 4, 5, 6, 7.

      TRILL Priority  DEI  DSCP Field (Binary/decimal)
      --------------  ---  -----------------------------
                  0   0/1  001000 / 8
                  1   0/1  000000 / 0
                  2   0/1  010000 / 16
                  3   0/1  011000 / 24
                  4   0/1  100000 / 32
                  5   0/1  101000 / 40
                  6   0/1  110000 / 48
                  7   0/1  111000 / 56

This appear to be an problematic mapping. At least for prio 0 and 1. As priority 1 appears to be intended to be higher than priority 0, it is interesting that it is mapped to CS1, which to quote
https://datatracker.ietf.org/doc/rfc7657/:

CS1 ('001000') was subsequently designated as the recommended
      codepoint for the Lower Effort (LE) PHB [RFC3662].

So what is proposed can in a network using default mapping, result in that you get priority 0 to be lower priority than 1. Plus that in some networks this can also results in strange remapping that results in a different PHB for CS1 than.

MTU and Fragmentation
---------------------

I think there are two main issue here. The first one is MTUD discovery of the actual IP path MTU between the ports. That will be needed to prevent a lot of traffic going into MTU black holes. Especially as TRILL requries
1470 byte support which is likey above a lot of paths.

Section 8.4:

   Path MTU discovery [RFC4821] should be useful
   in determining the IP MTU between a pair of RBridge ports with IP
   connectivity.

The issue with RFC4821 is that it has requirements on the packetization layer.
Trill appears to have several components that are useful. However, it will require a specification of the procedure to result in a useful tool.

Section 8.4:

   TRILL IS-IS MTU PDUs, as specified in Section 5 of [RFC6325] and in
   [RFC7177], can be used to obtain added assurance of the MTU of a
   link.

Yes, that can confirm working MTUs that are at 1470 or above, but appears prevented from working below 1470?

Thus, it appears that there is a lack of mechanism here to actually get a valid and functional MTU from TRILL in the cases where the Path MTU is below 1470. If I am wrong good, but I think this is an important piece for how to handle the next main issue.

UDP encapsulation and IP fragments.
I see it as a big issue that UDP encapsulation is the native one, and that relies on IP fragmentation despite the need for reliable fragmentation. With the setup of having to support 1470 MTU on TRILL level some packets will be fragmented in many environments. That will lead to a lot of losses, and as discussed below a very big problem with middleboxes. The main problem here is that if one tries to rely on IP fragments one will have issues with packets ending up in black holes. And different problems depending on IPv4 or IPv6.
IPv6 is lilkely the lesser problem assuming that one have working PMTUD.

There are several ways out of this.

1. Detect issues and use TCP encapsulation with correctly set MSS to not get IP fragements 2. Determine MTU and implement an fragmentation mechanism on top of UDP.

Zero Checksum:
--------------

Section 5.4:

UDP Checksum - as specified in [RFC0768]

Considering the fast path encapsulation desire, I am surprised to not see any mentioning of use of zero checksum here. Raising the zero checksum and forward reference would be good I think.

And then Section 8.5:

   The requirements for the usage of the zero UDP Checksum in a UDP
   tunnel protocol are detailed in [RFC6936]. These requirements apply
   to the UDP based TRILL over IP encapsulations specified herein
   (native and VXLAN), which are applications of UDP tunnel.

If you actually intended to allow zero checksum, then you actually should document that Trill fulfills the requirements that the applicability statement raises. I have not analyzed how well it meets these requirements.

Please review Section 6.2 of RFC 8086 for example how that can be done.

TCP Encapsulation issue
-----------------------

Section 5.6:

The TCP encapsulation appear to be missing an delimiter format allowing each individual TRILL packet/payload to be read out of the TCP's byte stream. In other words, a normal implementation has no way of ensuring that the TCP payload starts with the start of a new TRILL payload. Multiple small TRILL payloads may be included in the same TCP payload, and also only parts as TCP is one way of dealing with TRILL packets that are larger than the IP+Encapsulation MTU that actually will work.

This comment is based on that there appear to be no length fields included in the TRILL header. The most straight forward delimiter is a 2-byte length field for the TRILL payload to be encapsulated.

Section 5.6:

TCP endpoint requirements. I do wonder if an application like TRILL actual would need to discuss performance impacting implementation choices or limitations. For example use of NAGLE, the requirements on buffer sizes in relation to Bandwidth delay products, as buffer memory in a RBridge will impact performance.

Congestion Control
------------------
First thanks for the effort here.

8.1.2 In Other Environments

   Where UDP based encapsulation headers are used in TRILL over IP in
   environments other than those discussed in Section 8.1.1, specific
   congestion control mechanisms are commonly needed.  However, if the
   traffic being carried by the TRILL over IP link is already congestion
   controlled and the size and volatility of the TRILL IS-IS link state
   database is limited, then specific congestion control may not be
   needed. See [RFC8085] Section 3.1.11 for further guidance.

This is correct, however my question is if the RBridges have anyway of knowing which traffic is actually congestion controlled, considering that TRILL provides an layer 2 abstraction. I wonder if there should be any type of white list of the types of layer 2 payloads that can be assumed to be congestion controlled, and thus okay to forward over IP paths? I am worried that without any recommendation to prevent traffic that is not controlled to be forwarded, can lead to congestion issues.

The other issue I think may exist is the issue serial unicast emulation of broadcast/multicast creates. As this amplifies the outgoing packet rate with a factor of how many addresses are configured for serial unicast this can be significant traffic expansion. Thus, I think additional considerations are needed here, and maybe rate limiting of the amount of traffic to be multicasted.

Flow and ECMP
-------------

Section 8.3:

For example, for TRILL
   Data, this entropy field could be based on some hash of the
   Inner.MacDA, Inner.MacSA, and Inner.VLAN or Inner.FGL.

I would appreciate clearer references to what these fields are.

If I understand this correctly, the idea here is to look into the inner layer 2 frames, and use the flow equivalents that exists on that level and hash that into value that maps the flows onto the source port range.

I think this text should include a summary of the principle and ensure to note the important requirement that what is considered flows in the inner must not result in being striped over multiple source ports as this may lead to reordering issues due to packets taking different paths.

NAT and TRILL over IP:
Section 8.5:

If one like to use TRILL over IP through a NAT, then there are some very important considerations that are missing. First the need for static binding configurations or the need for determining ones external address(es) and be able to communicate that to the peer RBridges, and in addition ensure that one has keep-alives to that the NAT binding never times out.

Next is the issue that there is almost zero chance of getting a IP/UDP encapsulation TRILL payload through the NAT if it results in IP fragmentation, as NATs don't do defragment and refragmented on the internal side, and an IP fragment lacks UDP port and thus can't be matched to binding.

Also if you like to run IP/ESP through a NAT, then you most likely need the IP/UDP/ESP encapsulation (https://tools.ietf.org/html/rfc3948). Note that this will restrict the MTU even further and thus ensure that the 1470 requirement cannot be fulfilled even without additional tunnels over an 1500 bytes MTU Ethernet infrastructure.

I would note that also firewalls likely have issues with IP fragments for the same reason, they require significant amount of state to be verified if they should be let through.

In general I think you should create a configuration that has chance to work through most middleboxes, but I think you should require static bindings. I think that configuration is, and don't laugh now, but IP/UDP/ESP/TCP/TRILL, otherwise you will not be able to have both security and reliable fragmentation of TRILL packets.

Cheers

Magnus Westerlund

[trill] Tsvart early review of draft-ietf-trill-o… Magnus Westerlund
Re: [trill] Tsvart early review of draft-ietf-tri… Susan Hares
Re: [trill] Tsvart early review of draft-ietf-tri… Donald Eastlake
Re: [trill] Tsvart early review of draft-ietf-tri… Joe Touch
Re: [trill] Tsvart early review of draft-ietf-tri… Joe Touch
Re: [trill] Tsvart early review of draft-ietf-tri… Magnus Westerlund
Re: [trill] Tsvart early review of draft-ietf-tri… Donald Eastlake
Re: [trill] Tsvart early review of draft-ietf-tri… Donald Eastlake
Re: [trill] Tsvart early review of draft-ietf-tri… Magnus Westerlund
Re: [trill] Tsvart early review of draft-ietf-tri… Joe Touch
Re: [trill] Tsvart early review of draft-ietf-tri… Susan Hares
Re: [trill] [Tsv-art] Tsvart early review of draf… Black, David
Re: [trill] Tsvart early review of draft-ietf-tri… Joe Touch
Re: [trill] [Tsv-art] Tsvart early review of draf… Black, David
Re: [trill] Tsvart early review of draft-ietf-tri… Donald Eastlake
Re: [trill] [Tsv-art] Tsvart early review of draf… Donald Eastlake
Re: [trill] Tsvart early review of draft-ietf-tri… Susan Hares
Re: [trill] Tsvart early review of draft-ietf-tri… Donald Eastlake