Re: [trill] [Tsv-art] Tsvart early review of draft-ietf-trill-over-ip-10 - ECN & DSCP considerations

"Black, David" <David.Black@dell.com> Sat, 01 July 2017 14:53 UTC

Return-Path: <David.Black@dell.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 339A1128BC8; Sat, 1 Jul 2017 07:53:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.5
X-Spam-Level:
X-Spam-Status: No, score=-5.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=dell.com header.b=dGs3BT0g; dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=emc.com header.b=GGA9IuUs
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v3aQQyH4PxPV; Sat, 1 Jul 2017 07:53:41 -0700 (PDT)
Received: from esa1.dell-outbound.iphmx.com (esa1.dell-outbound.iphmx.com [68.232.153.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C0A6F127180; Sat, 1 Jul 2017 07:53:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dell.com; i=@dell.com; q=dns/txt; s=smtpout; t=1498920482; x=1530456482; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=tOYUwwUkY2e9/jKtJgENIjVUb0YoJH7mujKOsyHFIoM=; b=dGs3BT0gZtKrnf3Cn/QcIk2kUPJTJApEs0cp+SpO9iErnc4wK3q/X1a9 zq7+HjnLoRxgGQzXil2/CYMbwHNLEoa/gmK9y13XVj6eWFQHgLXSO8lOt PTcnjDme9m9n6PT3BCVAMjoMsvP+IfK7jSiklIavFGsCDxSEG+W2ezXjd k=;
Received: from esa1.dell-outbound2.iphmx.com ([68.232.153.201]) by esa1.dell-outbound.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Jul 2017 09:47:59 -0500
From: "Black, David" <David.Black@dell.com>
Received: from mailuogwdur.emc.com ([128.221.224.79]) by esa1.dell-outbound2.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Jul 2017 20:48:28 +0600
Received: from maildlpprd53.lss.emc.com (maildlpprd53.lss.emc.com [10.106.48.157]) by mailuogwprd52.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id v61ErYiP013098 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 1 Jul 2017 10:53:35 -0400
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd52.lss.emc.com v61ErYiP013098
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=emc.com; s=jan2013; t=1498920815; bh=Ykwf0xoaRMa8qYDBCgz0fvPavOc=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:Content-Transfer-Encoding:MIME-Version; b=GGA9IuUskNWMoEuEzOYJK/AGlgywt0ZlRWvFxBi3pW3W128+lTToD/CBMDtfTmiLY 04t6diKSkZvUvF2jrnkfPdek6r9ZgpYl953kJDqWz5E6EVwlF00jmCV6oAYHfCscrp 9Sio0OUwsXipYFsuHC2gASiRWGyqj6w7MZFBw8/0=
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd52.lss.emc.com v61ErYiP013098
Received: from mailusrhubprd54.lss.emc.com (mailusrhubprd54.lss.emc.com [10.106.48.19]) by maildlpprd53.lss.emc.com (RSA Interceptor); Sat, 1 Jul 2017 10:53:19 -0400
Received: from MXHUB304.corp.emc.com (MXHUB304.corp.emc.com [10.146.3.30]) by mailusrhubprd54.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id v61ErJsN004413 (version=TLSv1.2 cipher=AES128-SHA256 bits=128 verify=FAIL); Sat, 1 Jul 2017 10:53:19 -0400
Received: from MX307CL04.corp.emc.com ([fe80::849f:5da2:11b:4385]) by MXHUB304.corp.emc.com ([10.146.3.30]) with mapi id 14.03.0352.000; Sat, 1 Jul 2017 10:53:16 -0400
To: Donald Eastlake <d3e3e3@gmail.com>
CC: Magnus Westerlund <magnus.westerlund@ericsson.com>, "tsv-art@ietf.org" <tsv-art@ietf.org>, "draft-ietf-trill-over-ip.all@ietf.org" <draft-ietf-trill-over-ip.all@ietf.org>, IETF Discussion <ietf@ietf.org>, "trill@ietf.org" <trill@ietf.org>
Thread-Topic: [Tsv-art] Tsvart early review of draft-ietf-trill-over-ip-10 - ECN & DSCP considerations
Thread-Index: AdLurvO3SYOqL32fRKuavuJggYEWYwDffoiAABLbSJA=
Date: Sat, 01 Jul 2017 14:53:15 +0000
Message-ID: <CE03DB3D7B45C245BCA0D243277949362FB59E00@MX307CL04.corp.emc.com>
References: <CE03DB3D7B45C245BCA0D243277949362FB4DF88@MX307CL04.corp.emc.com> <CAF4+nEFUmB=eMO6rvK6-YiL+3Adp_t1QVqgMDBHZFn73_JxEfw@mail.gmail.com>
In-Reply-To: <CAF4+nEFUmB=eMO6rvK6-YiL+3Adp_t1QVqgMDBHZFn73_JxEfw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.105.8.135]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Sentrion-Hostname: mailusrhubprd54.lss.emc.com
X-RSA-Classifications: public
Archived-At: <https://mailarchive.ietf.org/arch/msg/trill/6Fy9lug-aV_sUmbOb3Sw-kIN_lM>
Subject: Re: [trill] [Tsv-art] Tsvart early review of draft-ietf-trill-over-ip-10 - ECN & DSCP considerations
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/trill/>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Jul 2017 14:53:45 -0000

Also, as noted earlier in this discussion, RFC 7657 explicitly discourages use of multiple DSCPs in a single TCP connection.  That needs to be reflected in the TCP encapsulation text in the trill-over-ip draft - in particular, the current text in Section 4.3 on mapping to DSCPs from TRILL priority and DEI does not appear to be consistent with RFC 7657 for TCP-based encapsulation.

Thanks, --David

> -----Original Message-----
> From: Donald Eastlake [mailto:d3e3e3@gmail.com]
> Sent: Friday, June 30, 2017 9:43 PM
> To: Black, David <david.black@emc.com>
> Cc: Magnus Westerlund <magnus.westerlund@ericsson.com>; tsv-
> art@ietf.org; draft-ietf-trill-over-ip.all@ietf.org; IETF Discussion
> <ietf@ietf.org>; trill@ietf.org
> Subject: Re: [Tsv-art] Tsvart early review of draft-ietf-trill-over-ip-10 - ECN &
> DSCP considerations
> 
> Hi David,
> 
> On Mon, Jun 26, 2017 at 3:04 PM, Black, David <David.Black@dell.com>
> wrote:
> > Adding some comments on ECN and DSCP ...
> >
> >> > Section 4.3:
> >> >
> >> >    TRILL over IP implementations MUST support setting the DSCP value in
> >> >    the outer IP Header of TRILL packets they send by mapping the TRILL
> >> >    priority and DEI to the DSCP. They MAY support, for a TRILL Data
> >> >    packet where the native frame payload is an IP packet, mapping the
> >> >    DSCP in this inner IP packet to the outer IP Header with the default
> >> >    for that mapping being to copy the DSCP without change.
> >> >
> >> > I think it is fine to require that implementations are capable of setting
> >> > DSCP values on the outer IP header. However, I fail to see any discussion of
> >> > the potential issues with actually setting the DSCP values. It is one thing to
> >> > do this in an IP back bone use case where one can know and have control
> >> > over the PHB that the DSCP values maps to. But otherwise, over general internet the
> >> > behavior is not that predictable. One can easily be subject to policers or
> >> > remapping. Also as the actual DSCP code point usage is domain specific this is
> >> > difficult. Priority reversal is likely the least of the problems that this can
> >> > run into over general Internet.
> >>
> >> It sounds like appropriate discussion and warnings about these issues
> >> would resolve the above comment.
> >
> > For ECN, see RFC 6040 and draft-ietf-tsvwg-rfc6040update-shim.  In particular,
> > copying the inner ECN codepoint to the outer IP header encapsulation without
> > requiring decapsulation processing as specified in RFC 6040 or the 6040update-shim
> > draft can lose congestion indications from the network and hence is wrong
> > (it's also wrong wrt RFC 3168, but RFC 6040 and the 6040update-shim drafts are
> > better and more current references).
> 
> That's a good point.
> 
> > For DSCPs, start with RFC 2983 - thinking about the validity (or likely validity)
> > of the outer DSCP at the decapsulator may help in choosing whether to
> > recommend a uniform model (e.g., copy DSCP out at ingress, copy back in at
> > egress) or a pipe model (e.g., do something reasonable for outer DSCP at
> > ingress, ignore it on egress) as the implementation default.
> 
> I believe the default behavior in the current draft is the best
> default. That sets DSCP based on the same TRILL Header indicia that
> controls default QoS on non-IP links.
> 
> > -- DSCP mapping to/from TRILL/Ethernet priorities
> >
> >> The intent in the draft is to reflect the default relative priority of
> >> the different priority code points in IEEE Std 802.1Q where priority 1
> >> is lower than priority 0. At a quick look, it appears to me that RFC
> >> 2474 requires that 0x001000 be handled as being of a priority not
> >> lower than the priority with which 0x000000 is handled. Yet RFC 3662,
> >> which you point to, seems to suggest using 0x001000 as a lower
> >> priority code point than 0x000000. Given that 3662 not only does not
> >> update 2474 but is only Informational while 2474 is Standards Track, I
> >> would say that 2474 dominates and that this draft makes the best
> >> assumptions it can about default behavior...
> >
> > Well ... that's a discussion about text in RFCs that are well over a decade
> > old, and in an area (less-than-best-effort service) where the aspirations
> > of at least RFC 3662 weren't realized ... but that RFC is not safe to ignore,
> > either.
> >
> > In practice, the specification of CS1 for less-than-best-effort service has
> > been promulgated by RFC 4594 rather than RFC 3662, and RFC 4594 has
> > had significant "running code" impact on network design and operation.
> >
> > As Magnus mentioned RFC7657, I strongly suggest starting from the
> > RFC 7657 discussion of this topic in order to figure out what to do.  I'm
> > not sure what to recommend, but I do think that starting from
> > RFC 7657 (rather than RFC 2474 and RFC 3662) is the better approach.
> 
> OK.
> 
> > FWIW, the TSVWG WG is in the process of figuring out which DSCP
> > to recommend for less-than-best-effort-service in place of CS1 - that's
> > likely to be an active topic of discussion in Prague.
> 
> I'll try to attend that session.
> 
> Thanks,
> Donald
> ===============================
>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>  155 Beaver Street, Milford, MA 01757 USA
>  d3e3e3@gmail.com
> 
> > Thanks, --David
> >
> >> -----Original Message-----
> >> From: Tsv-art [mailto:tsv-art-bounces@ietf.org] On Behalf Of Donald
> >> Eastlake
> >> Sent: Sunday, June 25, 2017 8:07 PM
> >> To: Magnus Westerlund <magnus.westerlund@ericsson.com>
> >> Cc: tsv-art@ietf.org; draft-ietf-trill-over-ip.all@ietf.org; IETF Discussion
> >> <ietf@ietf.org>; trill@ietf.org
> >> Subject: Re: [Tsv-art] Tsvart early review of draft-ietf-trill-over-ip-10
> >>
> >> Hi Magnus,
> >>
> >> Thanks for the extensive review. See my responses below.
> >>
> >> On Thu, Jun 15, 2017 at 1:32 PM, Magnus Westerlund
> >> <magnus.westerlund@ericsson.com> wrote:
> >> >
> >> > Reviewer: Magnus Westerlund
> >> > Review result: Not Ready
> >> >
> >> > Early review of draft-ietf-trill-over-ip-10
> >> > Reviewer: Magnus Westerlund
> >> > Review result: Not Ready
> >> >
> >> > TSV-ART review comments:
> >> >
> >> > I have set this to not ready as there are several issues, some significant
> that
> >> > could affect the protocol realization significantly. Some may be me
> missing
> >> > things in TRILL, I was not that familiar with it before this review and I have
> >> > only tried looking up things, not reading the whole earlier specifications.
> So
> >> > don't hesitate to push back and provide pointers to things that can
> resolve
> >> > issues. The authors and the WG clearly have thought about a lot of issues
> >> and
> >> > dealt with much already.
> >>
> >> OK. Hopefully we can resolve these one way or the other.
> >>
> >> > Diffserv usage
> >> > --------------
> >> >
> >> > Section 4.3:
> >> >
> >> >    TRILL over IP implementations MUST support setting the DSCP value in
> >> >    the outer IP Header of TRILL packets they send by mapping the TRILL
> >> >    priority and DEI to the DSCP. They MAY support, for a TRILL Data
> >> >    packet where the native frame payload is an IP packet, mapping the
> >> >    DSCP in this inner IP packet to the outer IP Header with the default
> >> >    for that mapping being to copy the DSCP without change.
> >> >
> >> > I think it is fine to require that implementations are capable  of setting
> >> > DSCP values on the outer IP header. However, I fail to see any discussion
> of
> >> > the potential issues with actually setting the DSCP values. It is one thing
> to
> >> > do this in an IP back bone use case where one can know and have control
> >> over
> >> > the PHB that the DSCP values maps to. But otherwise, over general
> >> internet the
> >> > behavior is not that predictable. One can easily be subject to policers or
> >> > remapping. Also as the actual DSCP code point usage is domain specific
> this
> >> is
> >> > difficult. Priority reversal is likely the least of the problems that this can
> >> > run into over general Internet.
> >>
> >> It sounds like appropriate discussion and warnings about these issues
> >> would resolve the above comment.
> >>
> >> > Section 4.3:
> >> >
> >> >    The default TRILL priority and DEI to DSCP mapping, which may be
> >> >    configured per TRILL over IP port, is an follows. Note that the DEI
> >> >    value does not affect the default mapping and, to provide a
> >> >    potentially lower priority service than the default priority 0,
> >> >    priority 1 is considered lower priority than 0. So the priority
> >> >    sequence from lower to higher priority is 1, 0, 2, 3, 4, 5, 6, 7.
> >> >
> >> >       TRILL Priority  DEI  DSCP Field (Binary/decimal)
> >> >       --------------  ---  -----------------------------
> >> >                   0   0/1  001000 / 8
> >> >                   1   0/1  000000 / 0
> >> >                   2   0/1  010000 / 16
> >> >                   3   0/1  011000 / 24
> >> >                   4   0/1  100000 / 32
> >> >                   5   0/1  101000 / 40
> >> >                   6   0/1  110000 / 48
> >> >                   7   0/1  111000 / 56
> >> >
> >> > This appear to be an problematic mapping. At least for prio 0 and 1. As
> >> > priority 1 appears to be intended to be higher than priority 0, it is
> >> > interesting that it is mapped to CS1, which to quote
> >> > https://datatracker.ietf.org/doc/rfc7657/:
> >> >
> >> > CS1 ('001000') was subsequently designated as the recommended
> >> >       codepoint for the Lower Effort (LE) PHB [RFC3662].
> >> >
> >> > So what is proposed can in a network using default mapping, result in
> that
> >> you
> >> > get priority 0 to be lower priority than 1. Plus that in some networks this
> can
> >> > also results in strange remapping that results in a different PHB for CS1
> >> than.
> >>
> >> The intent in the draft is to reflect the default relative priority of
> >> the different priority code points in IEEE Std 802.1Q where priority 1
> >> is lower than priority 0. At a quick look, it appears to me that RFC
> >> 2474 requires that 0x001000 be handled as being of a priority not
> >> lower than the priority with which 0x000000 is handled. Yet RFC 3662,
> >> which you point to, seems to suggest using 0x001000 as a lower
> >> priority code point than 0x000000. Given that 3662 not only does not
> >> update 2474 but is only Informational while 2474 is Standards Track, I
> >> would say that 2474 dominates and that this draft makes the best
> >> assumptions it can about default behavior...
> >>
> >> > MTU and Fragmentation
> >> > ---------------------
> >> >
> >> > I think there are two main issue here. The first one is MTUD discovery
> >> > of the actual IP path MTU between the ports. That will be needed to
> >> prevent
> >> > a lot of traffic going into MTU black holes. Especially as TRILL requries
> >> > 1470 byte support which is likey above a lot of paths.
> >>
> >> Seems like it would depend on the environments where TRILL was used.
> >> For example, I do not think 1470 would be a problem in most Data
> >> Center or Internet Exchange point uses, for example. Data Centers
> >> sometimes support 9K jumbo frames and the like.
> >>
> >> In fact, it is probably bad to focus too much on 1470 -- that is a
> >> required minimum to be sure that reasonable size link state PDUs can
> >> be successfully flooded through the TRILL campus so that routing will
> >> work. However, it would commonly be the case that, for the TRILL
> >> campus to be useful in a particular case, links need to be able to
> >> carry the expected size TRILL Data packets. For example, if there were
> >> two parts of a TRILL campus connected by one or a few TRILL over IP
> >> links and the end stations in each part were assuming they could use
> >> 1500 byte Ethernet packets, then the TRILL over IP links would need to
> >> support an MTU based on 1500 + TRILL Header + IP and TRILL over IP
> >> encapsulation. And more if security was being used or there were any
> >> other reasons for additional headers/encapsulation...
> >>
> >> > Section 8.4:
> >> >
> >> >    Path MTU discovery [RFC4821] should be useful
> >> >    in determining the IP MTU between a pair of RBridge ports with IP
> >> >    connectivity.
> >> >
> >> > The issue with RFC4821 is that it has requirements on the packetization
> >> layer.
> >> > Trill appears to have several components that are useful. However, it will
> >> > require a specification of the procedure to result in a useful tool.
> >>
> >> See below.
> >>
> >> > Section 8.4:
> >> >
> >> >    TRILL IS-IS MTU PDUs, as specified in Section 5 of [RFC6325] and in
> >> >    [RFC7177], can be used to obtain added assurance of the MTU of a
> >> >    link.
> >> >
> >> > Yes, that can confirm working MTUs that are at 1470 or above, but
> appears
> >> > prevented from working below 1470?
> >>
> >> While there is a minimum size for TRILL IS-IS MTU PDUs, determined by
> >> header size, it is well below 1470, probably (depending on whether
> >> secuirty is in use, etc.) below 150 bytes.
> >>
> >> > Thus, it appears that there is a lack of mechanism here to actually get a
> valid
> >> > and functional MTU from TRILL in the cases where the Path MTU is below
> >> 1470. If
> >> > I am wrong good, but I think this is an important piece for how to handle
> >> the
> >> > next main issue.
> >>
> >> How about referencing Section 3 of
> >> https://tools.ietf.org/html/draft-ietf-trill-mtu-negotiation-05
> >> which is currently in IETF Last Call? (The wording of that section is
> >> probably going to be improved based on an OPS review by Brian
> >> Carpenter.)
> >>
> >> > UDP encapsulation and IP fragments.
> >>   ----------------------------------
> >> > I see it as a big issue that UDP encapsulation is the native one, and that
> >> > relies on IP fragmentation despite the need for reliable fragmentation.
> >> With
> >> > the setup of having to support 1470 MTU on TRILL level some packets will
> >> be
> >> > fragmented in many environments. That will lead to a lot of losses, and as
> >> > discussed below a very big problem with middleboxes. The main problem
> >> here is
> >> > that if one tries to rely on IP fragments one will have issues with packets
> >> > ending up in black holes. And different problems depending on IPv4 or
> >> IPv6.
> >> > IPv6 is lilkely the lesser problem assuming that one have working
> PMTUD.
> >> >
> >> > There are several ways out of this.
> >> >
> >> > 1. Detect issues and use TCP encapsulation with correctly set MSS to not
> >> get IP
> >> > fragements 2. Determine MTU and implement an fragmentation
> >> mechanism on top of
> >> > UDP.
> >>
> >> So, I don't see that much problem with UDP being the general default
> >> consistent with the TRILL philosophy of defaulting to need zero or
> >> minimal configuration. The default should be to use multicast Hellos
> >> for discovery of neighbors which sure points at UDP to me. Having to
> >> traverse a NAT should be a rare case. Since, in the NAT case, you have
> >> to configure things related to the static binding and the IP
> >> address(es) of peer(s) anyway you can also configure to use a
> >> different encapsulation than UDP, such as TCP, at the same time. I
> >> don't see it as much of a problem if, by default, TRILL won't operate
> >> through a NAT. If you are using UDP and it fragments and fragments are
> >> dropped at a NAT, probably you can't exchange Hellos so you will not
> >> form an adjacency and anything on the other side of the NAT will not
> >> be visible.
> >>
> >> > Zero Checksum:
> >> > --------------
> >> >
> >> > Section 5.4:
> >> >
> >> > UDP Checksum - as specified in [RFC0768]
> >> >
> >> > Considering the fast path encapsulation desire, I am surprised to not see
> >> any
> >> > mentioning of use of zero checksum here. Raising the zero checksum and
> >> forward
> >> > reference would be good I think.
> >> >
> >> > And then Section 8.5:
> >> >
> >> >    The requirements for the usage of the zero UDP Checksum in a UDP
> >> >    tunnel protocol are detailed in [RFC6936]. These requirements apply
> >> >    to the UDP based TRILL over IP encapsulations specified herein
> >> >    (native and VXLAN), which are applications of UDP tunnel.
> >> >
> >> > If you actually intended to allow zero checksum, then you actually should
> >> > document that Trill fulfills the requirements that the applicability
> statement
> >> > raises. I have not analyzed how well it meets these requirements.
> >> >
> >> > Please review Section 6.2 of RFC 8086 for example how that can be done.
> >>
> >> OK. We'll look into it.
> >>
> >> > TCP Encapsulation issue
> >> > -----------------------
> >> >
> >> > Section 5.6:
> >> >
> >> > The TCP encapsulation appear to be missing an delimiter format allowing
> >> each
> >> > individual TRILL packet/payload to be read out of the TCP's byte stream.
> In
> >> > other words, a normal implementation has no way of ensuring that the
> TCP
> >> > payload starts with the start of a new TRILL payload. Multiple small TRILL
> >> > payloads may be included in the same TCP payload, and also only parts
> as
> >> TCP is
> >> > one way of dealing with TRILL packets that are larger than the
> >> IP+Encapsulation
> >> > MTU that actually will work.
> >> >
> >> > This comment is based on that there appear to be no length fields
> included
> >> in
> >> > the TRILL header. The most straight forward delimiter is a 2-byte length
> >> field
> >> > for the TRILL payload to be encapsulated.
> >>
> >> Right. It might also be useful to include some sort of check field, as
> >> is done in BGP, to detect if you are out of sync in parsing the TCP
> >> stream.
> >>
> >> Another point is that, while with UDP it seems fine to send packets
> >> with assorted QoS, you don't want to encourage re-ordering of TCP
> >> packets in a stream. So if TCP encapsulation is being used, you want
> >> to use the same DSCP value for the packets in a particular TCP stream.
> >> So, generally, you need to have a TCP connection per priority handling
> >> category. Mapping the 8 priority levels into a smaller number of
> >> handling categories is a normal thing to do so you certainly don't
> >> necessarily need 8 TCP connections. Adding material on this should not
> >> be too hard.
> >>
> >> > Section 5.6:
> >> >
> >> > TCP endpoint requirements. I do wonder if an application like TRILL actual
> >> > would need to discuss performance impacting implementation choices or
> >> > limitations. For example use of NAGLE, the requirements on buffer sizes
> in
> >> > relation to Bandwidth delay products, as buffer memory in a RBridge will
> >> impact
> >> > performance.
> >>
> >> Well, I'm not sure how deeply this document should get into such
> >> performance issues. What about just saying something about
> >> consideration being given to tuning TCP for performance and pointing
> >> to one or a few other RFCs that talk about this?
> >>
> >> > Congestion Control
> >> > ------------------
> >> > First thanks for the effort here.
> >>
> >> You're welcome.
> >>
> >> > 8.1.2 In Other Environments
> >> >
> >> >    Where UDP based encapsulation headers are used in TRILL over IP in
> >> >    environments other than those discussed in Section 8.1.1, specific
> >> >    congestion control mechanisms are commonly needed.  However, if the
> >> >    traffic being carried by the TRILL over IP link is already congestion
> >> >    controlled and the size and volatility of the TRILL IS-IS link state
> >> >    database is limited, then specific congestion control may not be
> >> >    needed. See [RFC8085] Section 3.1.11 for further guidance.
> >> >
> >> > This is correct, however my question is if the RBridges have any way of
> >> knowing
> >> > which traffic is actually congestion controlled, considering that TRILL
> >> provides
> >> > an layer 2 abstraction. I wonder if there should be any type of white list of
> >> > the types of layer 2 payloads that can be assumed to be congestion
> >> controlled,
> >> > and thus okay to forward over IP paths? I am worried that without any
> >> > recommendation to prevent traffic that is not controlled to be forwarded,
> >> can
> >> > lead to congestion issues.
> >> >
> >> > The other issue I think may exist is the issue serial unicast emulation of
> >> > broadcast/multicast creates. As this amplifies the outgoing packet rate
> with
> >> > a factor of how many addresses are configured for serial unicast this can
> >> > be significant traffic expansion. Thus, I think additional considerations are
> >> > needed here, and maybe rate limiting of the amount of traffic to be
> >> multicasted.
> >>
> >> OK. We can think about those issues.
> >>
> >> > Flow and ECMP
> >> > -------------
> >> >
> >> > Section 8.3:
> >> >
> >> > For example, for TRILL
> >> >    Data, this entropy field could be based on some hash of the
> >> >    Inner.MacDA, Inner.MacSA, and Inner.VLAN or Inner.FGL.
> >> >
> >> > I would appreciate clearer references to what these fields are.
> >>
> >> In a TRILL Data packet, the payload after the TRILL Header looks like
> >> an Ethernet frame except that there is always either a VLAN tag or,
> >> alternatively, where the VLAN tag would be, a Fine Grained Label
> >> [RFC7172]. (The preceding is the view in the TRILL RFCs, but there is
> >> an equivalent and equally valid view in which all the fields through
> >> and including the VLAN or FGL tag are part of the TRILL Header.) The
> >> TRILL base protocol specification focuses on Ethernet as a link
> >> technology between TRILL switches, in which case there will be a link
> >> header including an Outer.MacDA and Outer.MacSA fields and possibly an
> >> Outer.VLAN, all before the TRILL Header. See Figure 1 and Figure 2 in
> >> RFC 7172.
> >>
> >> Some of the above could be added to the draft for clarity.
> >>
> >> > If I understand this correctly, the idea here is to look into the inner
> >> > layer 2 frames, and use the flow equivalents that exists on that level and
> >> > hash that into value that maps the flows onto the source port range.
> >>
> >> Yes.
> >>
> >> > I think this text should include a summary of the principle and ensure to
> >> > note the important requirement that what is considered flows in the
> inner
> >> > must not result in being striped over multiple source ports as this may
> lead
> >> to
> >> > reordering issues due to packets taking different paths.
> >>
> >> Well, we can add some text. But when would the relative ordering
> >> matter for two TRILL Data packets where the two inner native payloads
> >> have different values for any one or more of these three fields
> >> (Inner.MacDA, Inner.MacSA, and inner VLAN/FGL tag) ? If any of those
> >> fields are different, you are talking about different streams.
> >>
> >> > NAT and TRILL over IP:
> >> > Section 8.5:
> >> >
> >> > If one like to use TRILL over IP through a NAT, then there are some very
> >> > important considerations that are missing. First the need for static
> binding
> >> > configurations or the need for determining ones external address(es) and
> >> be
> >> > able to communicate that to the peer RBridges, and in addition ensure
> that
> >> one
> >> > has keep-alives to that the NAT binding never times out.
> >>
> >> I think those are good points. There is an additional problem that
> >> TRILL Hellos detect neighbors with which they have 2-way connectivity
> >> by indicating, inside the Hellos that are sent, from what neighbors
> >> Hellos have been received on that port. If a NAT is involved, these
> >> neighbor addresses inside Hellos need to be mapped.
> >>
> >> > Next is the issue that there is almost zero chance of getting a IP/UDP
> >> > encapsulation TRILL payload through the NAT if it results in IP
> >> fragmentation,
> >> > as NATs don't do defragment and refragmented on the internal side, and
> >> an IP
> >> > fragment lacks UDP port and thus can't be matched to binding.
> >>
> >> So perhaps the recommendation should be to configure the port to use
> >> TCP if there will be fragmentation.
> >>
> >> > Also if you like to run IP/ESP through a NAT, then you most likely need the
> >> > IP/UDP/ESP encapsulation (https://tools.ietf.org/html/rfc3948). Note
> that
> >> this
> >> > will restrict the MTU even further and thus ensure that the 1470
> >> requirement
> >> > cannot be fulfilled even without additional tunnels over an 1500 bytes
> MTU
> >> > Ethernet infrastructure.
> >> >
> >> > I would note that also firewalls likely have issues with IP fragments for
> the
> >> > same reason, they require significant amount of state to be verified if
> they
> >> > should be let through.
> >> >
> >> > In general I think you should create a configuration that has chance to
> work
> >> > through most middleboxes, but I think you should require static bindings.
> I
> >> > think that configuration is, and don't laugh now, but
> >> IP/UDP/ESP/TCP/TRILL,
> >> > otherwise you will not be able to have both security and reliable
> >> fragmentation
> >> > of TRILL packets.
> >>
> >> OK. Thanks again for this review. It has pointed out a number of
> >> problems and in thinking about those, I believe a couple of further
> >> problems have come to mind that I mentioned above. We'll work on a
> >> revised draft.
> >>
> >> Thanks,
> >> Donald
> >> ===============================
> >>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
> >>  155 Beaver Street, Milford, MA 01757 USA
> >>  d3e3e3@gmail.com
> >>
> >> > Cheers
> >> >
> >> > Magnus Westerlund
> >>
> >> _______________________________________________
> >> Tsv-art mailing list
> >> Tsv-art@ietf.org
> >> https://www.ietf.org/mailman/listinfo/tsv-art