Re: [trill] Adam Roach's Discuss on draft-ietf-trill-ecn-support-05: (with DISCUSS and COMMENT)

Adam Roach <> Tue, 20 February 2018 22:23 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D862C12422F; Tue, 20 Feb 2018 14:23:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.888
X-Spam-Status: No, score=-1.888 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, T_RP_MATCHES_RCVD=-0.01, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id I8Y8y8Uyt9Il; Tue, 20 Feb 2018 14:22:58 -0800 (PST)
Received: from ( [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id EB8071200B9; Tue, 20 Feb 2018 14:22:57 -0800 (PST)
Received: from Svantevit.local ( []) (authenticated bits=0) by (8.15.2/8.15.2) with ESMTPSA id w1KMMpxf091516 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 20 Feb 2018 16:22:54 -0600 (CST) (envelope-from
X-Authentication-Warning: Host [] claimed to be Svantevit.local
To: Donald Eastlake <>
Cc:,, The IESG <>, Susan Hares <>,
References: <> <>
From: Adam Roach <>
Message-ID: <>
Date: Tue, 20 Feb 2018 16:22:46 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/alternative; boundary="------------F37A8D911CB121B840F4EAD6"
Content-Language: en-US
Archived-At: <>
Subject: Re: [trill] Adam Roach's Discuss on draft-ietf-trill-ecn-support-05: (with DISCUSS and COMMENT)
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Developing a hybrid router/bridge." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 20 Feb 2018 22:23:02 -0000

Bob --

Thanks for taking the time to explain the history and rationale of Table 
3 to me. I'm going to clear my DISCUSS based on this explanation, since 
it seems that the technical solution in the document is correct, if a 
bit non-obvious. Some additional comments inline.

[For those of you who did not see Bob's response: it appears that my 
original DISCUSS email did not reach him,  so he replied to a reduced 
audience. I have restored the traditional IESG review distribution to 
this email. For this reason, I am eliding less of Bob's response than I 
usually would]

On 2/20/18 12:51 PM, Bob Briscoe wrote:
>>>>         On Thu, Feb 8, 2018 at 2:15 PM, Adam Roach
>>>>         < <>> wrote:
>>>>             On 2/8/18 11:53 AM, Donald Eastlake wrote:
>>>>>             Hi Adam,
>>>>>             On Wed, Feb 7, 2018 at 8:12 PM, Adam Roach
>>>>>             < <>> wrote:
>>>>>             > Adam Roach has entered the following ballot position for
>>>>>             > draft-ietf-trill-ecn-support-05: Discuss
>>>>>             >
>>>>>             > ...
>>>>>             >
>>>>>             >
>>>>>             ----------------------------------------------------------------------
>>>>>             > DISCUSS:
>>>>>             >
>>>>>             ----------------------------------------------------------------------
>>>>>             >
>>>>>             > Thanks to the authors, chairs, shepherd, and working
>>>>>             group for the effort that
>>>>>             > has been put into this document.
>>>>>             >
>>>>>             > I have concerns about some ambiguity and/or
>>>>>             self-contradiction in this
>>>>>             > specification, but I suspect these should be easy to
>>>>>             fix. In particular, the
>>>>>             > behavior defined in Table 3 doesn't seem to be
>>>>>             consistent with the behavior
>>>>>             > described in the prose.
>>>>>             >
>>>>>             > For easy reference, I've copied Table 3 here:
>>>>>             >
>>>>>             >>
>>>>>             +---------+----------------------------------------------+
>>>>>             >>       | Inner   |  Arriving TRILL 3-bit ECN
>>>>>             Codepoint Name |
>>>>>             >>       | Native
>>>>>              +---------+------------+------------+----------+
>>>>>             >>       | Header  | Not-ECT | ECT(0)     | ECT(1)    
>>>>>             | CE   |
>>>>>             >>
>>>>>             +---------+---------+------------+------------+----------+
>>>>>             >>       | Not-ECT | Not-ECT | Not-ECT(*) | Not-ECT(*)
>>>>>             |  <drop>  |
>>>>>             >>       |  ECT(0) |  ECT(0) |  ECT(0)    |  ECT(1)  
>>>>>              | CE   |
>>>>>             >>       |  ECT(1) |  ECT(1) |  ECT(1)(*) |  ECT(1)  
>>>>>              | CE   |
>>>>>             >>       |  CE   |      CE |      CE    |  CE(*) |    
>>>>>             CE |
>>>>>             >>
>>>>>             +---------+---------+------------+------------+----------+
>>>>>             >>
>>>>>             >>            Table 3. Egress ECN Behavior
>>>>>             >>
>>>>>             >>  An asterisk in the above table indicates a
>>>>>             currently unused
>>>>>             >>  combination that SHOULD be logged. In contrast to
>>>>>             [RFC6040], in TRILL
>>>>>             >>  the drop condition is the result of a valid
>>>>>             combination of events and
>>>>>             >>  need not be logged.
>>>>>             >
>>>>>             > The prose in this document indicates:
>>>>>             >
>>>>>             >  1. Ingress gateway either copies the native header
>>>>>             value to the TRILL ECN
>>>>>             >     codepoint (resulting in any of the four values
>>>>>             above) or doesn't insert
>>>>>             >     any ECN information in the TRILL header.
>>>>>             >
>>>>>             >  2. Intermediate gateways can set the CCE flag,
>>>>>             resulting in "CE" in the
>>>>>             >     table above.
>>>>>             >
>>>>>             > Based on the above, a packet arriving at an egress
>>>>>             gateway can only be in one of
>>>>>             > the following states:
>>>>>             >
>>>>>             >  A. TRILL header is Not-ECT because no TRILL node
>>>>>             inserted ECN information.
>>>>>             >
>>>>>             >  B. TRILL header value == Native header value
>>>>>             because the ingress gateway
>>>>>             >     copied it from native to TRILL.
>>>>>             >
>>>>>             >  C. TRILL header is "CE" because an intermediate
>>>>>             node indicated congestion.
>>>>>             Sort of... But note that the TRILL header ECN bit s
>>>>>             can indicate non-ECT while the CCE bit is set making
>>>>>             the overall TRILL Header "CE".
>>>>             Right. That's part of case C. My states above assume
>>>>             application of Table 2 has already taken place.
>>>>>             > If that's correct, I would think that any state
>>>>>             other than those three needs
>>>>>             > to be marked with an (*). In particular, these two
>>>>>             states fall into that
>>>>>             > classification, and seem to require an asterisk:
> [BB] These states are not tagged (*) cos they can arise with variants 
> of ECN marking behaviour referred to in Section 4 (explained beneath 
> each bullet below). By design (explained further below), a consistent 
> encap and decap behaviour is intended to support all variants of ECN, 
> because subnet ingress and egress nodes will very probably be deployed 
> independently of each other and of intermediate nodes and/or L4 endpoints.
> Nonetheless, you're right that this is completely non-obvious to a 
> reader coming to this new. So let's change the text:
>     An asterisk in the above table indicates a currently unused
>     combination that SHOULD be logged. In contrast to [RFC6040 <>], in TRILL
>     the drop condition is the result of a valid combination of events and
>     need not be logged.
>     An asterisk in the above table indicates a combination that is currently
>     unused in all variants of ECN marking (see Section 4) and therefore
>     SHOULD be logged. In contrast to [RFC6040 <>], in TRILL
>     the drop condition is the result of a valid combination of events and
>     need not be logged.

Thanks. That definitely helps; but see my comments below.

>>>>>             >
>>>>>             >   - Native==CE && TRILL==ECT(0)
> [BB] You have indeed highlighted an awkward wrinkle here - we could 
> have tagged this combination with '(*)', cos it is strictly currently 
> unused by trill. However, I'm going to argue against a '(*)'...
> My explanation starts with some history: Back in 2001 when ECN in IP 
> was first defined [RFC3168], an IP-in-IP tunnel ingress set the outer 
> to ECT(0) if ECN in the native header was anything but Not-ECT (0b00). 
> This was intended to close off a perceived covert channel in IPsec. 
> However, since then the security area agreed that this added 
> over-paranoid complexity. So, since [RFC6040] an IP-in-IP tunnel 
> ingress now just copies the native ECN to the outer, and TRILL ECN 
> follows this new simpler model.
> Altho this spec does not allow a TRILL ingress RBrIdge to exhibit the 
> old IP-in-IP legacy behaviour, I think it best not to tag this 
> combination as 'currently unused'. Because, wherever possible, we want 
> trill's ECN encap and decap  to mimic IP-in-IP (a principle explained 
> further down this email).
> A more specific reason: there is already a congestion monitoring 
> technique being proposed that fakes the legacy IP-in-IP behaviour to 
> measure congestion across a tunnel 
> (draft-ietf-tsvwg-tunnel-congestion-feedback). Before it becomes an 
> RFC, I wouldn't be surprised if that draft gets extended from IP-in-IP 
> to IP-in-TRILL and other L2 protocols as well. If that likely event 
> happened, we would have to update this trill-ecn spec to remove the 
> '(*)' again. In the mean time, there's little harm in not logging a 
> combination that is not harmful, even tho it is stricty 'currently 
> unused'.
> (sry about the triple negative!)

This is a great explanation. I understand the risks of putting 
forward-looking speculation in RFCs, but could I convince you to at 
least explain in the document that the reason for this state not being 
considered out of the ordinary is to mimic legacy IP-in-IP ECN behavior? 
Any acknowledgement of why this case is being treated the way it is 
would be sufficient to prevent implementors from deciding that the table 
is wrong.

>>>>>             >
>>>>>             >   - Native==ECT(0) && TRILL==ECT(1)
> [BB] The standards track PCN variant of ECN [RFC6660] uses ECT(1) as a 
> lower severity of congestion notification than CE. Hence the asymmetry 
> where a native ECT(0) header can have an ECT(1) outer, but not vice versa.

As with the case above, it would be good (inasmuch as it would serve to 
un-confuse implementors) to include an explanation that this case makes 
sense due to RFC 6660's handling of ECT(0) versus ECT(1). Such a comment 
would also address my concern quoted below.

>>>>>             I would defer to my co-author Bob Briscoe but it looks
>>>>>             to me like you have a good point.
>>>>             Thanks; I'll wait to hear from Bob.
>>>>>             > In addition, the behavior this table defines for Native==ECT(0) &&
>>>>>             TRILL==ECT(1)
>>>>>             > is somewhat perplexing: for this case, the value in
>>>>>             the TRILL header takes
>>>>>             > precedence; however, when Native==ECT(1) &&
>>>>>             TRILL==ECT(0) the Native header
>>>>>             > takes precedence. Or, put another way, this table
>>>>>             defines ECT(1) to always
>>>>>             > override ECT(0). I don't find any prose in here to
>>>>>             indicate why this needs to be
>>>>>             > treated differentially, so I'm left to conclude that
>>>>>             this is a typographical
>>>>>             > error. If that's not the case, please add motivating
>>>>>             text to Table 3 explaining
>>>>>             > why ECT(1) is treated differently than ECT(0) for
>>>>>             baseline ECN behavior.
>>>>>             As noted in Section 4.1, there is an ECN variation
>>>>>             where ECT(0) just indicates ECT while ECT(1) indicates
>>>>>             congestion of a lesser severity level has been
>>>>>             encountered than that indicated by CE. I believe the
>>>>>             dominance of ECT(1) over ECT(0) is designed to not
>>>>>             interfere with this variation.
>>>>             Yes, and section 4 opens with the explanation that
>>>>             "Section 3 specifies interworking between TRILL and the
>>>>             original standardized form of ECN in IP [RFC3168]."
>>>>             Beyond that, I wouldn't expect any of the text in
>>>>             non-normative section 4 or its subsections to have
>>>>             bearing on the normative table in Section 3.
> [BB]
> * PCN is standards track, hence PCN-specific combinations are not 
> tagged 'currently unused' in (normative) Section 3.
> * L4S is experimental track, but it doesn't use any different 
> combinations of ECN than standard ECN, so it doesn't alter the CU 
> asterisks in section 3.
> Section 4 is informative, only because it doesn't define the protocol 
> spec; it merely explains how the normative spec in Section 3 allows 
> for ECN variants (whether stds track or experimental).

Thanks. The answer to this that I find compelling is your explanation 
below about RFC 6040's relative ordering of ECN markings, which I think 
should be at least mentioned in passing here.

>>>>             My reading of RFC 8311 is that it contemplates a series
>>>>             of experiments beyond those currently under
>>>>             development. It may well be that the current
>>>>             experiments consider ECT(1) to have a higher severity
>>>>             than ECT(0), but that this may not make sense for
>>>>             future experiments. Even if it does, I don't see
>>>>             guidance in RFC 8311 (or any other update to RFC 3168)
>>>>             that suggests such a relationship between ECT(0) and
>>>>             ECT(1) exists.
> As above, the possibility that ECT(1) is a higher severity than ECT(0) 
> is already on the standards track (PCN [RFC6660]).
> As far as possible, we want trill subnet endpoints to follow the same 
> ECN behaviour as IP-in-IP tunnel endpoints [RFC6040]. The explicitly 
> stated philosophy of RFC6040 and [ECNencapGuide] (which is referred to 
> from the Intro to this trill spec), is for respectively all tunnel 
> endpoints and subnet endpoints to have the same mechanical, dumb ECN 
> behaviour, whatever the variant of ECN in use. Because in the 
> incrementally deployed public Internet, one cannot know what tunnels 
> and subnets might be encountered across different paths.
>>>>             On top of this (as implied by the existence of section
>>>>             4), the TRILL handling for ECN will need to vary from
>>>>             experiment to experiment. It seems reasonable that part
>>>>             of this variability would include different mapping of
>>>>             ECN bits by the egress gateway.
> Nope. By design, all variants of ECN use the same encap and decap 
> behaviours, otherwise they would be undeployable to all intents and 
> purposes. Perhaps we should make that clearer in this sentence 
> introducing Section 4:
>     The ECN wire protocol for TRILL (Section 2 
> <>) has been designed to
>     support the other known variants of ECN,
>     The ECN wire protocol for TRILL (Section 2 
> <>) and the ingress (Section 3.1)
>     and egress (Section 3.3) ECN behaviours have been designed to
>     support the other known variants of ECN,

This sounds good. It might be worth mentioning that this behavior is 
intended to also be forward-compatible with future ECN variants, as long 
as they conform to RFC 6040's notion of relative codepoint priorities.

>>>>             Basically, I see two ways this can be resolved,
>>>>             although I'm happy to hear alternatives so long as they
>>>>             end up with the ECN and TRILL documents being in a
>>>>             consistent and future-proof state:
>>>>             Approach 1: Revise Table 3 so that ECT(0) and ECT(1)
>>>>             are treated the same (i.e., pick one of "TRILL header
>>>>             wins" or "Native header wins" -- I would suggest
>>>>             "Native header wins," for maximal compatibility with
>>>>             older ECN clients but I'm not dogmatic on this point),
>>>>             and then include a normative statement that allows RFC
>>>>             8311 experiments to override this mapping as makes
>>>>             sense for their design.
>>>>             Approach 2: Leave the table as is, but add an
>>>>             explanation of why ECT(1) is given preferential
>>>>             treatment over ECT(0).
> [BB] I hope you're happy with the additional text I've suggested for 
> why Table 3 was like it was. In summary, consistency with IP-in-IP is 
> paramount wherever possible.

That's exactly what the additional text should say. :)  I don't see that 
in your current proposal.

>>>>             Add a normative dependency from this document to a new
>>>>             document that updates RFC 8311 to add a requirement
>>>>             that any experiments that treat ECT(0) differently than
>>>>             ECT(1) MUST be designed such ECT(0) always indicates a
>>>>             lower severity of congestion than ECT(1). I presume
>>>>             that this new document would need to be done in
>>>>             coordination with TSVWG.
> [BB] That's already catered for in [RFC6040] (stds track) for 
> IP-in-IP, quoting:
>     o  In all other cases where the inner supports ECN, the decapsulator
>        MUST set the outgoing ECN field to the more severe marking of the
>        outer and inner ECN fields, where the ranking of severity from
>        highest to lowest is CE, ECT(1), ECT(0), Not-ECT.  This in no way
>        precludes cases where ECT(1) and ECT(0) have the same severity;
> and in [ECNencapGuide] (intended BCP) for IP-in-any-L2, quoting:
>     4.  Congestion indications may be encoded by a severity level.  For
>         instance increasing levels of congestion might be encoded by
>         numerically increasing indications, e.g. pre-congestion
>         notification (PCN) can be encoded in each PDU at three severity
>         levels in IP or MPLS [RFC6660 <>].
>         If the arriving inner header is an ECN-PDU, where the inner and
>         outer headers carry indications of congestion of different
>         severity, the more severe indication SHOULD be forwarded in
>         preference to the less severe.
>>>>             I think Approach 1 is more straightforward, but if
>>>>             there's a feeling in the working group that we need
>>>>             default egress behavior that is forwards-compatible
>>>>             with yet-undesigned experiments, then I think Approach
>>>>             2 is what you need.
>>>>             /a
> The way ECN encap & decap has been designed already follows your 
> 'approach 2'; where the same consistent behaviour at all encaps and 
> decaps can be exploited in both backward and forward compatible ways.

Given your explanation above, I agree that such is the case. I think 
including enough of that explanation that implementors don't find 
themselves double-guessing the specification in Table 3 would ultimately 
aid in interoperability.

> Indeed, here's proof that this forward compatibility is not just 
> theoretical...
> L4S came along after all this, and required drop probability to be the 
> square of the marking probability, but only for one of the ECN 
> codepoints. We managed to contrive the marking behaviour of an 
> intermediate trill RBridge that supports L4S so that any trill egress 
> will achieve this precise squaring, even an egress without any ECN or 
> L4S logic, and no squaring logic either (see Appendix A).
> Cheers, and thx again for noticing all this.
> Bob