Re: [aqm] ECT(1)

John Leslie <john@jlc.net> Tue, 04 August 2015 04:08 UTC

Date: Tue, 04 Aug 2015 00:08:24 -0400
From: John Leslie <john@jlc.net>
To: Bob Briscoe <research@bobbriscoe.net>
Message-ID: <20150804040824.GS96964@verdi>
References: <ba3b6f6b4d3d453d887c451fbca412fa@hioexcmbx05-prd.hq.netapp.com> <CAA93jw5WrT0Azcew_gic5H-tJtBo62m-f4fBB0=qQp01uf3VuQ@mail.gmail.com> <8a1ed5a975d44a7bad88dc573971ded5@hioexcmbx05-prd.hq.netapp.com> <20150728145036.GK96964@verdi> <55BFF7EC.1010608@bobbriscoe.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <55BFF7EC.1010608@bobbriscoe.net>
User-Agent: Mutt/1.4.1i
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/a56IG6slln62ihfY7qEV3AJtltk>
Cc: "Scheffenegger, Richard" <rs@netapp.com>, Dave Taht <dave.taht@gmail.com>, "aqm@ietf.org" <aqm@ietf.org>
Subject: Re: [aqm] ECT(1)
Precedence: list

Bob Briscoe <research@bobbriscoe.net> wrote:
> 
> I do not believe an IP (v4) option or a v6 extension would be necessary. 
> If ECT(1) were used that would surely be sufficient alone.

   Alas, we're facing a political question: not just a technical one.

   I think folks are ready to deprecate ECN Nonce; but I'm not
optimistic that folks are ready to embrace "Low-Latency, Low-Loss and
Scalable" (L4S) service, as introduced in draft-briscoe-aqm-dualq-coupled
(Has this draft been posted?).

   (IMHO, this work promises to be very valuable for the _many_ uses
that are latency-sensitive; but adoption is going to be a major
challenge!)

   We may indeed eventually get to where Bob is thinking today; but
I don't see a clear path to IETF-wide consensus yet. Even getting an
Experimental RFC approved which re-purposes ECT(1) strikes me as a
very significant challenge. :^(

> I believe the main criteria for an identifier for this new service are:
> 1. preferably orthogonal to Diffserv classes.

   IMHO, Diffserv classes are poison!

   There are a number of good folks pursing a Less-than-Best-Effort
diffserv class. I wish them luck! But I'd be amazed if they succeed.

   Diffserv classes are the private preserve of a _large_ number of
network-service-providers. Best-Effort is the only one with universal
agreement what it means.

   All the others are subject to non-documented shuffling at the
boundaries between providers (and "bleaching" to Best-Effort at many
points within providers' networks).

> 2. preferably end-to-end in scope

   I'm really not sure how we can make L4S useful if it lacks an
End-to-End meaning. The signal must enter at the sender and mostly
survive all the way to the receiver in order that the receiver
(by whatever magic) can tell the sender about any congestion.

> 3. preferably classic (RFC168) ECN and 'L4S' ECN would not permanently 
>    burn two codepoints, since it seems that 'L4S' could eventually
>    subsume classic ECN (a fork would not be needed, because classic
>    ECN doesn't seem to do anything that L4S cannot do).
> 

   This is "nice to have", I suppose; but it seems too optimistic
to take seriously. Deployment of L4S will take at least five years;
and nobody's crystal-ball is good enough to see beyond that.

   Furthermore, I don't see how we can _ever_ entirely eradicate the
RFC 3168 behavior of "same as drop". Furthermore, L4S _can't_
eliminate packet drops; and IMHO a packet-drop in an L4S stream
must be treated _differently_ than a L4S congestion mark.

> *ECT(1) **
> *Seems a good identifier, but it has the following problems:
> 
> a) L4S traffic would need to be distinguished from classic ECN both
>    when unmarked (ECT0 vs ECT1) and when marked (CE vs CE???).
>    Ie. congestion experienced (CE) would have to be shared between
>    the classes.

   Actually, there are _two_ ways ECT(1) could be used:
- ECT(1) could be set to request L4S forwarding rules marking CE
  to indicate L4S congestion; or
- L4S forwarders could change ECT(1) to ECT(0) (or vice-versa?),
  to mark L4S congestion.

>    It would not be so problematic if all queues classified all CE
>    packets as the lowest latency class (L4S); CE packets from classic
>    flows would then be delivered early out of order, requiring some
>    buffering, but probably no more buffering than is already needed
>    for retransmissions, and at least they would never be late out of
>    order. See also {Note 1}.

   I'm trying to follow this...

   What exactly does Bob mean by "all queues"? Mostly we think of
queues as part of the forwarding action. But some forwarders choose
their action upon packet entry to the queue; other at packet exit.
And, AFAIR, no forwarder takes an action based upon the packet being
CE-marked when it arrives.

> b) ECT(1) is the last available ECN codepoint (for both v4 & v6).
>    Using ECT(1) for L4S and ECT(0) for Classic ECN would burn the last
>    codepoint just for migration purposes (contravening my criterion
>    #3). If we could predict that migration might one day finish, we
>    could foresee a time when ECT(0) might become available again.
>    But that's a long shot.

   This is a political problem, more than a technical one.

   We've painted ourselves into a corner, where there aren't spare
bits -- and the "spare bits" in IPv6 turn out to be unusable. (We
seem to have done this quite deliberatly -- I don't understand why!)

   Nonetheless, we have a major need to mark incipient congestion, so
that we can avoid over-filling buffers at forwarding nodes. The fact
that we have only half-a-bit left to do this is the inevitable result
of our refusal to allocate enough bits in the first place (or if you
prefer, our insistence on using six bits for DSCP, defined in such a
way as to prevent end-to-end meaning of them).

   (Personally, I'd love to reclaim a few bits from DSCP; but to propose
this would label me a clueless kook, so I won't.)

   ECT(1) is there! It's allocated for ECN use. Refusing to define it
with an ECN meaning is simply irrational.

   Furthermore: there _is_ another bit! See RFC 3514. ;^)

> c) For the record, the following uses of ECT(1) have been proposed by 
>    the IETF and by researchers:
>  * receiver cheat detection (the ECN nonce [RFC3540] - experimental)
>  * ECN path testing (ECN for RTP [RFC6679] - standards track)
>  * various intermediate congestion level proposals (including PCN 
>    [RFC6660] - standards track)
>  * various fast-start proposals (in research, e.g. VCP)

   IMHO, only RFCs count as "proposals".

   RFC 3540 is ripe for deprecation, IMHO.

   RFC 6679 covers "ECN for RTP over UDP". Somehow I missed it coming
out in 2012 (though I must have been listening to the IESG telechat
where it was approved). Mea culpa!

   It's not an easy read (58 pages, heavy with RTP details)! At first
blush, I don't see what it's trying to do with ECT(1). It references
RFC 3168 for the meaning of ECT(1); it keeps separate counters for
ECT(0) and ECT(1); and it has a "random" mode (not RECOMMENDED) which
is supposed to randomize whether ECT(0) or ECT(1) is sent.

   The overall impression is that it tries to define feedback for all
possible ECN cases: thus supporting ECN Nonce use as well as all other
uses known at the time it was written.

   To deprecate ECN Nonce, we'd need to UPDATE RFC 6679 as well as
RFC 3168; but I don't see any new issues introduced by 6679 (and the
features of it are already appropriate for L4S.

> d) PCN is defined for a controlled environment, so that's not a problem. 
>    The wording or RTP-ECN does not mandate the use of ECT(1), but it is
>    not always clear that it is optional either.

   Clearly, keeping separate counters for ECT(1) and ECT(0) is required;
but sending ECT(1) vs ECT(0) is not specified within RFC 6679.

>    So I am trying to find out whether any implementations have used
>    ECT(1).

   At first blush, it would appear that the only _current_ use of ECT(1)
would be for ECN Nonce. But of course, RFC 6679 says nothing to prevent
its use for L4S.

>    Even if none of the IETF uses of ECT(1) are problematic in practice,
>    we should think very carefully before burning ECT(1) for L4S,
>    because there do appear to be new uses being proposed for it that
>    address a new potentially important class of problems: getting up to
>    speed fast.

   Some citations, please...

   (BTW, I think L4S could be _very_ helpful for "speeding up" slow-start.)

> *DSCP**
> *It might be better to distinguish L4S ECN from Classic ECN by using 
> only ECT(0) and CE, but also using a distinctive DS codepoint for L4S. 
> L4S could start off local-network only (e.g. for a network operator's 
> premium services), or a global DSCP could be burned so that hosts could 
> set it without needing to be configured for the network they happen to 
> be connected to at any one time.

   I don's see L4S as useful in "local-network-only" mode.

   Granted, there _are_ many cases where the benefit of L4S would be
greatest at the first hop (DOCSIS box). But the expected "bleaching"
could be very confusing as to the meaning of CE marking that could be
generated farther along the path. There is no such thing as a condition
where _only_ the first hop can experience congestion.

> Then, assuming all Classic ECN might eventually migrate to L4S ECN,
> a DSCP would no longer be needed as well as ECT(0) to identify L4S.
> Then the ECN field alone could represent L4S end-to-end.

   This is overly optimistic.

> We all know that DSCP has the following problems:
> a) Diffserv is not orthogonal to Diffserv (obviously), so multiple DSCPs 
>    might be needed for L4S in each DS class

   That seems fatal...

> b) DS is not end-to-end
> c) few global DSCPs left, altho certainly there are more DS codepoints 
>    than ECN codepoints left.

   Network operators don't believe in "global DSCPs" They bleach anyway.

   (I would tend to support carving out part of the "Experimental"
subset of DHCPs as "must propagate if not understood" -- and possibly
in ten years there might be enough equipment out there that respected
that... but for now, it _all_ gets bleached.)

> *Summary**
> *Combining ECT(0) and CE with a globally assigned DSCP solely during 
> initial deployment of L4S seems the least worst choice.

   We certainly could Experiment with that; but I'm very pessimistic.

   OTOH, Experimenting with ECT(1) seems likely to work. IMHO...

--
John Leslie <john@jlc.net>

[aqm] Minutes of the AQM WG session Scheffenegger, Richard
Re: [aqm] Minutes of the AQM WG session Dave Taht
Re: [aqm] Minutes of the AQM WG session Scheffenegger, Richard
Re: [aqm] Minutes of the AQM WG session De Schepper, Koen (Koen)
Re: [aqm] Minutes of the AQM WG session Dave Taht
Re: [aqm] Minutes of the AQM WG session Jonathan Morton
[aqm] ECT(1) John Leslie
Re: [aqm] Minutes of the AQM WG session De Schepper, Koen (Koen)
Re: [aqm] Minutes of the AQM WG session Dave Taht
Re: [aqm] ECT(1) Bob Briscoe
Re: [aqm] ECT(1) John Leslie
Re: [aqm] ECT(1) Bob Briscoe
Re: [aqm] Minutes of the AQM WG session De Schepper, Koen (Koen)
Re: [aqm] ECT(1) Bob Briscoe
Re: [aqm] ECT(1) Jonathan Morton
Re: [aqm] ECT(1) Gorry Fairhurst
Re: [aqm] ECT(1) Bob Briscoe
Re: [aqm] ECT(1) Michael Welzl
Re: [aqm] ECT(1) Mikael Abrahamsson
Re: [aqm] ECT(1) Andrew Mcgregor
Re: [aqm] ECT(1) Jonathan Morton
Re: [aqm] ECT(1) John Leslie
Re: [aqm] ECT(1) Bob Briscoe
Re: [aqm] ECT(1) Bob Briscoe
Re: [aqm] ECT(1) Andrew Mcgregor
Re: [aqm] ECT(1) Bob Briscoe
Re: [aqm] ECT(1) Toke Høiland-Jørgensen