Re: [tsvwg] path forward on L4S issue #16

"Scharf, Michael" <Michael.Scharf@hs-esslingen.de> Mon, 22 June 2020 07:22 UTC

Return-Path: <Michael.Scharf@hs-esslingen.de>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 348E13A0B77 for <tsvwg@ietfa.amsl.com>; Mon, 22 Jun 2020 00:22:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=hs-esslingen.de
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iu6zpJuQ8HXj for <tsvwg@ietfa.amsl.com>; Mon, 22 Jun 2020 00:22:11 -0700 (PDT)
Received: from mail.hs-esslingen.de (mail.hs-esslingen.de [134.108.32.78]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DE4933A0B7B for <tsvwg@ietf.org>; Mon, 22 Jun 2020 00:22:10 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.hs-esslingen.de (Postfix) with ESMTP id C958025A16; Mon, 22 Jun 2020 09:22:08 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hs-esslingen.de; s=mail; t=1592810528; bh=kFErZswTyy/ToFtXvh7HaZ2B8xfXtHcQ9MpjUFZ7O18=; h=From:To:CC:Subject:Date:References:In-Reply-To:From; b=yyl4Cu1n3z4LXMa/yxGfowQs012bjs6NCJiwYCac5rGpeAFyVDDEx7u8RoIPfVq+H MVJoiVztMnoAoTNDGD7pKrepuAViVvO4r4XMLJtDG86Dwgw4rChlIU+Me5BlhKhxMC SO9HaKcixzAw6vMgGvjS3PyMT422YvKPs7IW7HN0=
X-Virus-Scanned: by amavisd-new-2.7.1 (20120429) (Debian) at hs-esslingen.de
Received: from mail.hs-esslingen.de ([127.0.0.1]) by localhost (hs-esslingen.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id enJd2oiOK2sU; Mon, 22 Jun 2020 09:22:06 +0200 (CEST)
Received: from rznt8101.rznt.rzdir.fht-esslingen.de (rznt8101.rznt.rzdir.fht-esslingen.de [134.108.29.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.hs-esslingen.de (Postfix) with ESMTPS; Mon, 22 Jun 2020 09:22:06 +0200 (CEST)
Received: from RZNT8114.rznt.rzdir.fht-esslingen.de ([169.254.3.171]) by rznt8101.rznt.rzdir.fht-esslingen.de ([fe80::bd73:d6a9:24d7:95f1%10]) with mapi id 14.03.0468.000; Mon, 22 Jun 2020 09:22:06 +0200
From: "Scharf, Michael" <Michael.Scharf@hs-esslingen.de>
To: "Ruediger.Geib@telekom.de" <Ruediger.Geib@telekom.de>, "jholland=40akamai.com@dmarc.ietf.org" <jholland=40akamai.com@dmarc.ietf.org>
CC: "tsvwg@ietf.org" <tsvwg@ietf.org>
Thread-Topic: [tsvwg] path forward on L4S issue #16
Thread-Index: AQHWOrJWDXxzainLdUuLaaJqFofqwqjgrpkAgANyVYCAADFlUA==
Date: Mon, 22 Jun 2020 07:22:05 +0000
Message-ID: <6EC6417807D9754DA64F3087E2E2E03E2DC6B428@rznt8114.rznt.rzdir.fht-esslingen.de>
References: <8a8947e1-f852-c489-c85a-be874039f132@mti-systems.com> <CCF60E29-276F-45AA-8045-D14DFE44CDBE@akamai.com> <LEXPR01MB1040256CB204EB5149D5EE5B9C970@LEXPR01MB1040.DEUPRD01.PROD.OUTLOOK.DE>
In-Reply-To: <LEXPR01MB1040256CB204EB5149D5EE5B9C970@LEXPR01MB1040.DEUPRD01.PROD.OUTLOOK.DE>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [134.108.48.165]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/_0_ih0cuUPQ5d4oFciiqjs-59wU>
Subject: Re: [tsvwg] path forward on L4S issue #16
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Jun 2020 07:22:13 -0000

Regarding ALTO, I recommend to have a look at RFC 7285 "Application-Layer Traffic Optimization (ALTO) Protocol" and RFC 7971 "Application-Layer Traffic Optimization (ALTO) Deployment Considerations".

A whitelist could possibly be realized by the ALTO Map Service with a new cost type.

As alternative to a public list, ALTO also defines on-demand query services (Endpoint Cost Service or Endpoint Property Service). That solution does not require that the network operator discloses properties of the network topology. There are privacy tradeoffs between the ALTO Map Service and the ALTO Endpoint Cost Service.

ALTO was developed before YANG took off in the IETF. One could possibly realize a network service similar to ALTO using RESTCONF/YANG. But then one would probably have to solve quite a bit of the problems that are well understood and solved for ALTO.

Michael
(as co-author of RFC 7971)


> -----Original Message-----
> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of
> Ruediger.Geib@telekom.de
> Sent: Monday, June 22, 2020 8:11 AM
> To: jholland=40akamai.com@dmarc.ietf.org
> Cc: tsvwg@ietf.org
> Subject: Re: [tsvwg] path forward on L4S issue #16
> 
> Hi Jake,
> 
> thanks for your ongoing, constructive and to me big effort on ECN
> deployment.
> 
> Professionally, I'm not involved in access and application related queue- and
> congestion management. So it's hard for me to contribute. As an individual, I
> like your address proposal though. That sounds as if one could also use "alto"
> mechanisms (you mention an API and I only have a very rough idea, what
> alto does, so I may be wrong).
> 
> Regards,
> 
> Ruediger
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: tsvwg <tsvwg-bounces@ietf.org> Im Auftrag von Holland, Jake
> Gesendet: Samstag, 20. Juni 2020 03:33
> An: wes@mti-systems.com; tsvwg@ietf.org
> Betreff: Re: [tsvwg] path forward on L4S issue #16
> 
> Hi Wes and tsvwg,
> 
> On 6/4/20, 1:54 PM, "Wesley Eddy" <wes@mti-systems.com> wrote:
> > I think we should discuss the path forward on L4S issue #16 and what
> > people are working on, planning to do, or expecting to see in this regard.
> >
> > This is the issue on interaction with RFC 3168 ECN AQMs in the network.
> >
> > I think this is one of the more important ones in many recent
> > discussions, so would like to make sure we're agreeing on what it will
> > take to complete or what success will look like.
> >
> > The classic bottleneck detection work is a key part of this.
> ...(snip)...
> 
> I think a few other ideas have also been floated, so I'd ask to include those
> proposals as invited parts of the discussion as well.
> 
> This thread seemed like it had a lot of branches and was kind of hard to
> follow in places, so I thought I'd compile a list of the proposed paths forward
> that I've seen.  I'm not sure I got everything (please respond if anybody
> knows another suggestion that was left out).
> 
> So in hopes it's useful, here's my list (in no particular order):
> 
> 1. a robust classic bottleneck detection mechanism
> 
> 2. Changing L4S to use a 2-signal approach, using ECT(1)->ECT(0) for
>    the 1/p signal and ECT(1|0)->CE as a 1/sqrt(p) signal.
> 
> 3. a flag day to deprecate ECT(1)->CE marking by classic queues (instead
>    treating ECT(1) as NECT if no non-3168 meaning is implemented).
> 
> 4. operational considerations to recommend changing ECT(1) to NECT at
>    ingress to networks that have marking classic queues deployed
> 
> 5. operational considerations to recommend policing strategies that can
>    solve the general case of non-compliant traffic that does not respond
>    with the expected backoff to AQM congestion signaling.
> 
> I'll make another new suggestion now:
> 
> 6. An experiment-linked public whitelist of participant-registered IP
>    ranges that have a L4S compatible dualq in their reachability path at
>    the likely bottleneck, which would be checked by endpoints before
>    negotiating L4S.
> 
> 
> To add some of my own color commentary on these:
> 
> 1. a robust classic bottleneck detection mechanism a. AFAICT this is still the
> method preferred by the L4S authors.
> b. There's been some discussion about how technically feasible this
>    goal is, and also what level of "robust" is necessary.
> c. I'll respectfully suggest that we as a WG should ideally have at least
>    one solid backup plan, in case this approach proves hard to reach
>    consensus.
> 
> 2. Changing L4S to use a 2-signal approach, using ECT(1)->ECT(0) for
>    the 1/p signal and ECT(1|0)->CE as a 1/sqrt(p) signal.
> a. a few people seemed interested in considering this, but several
>    objections were raised, chiefly (IIRC):
>    i.   late out of order packets from chained loaded dualqs, and the
>         corresponding spurious retransmissions (with a side note that
>         this problem worsens as dualq deployment increases)
>    ii.  loss of the 1/p signal with RFC 6040 tunnel decapsulation, and
>         a corresponding limited scope initially, with a long slow path
>         (including standards actions) to get to ubiquitous deployment
>    iii. discards the long-term L4S goal of reclaiming ECT(0) for other
>         purposes
>    iv.  doesn't match the desired timeline for experimental deployment
>         by those who have engaged with the L4S work.  (I'm not sure any
>         of the proposed paths forward will satisfy this objection, but
>         IIRC this one was specifically raised in response to the 2-signal
>         proposal.)
> 
> 3. a flag day to deprecate ECT(1)->CE marking by classic queues (instead
>    treating ECT(1) as NECT if no non-3168 meaning is implemented).
> a. Interestingly, the scope of this approach seems to track on the same
>    questions as the "how important is a robust classic queue detection"
>    question in #1.b, because they both depend on the current and in-flight
>    deployment footprint for CE-marking classic queues.  If robustness is
>    not very important, then it's because classic queue deployment is low,
>    so a flag day would be a low-touch event.  And if a flag day would be
>    a major lift with a lot of work for operators, then good robustness
>    would likewise be very important if not doing a flag day.
> b. However, this presumably requires an update to RFC 3168, and I'm not
>    sure whether there's a well-established process for organizing an
>    event like this.  Obviously the outreach effort is much higher than
>    if e.g. option #1 or 2 turned out to be feasible to get working well
>    enough to satisfy rough consensus.
> 
> 4. operational considerations to recommend changing ECT(1) to NECT at
>    ingress to networks that have marking classic queues deployed.
> a. Note that these considerations are for networks NOT participating in the
>    L4S experiment, so they can't easily be folded into L4S-specific
>    operational considerations.
> b. This seems to me most appropriate as an add-on of a fallback position
>    during the #3 (flag day), where classic queues can't be reconfigured to
>    treat ECT(1) as NECT.  But I left it as a separate point because it was
>    proposed independently, and maybe there's an argument that only
> networks
>    that experience a problem would need to do this and could do it as a
>    post-hoc fix when problems are encountered, rather than pre-arranging
>    it with a flag day and the associated proactive outreach it would need.
>    (to be clear: I am currently against that position, but I acknowledge I
>    might be in the rough when consensus is checked)
> 
> 5. operational considerations to recommend policing strategies that can
>    solve the general case of non-compliant traffic that does not respond
>    with the expected backoff to AQM congestion signaling.
> a. Arguably this should be done regardless of L4S, because it seems like
>    an underdeveloped piece of the puzzle for the general problem of active
>    queue management in network devices.  However, solving this well and
>    getting solutions widely deployed or at least available (or somehow
>    deployed in conjunction with L4S endpoint enablement) could also
>    potentially address issue 16.
> b. There are many possible strategies here, so outlining some of the
>    known ones maybe seems worthwhile.  A few examples that spring to
>    mind:
>    i.   FQ
>    ii.  PPV (http://ppv.elte.hu/), as we saw in ICCRG 104 from Szilveszter
>         Nadas, seems to have a lot of promise here
>    iii. Likewise the work trying to solve a similar problem written up in
>         "Fair Resource Sharing for Stateless-Core Packet-Switched Networks
>         With Prioritization" by Michael Menth and Nikolas Zeitler
>         (https://ieeexplore.ieee.org/document/8419697)
>    iv.  There's also some good insights along these lines in the "Rationale"
>         section of the docsis queue protection scheme doc:
> https://tools.ietf.org/html/draft-briscoe-docsis-q-protection-00#section-5
>         It says the same approach could be used in scenarios beyond dualq,
>         and I think there's some applicability to codel or pie, with or
>         without ECN.
>    v.   Perhaps some generic guidelines that captures what many of these
>         have in common--in general a policing response could be based on
>         sampled monitoring (not necessarily integrated closely with the
>         queues) that maintains stats on the top current and recent senders,
>         and blacklists or downgrades their traffic for some time in response
>         to a large enough standing queue, or overflow of an AQM.  (On the
>         grounds that someone here is non-compliant since it exceeded
>         expected operational bounds, and thus at least the highest volume
>         recent senders have presumably failed to back off appropriately.)
>    A BCP that lists these (and maybe other options) and captures a more
>    generalized version of the advice from docsis-q-protection seems likely
>    helpful here.
> c. Also worth noting: the need for some kind of isolation for non-compliant
>    senders is not inherently an ECN-related (nor L4S-related) problem--there's
>    no forced reason that a sender necessarily has to respond to loss either...
> 
> 6. An experiment-linked public whitelist of participant-registered IP
>    ranges that have a L4S compatible dualq in their reachability path at
>    the likely bottleneck, which would be checked by endpoints before
>    negotiating L4S.
> a. to flesh the idea out a bit, I'm imagining this as a web API with a
>    known URL and a database attached, which is documented and maintained
>    as part of the L4S experiment, where experiment participants who have
>    deployed and enabled dualq-capable devices register the applicable
>    IP ranges, and L4S-capable endpoints query the web API before opening
>    connections, so that they avoid negotiating L4S support except when
>    either inside a network with a registered dualq, or when connecting to
>    a remote endpoint that's inside such a network, caching the answers
>    for ~10-30 minutes (or according to http headers in the web api or
>    something)
> b. This would let innocent bystander 3168 traffic operate unmolested at
>    access bottlenecks while gaining live operational experience with L4S.
> c. If the rapid ubiquitous rollout goes as planned according to the L4S
>    project intent, once the dualq devices are far more prevalent than classic
>    ECN queues and widely available on all the bandwidth-shaping access
>    technologies and the non-experiment participants are predominantly not
>    doing any classic marking, the whitelist could be gradually retired, or
>    turned into a blacklist of L4S on IPs that are known to have problems with
>    legacy systems that haven't been upgraded (which can be discovered by
>    gradually enabling L4S on non-participant paths, perhaps with A/B tests,
>    and following up with those networks to ask afterward whether it caused
>    problems)
> d. This approach is of course operationally awkward and not a good long-
> term
>    solution, and also comes with potential privacy concerns as the deployment
>    grows, but would allow for forward progress on L4S without fixing the
>    underlying incompatibility problem with the CE ambiguity, so that time and
>    an ongoing outreach effort could have a chance to resolve the classic ECN
>    deployment footprint questions.  This also allows for some amount of
>    targeted experimentation with the classic queue detection work.
> 
> 
> (Of these, it's maybe interesting to note that only 1, 2, and 6 do not seem to
> require a standards or BCP action, which IIRC was originally meant to be out
> of scope for the L4S work, outside of 8311.)
> 
> 
> As far as the original "working on, planning to do, or expecting to see"
> question:
> 
> I guess I'm expecting to see at some point the results from what the L4S
> team is doing on the detection mechanisms.  But I remain not very hopeful
> they will address all the concerns that have been raised, so I'm hoping it'll be
> coupled with a credible outreach effort that seems likely to reach all the
> networks that have deployed or are in-process of deploying shared classic
> queues, at the very least.
> 
> I'm not sure about "expecting", but I'd also be very much in favor of seeing
> some sort of approval-style poll conducted (maybe with a preference
> weighting or rank ordering or something) of what the WG members think of
> the technical viability of the different proposed approaches, so that people
> can have a better idea of what others think sound like promising directions.
> 
> Outside that, I'd personally love to see further discussion on these or other
> ideas, but maybe forking the thread into different threads for the different
> proposals, to better avoid the confusion I've been feeling trying to find the
> prior links to the preceding messages on this monster (I eventually gave up).
> 
> 
> I regret that I don't currently have much to offer on the "working on" or
> "planning to do" front, being rather busy right now with a few other
> challenging problems.
> 
> But I am supportive of efforts to improve internet latency.  Especially those
> aimed at increasing the deployment (and enablement) of the currently
> available 3168 AQMs and the increased default use of 3168 ECN by more
> endpoint stacks, since that would have a useful impact on application level
> latency in TCP connections right away.
> 
> When I see good opportunities and I'm able, I'll aim to make minor
> contributions to latency-related efforts or do things like minor testing
> support when possible.  So anyone engaging in that kind of thing, please do
> keep the wg (or at least me) posted on progress and any opportunities to
> provide useful low-effort support.  I can spend a day on this kind of thing
> every once in a while (tho sadly, not necessarily every time it might be
> useful), but I can't spend weeks at a time.  That's probably true for the near
> forseeable future.
> 
> Best regards,
> Jake
>