Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal

Neal Cardwell <ncardwell@google.com> Tue, 12 May 2020 21:31 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BF4BF3A0C0C for <tsvwg@ietfa.amsl.com>; Tue, 12 May 2020 14:31:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.6
X-Spam-Level:
X-Spam-Status: No, score=-17.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 54_lX5o-rdu3 for <tsvwg@ietfa.amsl.com>; Tue, 12 May 2020 14:31:07 -0700 (PDT)
Received: from mail-ua1-x92b.google.com (mail-ua1-x92b.google.com [IPv6:2607:f8b0:4864:20::92b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 071363A0C0A for <tsvwg@ietf.org>; Tue, 12 May 2020 14:31:06 -0700 (PDT)
Received: by mail-ua1-x92b.google.com with SMTP id y10so5305107uao.8 for <tsvwg@ietf.org>; Tue, 12 May 2020 14:31:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=D/SnIGTSFyULpGPgK8/0jhZRFpD/FLrpz1gWtN2FkTw=; b=eSTTck1aYlHzmP6mKQKHrhpvbntwknyqyEbJaPz1mPYK+MY4eGK58cxvHgfF2lsId8 mfqVqEpQ84iaHTu/TFAZHsORnmVtAAaFrITAuB5j+xiF/gEEkpckL5WwuqoqR2dFWVux ZqlE0KL43uIzYJA5sXGATI0/W22vQiehYfG3xTz8jBRvZp1esA4umiMh4shLgiKkdk7s MO60q3zUS/Lv5LHjDKxXNILSpphkMja1kqSQP4PEjcDZGKzXj5i03FmRs6/HPiLNOWV0 +B845QEdWwYGJTh+XJ4QBzV9cPf7rG3M+7B41cLE+IQ/WssEDBIvFLxfh2tqZ4xCFTd/ nxoA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=D/SnIGTSFyULpGPgK8/0jhZRFpD/FLrpz1gWtN2FkTw=; b=hSwA6UuKXh0djMAUCh6OcBkLMNFM5mceE6yL4OAQxub479+C3mFUGXgmCwf9Szk+NE 2hhWhR1gbLqvZFYJPlV3Amaxg0vow7pyj1byHPiUznKjwtT/MJEAbCnk2+ybOjXyLiSp eYpGOBHb1as1jEH+ngGZv35ri7Y6LHs290kwgiq0GAINxgcyntziD4kUEko7w7DfGVqJ MzZCM6mp5/Ruqs5UZCHA8Ovv5q5dpEi8Vef1uPbqX/HZwi44q+xAMdPwEkYpl54au0rS 7eSlnO+Us15lnZ1JMsy72bh8rk2MO4N1y/k7YJO/Mj4GWXIhscZIXfpS7f2uPBcZWXtL 7Zzg==
X-Gm-Message-State: AGi0PuZLTl8M6t/iufH0MBLEQRZeiqSe27iZnRlKbC4pTrpRldzNGvbj pUtysitw7hmwb5ZyMuF2MJT1LSG5C2mGoC+iV5uWVFwRiIg=
X-Google-Smtp-Source: APiQypKzrTnmGWmHlLFAFa6F+pdLFx1R9PtEq8RebLTpSZWi2Phw6zWk92O4hcHL57uUppKmHiGhzywTlA1Lv4BDKG4=
X-Received: by 2002:a9f:2065:: with SMTP id 92mr16139551uam.33.1589319065190; Tue, 12 May 2020 14:31:05 -0700 (PDT)
MIME-Version: 1.0
References: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com> <06627DFC-6F54-4FCB-A071-F4F9D671B1CC@gmx.de>
In-Reply-To: <06627DFC-6F54-4FCB-A071-F4F9D671B1CC@gmx.de>
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 12 May 2020 17:30:48 -0400
Message-ID: <CADVnQykBXW5Y-+on1CQpN1vg_umV3DKqE+grKS9kvVP1y9NC3g@mail.gmail.com>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: tsvwg IETF list <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/jq8Aaw93Y29f7NRpcHMZWMj6cIY>
Subject: Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 May 2020 21:31:09 -0000

Hi Sebastian,

Some thoughts in-line below...

On Sat, May 9, 2020 at 6:53 AM Sebastian Moeller <moeller0@gmx.de> wrote:
>
> Hi Neal,
>
>
> > On May 8, 2020, at 17:19, Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org> wrote:
> > [...]
> >
> > - SCE seems to involve an ecosystem with a more complex and more
> >   experimental CC (with two different kinds of ECN signal) and little
> >   real-world/production experience yet.. L4S seems to involve an ecosystem
> >   that provides a queue that is basically a single-threshold,
> >   shallow-threshold, DCTCP-style, ECN ecosystem, which is simpler and for
> >   which the world has a lot of accumulated academic research and
> >   real-world/production experience over the last decade.
>
> [SM] Interestingly, I take it as a considerable downside that in a decade of
> work L4S has not managed to come up with robust and reliable solutions to
> its challenges. "Too little, too late", comes to mind as much as "robust
> solution after years of diligent engineering", but which one it is is still
> an open question.
>
> One more downside of the long-winding development is that the change of
> reference protocol from DCTCP to TCP Prague basically devalues the old DCTCPs
> measurements as proof of safety.
<
> My point is, it seems odd, using indirect measures like accumulated
> development time and magnitude of conducted tests as proxies for the quality
> of L4S instead of actually looking closely into the RFCs and compare their
> claims with the existing data. I am not saying that my assessment of L4S'
> implementation not being close to its promises is the only conclusion one
> can come to, but I would hope that everybody chiming into this consensus
> questions actually takes the time to look at that closely for themselves. It
> is easy to promise the sky, delivery & execution however...

Both L4S and SCE have algorithms and implementations that are works in
progress, and not set in stone at this point. Since they are works in
progress, I think it's worthwhile to focus on the core question we are
facing here, which is about the interpretation of the ECT(1) code
point. I think it's useful to distinguish between what is inherent in
the interpretation of the code point from what is incidental in the
current algorithms/implementations on either side.

> > - L4S flows potentially causing unfairness in RFC3168 ECN bottlenecks has
>     been mentioned as a potential concern. However, a robust RFC3168 ECN
>     bottleneck should already have a mechanism to avoid unfairness caused by
>     flows that are marked as ECT(0|1) and yet not performing RFC3168
>     responses.
>
> [SM] That essentially declares all non-FQ AQMs to be fair game, no?

No, there are ways to deal with abusive flows that do not require fair queuing.

>  Because
>  if they wanted better isolation they could get it (at a cost). That seems at
>  odds with the extra mile L4S goes to avoid using FQ solutions even for a
>  problem that is exceptionally well suited for FQ. Because that can easily be
>  turned around, why not demand the same level of robustness from L4S instead,
>  it being the newcomer and all? Say, require L4S to monitor flow behavior and
>  make its classification based on observed behavior instead of a simple
>  assertion by the sender (ECT(1) is nothing more than that, it is at best a
>  classification on intent, while the thing that should be classified is
>  behavior.) In the context of another thread it seems clear that pure intent
>  signaling is actually expected to be abused:
...
> While I do not fully agree that every sender rightfully should try to abuse
> the network at all costs, I accept that the potential is there and solutions
> need to take this into account in their threat modeling (and IMHO L4S has not
> done so sufficiently, simply claiming without supporting evidence that ECT(1)
> can not be abused is either naively optimistic or intentionally misguided).

L4S does not claim that ECT(1) cannot be abused. Rather, it has a
rather well-developed story for detecting and dealing with abuse of
the ECT(1) code point with queue protection algorithms. Please see:

  https://tools.ietf.org/html/draft-ietf-tsvwg-l4s-arch-04#section-8.2
  https://tools.ietf.org/html/draft-briscoe-docsis-q-protection-00

> > In particular, many of the large sources of known deployments of RFC3168 --
> > Linux fq_codel and cake -- are already deployed with fair queueing. In such
> > bottlenecks L4S traffic should not cause harm to other non-L4S flows.
>
> [SM] Mmmh, that requires active defenses by existing network to
> accommodate a newcomer...

It's not perfect, but we can't let the perfect be the enemy of the
good, and need to evaluate all the trade-offs of the alternatives
holistically.

Best regards,
neal