Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal

Neal Cardwell <ncardwell@google.com> Tue, 12 May 2020 19:47 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF02A3A0763 for <tsvwg@ietfa.amsl.com>; Tue, 12 May 2020 12:47:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.6
X-Spam-Level:
X-Spam-Status: No, score=-17.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m0jhEhDgeoFk for <tsvwg@ietfa.amsl.com>; Tue, 12 May 2020 12:47:49 -0700 (PDT)
Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6DA913A0762 for <tsvwg@ietf.org>; Tue, 12 May 2020 12:47:49 -0700 (PDT)
Received: by mail-ua1-x932.google.com with SMTP id 36so5195242uaf.9 for <tsvwg@ietf.org>; Tue, 12 May 2020 12:47:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gjsDyek9UZK3qgL+j1tQLmnh5NZzo2hNjXzegVZ1gJc=; b=rxuofm7wZ8+fgwAcYOxlUt3QAOtdtXuW/wzwISsN274a1lHq5GCVpVOc+KX2DGqBJ0 B+J9S1quIUsSO4G4KcDpm2i0MxrVLFIja7WODHlggbHFU0UlO/Ixm7jUS+7q7cAR6hh4 tlW7EeDmsjrjT4/SKOxU89NE29lW9PjZw1gwAfF1d5gJ2FFUzTDi3QxZE+HQmumDTBP3 P4cKvq/H7OUCPAQt+CpObIuF8jFEV5YmoR8yUXvuervU7cBDHBmS57jYxEor62LIVkjO r3LQ/isN+AcNVd/iXWp6yerWyWf30s64xfODjlon/xPw5STmyZyFhmTO8YztdQZT684E pHoA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gjsDyek9UZK3qgL+j1tQLmnh5NZzo2hNjXzegVZ1gJc=; b=GvS1SZXMNaCB/H9Qe4bSrqJeF+xprPxTUeWhWlOOW8qhxSYqzHD9fiKJ5oL7iPo0wL i5qURWB3iYUu3CDzbzPirGxU8C26zVkTTJktuepBk0PRMV30ZYWGEwD5JTKmIMSMYll1 ipQGihlwH3he22nUy/2mYNXTQt1joO5N8ryyvPcyqvr1Kp4ulW5DTX165dcuLTr5cave BoaDwnsH48DAcrOtmPnhl5YQPQmCy3XyIyQivsVx3qb2FJOiJp2GoCY1F1WlKUQKIImb 5y0SlRnVKoNP1Q7uywFEC0YCFARloIR3e92G8oXM8XlVEH94tna851IkrlWmTihQqzzC IETQ==
X-Gm-Message-State: AGi0PubcD/MfhT9h+vroJef5GgCeRTudWSrm2t1fJLLmWPUbGzwyqR0A 3GLQiiBz3kV0JtrhZc/voXnmDupbZKyyD153KGrOwJJUoyk=
X-Google-Smtp-Source: APiQypJf+uOIdEZLBTsdrbVPlYO2SXMZEutiFb6w0VC2EFJnYyqDY+Q3Kf6Iqq3VTAWNcR+NfJzZJ5nt4b6d4m0smAI=
X-Received: by 2002:ab0:25:: with SMTP id 34mr16834964uai.63.1589312868040; Tue, 12 May 2020 12:47:48 -0700 (PDT)
MIME-Version: 1.0
References: <CADVnQy=7f79Mj_GQBU-UsodTRORjB2U6rCPPQ+1Zck_gxr-rww@mail.gmail.com> <2fe941a4-6824-a6bf-5d4d-ac2402912414@wizmail.org>
In-Reply-To: <2fe941a4-6824-a6bf-5d4d-ac2402912414@wizmail.org>
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 12 May 2020 15:47:30 -0400
Message-ID: <CADVnQy=fDbN4mzb0N+GpsVJRZHZkRQX5W5Kp+xJ+D4m=oFwMZQ@mail.gmail.com>
To: Jeremy Harris <jgh@wizmail.org>
Cc: tsvwg IETF list <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/Io3UlijUMEtjwyyxEwFSFBFpqIQ>
Subject: Re: [tsvwg] Neal Cardwell's rationale for supporting ECT(1) as an input/L4S signal
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 May 2020 19:47:51 -0000

On Fri, May 8, 2020 at 1:11 PM Jeremy Harris <jgh@wizmail.org> wrote:
>
> On 08/05/2020 16:19, Neal Cardwell wrote:
> > - SCE is basically an extension of RFC3168 ECN: senders mark their packets
> > ECT(0), and respond to CE like RFC 3168 says they should, by halving cwnd.
> > This would not work well for large sites with a substantial installed base
> > of shallow-threshold (DCTCP-style) ECN: if these sites marked their traffic
> > with ECT(0) to try to use SCE, their local switches configured for
> > shallow-threshold ECN would make CE marks at low queue occupancies, which
> > would cause large RFC3681-style reductions in cwnd, which would cause
> > underutilization. To the best of my knowledge this applies to several large
> > Internet services.
>
> Wait, that doesn't match my understanding.  In such an example site
> using SCE I would expect ECT(0) marking only at "large" queue
> occupancies (and resulting in cwnd halving when seen), but more
> prevalently ECT(1) marking at "small" queue occupancies - resulting
> in some smaller percentage (than 50) decrease in cwnd when seen.

Sorry if my scenario was not clearly expressed. In the scenario I'm
talking about here, there is a substantial installed base of
shallow-threshold (DCTCP-style) ECN switches in the datacenter in
which a server/sender sits. Those switches have hardware that only
knows how to mark with CE marks, and the ECN marking thresholds are
set to be shallow, because that's what allows nice, low latency for
DCTCP-style algorithms, so that's how they've been running for years.
(And the switch hardware is unable to mark with ECT(1) as a congestion
signal.)

If a server/sender in such a context attempts to participate in an SCE
conversation with a client/receiver on the public Internet, there is
the likelihood that shallow queues in the local datacenter bottlenecks
will result in the local switches making CE marks that will be
reflected by the receiver and interpreted as requiring a maximum-sized
cwnd cut (50% for Reno, 30% for CUBIC). That would cause poor
throughput.

That's the basic concern for this scenario.

best,
neal