Re: [tcpm] Linux doesn’t implement RFC3465

Yuchung Cheng <ycheng@google.com> Wed, 28 July 2021 23:20 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E4E4F3A0978 for <tcpm@ietfa.amsl.com>; Wed, 28 Jul 2021 16:20:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -18.097
X-Spam-Level:
X-Spam-Status: No, score=-18.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.499, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uWJtp1Zy505Q for <tcpm@ietfa.amsl.com>; Wed, 28 Jul 2021 16:20:54 -0700 (PDT)
Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 29B223A0976 for <tcpm@ietf.org>; Wed, 28 Jul 2021 16:20:54 -0700 (PDT)
Received: by mail-wm1-x330.google.com with SMTP id m19so2490402wms.0 for <tcpm@ietf.org>; Wed, 28 Jul 2021 16:20:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=C25ou07oyl99bXHuP37iyZAnVlRNumHWYU0HF2PbHrA=; b=Yo74OR5JJJGDD/WNnwOyB8tPyogoulJEr1nvfJkSetF3sOTTnyXgIdcxu8yc0sb/EM +CJ56YWyP9eydPuZabCeQTa8b4baY4B7sdD7+FPIyEFF1Mavwq0/vmUtRvMt2f7LOuTb y02xpfGYS2L9/73FbdfK+BqaGecRWVRV7UKAnjieMmZxs4eHdz7z4Snz5jgB2j/CPLBt V3Ix2ZISb7ToOfBz4yNj/AGXAGbarvPJl5UU+BCfK4wTp3bjBx3F/0URwA31SU/ayOys VB+I0TQ9Ck75Eo4Wmva6c5Y6S4gUulXlFQQKxeNq8dYuJMBRDvzuXObeCT47jfWcQgqw +WEw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C25ou07oyl99bXHuP37iyZAnVlRNumHWYU0HF2PbHrA=; b=Mj4IInEEPl9/gjSQHgevR6tQWOt28GcO12xXheBFqVPY51yPJ0xjYVHOSI9TECHO5L P7SVnFyjutyMx4jdOLN54ofVkDkMM+fP2i89gwNyS70RP5hMIe4kYB3vfOLNMv8TYLac ei7shyzXqE2bNh+AR2OESM1cjUMa2VVzbzZnVYsjywYnHExaFzQ5OPH7igBwnKV9HYvo WiVK3+pw1bG4wsOV/3QN5NTgslUItRprxZ6KgUI3KvLonw8tRKtpKTmRz7qpy0YTjEBx YTi4msRHi1T0XqjbwnVmrShnm0RmwwqC63S2731Fq59KodjBIzWRLp9vIoFLu7rsitMF L7Kw==
X-Gm-Message-State: AOAM533vLIyI2Ne5e2HjS/w/tVzuKkNfJFZuLMVejMmiPkvBmiG3D4dr YM2cT3JYooLd1AA3aorvIN7mOo1xqG9L7rKqJ3OFHw==
X-Google-Smtp-Source: ABdhPJy0F5ZwwkwbPtx0lGD2YqcoX1GKgxB1npNPxu/RCyJmt1bGOQO7PZ/ZLT0U9xfsJMWIP/uSRZDZpNUTAIRv3H8=
X-Received: by 2002:a05:600c:198a:: with SMTP id t10mr11496654wmq.32.1627514450979; Wed, 28 Jul 2021 16:20:50 -0700 (PDT)
MIME-Version: 1.0
References: <78EF3761-7CAF-459E-A4C0-57CDEAFEA8EE@apple.com> <CADVnQynkBxTdapXN0rWOuWO3KXQ2qb6x=xhB35XrMU38JkX2DQ@mail.gmail.com> <601D9D4F-A82C-475A-98CC-383C1F876C44@apple.com> <54699CC9-C8F5-4CA3-8815-F7A21AE10429@icsi.berkeley.edu> <DF5EF1C7-0940-478A-9518-62185A79A288@apple.com>
In-Reply-To: <DF5EF1C7-0940-478A-9518-62185A79A288@apple.com>
From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 28 Jul 2021 16:20:14 -0700
Message-ID: <CAK6E8=fb1xioSzhj0kXPrkRz+cqkJbkC=2s643uHgiR8bERp1g@mail.gmail.com>
To: Vidhi Goel <vidhi_goel=40apple.com@dmarc.ietf.org>
Cc: Mark Allman <mallman@icir.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>, Neal Cardwell <ncardwell@google.com>
Content-Type: multipart/alternative; boundary="000000000000b8205605c8373c71"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/rs5AM2LAyEr5nmMOswlfPQQJuog>
Subject: Re: [tcpm] =?utf-8?q?Linux_doesn=E2=80=99t_implement_RFC3465?=
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jul 2021 23:20:59 -0000

Thank you Vidhi and Mark for supporting an ABC update.

To give this a start, my recommended update is:

1) if the sender uses some form of pacing to send data packets, L is
RECOMMENDED to be the effective window worth of packets. Pacing here refers
to spread packet transmission following a rate based on the congestion
window and round trip. Additionally if SACK is supported, the same L
applies in slow start after RTO

2) otherwise L is RECOMMENDED to be 8

Thoughts?


On Wed, Jul 28, 2021 at 2:13 PM Vidhi Goel <vidhi_goel=
40apple.com@dmarc.ietf.org> wrote:

> Resurrecting the 3465 thread.
>
> In the TCPM meeting at IETF 111, we discussed about this issue of L=2
> which is a MUST in RFC 3465. This is a very strict requirement and stacks
> like Linux already doesn’t follow it.
>
> The basic principle in the Linux code is that the sender should use the
> ACKs to learn about the capacity of the path (both in volumetric and rate
> dimensions), and should not ignore that information. This allows the sender
> to quickly grow and achieve high throughput, even in the presence of
> stretch ACKs, which are pervasive, due to TSO/GSO, GRO/LRO, etc.
>
> Considering bursts is important, but that can be tackled as an orthogonal
> issue. Bursts are avoided in the Linux TCP ecosystem by the combination of
> TSO autosizing, pacing, TSQ, and fair queueing.
>
>
> As Neal described, the congestion controller should use the information in
> stretch ACKs to increase its congestion window so that it correctly adjusts
> the *cwnd* based on available link capacity.
> Burstiness is an orthogonal issue which can be solved by pacing.
>
> QUIC loss recovery (RFC 9002) also follows this approach.
>
> *Mark*,
> As a lot of transport and congestion control drafts reference RFC 3465, do
> you think we should update this RFC to reflect the current deployment? This
> would also be useful for someone who is just starting with a new
> implementation.
>
> Thanks,
> Vidhi
>
> On Nov 27, 2019, at 5:14 AM, Mark Allman <mallman@icir.org> wrote:
>
>
> +Mark Allman
>
>
> Just to clear it up, I *was* at BBN long ago when the ABC document
> was written.  It's a cool place to work.  I recommend it.  But, I
> now hang out at ICSI.
>
> I believe that ABC was written to solve the problem with ACK
> counting by counting the number of bytes acknowledged for
> misbehaving receivers. Limiting the increase to 2*MSS was a good
> solution to avoid bursts at the time.
>
>
> The main motivation behind ABC was to counteract delayed ACKs.  The
> common approach at the time was to just bump cwnd by one MSS every
> time an ACK rolled in.  So, if an ACK covered two segments because
> the receiver was delaying the ACKs then the growth rate during slow
> start was 1.5x per RTT instead of the 2x that was really
> envisioned.  During congestion avoidance the growth was 1 MSS every
> 2 RTTs instead of the envisioned every RTT.
>
> A secondary motivation was to counteract these ACK division attacks
> that Savage taught us about.  I.e., we could ACK an MSS-sized packet
> one byte at a time and the sender would then increase the cwnd by
> MSS*MSS bytes in the prevalent ACK counting scheme (i.e., cwnd would
> get bumped by MSS bytes for every ACK).
>
> The limit has two roots ...
>
> (1) The limit is important in slow starts that follow an RTO.  As
>    the RFC discusses, in this case we might retransmit a single
>    packet and this will cause the receiver's window to slide a
>    great deal.  Therefore, an ACK may indicate that a ton of data
>    has left the network, but that isn't really the case.  So, we
>    don't want to increase the cwnd based on all the new bytes
>    ACKed.
>
>    I have since mostly decided that this use of L is crude.
>    Probably there is a more elegant way to do it by using the
>    scoreboard and the SACK information to get a better
>    understanding of what left the network and when.  That said, in
>    this case L is simple and probably about right most of the
>    time.
>
> (2) I think there was some general conservativeness to bursts and
>    using L everywhere quelled some of the worry.  Here L=2 was used
>    to exactly offset delayed ACKs.
>
> I agree that increasing the congestion window and controlling the
> burst rate are orthogonal issues.
>
>
> Yes.  In fact, we did subsequent research on mitigating bursts
> because we never really thought of ABC as somehow the way to control
> bursts (research papers available, but never went into RFCs).
>
> And, in a world that leverages stretch ACKs as a routine I think
> Linux's approach of not using an L may well be correct.  Documenting
> that and the reasoning behind it in modern networks seems useful to
> me.
>
> allman
>
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>