Re: [tcpm] Linux doesn’t implement RFC3465

Neal Cardwell <> Thu, 29 July 2021 15:19 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A82C23A25C2 for <>; Thu, 29 Jul 2021 08:19:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -18.097
X-Spam-Status: No, score=-18.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.499, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id J7ilBQmPsuZL for <>; Thu, 29 Jul 2021 08:19:35 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::933]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id D15F53A25EB for <>; Thu, 29 Jul 2021 08:19:34 -0700 (PDT)
Received: by with SMTP id v3so2724683uau.3 for <>; Thu, 29 Jul 2021 08:19:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uIXoovpFH5cpbVPTiEaS8+9BS0tKC3WyrV5nZyho+to=; b=C95B1MiPPE00z34L/7VcaUG4hHJ/YM4oRasMcLEjo4XdWli4JKCoHz0tUWnr1f07tC u+TJejetpI7K/6JtGFFCOMnGYA5pEPwTsAzSFB/YNCUWhsthBukkjCyIFXD7bKVWdrPh Z0uatkJGaRsRz30S/pRW5hNjptiJcazyF5GDOqifXAYwbVyql+uW1SBSWAiNvO2ZDMd2 iEhJyLB6vvmEuXQGcppppGW21Hh4CM/4UfoYRjiQ2RnuwBlh1RNVl871y0WALYHng1rS ETVAEB9M/clfVg4EXHxMq9OBeFNCp3h7KHBEOhq9TWdwZHnuNLrfBs1bZIgXd4pTng6c siQw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uIXoovpFH5cpbVPTiEaS8+9BS0tKC3WyrV5nZyho+to=; b=lK46ijBkrZx31spTe5ZR0mxY8GncIMHe3VWwam+S5EL1yc3JZYbQdO6V0wDAzstApi 6MZtHmFwxzC2XI2sEa4vjy97ZbwnFv2sg1qM9Yoo1+C1cz/WDHVLSZZlNas2fwQYtqvT NzdEKCXOQeWt7l51GA10G1I8x4P9hK+SaN/8rwEjRsiB9OoDuQoiU/ZxB01IeYV2h9zX +aGlEhyoIz68Ic7hbK6VTJINbgX09zu6eTeH29QXagZUN1YCr16z4hTmplhcdRusvumy FbhpiqY2m/NBLntA7pzeAbBo0oA8B59zVArCw5VB9ffs/0pGP+/s2nxyNRjL3YRkmVOD 3tEw==
X-Gm-Message-State: AOAM533D1qjDVSj7um3miYDmEfaOdiI4SCw4TPL8/i5EIxJCC8JscaUL RbteDq+GhPudGLx22UUGRmuS/Y/vujRvftfY8tMHUI2hKLyaNg==
X-Google-Smtp-Source: ABdhPJxFzw6yFlPYQ/etGOPglSaNQlVkD9XLUndQ2nwBSUDjWtSGyrKlN3zqxjMnMd+IEFVulh7cruzgS92uU37YFk0=
X-Received: by 2002:ab0:42d:: with SMTP id 42mr5027275uav.63.1627571972718; Thu, 29 Jul 2021 08:19:32 -0700 (PDT)
MIME-Version: 1.0
References: <> <> <> <> <> <> <> <>
In-Reply-To: <>
From: Neal Cardwell <>
Date: Thu, 29 Jul 2021 11:19:16 -0400
Message-ID: <>
To: Mark Allman <>
Cc: Vidhi Goel <>, Extensions <>
Content-Type: multipart/alternative; boundary="000000000000481a3605c844a15c"
Archived-At: <>
Subject: Re: [tcpm] =?utf-8?q?Linux_doesn=E2=80=99t_implement_RFC3465?=
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 29 Jul 2021 15:19:40 -0000

On Thu, Jul 29, 2021 at 10:06 AM Mark Allman <> wrote:

> >>     (b) If there is no burst mitigation then we have to figure out
> >>         if L is still useful for this purpose and whether we want to
> >>         retain it.  Seems like perhaps L=2 is sensible here.  L was
> >>         never meant to be some general burst mitigator.  However,
> >>         ABC clearly *can* aggravate bursting and so perhaps it makes
> >>         sense to have it also try to limit the impact of the
> >>         aggravation (in the absence of some general mechanism).
> >
> > Even if recommending a static L value, IMHO L=2 is a bit
> > conservative.
> Well, perhaps.  L=2 was designed to exactly counteract delayed ACKs.
> So, it isn't exactly a new magic number.  We could wave our hands
> and say "5 seems OK" or "10 seems OK" or whatever.  And, I am sure
> we could come up with something that folks felt was fine.  However,
> my feeling is that if we want to worry about bursts then let's worry
> about bursts in some generic way.  And, if you have some way to deal
> with bursts then L isn't needed.  And, if you don't have a way to
> deal with bursts then a conservative L seems fine.  But, perhaps
> putting the effort into a generic mechanism instead of cooking yet
> another magic number we need to periodically refresh is probably a
> better way to spend effort.

Yes, I very much agree that "putting the effort into a generic mechanism
instead of cooking yet another magic number we need to periodically refresh
is probably a better way to spend effort."

> >>   - During slow starts that follow RTOs there is a general
> >>     problem that just because the window slides by X bytes
> >>     doesn't say anything about the *network*, as that sliding can
> >>     happen because much of the data was likely queued for the
> >>     application on the receiver.  So, e.g., you can RTO and send
> >>     one packet and get an ACK back that slides the window 10
> >>     packets.  That doesn't mean 10 packets left.  It means one
> >>     packet left the network and nine packets are eligible to be
> >>     sent to the application.  So, it is not OK to set the cwnd to
> >>     1+10 = 11 packets in response to this ACK.  Here L should
> >>     exist and be 1.
> >
> > AFAICT this argument only applies to non-SACK connections. For
> > connections with SACK (the vast majority of connections over the
> > public Internet and in datacenters), it is quite feasible to
> > determine how many packets really left the network (and Linux TCP
> > does this; see below).
> If you have an accurate way to figure out how many of the ACKed
> bytes left the network and how many were just buffered at the
> receiver then I see no problem with increasing based on byte count
> as you do in the initial slow start.
> (I don't remember what the paper you cite says, but my guess is it's
> often the case that L=1 is a reasonable substitute for something
> complicated here.  But, perhaps I am running the simulation in my
> head wrong ... it has been a while, admittedly!)
> > Yes, offload mechanisms are so pervasive in practice,
> I am trying to build a mental model here.  How pervasive would you
> guess these are?  And, where in the network?  I have assumed that
> they are for sure pervasive in data centers and server farms, but
> not for the vast majority of Internet-connected devices.

>From my impression looking at public Internet traces, aggregation
mechanisms that cause TCP ACKs for more than 2 segments are very common. I
suspect that's because the majority of public Internet traffic these days
has a bottleneck that is either wifi, cellular, or DOCSIS, and all of these
have a shared medium with a large latency overhead for L2 MAC control of
gets to speak next. So a lot of batching happens, both in big batches of
data that arrive at the client in the same L2 medium time slot, and big
batches of ACKs that accumulate while the client waits (often several
milliseconds, sometimes even tens of milliseconds) for its chance to send a
big stretch ACK or batch of ACKs.

This brings up a related point: even if there is some ABC-style per-ACK L
limit on cwnd increases, the time structure of most public Internet ACK
streams is massively bursty because of these aggregation mechanisms
inherent in L2 behavior on most public Internet bottlenecks (wifi,
cellular, DOCSIS). So even if there is a limit L that limits the per-ACK
behavior to be smooth, if there is no pacing of data segments then the data
transmit time structure will still be bursty because the ACK arrivals these
days are very bursty.

best regards,