Re: [tcpm] Linux doesn’t implement RFC3465

Vidhi Goel <vidhi_goel@apple.com> Thu, 29 July 2021 20:19 UTC

Return-Path: <vidhi_goel@apple.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9F91D3A08DC for <tcpm@ietfa.amsl.com>; Thu, 29 Jul 2021 13:19:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.552
X-Spam-Level:
X-Spam-Status: No, score=-7.552 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.452, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=apple.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Utpf0XKZfC5h for <tcpm@ietfa.amsl.com>; Thu, 29 Jul 2021 13:19:28 -0700 (PDT)
Received: from ma1-aaemail-dr-lapp02.apple.com (ma1-aaemail-dr-lapp02.apple.com [17.171.2.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 369B93A08CB for <tcpm@ietf.org>; Thu, 29 Jul 2021 13:19:27 -0700 (PDT)
Received: from pps.filterd (ma1-aaemail-dr-lapp02.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp02.apple.com (8.16.0.42/8.16.0.42) with SMTP id 16TKIdUp011614; Thu, 29 Jul 2021 13:19:25 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=from : message-id : content-type : mime-version : subject : date : in-reply-to : cc : to : references; s=20180706; bh=KfnieGMF5eH02mFyLoKxBxd+HQu/kMxqR3J93zgHbp4=; b=JnyvL2qiRSXlwiWAzJVp6qpgdMOsF0D0oSvBe+baTBiXwhEnCVF5a6SJuLaQXyJmJ/qQ PXpYiHRhsx7fjRVP3CC1z7hzdR4BxOsCc6s1+rJxCu2wyucxy97pqN/P2sr0RjJLAIpc qx38EKvf0ek1m5iAb4QLkFuWNk5nzcHl5uHbQ1LE57oiU56rgKlQ99QyNTDCpojCzwXE rLsqV/9X+L3o95g8KCXUQYbdplxLA5oH1td2Qa0qDxgRDpKb+CwWR7Rz29eXNcDAXd70 AtLC+st/TCAndVz0JplrhdZc+EsokLJgdQmo04N+6/tuu125K2+dHPe8rGUNj6q8vus+ Uw==
Received: from rn-mailsvcp-mta-lapp04.rno.apple.com (rn-mailsvcp-mta-lapp04.rno.apple.com [10.225.203.152]) by ma1-aaemail-dr-lapp02.apple.com with ESMTP id 3a38p52wg8-13 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Thu, 29 Jul 2021 13:19:25 -0700
Received: from rn-mailsvcp-mmp-lapp02.rno.apple.com (rn-mailsvcp-mmp-lapp02.rno.apple.com [17.179.253.15]) by rn-mailsvcp-mta-lapp04.rno.apple.com (Oracle Communications Messaging Server 8.1.0.9.20210415 64bit (built Apr 15 2021)) with ESMTPS id <0QX0006UGWGDBR50@rn-mailsvcp-mta-lapp04.rno.apple.com>; Thu, 29 Jul 2021 13:19:25 -0700 (PDT)
Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp02.rno.apple.com by rn-mailsvcp-mmp-lapp02.rno.apple.com (Oracle Communications Messaging Server 8.1.0.9.20210415 64bit (built Apr 15 2021)) id <0QX000I00WFDMP00@rn-mailsvcp-mmp-lapp02.rno.apple.com>; Thu, 29 Jul 2021 13:19:25 -0700 (PDT)
X-Va-A:
X-Va-T-CD: 4a7e35e76faa9baa589abca3404fc9ee
X-Va-E-CD: 8ad83cf34a8c3732cb73dea81270d36d
X-Va-R-CD: 2eec4d4b333905bf2498bde820845c7b
X-Va-CD: 0
X-Va-ID: e589ef69-4f91-48f8-8371-e91c7778ef4c
X-V-A:
X-V-T-CD: 4a7e35e76faa9baa589abca3404fc9ee
X-V-E-CD: 8ad83cf34a8c3732cb73dea81270d36d
X-V-R-CD: 2eec4d4b333905bf2498bde820845c7b
X-V-CD: 0
X-V-ID: f28d182d-9590-4a39-938e-fe4e83e059a5
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-29_16:2021-07-29, 2021-07-29 signatures=0
Received: from smtpclient.apple (unknown [17.11.74.148]) by rn-mailsvcp-mmp-lapp02.rno.apple.com (Oracle Communications Messaging Server 8.1.0.9.20210415 64bit (built Apr 15 2021)) with ESMTPSA id <0QX0010DIWGC1K00@rn-mailsvcp-mmp-lapp02.rno.apple.com>; Thu, 29 Jul 2021 13:19:24 -0700 (PDT)
From: Vidhi Goel <vidhi_goel@apple.com>
Message-id: <11FE4818-87E7-4FD8-8F45-E19CD9A3366A@apple.com>
Content-type: multipart/alternative; boundary="Apple-Mail=_4596C771-A40E-41BF-8DB1-BBC0C488B5D1"
MIME-version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.11\))
Date: Thu, 29 Jul 2021 13:19:23 -0700
In-reply-to: <CADVnQykM8p-bVz_oPrje1yNh9_7_isAUL+wnQWDoY9Gs18sLPQ@mail.gmail.com>
Cc: Extensions <tcpm@ietf.org>
To: Neal Cardwell <ncardwell@google.com>, Mark Allman <mallman@icir.org>
References: <78EF3761-7CAF-459E-A4C0-57CDEAFEA8EE@apple.com> <CADVnQynkBxTdapXN0rWOuWO3KXQ2qb6x=xhB35XrMU38JkX2DQ@mail.gmail.com> <601D9D4F-A82C-475A-98CC-383C1F876C44@apple.com> <54699CC9-C8F5-4CA3-8815-F7A21AE10429@icsi.berkeley.edu> <DF5EF1C7-0940-478A-9518-62185A79A288@apple.com> <E150D881-4AB3-4AEA-BE0C-1D4B47B2C531@icir.org> <CADVnQynjE+D-OSvdOVROjT3y1cnHHWqdNQSmphLAJ+HsBTUAJQ@mail.gmail.com> <A1B50403-2405-4348-9626-025D255DEAE7@icir.org> <CADVnQykM8p-bVz_oPrje1yNh9_7_isAUL+wnQWDoY9Gs18sLPQ@mail.gmail.com>
X-Mailer: Apple Mail (2.3654.100.0.2.11)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-29_16:2021-07-29, 2021-07-29 signatures=0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/NakG895cJmzW1KsiscRNqeiP5dM>
Subject: Re: [tcpm] =?utf-8?q?Linux_doesn=E2=80=99t_implement_RFC3465?=
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2021 20:19:37 -0000

> Well, perhaps.  L=2 was designed to exactly counteract delayed ACKs.
> So, it isn't exactly a new magic number.  We could wave our hands
> and say "5 seems OK" or "10 seems OK" or whatever.  And, I am sure
> we could come up with something that folks felt was fine.  However,
> my feeling is that if we want to worry about bursts then let's worry
> about bursts in some generic way.  And, if you have some way to deal
> with bursts then L isn't needed.  And, if you don't have a way to
> deal with bursts then a conservative L seems fine.  But, perhaps
> putting the effort into a generic mechanism instead of cooking yet
> another magic number we need to periodically refresh is probably a
> better way to spend effort.
> 
> Yes, I very much agree that "putting the effort into a generic mechanism instead of cooking yet another magic number we need to periodically refresh is probably a better way to spend effort.” 

I agree that defining such a number doesn’t fully solve the problem but it gives some recommendation for implementations that don’t do pacing. So, defining a somewhat less restrictive value for L (5 or 10) would be a last resort for implementations that don’t pace.

Thanks,
Vidhi



> On Jul 29, 2021, at 8:19 AM, Neal Cardwell <ncardwell@google.com> wrote:
> 
> 
> 
> On Thu, Jul 29, 2021 at 10:06 AM Mark Allman <mallman@icir.org <mailto:mallman@icir.org>> wrote:
> 
> >>     (b) If there is no burst mitigation then we have to figure out
> >>         if L is still useful for this purpose and whether we want to
> >>         retain it.  Seems like perhaps L=2 is sensible here.  L was
> >>         never meant to be some general burst mitigator.  However,
> >>         ABC clearly *can* aggravate bursting and so perhaps it makes
> >>         sense to have it also try to limit the impact of the
> >>         aggravation (in the absence of some general mechanism).
> >
> > Even if recommending a static L value, IMHO L=2 is a bit
> > conservative.
> 
> Well, perhaps.  L=2 was designed to exactly counteract delayed ACKs.
> So, it isn't exactly a new magic number.  We could wave our hands
> and say "5 seems OK" or "10 seems OK" or whatever.  And, I am sure
> we could come up with something that folks felt was fine.  However,
> my feeling is that if we want to worry about bursts then let's worry
> about bursts in some generic way.  And, if you have some way to deal
> with bursts then L isn't needed.  And, if you don't have a way to
> deal with bursts then a conservative L seems fine.  But, perhaps
> putting the effort into a generic mechanism instead of cooking yet
> another magic number we need to periodically refresh is probably a
> better way to spend effort.
> 
> Yes, I very much agree that "putting the effort into a generic mechanism instead of cooking yet another magic number we need to periodically refresh is probably a better way to spend effort." 
> 
> >>   - During slow starts that follow RTOs there is a general
> >>     problem that just because the window slides by X bytes
> >>     doesn't say anything about the *network*, as that sliding can
> >>     happen because much of the data was likely queued for the
> >>     application on the receiver.  So, e.g., you can RTO and send
> >>     one packet and get an ACK back that slides the window 10
> >>     packets.  That doesn't mean 10 packets left.  It means one
> >>     packet left the network and nine packets are eligible to be
> >>     sent to the application.  So, it is not OK to set the cwnd to
> >>     1+10 = 11 packets in response to this ACK.  Here L should
> >>     exist and be 1.
> >
> > AFAICT this argument only applies to non-SACK connections. For
> > connections with SACK (the vast majority of connections over the
> > public Internet and in datacenters), it is quite feasible to
> > determine how many packets really left the network (and Linux TCP
> > does this; see below).
> 
> If you have an accurate way to figure out how many of the ACKed
> bytes left the network and how many were just buffered at the
> receiver then I see no problem with increasing based on byte count
> as you do in the initial slow start.
> 
> (I don't remember what the paper you cite says, but my guess is it's
> often the case that L=1 is a reasonable substitute for something
> complicated here.  But, perhaps I am running the simulation in my
> head wrong ... it has been a while, admittedly!)
> 
> > Yes, offload mechanisms are so pervasive in practice,
> 
> I am trying to build a mental model here.  How pervasive would you
> guess these are?  And, where in the network?  I have assumed that
> they are for sure pervasive in data centers and server farms, but
> not for the vast majority of Internet-connected devices.
> 
> From my impression looking at public Internet traces, aggregation mechanisms that cause TCP ACKs for more than 2 segments are very common. I suspect that's because the majority of public Internet traffic these days has a bottleneck that is either wifi, cellular, or DOCSIS, and all of these have a shared medium with a large latency overhead for L2 MAC control of gets to speak next. So a lot of batching happens, both in big batches of data that arrive at the client in the same L2 medium time slot, and big batches of ACKs that accumulate while the client waits (often several milliseconds, sometimes even tens of milliseconds) for its chance to send a big stretch ACK or batch of ACKs.
> 
> This brings up a related point: even if there is some ABC-style per-ACK L limit on cwnd increases, the time structure of most public Internet ACK streams is massively bursty because of these aggregation mechanisms inherent in L2 behavior on most public Internet bottlenecks (wifi, cellular, DOCSIS). So even if there is a limit L that limits the per-ACK behavior to be smooth, if there is no pacing of data segments then the data transmit time structure will still be bursty because the ACK arrivals these days are very bursty. 
> 
> best regards,
> neal