Re: [tcpm] Linux doesn’t implement RFC3465

Yuchung Cheng <ycheng@google.com> Thu, 29 July 2021 20:48 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2AD573A0B16 for <tcpm@ietfa.amsl.com>; Thu, 29 Jul 2021 13:48:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -18.097
X-Spam-Level:
X-Spam-Status: No, score=-18.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.499, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VARmlIo4xM_y for <tcpm@ietfa.amsl.com>; Thu, 29 Jul 2021 13:48:23 -0700 (PDT)
Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3FDB63A0B12 for <tcpm@ietf.org>; Thu, 29 Jul 2021 13:48:23 -0700 (PDT)
Received: by mail-wm1-x332.google.com with SMTP id o5-20020a1c4d050000b02901fc3a62af78so7668631wmh.3 for <tcpm@ietf.org>; Thu, 29 Jul 2021 13:48:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8+QZpxAU3xXpM/jNeqsG88s2dnWrwGxZFTiFFO2Zc9s=; b=QJBlmjMqVHMofKZZPGlvHOWRqrXJCxpJa2uZHm7jgkrIQsFPmIJQgr7wcYJB/5R2qO Ofd/Oo9lAXX8WB78Jrf+7nCrD6e9v2SP7+2YErqqpT2kVKFsUgGT0QO5M4AUrJRyvmWF 02Po90Ercnm1PuAX9DBe8avrr7uPNs3U+B3eOy3QOUwU7hcmFz+f0aY7y7yRJ8M44Gr0 9b2I/tstvejEjmIRB67uxwiPuFqBVXhUDnSXHUYQq5QAZ2ZtRr7lIXjILPgxf3ht9xbA Kt+8oGLnSnAPHkAMfbCciBGFCls8TXag7sc3WOdJbkx9gxRmOgceMFKT54IuJF/DzzCa XpRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8+QZpxAU3xXpM/jNeqsG88s2dnWrwGxZFTiFFO2Zc9s=; b=L1Od4on7aMfqF/ES33dN0S4QOYwBUwXazYYvhn0p0FEIW0ucDdEJBQZqtGyQAR20q2 jGVcQDJr+cdDmAe8k0hTbFKVkG4okuNejyLcKYTHyBaM/sK/2BzAh8n3HVLrou4L45YI qzwpiFvI38LT0M+nTfWjvu+EyxRXWyEJDE6dSGcbpNi1kqGWpXAONbJ36lmrBQVU3wUV NdUqo/UHhy5Ku6CRLc/AVJLY2e1dzOQdZEQtLpKLPCivsWARmCtMgjH4f38vwBtXFC89 Yw9vzGQCRGvBu2V3Z8Cn7Ji1acLvCisnvMnIY6TV/wKrDjUKeKPSyyZM3ECBWdM/nTSl YBXQ==
X-Gm-Message-State: AOAM531WIGZ632/sLRojFPJ08Vt1Lle/z2wseTzvSzkTX2op7nAx4GKO Z24Aqtx2au1fxBJPPLwUpavKsqgy0IraFuwxo361yQ==
X-Google-Smtp-Source: ABdhPJzBEtPNN4A59vh/X75jUZLj7GlO7AKXHwOt3VhN7rXK6kQIfo2kZs58uLcEqXxG/qHUozZiQlsm3wWgT6W5Afw=
X-Received: by 2002:a1c:f206:: with SMTP id s6mr6446543wmc.102.1627591700089; Thu, 29 Jul 2021 13:48:20 -0700 (PDT)
MIME-Version: 1.0
References: <78EF3761-7CAF-459E-A4C0-57CDEAFEA8EE@apple.com> <CADVnQynkBxTdapXN0rWOuWO3KXQ2qb6x=xhB35XrMU38JkX2DQ@mail.gmail.com> <601D9D4F-A82C-475A-98CC-383C1F876C44@apple.com> <54699CC9-C8F5-4CA3-8815-F7A21AE10429@icsi.berkeley.edu> <DF5EF1C7-0940-478A-9518-62185A79A288@apple.com> <E150D881-4AB3-4AEA-BE0C-1D4B47B2C531@icir.org> <CADVnQynjE+D-OSvdOVROjT3y1cnHHWqdNQSmphLAJ+HsBTUAJQ@mail.gmail.com> <A1B50403-2405-4348-9626-025D255DEAE7@icir.org> <CADVnQykM8p-bVz_oPrje1yNh9_7_isAUL+wnQWDoY9Gs18sLPQ@mail.gmail.com> <11FE4818-87E7-4FD8-8F45-E19CD9A3366A@apple.com>
In-Reply-To: <11FE4818-87E7-4FD8-8F45-E19CD9A3366A@apple.com>
From: Yuchung Cheng <ycheng@google.com>
Date: Thu, 29 Jul 2021 13:47:42 -0700
Message-ID: <CAK6E8=fFWAE_NSr45i2mdh6NmYDusUFW3GYGtuo-FcL07sox9A@mail.gmail.com>
To: Vidhi Goel <vidhi_goel=40apple.com@dmarc.ietf.org>
Cc: Neal Cardwell <ncardwell@google.com>, Mark Allman <mallman@icir.org>, Extensions <tcpm@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000020345805c849392d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/OXIJzBPWEQ0S3NYxqBCcdTuM84M>
Subject: Re: [tcpm] =?utf-8?q?Linux_doesn=E2=80=99t_implement_RFC3465?=
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2021 20:48:28 -0000

On Thu, Jul 29, 2021 at 1:19 PM Vidhi Goel <vidhi_goel=
40apple.com@dmarc.ietf.org> wrote:

> Well, perhaps.  L=2 was designed to exactly counteract delayed ACKs.
>> So, it isn't exactly a new magic number.  We could wave our hands
>> and say "5 seems OK" or "10 seems OK" or whatever.  And, I am sure
>> we could come up with something that folks felt was fine.  However,
>> my feeling is that if we want to worry about bursts then let's worry
>> about bursts in some generic way.  And, if you have some way to deal
>> with bursts then L isn't needed.  And, if you don't have a way to
>> deal with bursts then a conservative L seems fine.  But, perhaps
>> putting the effort into a generic mechanism instead of cooking yet
>> another magic number we need to periodically refresh is probably a
>> better way to spend effort.
>>
>
> Yes, I very much agree that "putting the effort into a generic mechanism
> instead of cooking yet another magic number we need to periodically refresh
> is probably a better way to spend effort.”
>
>
> I agree that defining such a number doesn’t fully solve the problem but it
> gives some recommendation for implementations that don’t do pacing. So,
> defining a somewhat less restrictive value for L (5 or 10) would be a last
> resort for implementations that don’t pace.
>
How about putting a number 10, and also put all the rationales to follow to
decide a higher or lower value. It's never one-size for all.

Also I believe it's time to move ABC into the standards track, in the era
of (bigger and bigger) stretch ACKs.


> Thanks,
> Vidhi
>
>
>
> On Jul 29, 2021, at 8:19 AM, Neal Cardwell <ncardwell@google.com> wrote:
>
>
>
> On Thu, Jul 29, 2021 at 10:06 AM Mark Allman <mallman@icir.org> wrote:
>
>>
>> >>     (b) If there is no burst mitigation then we have to figure out
>> >>         if L is still useful for this purpose and whether we want to
>> >>         retain it.  Seems like perhaps L=2 is sensible here.  L was
>> >>         never meant to be some general burst mitigator.  However,
>> >>         ABC clearly *can* aggravate bursting and so perhaps it makes
>> >>         sense to have it also try to limit the impact of the
>> >>         aggravation (in the absence of some general mechanism).
>> >
>> > Even if recommending a static L value, IMHO L=2 is a bit
>> > conservative.
>>
>> Well, perhaps.  L=2 was designed to exactly counteract delayed ACKs.
>> So, it isn't exactly a new magic number.  We could wave our hands
>> and say "5 seems OK" or "10 seems OK" or whatever.  And, I am sure
>> we could come up with something that folks felt was fine.  However,
>> my feeling is that if we want to worry about bursts then let's worry
>> about bursts in some generic way.  And, if you have some way to deal
>> with bursts then L isn't needed.  And, if you don't have a way to
>> deal with bursts then a conservative L seems fine.  But, perhaps
>> putting the effort into a generic mechanism instead of cooking yet
>> another magic number we need to periodically refresh is probably a
>> better way to spend effort.
>>
>
> Yes, I very much agree that "putting the effort into a generic mechanism
> instead of cooking yet another magic number we need to periodically refresh
> is probably a better way to spend effort."
>
>>
>> >>   - During slow starts that follow RTOs there is a general
>> >>     problem that just because the window slides by X bytes
>> >>     doesn't say anything about the *network*, as that sliding can
>> >>     happen because much of the data was likely queued for the
>> >>     application on the receiver.  So, e.g., you can RTO and send
>> >>     one packet and get an ACK back that slides the window 10
>> >>     packets.  That doesn't mean 10 packets left.  It means one
>> >>     packet left the network and nine packets are eligible to be
>> >>     sent to the application.  So, it is not OK to set the cwnd to
>> >>     1+10 = 11 packets in response to this ACK.  Here L should
>> >>     exist and be 1.
>> >
>> > AFAICT this argument only applies to non-SACK connections. For
>> > connections with SACK (the vast majority of connections over the
>> > public Internet and in datacenters), it is quite feasible to
>> > determine how many packets really left the network (and Linux TCP
>> > does this; see below).
>>
>> If you have an accurate way to figure out how many of the ACKed
>> bytes left the network and how many were just buffered at the
>> receiver then I see no problem with increasing based on byte count
>> as you do in the initial slow start.
>>
>> (I don't remember what the paper you cite says, but my guess is it's
>> often the case that L=1 is a reasonable substitute for something
>> complicated here.  But, perhaps I am running the simulation in my
>> head wrong ... it has been a while, admittedly!)
>>
>> > Yes, offload mechanisms are so pervasive in practice,
>>
>> I am trying to build a mental model here.  How pervasive would you
>> guess these are?  And, where in the network?  I have assumed that
>> they are for sure pervasive in data centers and server farms, but
>> not for the vast majority of Internet-connected devices.
>>
>
> From my impression looking at public Internet traces, aggregation
> mechanisms that cause TCP ACKs for more than 2 segments are very common. I
> suspect that's because the majority of public Internet traffic these days
> has a bottleneck that is either wifi, cellular, or DOCSIS, and all of these
> have a shared medium with a large latency overhead for L2 MAC control of
> gets to speak next. So a lot of batching happens, both in big batches of
> data that arrive at the client in the same L2 medium time slot, and big
> batches of ACKs that accumulate while the client waits (often several
> milliseconds, sometimes even tens of milliseconds) for its chance to send a
> big stretch ACK or batch of ACKs.
>
> This brings up a related point: even if there is some ABC-style per-ACK L
> limit on cwnd increases, the time structure of most public Internet ACK
> streams is massively bursty because of these aggregation mechanisms
> inherent in L2 behavior on most public Internet bottlenecks (wifi,
> cellular, DOCSIS). So even if there is a limit L that limits the per-ACK
> behavior to be smooth, if there is no pacing of data segments then the data
> transmit time structure will still be bursty because the ACK arrivals these
> days are very bursty.
>
> best regards,
> neal
>
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>