Re: [tcpm] Linux doesn’t implement RFC3465

github repo sounds good to me.

On Fri, Jul 30, 2021 at 1:26 PM Vidhi Goel <vidhi_goel=
40apple.com@dmarc.ietf.org> wrote:

> Hi Mark, Yuchung, Neal,
>
> Given that we have some ideas / suggestion text on improving 3465, how do
> you think we should proceed?
> Mark, do you want to start a GitHub repo or such with some of the changes
> already suggested so far and others can review and / or contribute to the
> bis draft?
>
> Thanks,
> Vidhi
>
> On Jul 30, 2021, at 11:04 AM, Yuchung Cheng <
> ycheng=40google.com@dmarc.ietf.org> wrote:
>
>
>
> On Thu, Jul 29, 2021 at 6:03 PM Vidhi Goel <vidhi_goel=
> 40apple.com@dmarc.ietf.org> wrote:
>
>> Well, perhaps.  L=2 was designed to exactly counteract delayed ACKs.
>>>> So, it isn't exactly a new magic number.  We could wave our hands
>>>> and say "5 seems OK" or "10 seems OK" or whatever.  And, I am sure
>>>> we could come up with something that folks felt was fine.  However,
>>>> my feeling is that if we want to worry about bursts then let's worry
>>>> about bursts in some generic way.  And, if you have some way to deal
>>>> with bursts then L isn't needed.  And, if you don't have a way to
>>>> deal with bursts then a conservative L seems fine.  But, perhaps
>>>> putting the effort into a generic mechanism instead of cooking yet
>>>> another magic number we need to periodically refresh is probably a
>>>> better way to spend effort.
>>>>
>>>
>>> Yes, I very much agree that "putting the effort into a generic mechanism
>>> instead of cooking yet another magic number we need to periodically refresh
>>> is probably a better way to spend effort.”
>>>
>>>
>>> I agree that defining such a number doesn’t fully solve the problem but
>>> it gives some recommendation for implementations that don’t do pacing. So,
>>> defining a somewhat less restrictive value for L (5 or 10) would be a last
>>> resort for implementations that don’t pace.
>>>
>> How about putting a number 10, and also put all the rationales to follow
>> to decide a higher or lower value. It's never one-size for all.
>>
>>
>> That sounds great. Something on the lines of,
>>
>>  “This document RECOMMENDS using mechanisms like Pacing to control how
>> many bytes are sent to the network at a point of time. But if it is not
>> possible to implement pacing, an implementation MAY implicitly pace their
>> traffic by applying a limit L to the increase in congestion window per ACK
>> during slow start. In modern stacks, acknowledgments are aggregated for
>> various reason, CPU optimization, reducing network load etc. Hence it is
>> common for a sender to receive an aggregated ACK that acknowledges more
>> than 2 segments. For example, a stack that implements GRO could aggregate
>> packets up to 64Kbytes or ~44 segments before passing on to the TCP layer
>> and this would result in a single ACK to be generated by the TCP stack.
>> Given that an initial window of 10 packets in current deployments has
>> been working fine, the draft makes a recommendation to set L=10 during slow
>> start. This would mean that with every ACK, we are probing for a new
>> capacity by sending 10 packets in addition to the previously discovered
>> capacity. Implementations MAY choose to set a lower limit if they believe
>> an increase of 10 is too aggressive."
>>
>> Does this sound like what we would like to say?
>>
> Thanks for taking a shot. I would put more description on Pacing to ensure
> better implementation. How about:
> "Pacing here refers to spread packet transmission following a rate based
> on the congestion window and round trip." with a citation of
> https://datatracker.ietf.org/doc/html/rfc7661#section-4.4.2
>
>
>
> I would also refer to IW RFC 6928 in case it gets increased / updated a
> few years later.
> Hmm maybe we should also move RFC6928 to the standard track :-)
>
>
>> -
>> Vidhi
>>
>> On Jul 29, 2021, at 1:47 PM, Yuchung Cheng <
>> ycheng=40google.com@dmarc.ietf.org> wrote:
>>
>>
>>
>> On Thu, Jul 29, 2021 at 1:19 PM Vidhi Goel <vidhi_goel=
>> 40apple.com@dmarc.ietf.org> wrote:
>>
>>> Well, perhaps.  L=2 was designed to exactly counteract delayed ACKs.
>>>> So, it isn't exactly a new magic number.  We could wave our hands
>>>> and say "5 seems OK" or "10 seems OK" or whatever.  And, I am sure
>>>> we could come up with something that folks felt was fine.  However,
>>>> my feeling is that if we want to worry about bursts then let's worry
>>>> about bursts in some generic way.  And, if you have some way to deal
>>>> with bursts then L isn't needed.  And, if you don't have a way to
>>>> deal with bursts then a conservative L seems fine.  But, perhaps
>>>> putting the effort into a generic mechanism instead of cooking yet
>>>> another magic number we need to periodically refresh is probably a
>>>> better way to spend effort.
>>>>
>>>
>>> Yes, I very much agree that "putting the effort into a generic mechanism
>>> instead of cooking yet another magic number we need to periodically refresh
>>> is probably a better way to spend effort.”
>>>
>>>
>>> I agree that defining such a number doesn’t fully solve the problem but
>>> it gives some recommendation for implementations that don’t do pacing. So,
>>> defining a somewhat less restrictive value for L (5 or 10) would be a last
>>> resort for implementations that don’t pace.
>>>
>> How about putting a number 10, and also put all the rationales to follow
>> to decide a higher or lower value. It's never one-size for all.
>>
>> Also I believe it's time to move ABC into the standards track, in the era
>> of (bigger and bigger) stretch ACKs.
>>
>>
>>> Thanks,
>>> Vidhi
>>>
>>>
>>>
>>> On Jul 29, 2021, at 8:19 AM, Neal Cardwell <ncardwell@google.com> wrote:
>>>
>>>
>>>
>>> On Thu, Jul 29, 2021 at 10:06 AM Mark Allman <mallman@icir.org> wrote:
>>>
>>>>
>>>> >>     (b) If there is no burst mitigation then we have to figure out
>>>> >>         if L is still useful for this purpose and whether we want to
>>>> >>         retain it.  Seems like perhaps L=2 is sensible here.  L was
>>>> >>         never meant to be some general burst mitigator.  However,
>>>> >>         ABC clearly *can* aggravate bursting and so perhaps it makes
>>>> >>         sense to have it also try to limit the impact of the
>>>> >>         aggravation (in the absence of some general mechanism).
>>>> >
>>>> > Even if recommending a static L value, IMHO L=2 is a bit
>>>> > conservative.
>>>>
>>>> Well, perhaps.  L=2 was designed to exactly counteract delayed ACKs.
>>>> So, it isn't exactly a new magic number.  We could wave our hands
>>>> and say "5 seems OK" or "10 seems OK" or whatever.  And, I am sure
>>>> we could come up with something that folks felt was fine.  However,
>>>> my feeling is that if we want to worry about bursts then let's worry
>>>> about bursts in some generic way.  And, if you have some way to deal
>>>> with bursts then L isn't needed.  And, if you don't have a way to
>>>> deal with bursts then a conservative L seems fine.  But, perhaps
>>>> putting the effort into a generic mechanism instead of cooking yet
>>>> another magic number we need to periodically refresh is probably a
>>>> better way to spend effort.
>>>>
>>>
>>> Yes, I very much agree that "putting the effort into a generic mechanism
>>> instead of cooking yet another magic number we need to periodically refresh
>>> is probably a better way to spend effort."
>>>
>>>>
>>>> >>   - During slow starts that follow RTOs there is a general
>>>> >>     problem that just because the window slides by X bytes
>>>> >>     doesn't say anything about the *network*, as that sliding can
>>>> >>     happen because much of the data was likely queued for the
>>>> >>     application on the receiver.  So, e.g., you can RTO and send
>>>> >>     one packet and get an ACK back that slides the window 10
>>>> >>     packets.  That doesn't mean 10 packets left.  It means one
>>>> >>     packet left the network and nine packets are eligible to be
>>>> >>     sent to the application.  So, it is not OK to set the cwnd to
>>>> >>     1+10 = 11 packets in response to this ACK.  Here L should
>>>> >>     exist and be 1.
>>>> >
>>>> > AFAICT this argument only applies to non-SACK connections. For
>>>> > connections with SACK (the vast majority of connections over the
>>>> > public Internet and in datacenters), it is quite feasible to
>>>> > determine how many packets really left the network (and Linux TCP
>>>> > does this; see below).
>>>>
>>>> If you have an accurate way to figure out how many of the ACKed
>>>> bytes left the network and how many were just buffered at the
>>>> receiver then I see no problem with increasing based on byte count
>>>> as you do in the initial slow start.
>>>>
>>>> (I don't remember what the paper you cite says, but my guess is it's
>>>> often the case that L=1 is a reasonable substitute for something
>>>> complicated here.  But, perhaps I am running the simulation in my
>>>> head wrong ... it has been a while, admittedly!)
>>>>
>>>> > Yes, offload mechanisms are so pervasive in practice,
>>>>
>>>> I am trying to build a mental model here.  How pervasive would you
>>>> guess these are?  And, where in the network?  I have assumed that
>>>> they are for sure pervasive in data centers and server farms, but
>>>> not for the vast majority of Internet-connected devices.
>>>>
>>>
>>> From my impression looking at public Internet traces, aggregation
>>> mechanisms that cause TCP ACKs for more than 2 segments are very common. I
>>> suspect that's because the majority of public Internet traffic these days
>>> has a bottleneck that is either wifi, cellular, or DOCSIS, and all of these
>>> have a shared medium with a large latency overhead for L2 MAC control of
>>> gets to speak next. So a lot of batching happens, both in big batches of
>>> data that arrive at the client in the same L2 medium time slot, and big
>>> batches of ACKs that accumulate while the client waits (often several
>>> milliseconds, sometimes even tens of milliseconds) for its chance to send a
>>> big stretch ACK or batch of ACKs.
>>>
>>> This brings up a related point: even if there is some ABC-style per-ACK
>>> L limit on cwnd increases, the time structure of most public Internet ACK
>>> streams is massively bursty because of these aggregation mechanisms
>>> inherent in L2 behavior on most public Internet bottlenecks (wifi,
>>> cellular, DOCSIS). So even if there is a limit L that limits the per-ACK
>>> behavior to be smooth, if there is no pacing of data segments then the data
>>> transmit time structure will still be bursty because the ACK arrivals these
>>> days are very bursty.
>>>
>>> best regards,
>>> neal
>>>
>>>
>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
>>
>>
>