Re: [iccrg] draft-briscoe-iccrg-prague-congestion-control: CE-marked bytes or packets?

Hi Sebastian,

Thanks for raising that. I'm afraid I won't have time to read and
comment on the draft for ECN encapsulation (
https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-encap-guidelines-17
).

best regards,
neal

On Thu, Aug 11, 2022 at 1:36 PM Sebastian Moeller <moeller0@gmx.de> wrote:
>
> Dear Neal,
>
> over in the tsvwg we have a draft for ECN encapsulation (https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-encap-guidelines-17) where section 4.6 touches the same issue (especially goal #2 and its example implementation is aimed at propagating CE marks in a way that roughly conserves the number of marked bytes*). It would be great if you could have a look at that section and maybe post comments over in the tsvwg list?
>
> Thanks in advance
>         Sebastian
>
> *) The proposed method clearly is incomplete and feels like it was devised in a text editor and is not a description of implemented and working code, which fits with your observation that eve TCP Prague uses packet marking over byte marking logic. My personal preference (for what it is worth) would be not to include incomplete methods and examples in an RFC that have never been put to a real test.
>
>
>
> > On Aug 11, 2022, at 16:43, Neal Cardwell <ncardwell=40google.com@dmarc.ietf.org> wrote:
> >
> > Another argument against byte-counting ECN responses:
> >
> > (4) The fact that Prague updates its EWMA alpha once per round trip seems to suggest that it is trying to maintain a time-weighted marking probability, since it is effectively taking one "frac" sample per round trip. If byte-based weighting was really preferable, then presumably the EWMA alpha should be updated per equal volume of data delivered, e.g. updating the EWMA alpha per 100 KBytes of data delivered. Since Prague updates the EWMA alpha once per round trip, the EWMA will diverge wildly from the inter-round-trip byte weighting if the volumes of data (in bytes) in each round trip diverge widely, which is quite common for web or RPC traffic.
> >
> > best regards,
> > neal
> >
> >
> >
> > On Wed, Aug 10, 2022 at 6:11 PM Neal Cardwell <ncardwell@google.com> wrote:
> > Re:
> >   https://datatracker.ietf.org/doc/html/draft-briscoe-iccrg-prague-congestion-control-01
> >
> > and the passages:
> >
> > "2.3.2.  Moving Average of ECN Feedback
> > ...it measures the fraction, frac, of ACKed bytes that carried ECN
> > feedback over the previous round trip. ...
> >
> > 2.4.3.  Additive Increase and ECN Feedback
> > ...a Prague CC applies additive increase irrespective of its CWR
> > state, but only for bytes that have been ACK'd without ECN feedback.
> > ...  This approach reduces additive increase as the marking
> > probability increases..."
> >
> > I was curious about the design choice to specify that the algorithm
> > reacts to the fraction of *bytes* that have been CE-marked instead of
> > the fraction of *packets*. IMHO it would be useful for the document to
> > outline the motivation.
> >
> > Apologies if I have missed this in previous e-mail discussions or
> > presentations. I may well have. :-)
> >
> > I can imagine a number of potential reasons why it could be
> > advantageous to react to the fraction of packets CE-marked rather than
> > the fraction of bytes CE-marked:
> >
> > (1) AFAICT byte counters distort the path's ECN marking probability
> > more than using packet counters. For example, suppose we have a round
> > trip with 100 packets sent at roughly uniform intervals across the
> > round trip time:
> >
> > o  99 packets of 1 byte each, all CE-marked
> > o 1 packet of 1000 bytes that was not CE-marked
> >
> > Then the byte-based Prague "frac" ("the fraction, frac, of ACKed bytes
> > that carried ECN feedback over the previous round trip") is:
> >
> >   99 bytes / 1099 bytes ~= .09
> >
> > Whereas the fraction of ACKed packets that carried ECN feedback is:
> >
> >    99 packets / 100 packet = .99
> >
> > So in this toy example there is a >10x difference in the CE "frac"
> > signal depending on whether bytes or packets are counted.
> >
> > And given that these packets were spaced uniformly across the round
> > trip, 99% of the time the bottleneck had excess queuing. This 99%
> > number is well reflected in a packet-based "frac", but seems to imply
> > that the byte-based "frac" approach dramatically underestimates the
> > probability that a packet will encounter excessive queuing, aka the
> > packet CE marking probability.
> >
> > The Prague draft in section 1 mentions:
> >
> > " The Prague CC is a particular instance of a scalable congestion control. ...
> > For a scalable congestion control B=1, so its response function takes
> > the form cwnd = K/p. ...
> > p:  Steady-state probability of drop or marking"
> >
> > So Prague is defined as a scalable congestion control, which has a
> > response function that is a function of the probability of ECN
> > marking. But AFAICT the "frac" mentioned in the Prague spec is a
> > byte-weighted number, and by contrast the fraction of *packets*
> > CE-marked is a much better estimate of the probability of a packet
> > being CE-marked (which is my interpretation of the somewhat ambiguous
> > "probability of drop or marking").
> >
> > (2) The current Linux TCP reference implementation of TCP Prague does
> > not actually use bytes; it uses packets. Likewise, DCTCP and BBRv2 use
> > packets rather than bytes. So AFAIK the real-world deployment
> > experience with shallow-threshold ECN thus far is almost entirely with
> > packet-based algorithms rather than byte-based algorithms. It seems
> > risky to specify Prague with a byte-based approach that has not been
> > tested, especially given that the byte-based and packet-based
> > algorithms can measure massively different signals in some cases (see
> > (1) above).
> >
> > (3) AFAIK byte counters are not available when relying on the AccECN
> > ACE field if there is ACK loss, since the CE marks counted in the ACE
> > field cannot be properly matched against the size of segments that
> > were already ACKed and freed. So in environments where only the ACE
> > field is available then this would imply that TCP Prague cannot be
> > used (since Prague is specified only in bytes). This would seem to
> > significantly limit the utility of the ACE field and/or byte-based
> > Prague, in such scenarios. If Prague were defined in terms of packets
> > then it seems that perhaps it could be more likely to be useful in
> > paths that only support the ACE field and strip out the AccECN option?
> >
> >
> > In summary, if byte counting is considered preferable, IMHO it would
> > be good to document in this draft why this is so, change the Linux TCP
> > Prague code to use the byte-based approach, and then for the
> > definition of "p" in the draft to specify that it means the
> > probability that a payload "byte" is CE marked rather than leaving the
> > bytes/packets distinction ambiguous.
> >
> > best regards,
> > neal
> > _______________________________________________
> > iccrg mailing list
> > iccrg@irtf.org
> > https://www.irtf.org/mailman/listinfo/iccrg
>