[iccrg] draft-briscoe-iccrg-prague-congestion-control: CE-marked bytes or packets?

Neal Cardwell <ncardwell@google.com> Wed, 10 August 2022 22:11 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: iccrg@ietfa.amsl.com
Delivered-To: iccrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 10E0AC157B42 for <iccrg@ietfa.amsl.com>; Wed, 10 Aug 2022 15:11:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -22.608
X-Spam-Level:
X-Spam-Status: No, score=-22.608 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fZkdqUY6XnbV for <iccrg@ietfa.amsl.com>; Wed, 10 Aug 2022 15:11:24 -0700 (PDT)
Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 51EC4C157903 for <iccrg@irtf.org>; Wed, 10 Aug 2022 15:11:19 -0700 (PDT)
Received: by mail-qv1-xf33.google.com with SMTP id m10so12052979qvu.4 for <iccrg@irtf.org>; Wed, 10 Aug 2022 15:11:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc; bh=sub8LZdkRs2MDS7DW5D0NcDm3FT7382wDwvuyxaQuqw=; b=X4KPdzQjhwZcmFsf+h1ip30IDk0sfw05Q1lrWwjQH7K4te0ybUgNk0x0EjtYf0wGek /9Fr+zMUtlYeBuewmBLHyWc2kPmLr3i7yBBPoHxyKPfCP4IXxY5zN77OA8NVaGskeWY+ i/CODA4p2PEMMpRLZhEeTMZhjEj+1pKasqpH/MUCRNmAZF3yMqUJYcmqMWH/3DfGr722 Oo+/fMuGcaBSTz5scDWOvHANfWM6wISbM5iVO6iAliavMAwTwJHAWWp1KKVoVmpCH9FW dPWOfUN4tmOwUGsC5LI/i1X7Us0iwngQolAQu+ZJo0t21OR//JirbPVtqmSF5YaYP/L9 wK6w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc; bh=sub8LZdkRs2MDS7DW5D0NcDm3FT7382wDwvuyxaQuqw=; b=d2mQTNpQzS1+Q9CQzNvkslgnwk07apQnbiwKrjLMGpLEvPCwFhH37eB/FGgwckuebp A56B2Der4bnJuL1PyHaRJs9zu4JfyapXuo6v+SUd0UW/RVcNkESAdK2JTjiYBKLRXLs3 w0Sat5/ncP9zDLaf9Nky+LLr30NQX8aKuVki5mPTfMcjE3kDHaCyFGLhLugFSQb1DJoA 0dT7BCnb4xTYQCeD2zTVWHi447LXKMh2JfNgWMx/qS8qpldG4LsZlWkJ8wNIKX3mXWD1 K8HSkZSR+VXGEiCFbZy8+D3gGfQvFaFmzCGNykrAURKgM+C3hKZet/yYYEBN7qxtYuP3 QrFg==
X-Gm-Message-State: ACgBeo0VwBC1R86fCSc54OPSDC0Cv0HYKWmXBX1hr3SGGkodY/DNsW4/ v0kX6bGx+G+ES5x1zS7owecZd2grC+poeFi7PuMCPw==
X-Google-Smtp-Source: AA6agR5PeW7YJxJ6FJwgFDSmc39zEjEErRSbmnu7mGRwlptmezAu6cDgDHHs4VJ2iTBsPzoz629gAp2YdOJlVLIECIs=
X-Received: by 2002:ad4:4ee8:0:b0:474:6f9c:a103 with SMTP id dv8-20020ad44ee8000000b004746f9ca103mr25679262qvb.47.1660169478282; Wed, 10 Aug 2022 15:11:18 -0700 (PDT)
MIME-Version: 1.0
From: Neal Cardwell <ncardwell@google.com>
Date: Wed, 10 Aug 2022 18:11:02 -0400
Message-ID: <CADVnQykxwaqZTGXR-ZMYLEem0rKfAcT7KkHYgsF4dBdWvi2k4w@mail.gmail.com>
To: Bob Briscoe <ietf@bobbriscoe.net>, "Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com>, "De Schepper, Koen (Koen)" <koen.de_schepper@nokia.com>
Cc: iccrg IRTF list <iccrg@irtf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/ePpX9rC9mTc_RPOZrIaa2i2KRWk>
Subject: [iccrg] draft-briscoe-iccrg-prague-congestion-control: CE-marked bytes or packets?
X-BeenThere: iccrg@irtf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Discussions of Internet Congestion Control Research Group \(ICCRG\)" <iccrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/iccrg>, <mailto:iccrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/iccrg/>
List-Post: <mailto:iccrg@irtf.org>
List-Help: <mailto:iccrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/iccrg>, <mailto:iccrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Aug 2022 22:11:25 -0000

Re:
  https://datatracker.ietf.org/doc/html/draft-briscoe-iccrg-prague-congestion-control-01

and the passages:

"2.3.2.  Moving Average of ECN Feedback
...it measures the fraction, frac, of ACKed bytes that carried ECN
feedback over the previous round trip. ...

2.4.3.  Additive Increase and ECN Feedback
...a Prague CC applies additive increase irrespective of its CWR
state, but only for bytes that have been ACK'd without ECN feedback.
...  This approach reduces additive increase as the marking
probability increases..."

I was curious about the design choice to specify that the algorithm
reacts to the fraction of *bytes* that have been CE-marked instead of
the fraction of *packets*. IMHO it would be useful for the document to
outline the motivation.

Apologies if I have missed this in previous e-mail discussions or
presentations. I may well have. :-)

I can imagine a number of potential reasons why it could be
advantageous to react to the fraction of packets CE-marked rather than
the fraction of bytes CE-marked:

(1) AFAICT byte counters distort the path's ECN marking probability
more than using packet counters. For example, suppose we have a round
trip with 100 packets sent at roughly uniform intervals across the
round trip time:

o  99 packets of 1 byte each, all CE-marked
o 1 packet of 1000 bytes that was not CE-marked

Then the byte-based Prague "frac" ("the fraction, frac, of ACKed bytes
that carried ECN feedback over the previous round trip") is:

  99 bytes / 1099 bytes ~= .09

Whereas the fraction of ACKed packets that carried ECN feedback is:

   99 packets / 100 packet = .99

So in this toy example there is a >10x difference in the CE "frac"
signal depending on whether bytes or packets are counted.

And given that these packets were spaced uniformly across the round
trip, 99% of the time the bottleneck had excess queuing. This 99%
number is well reflected in a packet-based "frac", but seems to imply
that the byte-based "frac" approach dramatically underestimates the
probability that a packet will encounter excessive queuing, aka the
packet CE marking probability.

The Prague draft in section 1 mentions:

" The Prague CC is a particular instance of a scalable congestion control. ...
For a scalable congestion control B=1, so its response function takes
the form cwnd = K/p. ...
p:  Steady-state probability of drop or marking"

So Prague is defined as a scalable congestion control, which has a
response function that is a function of the probability of ECN
marking. But AFAICT the "frac" mentioned in the Prague spec is a
byte-weighted number, and by contrast the fraction of *packets*
CE-marked is a much better estimate of the probability of a packet
being CE-marked (which is my interpretation of the somewhat ambiguous
"probability of drop or marking").

(2) The current Linux TCP reference implementation of TCP Prague does
not actually use bytes; it uses packets. Likewise, DCTCP and BBRv2 use
packets rather than bytes. So AFAIK the real-world deployment
experience with shallow-threshold ECN thus far is almost entirely with
packet-based algorithms rather than byte-based algorithms. It seems
risky to specify Prague with a byte-based approach that has not been
tested, especially given that the byte-based and packet-based
algorithms can measure massively different signals in some cases (see
(1) above).

(3) AFAIK byte counters are not available when relying on the AccECN
ACE field if there is ACK loss, since the CE marks counted in the ACE
field cannot be properly matched against the size of segments that
were already ACKed and freed. So in environments where only the ACE
field is available then this would imply that TCP Prague cannot be
used (since Prague is specified only in bytes). This would seem to
significantly limit the utility of the ACE field and/or byte-based
Prague, in such scenarios. If Prague were defined in terms of packets
then it seems that perhaps it could be more likely to be useful in
paths that only support the ACE field and strip out the AccECN option?


In summary, if byte counting is considered preferable, IMHO it would
be good to document in this draft why this is so, change the Linux TCP
Prague code to use the byte-based approach, and then for the
definition of "p" in the draft to specify that it means the
probability that a payload "byte" is CE marked rather than leaving the
bytes/packets distinction ambiguous.

best regards,
neal