Re: [iccrg] Updates to draft-briscoe-iccrg-prague-congestion-control-03

Marten Seemann <martenseemann@gmail.com> Wed, 01 November 2023 05:58 UTC

Return-Path: <martenseemann@gmail.com>
X-Original-To: iccrg@ietfa.amsl.com
Delivered-To: iccrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4BD1FC1519A7 for <iccrg@ietfa.amsl.com>; Tue, 31 Oct 2023 22:58:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S30CJ8v7nlXl for <iccrg@ietfa.amsl.com>; Tue, 31 Oct 2023 22:57:57 -0700 (PDT)
Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 896E0C15152E for <iccrg@irtf.org>; Tue, 31 Oct 2023 22:57:57 -0700 (PDT)
Received: by mail-lf1-x131.google.com with SMTP id 2adb3069b0e04-50939d39d0fso1126011e87.1 for <iccrg@irtf.org>; Tue, 31 Oct 2023 22:57:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698818276; x=1699423076; darn=irtf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bOUwgljUkM7coVz4qCabl6MkBGE6d8xVZinYUKkKUMo=; b=YTx1HGON+errEUsxCbo756Z2d4W4OhMZFr2p4IobjNSWZpDauk+LWuZGa3gViExN39 tT1QEeKiEQE3W9U55FilqLD2ojuaeya15CA6bdId8RNJ0vh4QWRt21chdlAHOFoedeJo Qj/9Pao50uKbdQn2zIfk7hVeWzIZMLyp2LpWfRt3r9maIr2urZpIKorBAmCHNjuxyIsk eI9R4Xd2lXWVnmAnp8it7U8FDhYEPVd0M1D5Uv0hVT7YKnyyAz7//LaX8B/JE3uZzAV8 COnYsCBc0g5H5W77Ewtw0VyDaka6N3mV+HuhAOa6YdTp/jAz4EhGXbZkftuj2gynSBwS ecLg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698818276; x=1699423076; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bOUwgljUkM7coVz4qCabl6MkBGE6d8xVZinYUKkKUMo=; b=RvCoaDdumA8+Ersax1EHZ+psRCL1Eg7MAlR+qYBhhEdq3wUXblyvsb8/3fBe3SMA8P 34kdofDyFDE8mYYYAQGVztI5X9jNaHzkSoCR/SpuFoNRmihP+wds6olv3VsnK25wgHov HebW/ytRZELteEZ5RPbHB+AQezoIAQzJPSnXmbt8g9s6+6xf0A61nNaA65lCDmOjqcGC xZEOsCaHXNSSKUZBXuQ1P3vGIXU+BByoxFMqlDdmM5r+PrNaNvsF9vyuZk9kg/nI5KSL LKaA+QYdsjlvq50KwYB31y12GYi/NrVeiKpHNSy+mAOycvEt1L3MFMFl48qOl7E35fMi TGaw==
X-Gm-Message-State: AOJu0YzNtMwKbKvSahPf3stWSuAWGCm31LOPIAfOC79dT+r7nnuHfYl3 PVcd2UyTl1Bk9XC/LLHLPONbjvEbboO3P2f+znkKIinCHWcE+w==
X-Google-Smtp-Source: AGHT+IGnH0VQJyqe48Oc+Ib8mZFPNXBaR6bsJO43/QH6fRyfLh0n4PEqp+pghlTR8j8q+US0sCRc4SU2MbXlFZsZuMU=
X-Received: by 2002:ac2:43ad:0:b0:507:9dfd:f840 with SMTP id t13-20020ac243ad000000b005079dfdf840mr10302479lfl.68.1698818274987; Tue, 31 Oct 2023 22:57:54 -0700 (PDT)
MIME-Version: 1.0
References: <169728527879.18854.17962028148144369127@ietfa.amsl.com> <0c9d15e7-6f15-4b7c-b1ce-f50854152aef@bobbriscoe.net> <CAOYVs2rFgyRQ1Hdk6g1j9Ku23TS1FRjW2r104H_eUPJioLJLiw@mail.gmail.com> <ba04ef94-17b5-424b-a417-4fce9598ab1a@bobbriscoe.net>
In-Reply-To: <ba04ef94-17b5-424b-a417-4fce9598ab1a@bobbriscoe.net>
From: Marten Seemann <martenseemann@gmail.com>
Date: Wed, 01 Nov 2023 12:57:43 +0700
Message-ID: <CAOYVs2qWYSsEnx7ZgtU-NJvLnudW8VCMP7OGW01B4W_TCtpVng@mail.gmail.com>
To: Bob Briscoe <ietf@bobbriscoe.net>
Cc: iccrg IRTF list <iccrg@irtf.org>
Content-Type: multipart/alternative; boundary="000000000000d1942d060910f39f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/em9Kodd2h_uoB7518gvVLFNaENk>
Subject: Re: [iccrg] Updates to draft-briscoe-iccrg-prague-congestion-control-03
X-BeenThere: iccrg@irtf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Discussions of Internet Congestion Control Research Group \(ICCRG\)" <iccrg.irtf.org>
List-Unsubscribe: <https://mailman.irtf.org/mailman/options/iccrg>, <mailto:iccrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/iccrg/>
List-Post: <mailto:iccrg@irtf.org>
List-Help: <mailto:iccrg-request@irtf.org?subject=help>
List-Subscribe: <https://mailman.irtf.org/mailman/listinfo/iccrg>, <mailto:iccrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Nov 2023 05:58:03 -0000

Thanks to everyone for the responses here. I didn't mean to start a
discussion if L4S' congestion feedback is sufficient, all I'm trying to
understand is how to best feed this signal into a Prague implementation for
a QUIC stack.

> Section 3.1.3 goes on to summarize the brief tech report I posted on
arXiv that explains and defines an alternative approach that removes all
this lag, cited  as [PerAckEWMA]:
>    Removing the Clock Machinery Lag from DCTCP/Prague
<https://arxiv.org/abs/2101.07727>

Interesting. I must have missed this. I'll read your paper later.

> [BB] Yes. It has to be, given the current QUIC protocol.
>
> Despite having the opportunity to fully integrate ECN into the design of
QUIC, it seems it was still a bit of an afterthought (I shouldn't complain,
'cos I volunteered to help with adding ECN, but then didn't get involved
'cos of other pressing work at the time).

I think I'd agree with that statement. I'm wondering if we can easily
define a new ACK frame that provides richer signaling. I wrote up a quick
draft (I can't submit it to the datatracker at this time since the draft
deadline for 118 has passed):
https://github.com/marten-seemann/draft-seemann-quic-accurate-ack-ecn. I
haven't implemented it, but it should be pretty straightforward to do, if
anyone wants to experiment with it.

> Assuming the QUIC RFC is not just badly written, it doesn't seem right.
Once a flow has got going, a time threshold is generally considered more
robust than a packet-threshold. So, even if the packet-threshold is adapted
in parallel to the time-threshold, using the logical OR of them both will
often override the time-threshold with the less robust packet threshold.

We discussed this when we were working on the QUIC RFCs. It was not an
oversight, but a compromise to keep RFC 9002 easy (easier) to implement,
and to allow for future experimentation. See
https://github.com/quicwg/base-drafts/issues/3571 for more context. Do we
have more data now than we had back in 2020? Would it make sense to write a
followup document? Or should the Prague draft explicitly require
implementations to deviate from what's defined in RFC 9002?

> [BB] I think the equation you want is already in §2.4.4, altho I agree
that the text could be clearer, because it states rules you might think are
right then explains why they're wrong, rather than saying what is right
first, then explaining why: Therefore, the increase in cwnd per packet has
to be (1/M^2) * (1/cwnd).
> This gives the increase per packet, not per RTT, but that's what is
needed in an implementation isn't it? Or am I misunderstanding why you want
the increase per RTT in particular?

I'm probably missing something obvious here. Section 2.4.3 defines the
window increase per ACK, not per (acknowledged) packet, so this equation
doesn't seem to be directly applicable.

On Tue, 31 Oct 2023 at 23:33, Bob Briscoe <ietf@bobbriscoe.net> wrote:

> Marten,
>
> On 31/10/2023 10:50, Marten Seemann wrote:
>
> I read the draft and I'm trying to figure out how I'd implement Prague in
> my QUIC stack. There are a couple of things I've noticed:
>
>    1.
>
>    Section 2.3.2: It’s unclear to me when exactly *alpha* is updated. I
>    assume that once I receive the first ACK, I save the timestamp. When I
>    receive a new ACK, there are two code paths: if it’s received within one
>    *rtt_virt*, just accumulate the counters used to calculate *frac*. If
>    it’s received after *rtt_virt*, update *alpha *according to the
>    equation given in this section, reset the counters for *frac* and save
>    the timestamp as the beginning of the next *rtt_virt* epoch. However,
>    this would mean that the *alpha* value used for multiplicative
>    decrease (section 2.4.2) would always be slightly outdated, which seems
>    suboptimal for an immediate response to a growing queue. Is there a better
>    way?
>
> [BB] Indeed. It's actually lot worse than "slightly outdated". As it says
> at the end of the section you refer to:
>
> However, another approach is being investigated because these per-RTT
> updates introduce 1--2 rounds of delay into the congestion response on top
> of the inherent round of feedback delay (see Section 3.1.3
> <https://datatracker.ietf.org/doc/html/draft-briscoe-iccrg-prague-congestion-control-03#pracc_faster_response>
> in the section on variants and future work).
>
>
> Section 3.1.3 goes on to summarize the brief tech report I posted on arXiv
> that explains and defines an alternative approach that removes all this
> lag, cited as [PerAckEWMA]:
>     Removing the Clock Machinery Lag from DCTCP/Prague
> <https://arxiv.org/abs/2101.07727>
>
> Joakim Misund was evaluating it about a year ago, when he decided to get a
> proper job ;) instead of doing his PhD with me. I am just getting round to
> working on it myself again. But if you wanted to try it yourself as wel, we
> could certainly compare notes.
>
>
>    1.
>
>    Section 2.3.3: QUIC uses both packet- and time-threshold loss
>    detection (see sections 6.1.1 and 6.1.2 of RFC 9002). I’m not sure what
>    exactly the recommendation of this draft is.
>
> [BB] I hadn't appreciated that QUIC deems there's been a loss if /either/
> of these conditions is met [RFC9002; §6.1]:
>
> The packet was sent kPacketThreshold packets before an acknowledged packet
> (Section 6.1.1
> <https://datatracker.ietf.org/doc/html/rfc9002#packet-threshold>), or it
> was sent long enough in the past (Section 6.1.2
> <https://datatracker.ietf.org/doc/html/rfc9002#time-threshold>).
>
>
> Whereas TCP RACK [RFC8985], only uses DupThresh at the start of a flow
> (when the RTT is likely to be inaccurate) until a decent reordering window
> is established (if reordering is detected at all):
>
> if some reordering has been observed, then RACK does not trigger fast
> recovery based on DupThresh.
>
> Assuming the QUIC RFC is not just badly written, it doesn't seem right.
> Once a flow has got going, a time threshold is generally considered more
> robust than a packet-threshold. So, even if the packet-threshold is adapted
> in parallel to the time-threshold, using the logical OR of them both will
> often override the time-threshold with the less robust packet threshold.
>
>
>    1.
>
>    [cont] It would certainly be possible to turn off packet-threshold
>    loss detection, and rely on time-threshold altogether. Is that what QUIC
>    implementations should do?
>
>
> [BB] Well, that's certainly what I thought RACK was meant to do.
> This might end up requiring an erratum to RFC9002.
>
>
>    1.
>
>    Section 2.4.2: Is the suppression of further decreases after one
>    ECN-triggered decrease for one *srtt*, or is it one *rtt_virt*?
>    Reading section 2.4.4 it sounds like it’s *rtt_virt*, but this could
>    probably be clarified in this section.
>
>
> [BB] Yes. I will check that the Linux code does that though.
>
> We should have caught that mistake in the draft recently when we went
> through the draft checking all the places where it had said 'RTT' before we
> introduced rtt_virt. I've added this to my list of edits to make for the
> next rev.
>
>
>
>    1.
>
>    Section 2.4.3: The QUIC ACK frame acknowledges (multiple) ranges of
>    packets at the same time, together with cumulative ECN counts. It’s
>    therefore not possible to tell which packet was ECN-marked. This means that
>    a QUIC stack will be able to determine *acked_sacked*, but not
>    *ece_delta*. Is it valid to approximate it by assuming that all
>    packets had the same average size? Either way, this is pretty awkward to
>    fit into the pseudo-code given in appendix B.5 of RFC 9002.
>
>
> [BB] Yes. It has to be, given the current QUIC protocol.
>
> Despite having the opportunity to fully integrate ECN into the design of
> QUIC, it seems it was still a bit of an afterthought (I shouldn't complain,
> 'cos I volunteered to help with adding ECN, but then didn't get involved
> 'cos of other pressing work at the time).
>
>
>
>    1.
>
>    Section 2.4.3: Similarly, what's the correct order to process an ACK
>    that reports an ECN marking: For example, an ACK might acknowledge 20 new
>    packets, and report one ECN marking. I think the correct order would be
>    applying the additive increase for 19 packets first, and then applying the
>    multiplicative decrease afterwards. This is because receiving a CE-marked
>    packet would elicit an immediate ACK frame from a QUIC receiver (RFC 9000,
>    section 13.2.1). The draft should probably be explicit about this.
>
>
> [BB] Good point.
> I agree with your logic, and I've added this to the list of edits too.
> However, I think I'll word it as a SHOULD, 'cos it makes sense when CE
> triggers feedback, but the implementer might have better info in some
> scenarios.
>
>
>    1.
>
>    Section 2.4.4: I'm struggling to follow how exactly cwnd is supposed
>    to change for small RTTs. Most important from an implementation
>    perspective: section 2.4.3 says that *ai_per_rtt* will have a
>    different value for small RTTs. It would be helpful if section 2.4.4 would
>    contain an equation for *ai_per_rtt*.
>
>
> [BB] I think the equation you want is already in §2.4.4, altho I agree
> that the text could be clearer, because it states rules you might think are
> right then explains why they're wrong, rather than saying what is right
> first, then explaining why:
>
> Therefore, the increase in cwnd per packet has to be (1/M^2) * (1/cwnd).
>
> This gives the increase per packet, not per RTT, but that's what is needed
> in an implementation isn't it? Or am I misunderstanding why you want the
> increase per RTT in particular?
>
> Thanks for all these useful comments and questions.
>
>
>
> Bob
>
>
>
> On Sat, 14 Oct 2023 at 19:45, Bob Briscoe <ietf=
> 40bobbriscoe.net@dmarc.ietf.org> wrote:
>
>> iccrg,
>>
>> We've just posted an update to prague-congestion-control.
>> Links to diffs are quoted below.
>> The main technical changes:
>>
>>    - the Apple implementation falls back to CUBIC behaviour on loss
>>    (both the reduction and the subsequent increase). Currently the Linux
>>    implementation still falls back to Reno on loss, but that is being changed.
>>    - how the Apple implementation over QUIC behaves when the path or the
>>    remote peer fails to support ECN properly
>>    - the items already discussed on this list in response to Neal's
>>    review, some of which were editorial, but others were technical, e.g.
>>       - pseudocode for removing integer rounding bias
>>       - clarifying the RTT-independence approach
>>
>> Cheers
>>
>>
>> Bob & co-authors
>>
>>
>>
>> -------- Forwarded Message --------
>> Subject: New Version Notification for
>> draft-briscoe-iccrg-prague-congestion-control-03.txt
>> Date: Sat, 14 Oct 2023 05:07:58 -0700
>> From: internet-drafts@ietf.org
>> To: Bob Briscoe <ietf@bobbriscoe.net> <ietf@bobbriscoe.net>, Koen De
>> Schepper <koen.de_schepper@nokia.com> <koen.de_schepper@nokia.com>,
>> Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
>> <olivier.tilmans@nokia-bell-labs.com>, Vidhi Goel <vidhi_goel@apple.com>
>> <vidhi_goel@apple.com>
>>
>> A new version of Internet-Draft
>> draft-briscoe-iccrg-prague-congestion-control-03.txt has been successfully
>> submitted by Bob Briscoe and posted to the
>> IETF repository.
>>
>> Name: draft-briscoe-iccrg-prague-congestion-control
>> Revision: 03
>> Title: Prague Congestion Control
>> Date: 2023-10-14
>> Group: Individual Submission
>> Pages: 34
>> URL:
>> https://www.ietf.org/archive/id/draft-briscoe-iccrg-prague-congestion-control-03.txt
>> Status:
>> https://datatracker.ietf.org/doc/draft-briscoe-iccrg-prague-congestion-control/
>> HTML:
>> https://www.ietf.org/archive/id/draft-briscoe-iccrg-prague-congestion-control-03.html
>> HTMLized:
>> https://datatracker.ietf.org/doc/html/draft-briscoe-iccrg-prague-congestion-control
>> Diff:
>> https://author-tools.ietf.org/iddiff?url2=draft-briscoe-iccrg-prague-congestion-control-03
>>
>> Abstract:
>>
>> This specification defines the Prague congestion control scheme,
>> which is derived from DCTCP and adapted for Internet traffic by
>> implementing the Prague L4S requirements. Over paths with L4S
>> support at the bottleneck, it adapts the DCTCP mechanisms to achieve
>> consistently low latency and full throughput. It is defined
>> independently of any particular transport protocol or operating
>> system, but notes are added that highlight issues specific to certain
>> transports and OSs. It is mainly based on experience with the
>> reference Linux implementation of TCP Prague and the Apple
>> implementation over QUIC, but it includes experience from other
>> implementations where available.
>>
>> The implementation does not satisfy all the Prague requirements (yet)
>> and the IETF might decide that certain requirements need to be
>> relaxed as an outcome of the process of trying to satisfy them all.
>> Future plans that have typically only been implemented as proof-of-
>> concept code are outlined in a separate section.
>>
>>
>>
>> The IETF Secretariat
>>
>>
>> _______________________________________________
>> iccrg mailing list
>> iccrg@irtf.org
>> https://www.irtf.org/mailman/listinfo/iccrg
>>
>
> --
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
>