[tsvwg] some feedback on draft-ietf-tsvwg-careful-resume-12
Neal Cardwell <ncardwell@google.com> Sun, 19 January 2025 19:51 UTC
Return-Path: <ncardwell@google.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7CFADC15155C for <tsvwg@ietfa.amsl.com>; Sun, 19 Jan 2025 11:51:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.61
X-Spam-Level:
X-Spam-Status: No, score=-17.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ob7zXXnT_V5U for <tsvwg@ietfa.amsl.com>; Sun, 19 Jan 2025 11:51:23 -0800 (PST)
Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 95BC9C15154E for <tsvwg@ietf.org>; Sun, 19 Jan 2025 11:51:23 -0800 (PST)
Received: by mail-qt1-x82a.google.com with SMTP id d75a77b69052e-467abce2ef9so277301cf.0 for <tsvwg@ietf.org>; Sun, 19 Jan 2025 11:51:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737316282; x=1737921082; darn=ietf.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=a5SLH7RtG3Y/mL3/IaYavW108uDRba7Zj/1wRU3ZVQE=; b=J+cPocIsqlxs34n/Es9gDLf/13K+LuoZTpYipWKzuPGp8ZnrObaBFL1a74KAtDeYmI aORK6TmAlERtrP4uFJhOBsuw25Icnmre5dLyaNS50Kp9GTP7mnUe8dCPmskbG9NG2F9j 215dm8J3rZhmCPhK1jcx0LQnWoV8wO+eS8CHlONUhDO27g3XEyQy+dNkOqpadUEQ/vRZ 8MflqwenPF2DyvWec0AuX8bzqpFKQh6HI1fRC+zlYHEH+nhzZylupZKNB33Ch7IKA7+3 XlgJhzbPKMHP+y/jXa2oHzdv7jcpPAmNfVGCUHgKJ5YQrpDrc3Hb597yRrkL9efTqP5u qPKA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737316282; x=1737921082; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=a5SLH7RtG3Y/mL3/IaYavW108uDRba7Zj/1wRU3ZVQE=; b=ncHq9ORjbYbPTAV2/XiOzpjDBGGLBignG3jFrC0YiIhlSLrQOj47ECXzTK89YNJCX4 kZigO2GgrGmo+EUlZqhw5loVNM8ft4Kkl3qmy+M7R1IDqqonu/d3B+iBQ1JxzWz++oz/ ZTaXP4Pt69iU2DYxKVncDHurms70/UGfGqMldzcSPjVjT1gX0bW61YM7wzqWXqR1fRK3 /ShvacxmJLOdv8ZztEO99avjWonmbXVFJMpdTJNAlCXWmcLA1PgbxpGcQTPcfYm5dSu5 eu3X84qaKVbgW9VFtknC5VBgwECcDNSBv1zZlLuN+CStNXARZf/fWlkizp2iy5hiPpWQ 6ILw==
X-Forwarded-Encrypted: i=1; AJvYcCV5Q2xJvUbgRlVekUAsvMBikb5O9YwWRHl8LpU9CCpD8aF3AT6XKhSIWQ04Do5fAV/zVba0rw==@ietf.org
X-Gm-Message-State: AOJu0Yxy2NkP/k5WA82k7P+xOrcH8CstzPYzrTxN5zZQoqbn1sEZLLGe lzHL1RE5ZgwXEQjLzYQs4Erd5j4IvIprhMJenG57006D7doolL08mqktl3EyO9/SOkvPDF3BOhq eEZ+ryOynCbLpMhCfewezscrTZgNNbYXxzZbF4otwgwuFvWcv1K0vWeU=
X-Gm-Gg: ASbGncud97xISakDaFrPxo4LxPWkCsxBZcsRoY+AG1fUzYXaTE0e3Xs66/KgEhzevQT c/SeWKlDkayXHE1Tr5FjVPYk55mJv7HnCEvs1kfQu0IslfB87VcE=
X-Google-Smtp-Source: AGHT+IEjMPrU1ESBreZ+tvKdra7BFADvDydqZ8//77Gf6ck9xT2AzPX0Zy92+YcgXFeXVWDsVBI7kRt7lcVQdIiT3VM=
X-Received: by 2002:a05:622a:294:b0:466:9af1:5a35 with SMTP id d75a77b69052e-46e1faf84bbmr4240121cf.10.1737316282195; Sun, 19 Jan 2025 11:51:22 -0800 (PST)
MIME-Version: 1.0
From: Neal Cardwell <ncardwell@google.com>
Date: Sun, 19 Jan 2025 14:51:06 -0500
X-Gm-Features: AbW1kvYmdw2hEq2OzKuMPu93ergAsD6xGHH3KYL1eyOepHQWt-JzqEC7r_tD3so
Message-ID: <CADVnQym1cez-6eqLR+dAUpBd1rFBDnU3fQagiLOb27xjJFmbbw@mail.gmail.com>
To: draft-ietf-tsvwg-careful-resume@ietf.org, tsvwg IETF list <tsvwg@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000dd4c8c062c14771b"
Message-ID-Hash: 2G5CS2DNAHEWUTVYG7PEF22PUVKD5ES2
X-Message-ID-Hash: 2G5CS2DNAHEWUTVYG7PEF22PUVKD5ES2
X-MailFrom: ncardwell@google.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tsvwg.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [tsvwg] some feedback on draft-ietf-tsvwg-careful-resume-12
List-Id: Transport Area Working Group <tsvwg.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/42DCwTKJZaDWZ6yp-QTWSzr_nYE>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Owner: <mailto:tsvwg-owner@ietf.org>
List-Post: <mailto:tsvwg@ietf.org>
List-Subscribe: <mailto:tsvwg-join@ietf.org>
List-Unsubscribe: <mailto:tsvwg-leave@ietf.org>
Hi, Here's some feedback on a partial reading of draft-ietf-tsvwg-careful-resume-12, "Convergence of Congestion Control from Retained State". Sorry for the delayed feedback. A long 3-day weekend in the US gave me some extra time. :-) My main concern was the handling of application-limited behavior in the "Unvalidated Phase"; AFAICT this handling is buggy (see below for details)... Notes: --- Re: "It introduces an alternative mechanism to select initial CC parameters, that seek", this seems to have a typo. Perhaps consider: "It introduces an alternative mechanism to select initial CC parameters that seeks" --- Re: "Successful validation can further increase the CWND, resulting in a CWND after validating that the used rate did not result in congestion." What does "resulting in a CWND" mean? This reads as if there is a typo that accidentally omitted some words? Consider perhaps: "Successful validation can allow further increases of the CWND; after validating that the used rate did not result in congestion, the sender can quickly increase CWND to saved_cwnd." --- Re: "Suppose, for example a connection using the classic TCP congestion control. In "slow start" mode, a Reno congestion control would normally converge on a "slow start threshold" set to half the volume of data in flight" The term "classic" is sometimes used to refer to both Reno and CUBIC, and "converge" in this context IMHO is a little confusing/misleading. Consider perhaps: "Suppose, for example a connection using Reno TCP congestion control. When exiting "slow start" mode due to loss, Reno would normally set CWND to a "slow start threshold" set to half the volume of data in flight" --- Figure 1 includes a state labelled "Observing (Normal)" and a second state labelled "Normal". It was unclear to me, upon first reading the figure, whether these were supposed to represent the same state. --- In, "3.2. Reconnaissance Phase", re: "Reconnaissance Phase (Confirming the RTT): During this phase, a sender MUST record the minimum RTT for the current connection as the current_rtt. ... When a path is not confirmed, Careful Resume is not used and the sender enters the Normal Phase with a CWND that was not modified by Careful Resume." That text makes it sound like "Confirming the RTT" consists merely of recording the minimum RTT for the current connection as the current_rtt. But "4.2.1. Confirming the Path" describes a more sensible approach of comparing current_rtt with saved_RTT. Instead of "Reconnaissance Phase (Confirming the RTT): During this phase, a sender MUST record the minimum RTT for the current connection as the current_rtt." I would suggest something more like: "Reconnaissance Phase (Recording the current_rtt): During this phase, a sender MUST record the minimum RTT for the current connection as the current_rtt." --- In, "3.2. Reconnaissance Phase", re: "When a sender has confirmed the RTT", I would suggest a forward reference to what it means to "confirm the RTT", like: "When a sender has confirmed the path (see section 4.2.1. 'Confirming the Path')". --- In, "3.2. Reconnaissance Phase", re: "When a sender has confirmed the RTT and also has received an acknowledgement for the initial data without reported congestion, it MAY then enter the Unvalidated Phase. This transition occurs when more data is sent than normally permitted by the congestion control algorithm." This seems contradictory, AFAICT. The first sentence seems to say you "MAY" enter the "Unvalidated" phase without sending "more data is sent than normally permitted by the congestion control algorithm": "When a sender has confirmed the RTT and also has received an acknowledgement for the initial data without reported congestion, it MAY then enter the Unvalidated Phase." The second sentence seems to contradict that: "This transition occurs when more data is sent than normally permitted by the congestion control algorithm.". --- In section "3.3. Unvalidated Phase", Re: "If the current_rtt is greater than or equal to (saved_rtt / 2) or the current_rtt is less than or equal to (saved_rtt x 10) (see Section 4.2.1), the sender MUST enter the Normal Phase"... it seems like there are typos there that invert the sense of the semantics. If current_rtt is exactly equal to saved_rtt, it will fail those checks and force an entry of "Normal Phase". AFAICT the intended text was something like: "If the current_rtt is *less* than or equal to (saved_rtt / 2) or the current_rtt is *greater than* (saved_rtt x 10) (see Section 4.2.1), the sender MUST enter the Normal Phase". --- In section "3.3. Unvalidated Phase", Re: "The calculation of a sending rate from a saved_cwnd is directly impacted by the RTT, therefore a significant change in the RTT is a strong indication that the previously observed CC parameters are not be valid for the current path." There's a typo in the phrase: "are not be valid". More importantly: even if the CC is something like BBR, with a separate saved max_bw that can prevent changes in RTT from distorting the sending rate, I would argue that this kind of significant change in the RTT is a signal that likely the current path conditions (often the bottleneck radio link conditions or bottleneck cross-traffic) are very different from the conditions at the time the saved_cwnd or other parameters were saved, and so the flow should not reuse the saved CC parameters. Consider perhaps something like: "This kind of significant change in the RTT is a signal that likely the current path conditions (e.g., the route, or bottleneck radio link conditions, or bottleneck cross-traffic) are significantly different from the conditions at the time the CC parameters were saved, and so the flow should not reuse the saved CC parameters. Furthermore, for window-based CCs, the calculation of a sending rate from a saved_cwnd is directly impacted by the RTT, so large changes in RTT may produce unsafe changes in sending rates." --- In section "3.3. Unvalidated Phase", Re: "Unvalidated Phase (Completed sending all unvalidated packets): The sender enters the Validating Phase when the flight_size equals the CWND." This seems to specify unsafe behavior for connections that are application-limited in the "Unvalidated Phase". Suppose a connection is application-limited in the "Unvalidated Phase". It may increase CWND to a large jump_cwnd of, say 100,000 packets. But because the flow is application-limited, its flight_size may only reach 99,999 packets. That's a very large flight of data, far beyond the defaults for any CC I know of, so ideally after sending the 99,999 packets, the flow should enter the "Validating Phase", so that it can check "whether all packets sent in the Unvalidated Phase were received without inducing congestion", and enter Safe Retreat if needed. Instead, because flight_size is always less than CWND, the algorithm has the flow stay in the "Unvalidated Phase" forever, increasing PipeSize beyond any quantity that corresponds to anything happening in the system, and never validating the impact of its massive cwnd. :-) --- In section "3.4. Validating Phase", Re: "The Validating Phase checks whether all packets sent in the Unvalidated Phase were received without inducing congestion." The term "all" seems like a strong claim. Unless there is reordering, the flow will only have access to ACKs from the last round trip of the Unvalidated Phase, not "all packets sent in the Unvalidated Phase". --- In section "3.4. Validating Phase", Re: "When using BBR Section 4.5, validation is performed using the regular BBR rules for exiting Startup. The measured delivery rate will reflect the actual capacity of the network." The measured delivery rate will typically not reflect the actual capacity of the network if the flow is application-limited. And the measured delivery rate reflects the flow's share of the capacity of the bottleneck link, not the full capacity of the network. And "capacity" is ambiguous, since it can refer to volumetric capacity (number of packets that will fit in the path without loss) or bandwidth capacity (the rate at which the path can forward packets). So I'd suggest rephrasing this as something like: "When using BBR (see Section 4.5), validation is performed using the regular BBR rules for exiting Startup. Upon exiting Startup, the connection estimates that the measured delivery rate will reflect the flow's share of the actual bottleneck bandwidth." --- In section "A.2. Example with No Loss, Rate-Limited", Re: "the sender will still not have fully-used the CWND. It then enters the Validating Phase". This passage seems to contradict section "3.3. Unvalidated Phase", which says: "The sender enters the Validating Phase when the flight_size equals the CWND." --- best regards, neal
- [tsvwg] some feedback on draft-ietf-tsvwg-careful… Neal Cardwell