[tcpm] Re: using SACK info for RTTM?

Yuchung Cheng <ycheng@google.com> Wed, 05 June 2024 20:49 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8723CC1840D3 for <tcpm@ietfa.amsl.com>; Wed, 5 Jun 2024 13:49:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.596
X-Spam-Level:
X-Spam-Status: No, score=-17.596 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EyeOCy7mSg_6 for <tcpm@ietfa.amsl.com>; Wed, 5 Jun 2024 13:49:54 -0700 (PDT)
Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6A857C1519BA for <tcpm@ietf.org>; Wed, 5 Jun 2024 13:49:54 -0700 (PDT)
Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-a68b41ef3f6so23407566b.1 for <tcpm@ietf.org>; Wed, 05 Jun 2024 13:49:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717620593; x=1718225393; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=N5oIdTEMuBBao/eD6DHhmmq19i25tR32nainWBP4MSc=; b=wSn7BDWGr/zsNpFDqQ3XawNpE8XlBs/ncma4v3aW42/AQ2xIMKQ18HTJLj/e9yZVjZ EW/3oD1KnjCfWamz3RN/OUvkbcgx+Cwi7DcgwZGha/C/SrIIzvZQlVItcPQPmARms+5e W5RoIPLGYSXIvPM8QQ+vsOYNyNje9nQg208skGIa5cg65++i87gY1tSegtGL3xiHGurM +DU1GGGISOnWd19IL+P3K6jJldMv+daq4aXJu90D+9qnQ/4uBZptavjMb9/vb2wEzSkv Gs/znJ5UqvnDIkcV0cRCuFBei0etugLiXDj7a9RUniorD1spWtxhwWsM4w73El56y5tC ugrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717620593; x=1718225393; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N5oIdTEMuBBao/eD6DHhmmq19i25tR32nainWBP4MSc=; b=MbbcPoALhI+foeisrOVhJ/HHTezAQegEpVdhnWZONms6Cgkg0jafZX1bDRJn7N72ft zVIc5wkzZIbsISBbPb6XAUwZpDSgVCO47Dr5csphqZ+Mp9H1fuPdqdiaokKwwnKwBF89 DMH5AffPraQGG6gm2QD+cWdBdlqYJyybnB3z13FY/EmAihKZQpf/5abgTXcOV+Hp4Dc1 uCoIpS6xVv6imgnSPgVJOtTJbLnkKZabKgaatelMBuZwm93GElr+RNhOiQ60eKhPwHJ8 Gnl+DprQ04Bb/UEeNQ2eOXAfwoPycLbxv8SIrrSZ7FLp4mdpYWHcuczkXaoDkTfRetj7 byeA==
X-Forwarded-Encrypted: i=1; AJvYcCUoowp4rF7veZMbHM1q3cZNnX1j1Cj0A9X07j9qdcQX3yiYtZWfuSHN26RirKFO8lpkYO5sZqOW6oSlYsUo
X-Gm-Message-State: AOJu0Yzp0Tc497HuJ1+a6FJy/r9tFx6rDTiO4A7h7PkZuc77h2aM2Dil UXpOlVfVxJM3wmKkAIbPOpwJkitgx47/Cldp6hghKJLkZ1pB2GJNqUUnZZ3BsrmphIbXn3oK3MD QWNLFlgCpYMFJ0nZ6+0/1dzLBp9/z3HlL016jPf9mfUHDsdCcRA==
X-Google-Smtp-Source: AGHT+IEEtaWA2U+HLxSn+UisTLLVOOR9EqehRLY4VDXvCswqa634lsOzK4K+Shv0B0XHI/WB5wiXzBlCaPuuslWiLY4=
X-Received: by 2002:a17:907:762b:b0:a69:8ef0:f0ad with SMTP id a640c23a62f3a-a699f888637mr207394866b.42.1717620592343; Wed, 05 Jun 2024 13:49:52 -0700 (PDT)
MIME-Version: 1.0
References: <CAAK044QOLRucPZBzeTRBj=m83aXVsFq83zJQgmvYuVVwKTHzFA@mail.gmail.com> <CADVnQy=4Lqsbx_cMgK05ydrYNUbg-tiX8r3ZDmTkZVPTyCZJRg@mail.gmail.com> <CAAK044R5eA622EMPFu2p1hmA_tDHrYdCa5S+r6OSWzCKcsQmSw@mail.gmail.com> <CAK6E8=dcDfawq7z9mDTDQS3PjKyjZibUxvEygqZYvgR6_AHCUA@mail.gmail.com> <CAAK044Rj=BQz__SAqjPUqyFP_Q3Td35LKfxzNRMgNsJX0ES-=Q@mail.gmail.com> <CADVnQykv3JNBWX3xkxdyDVpD+ru9i+aGFygtaL9rtdee0H6_8Q@mail.gmail.com> <CAK6E8=e40CBEj2fcTtYR-aLBxNL5+b2a0D4uDUzJX=-qYwBe=Q@mail.gmail.com> <CAAK044Rgu32KVsq4FzqS4dFjL-ZdCt=aeAy3oF9zbLfmPVG28g@mail.gmail.com> <CADVnQymnGAiBeOTu0O_Bm-H0MpTJTcmWyJb_qRi8Lx8srBXNGA@mail.gmail.com>
In-Reply-To: <CADVnQymnGAiBeOTu0O_Bm-H0MpTJTcmWyJb_qRi8Lx8srBXNGA@mail.gmail.com>
From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 05 Jun 2024 13:49:11 -0700
Message-ID: <CAK6E8=c=+LYqSD38yVGDE4=mYeyCbFPTojthgN_aSh8iV9upFA@mail.gmail.com>
To: Neal Cardwell <ncardwell@google.com>
Content-Type: multipart/alternative; boundary="000000000000446fc2061a2ab53e"
Message-ID-Hash: A7Y4LDU55E4JL2ZZG5GBNKNBMI6U5JCX
X-Message-ID-Hash: A7Y4LDU55E4JL2ZZG5GBNKNBMI6U5JCX
X-MailFrom: ycheng@google.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: "tcpm@ietf.org Extensions" <tcpm@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [tcpm] Re: using SACK info for RTTM?
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/SKQtSDXbRucU61sHyzK_ExG3zns>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>

On Wed, Jun 5, 2024 at 11:57 AM Neal Cardwell <ncardwell@google.com> wrote:

> On Wed, Jun 5, 2024 at 2:21 PM Yoshifumi Nishida <nsd.ietf@gmail.com>
> wrote:
>
>> Hi Yuchung, Neal, thank you so much. It’s interesting.
>>
>> So, it might be a good time to revive
>> https://datatracker.ietf.org/doc/draft-yang-tcpm-ets/ ?
>>
>
> Yes, reviving the work for some kind of standardization of more precise
> TCP timestamps is something we would like to do when time permits.
>
>
>> OTOH, I’m thinking why DSACK is not sufficient here.
>>
>
> DSACK can work sometimes, however it is not as good as timestamp undo, for
> at least a few reasons:
>
> (1) DSACK undo is slower: DSACK undo takes about two round trips longer
> than timestamp undo. With DSACK undo the data sender has one extra round
> trip to make a full set of spurious retransmissions for apparent holes in
> the sequence space, and then a second round trip to receive all the DSACKs
> for the spurious retransmissions. Only then can the data sender undo the
> congestion control response. With timestamp-based undo, usually a data
> sender will receive an ACK very shortly (say, O(1ms)) after its spurious
> retransmit that covers the spuriously retransmitted sequence and has a TS
> ECR from before the retransmit, allowing the undo to happen immediately.
>
> (2) DSACK undo is unreliable: if even a single ACK packet containing a
> DSACK is lost, then the data sender cannot undo the congestion control
> response. For example, if a flow is fully utilizing a 10Gbps * 100ms path,
> and thus (1500B MTU) its cwnd is at least 82,562, then the loss rate in the
> direction of returning ACKs needs to be zero in that round trip, or on
> average less than 1/82,562, or < 0.0012%. Not impossible, but a stringent
> requirement. With timestamp-based undo every incoming ACK has a TS ECR
> value that allows undo, so ACK loss is not a problem.
>

(3) DSACK undo is also more complicated to implement (esp corner cases)


> neal
>
>
>
>> Thanks,
>> —
>> Yoshi
>>
>> On Wed, Jun 5, 2024 at 10:06 AM Yuchung Cheng <ycheng@google.com> wrote:
>>
>>> Also TCP timestamp needs to really move to usec level for today's
>>> data-center networks, which Eric Dumazet finally upstreamed that feature
>>> (to opt-in). anything beyond 10us can't be used in Eifel
>>>
>>> On Wed, Jun 5, 2024 at 6:41 AM Neal Cardwell <ncardwell@google.com>
>>> wrote:
>>>
>>>> IMHO by far the biggest benefit of TCP timestamps is not in RTT
>>>> measurement or PAWS, but in using them for "Eifel" undo (a la RFC 3522, RFC
>>>> 4015): quickly detecting spurious loss detection events due to reordering,
>>>> and quickly undoing the spurious congestion control slow-down response.
>>>> This is important since reordering is increasingly common due to many
>>>> increasingly common network mechanisms: link-layer retransmissions for
>>>> wifi/cellular links, traffic engineering, multipathing and ECMP/WCMP
>>>> load-balancing, protective load balancing (SIGCOMM 2022), protective
>>>> reroute (SIGCOMM 2023), multi-queue NICs, etc. Those factors make the 12
>>>> bytes of TCP option space overwhelmingly worth it.
>>>>
>>>> best regards,
>>>> neal
>>>>
>>>>
>>>> On Wed, Jun 5, 2024 at 3:03 AM Yoshifumi Nishida <nsd.ietf@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Yuchung,
>>>>>
>>>>> Thanks for the explanation.
>>>>> I thought a bit about the trade-off between using 12 bytes options
>>>>> space and giving up measuring RTTs for retransmitted packets.
>>>>> But, I am included to prefer measuring RTTs for now.
>>>>>
>>>>> --
>>>>> Yoshi
>>>>>
>>>>> On Mon, Jun 3, 2024 at 1:57 PM Yuchung Cheng <ycheng@google.com>
>>>>> wrote:
>>>>>
>>>>>> hi Yoshifumi,
>>>>>>
>>>>>> Linux only uses TS-opts if needed to disambiguate on RTT samples
>>>>>> covering sequences that have been retransmitted. This applies to SACK or
>>>>>> non-SACK. In order words, if an S/ACK covers a sequence range that has
>>>>>> never been retransmitted, Linux does not use timestamp options.
>>>>>>
>>>>>> On Mon, Jun 3, 2024 at 1:29 PM Yoshifumi Nishida <nsd.ietf@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Neal, thank you so much for the comments.
>>>>>>>
>>>>>>> The linux algorithm you've described makes sense to me and it seems
>>>>>>> the scheme doesn't require timestamp options.
>>>>>>> However, as far as I've read linux code, it seems that linux still
>>>>>>> uses timestamp options for RTT measurement to some extent.
>>>>>>> I'm curious why linux is mixing two schemes for RTTM.
>>>>>>> --
>>>>>>> Yoshi
>>>>>>>
>>>>>>> On Mon, Jun 3, 2024 at 8:57 AM Neal Cardwell <ncardwell@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 3, 2024 at 11:02 AM Yoshifumi Nishida <
>>>>>>>> nsd.ietf@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi folks,
>>>>>>>>>
>>>>>>>>> While I was checking RFC7323, I found the following sentence.
>>>>>>>>>
>>>>>>>>> RTTM update processing explicitly excludes segments not updating
>>>>>>>>> SND.UNA.  The original text could be interpreted to allow taking
>>>>>>>>> RTT samples when SACK acknowledges some new, non-continuous
>>>>>>>>> data.
>>>>>>>>>
>>>>>>>>> I am a bit curious about the rationale of this sentence.
>>>>>>>>> It seems to me that we cannot measure RTT when we have a gap in packet sequence with this rule.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes, that rule forbids using RFC7323 timestamps for calculating RTT
>>>>>>>> samples for SACKed sequence ranges.
>>>>>>>>
>>>>>>>> The rationale: AFAIK this rule is a necessary consequence of the
>>>>>>>> conditions under which TS.Recent is updated.
>>>>>>>>
>>>>>>>> The rules for updating TS.Recent are in sec 4.3, "Which Timestamp
>>>>>>>> to Echo":
>>>>>>>>   https://datatracker.ietf.org/doc/html/rfc7323#section-4.3
>>>>>>>> Rule (2) in sec 4.3 says:
>>>>>>>>   If:
>>>>>>>>     SEG.TSval >= TS.Recent and SEG.SEQ <= Last.ACK.sent
>>>>>>>>   then SEG.TSval is copied to TS.Recent; otherwise, it is ignored.
>>>>>>>>
>>>>>>>> Since out-of-order sequence ranges that are SACKed will fail the
>>>>>>>> SEG.SEQ <= Last.ACK.sent check, SACKed sequence ranges will not update
>>>>>>>> TS.Recent. So using TS.Recent to calculate an RTT sample for a SACKed
>>>>>>>> sequence range could, in general, give a vastly overestimated RTT sample.
>>>>>>>> So that's why it's forbidden by the RFC.
>>>>>>>>
>>>>>>>> However, in practice usually this does not need to be a big deal.
>>>>>>>> For example, Linux TCP still obtains an RTT sample for every
>>>>>>>> non-retransmitted SACKed sequence range, by:
>>>>>>>>
>>>>>>>> (a) recording the transmit time of every sequence range
>>>>>>>> (b) recording whether that sequence range was retransmitted, and
>>>>>>>> then
>>>>>>>> (c) using those two pieces of information when that sequence range
>>>>>>>> is cumulatively or selectively ACKed, to calculate an RTT sample
>>>>>>>> (rtt_sample = now - transmit_timestamp) if the sequence range was never
>>>>>>>> retransmitted.
>>>>>>>>
>>>>>>>> So, in Linux TCP, SACKed sequence ranges fail to generate an RTT
>>>>>>>> sample only when they were previously retransmitted.
>>>>>>>>
>>>>>>>> best regards,
>>>>>>>> neal
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> --
>>>>>>>>> Yoshi
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> tcpm mailing list -- tcpm@ietf.org
>>>>>>>>> To unsubscribe send an email to tcpm-leave@ietf.org
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> tcpm mailing list -- tcpm@ietf.org
>>>>>>> To unsubscribe send an email to tcpm-leave@ietf.org
>>>>>>>
>>>>>>