[tcpm] Re: using SACK info for RTTM?
Yuchung Cheng <ycheng@google.com> Wed, 05 June 2024 20:49 UTC
Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8723CC1840D3 for <tcpm@ietfa.amsl.com>; Wed, 5 Jun 2024 13:49:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.596
X-Spam-Level:
X-Spam-Status: No, score=-17.596 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EyeOCy7mSg_6 for <tcpm@ietfa.amsl.com>; Wed, 5 Jun 2024 13:49:54 -0700 (PDT)
Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6A857C1519BA for <tcpm@ietf.org>; Wed, 5 Jun 2024 13:49:54 -0700 (PDT)
Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-a68b41ef3f6so23407566b.1 for <tcpm@ietf.org>; Wed, 05 Jun 2024 13:49:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717620593; x=1718225393; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=N5oIdTEMuBBao/eD6DHhmmq19i25tR32nainWBP4MSc=; b=wSn7BDWGr/zsNpFDqQ3XawNpE8XlBs/ncma4v3aW42/AQ2xIMKQ18HTJLj/e9yZVjZ EW/3oD1KnjCfWamz3RN/OUvkbcgx+Cwi7DcgwZGha/C/SrIIzvZQlVItcPQPmARms+5e W5RoIPLGYSXIvPM8QQ+vsOYNyNje9nQg208skGIa5cg65++i87gY1tSegtGL3xiHGurM +DU1GGGISOnWd19IL+P3K6jJldMv+daq4aXJu90D+9qnQ/4uBZptavjMb9/vb2wEzSkv Gs/znJ5UqvnDIkcV0cRCuFBei0etugLiXDj7a9RUniorD1spWtxhwWsM4w73El56y5tC ugrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717620593; x=1718225393; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N5oIdTEMuBBao/eD6DHhmmq19i25tR32nainWBP4MSc=; b=MbbcPoALhI+foeisrOVhJ/HHTezAQegEpVdhnWZONms6Cgkg0jafZX1bDRJn7N72ft zVIc5wkzZIbsISBbPb6XAUwZpDSgVCO47Dr5csphqZ+Mp9H1fuPdqdiaokKwwnKwBF89 DMH5AffPraQGG6gm2QD+cWdBdlqYJyybnB3z13FY/EmAihKZQpf/5abgTXcOV+Hp4Dc1 uCoIpS6xVv6imgnSPgVJOtTJbLnkKZabKgaatelMBuZwm93GElr+RNhOiQ60eKhPwHJ8 Gnl+DprQ04Bb/UEeNQ2eOXAfwoPycLbxv8SIrrSZ7FLp4mdpYWHcuczkXaoDkTfRetj7 byeA==
X-Forwarded-Encrypted: i=1; AJvYcCUoowp4rF7veZMbHM1q3cZNnX1j1Cj0A9X07j9qdcQX3yiYtZWfuSHN26RirKFO8lpkYO5sZqOW6oSlYsUo
X-Gm-Message-State: AOJu0Yzp0Tc497HuJ1+a6FJy/r9tFx6rDTiO4A7h7PkZuc77h2aM2Dil UXpOlVfVxJM3wmKkAIbPOpwJkitgx47/Cldp6hghKJLkZ1pB2GJNqUUnZZ3BsrmphIbXn3oK3MD QWNLFlgCpYMFJ0nZ6+0/1dzLBp9/z3HlL016jPf9mfUHDsdCcRA==
X-Google-Smtp-Source: AGHT+IEEtaWA2U+HLxSn+UisTLLVOOR9EqehRLY4VDXvCswqa634lsOzK4K+Shv0B0XHI/WB5wiXzBlCaPuuslWiLY4=
X-Received: by 2002:a17:907:762b:b0:a69:8ef0:f0ad with SMTP id a640c23a62f3a-a699f888637mr207394866b.42.1717620592343; Wed, 05 Jun 2024 13:49:52 -0700 (PDT)
MIME-Version: 1.0
References: <CAAK044QOLRucPZBzeTRBj=m83aXVsFq83zJQgmvYuVVwKTHzFA@mail.gmail.com> <CADVnQy=4Lqsbx_cMgK05ydrYNUbg-tiX8r3ZDmTkZVPTyCZJRg@mail.gmail.com> <CAAK044R5eA622EMPFu2p1hmA_tDHrYdCa5S+r6OSWzCKcsQmSw@mail.gmail.com> <CAK6E8=dcDfawq7z9mDTDQS3PjKyjZibUxvEygqZYvgR6_AHCUA@mail.gmail.com> <CAAK044Rj=BQz__SAqjPUqyFP_Q3Td35LKfxzNRMgNsJX0ES-=Q@mail.gmail.com> <CADVnQykv3JNBWX3xkxdyDVpD+ru9i+aGFygtaL9rtdee0H6_8Q@mail.gmail.com> <CAK6E8=e40CBEj2fcTtYR-aLBxNL5+b2a0D4uDUzJX=-qYwBe=Q@mail.gmail.com> <CAAK044Rgu32KVsq4FzqS4dFjL-ZdCt=aeAy3oF9zbLfmPVG28g@mail.gmail.com> <CADVnQymnGAiBeOTu0O_Bm-H0MpTJTcmWyJb_qRi8Lx8srBXNGA@mail.gmail.com>
In-Reply-To: <CADVnQymnGAiBeOTu0O_Bm-H0MpTJTcmWyJb_qRi8Lx8srBXNGA@mail.gmail.com>
From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 05 Jun 2024 13:49:11 -0700
Message-ID: <CAK6E8=c=+LYqSD38yVGDE4=mYeyCbFPTojthgN_aSh8iV9upFA@mail.gmail.com>
To: Neal Cardwell <ncardwell@google.com>
Content-Type: multipart/alternative; boundary="000000000000446fc2061a2ab53e"
Message-ID-Hash: A7Y4LDU55E4JL2ZZG5GBNKNBMI6U5JCX
X-Message-ID-Hash: A7Y4LDU55E4JL2ZZG5GBNKNBMI6U5JCX
X-MailFrom: ycheng@google.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: "tcpm@ietf.org Extensions" <tcpm@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [tcpm] Re: using SACK info for RTTM?
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/SKQtSDXbRucU61sHyzK_ExG3zns>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>
On Wed, Jun 5, 2024 at 11:57 AM Neal Cardwell <ncardwell@google.com> wrote: > On Wed, Jun 5, 2024 at 2:21 PM Yoshifumi Nishida <nsd.ietf@gmail.com> > wrote: > >> Hi Yuchung, Neal, thank you so much. It’s interesting. >> >> So, it might be a good time to revive >> https://datatracker.ietf.org/doc/draft-yang-tcpm-ets/ ? >> > > Yes, reviving the work for some kind of standardization of more precise > TCP timestamps is something we would like to do when time permits. > > >> OTOH, I’m thinking why DSACK is not sufficient here. >> > > DSACK can work sometimes, however it is not as good as timestamp undo, for > at least a few reasons: > > (1) DSACK undo is slower: DSACK undo takes about two round trips longer > than timestamp undo. With DSACK undo the data sender has one extra round > trip to make a full set of spurious retransmissions for apparent holes in > the sequence space, and then a second round trip to receive all the DSACKs > for the spurious retransmissions. Only then can the data sender undo the > congestion control response. With timestamp-based undo, usually a data > sender will receive an ACK very shortly (say, O(1ms)) after its spurious > retransmit that covers the spuriously retransmitted sequence and has a TS > ECR from before the retransmit, allowing the undo to happen immediately. > > (2) DSACK undo is unreliable: if even a single ACK packet containing a > DSACK is lost, then the data sender cannot undo the congestion control > response. For example, if a flow is fully utilizing a 10Gbps * 100ms path, > and thus (1500B MTU) its cwnd is at least 82,562, then the loss rate in the > direction of returning ACKs needs to be zero in that round trip, or on > average less than 1/82,562, or < 0.0012%. Not impossible, but a stringent > requirement. With timestamp-based undo every incoming ACK has a TS ECR > value that allows undo, so ACK loss is not a problem. > (3) DSACK undo is also more complicated to implement (esp corner cases) > neal > > > >> Thanks, >> — >> Yoshi >> >> On Wed, Jun 5, 2024 at 10:06 AM Yuchung Cheng <ycheng@google.com> wrote: >> >>> Also TCP timestamp needs to really move to usec level for today's >>> data-center networks, which Eric Dumazet finally upstreamed that feature >>> (to opt-in). anything beyond 10us can't be used in Eifel >>> >>> On Wed, Jun 5, 2024 at 6:41 AM Neal Cardwell <ncardwell@google.com> >>> wrote: >>> >>>> IMHO by far the biggest benefit of TCP timestamps is not in RTT >>>> measurement or PAWS, but in using them for "Eifel" undo (a la RFC 3522, RFC >>>> 4015): quickly detecting spurious loss detection events due to reordering, >>>> and quickly undoing the spurious congestion control slow-down response. >>>> This is important since reordering is increasingly common due to many >>>> increasingly common network mechanisms: link-layer retransmissions for >>>> wifi/cellular links, traffic engineering, multipathing and ECMP/WCMP >>>> load-balancing, protective load balancing (SIGCOMM 2022), protective >>>> reroute (SIGCOMM 2023), multi-queue NICs, etc. Those factors make the 12 >>>> bytes of TCP option space overwhelmingly worth it. >>>> >>>> best regards, >>>> neal >>>> >>>> >>>> On Wed, Jun 5, 2024 at 3:03 AM Yoshifumi Nishida <nsd.ietf@gmail.com> >>>> wrote: >>>> >>>>> Hi Yuchung, >>>>> >>>>> Thanks for the explanation. >>>>> I thought a bit about the trade-off between using 12 bytes options >>>>> space and giving up measuring RTTs for retransmitted packets. >>>>> But, I am included to prefer measuring RTTs for now. >>>>> >>>>> -- >>>>> Yoshi >>>>> >>>>> On Mon, Jun 3, 2024 at 1:57 PM Yuchung Cheng <ycheng@google.com> >>>>> wrote: >>>>> >>>>>> hi Yoshifumi, >>>>>> >>>>>> Linux only uses TS-opts if needed to disambiguate on RTT samples >>>>>> covering sequences that have been retransmitted. This applies to SACK or >>>>>> non-SACK. In order words, if an S/ACK covers a sequence range that has >>>>>> never been retransmitted, Linux does not use timestamp options. >>>>>> >>>>>> On Mon, Jun 3, 2024 at 1:29 PM Yoshifumi Nishida <nsd.ietf@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Neal, thank you so much for the comments. >>>>>>> >>>>>>> The linux algorithm you've described makes sense to me and it seems >>>>>>> the scheme doesn't require timestamp options. >>>>>>> However, as far as I've read linux code, it seems that linux still >>>>>>> uses timestamp options for RTT measurement to some extent. >>>>>>> I'm curious why linux is mixing two schemes for RTTM. >>>>>>> -- >>>>>>> Yoshi >>>>>>> >>>>>>> On Mon, Jun 3, 2024 at 8:57 AM Neal Cardwell <ncardwell@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jun 3, 2024 at 11:02 AM Yoshifumi Nishida < >>>>>>>> nsd.ietf@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi folks, >>>>>>>>> >>>>>>>>> While I was checking RFC7323, I found the following sentence. >>>>>>>>> >>>>>>>>> RTTM update processing explicitly excludes segments not updating >>>>>>>>> SND.UNA. The original text could be interpreted to allow taking >>>>>>>>> RTT samples when SACK acknowledges some new, non-continuous >>>>>>>>> data. >>>>>>>>> >>>>>>>>> I am a bit curious about the rationale of this sentence. >>>>>>>>> It seems to me that we cannot measure RTT when we have a gap in packet sequence with this rule. >>>>>>>>> >>>>>>>>> >>>>>>>> Yes, that rule forbids using RFC7323 timestamps for calculating RTT >>>>>>>> samples for SACKed sequence ranges. >>>>>>>> >>>>>>>> The rationale: AFAIK this rule is a necessary consequence of the >>>>>>>> conditions under which TS.Recent is updated. >>>>>>>> >>>>>>>> The rules for updating TS.Recent are in sec 4.3, "Which Timestamp >>>>>>>> to Echo": >>>>>>>> https://datatracker.ietf.org/doc/html/rfc7323#section-4.3 >>>>>>>> Rule (2) in sec 4.3 says: >>>>>>>> If: >>>>>>>> SEG.TSval >= TS.Recent and SEG.SEQ <= Last.ACK.sent >>>>>>>> then SEG.TSval is copied to TS.Recent; otherwise, it is ignored. >>>>>>>> >>>>>>>> Since out-of-order sequence ranges that are SACKed will fail the >>>>>>>> SEG.SEQ <= Last.ACK.sent check, SACKed sequence ranges will not update >>>>>>>> TS.Recent. So using TS.Recent to calculate an RTT sample for a SACKed >>>>>>>> sequence range could, in general, give a vastly overestimated RTT sample. >>>>>>>> So that's why it's forbidden by the RFC. >>>>>>>> >>>>>>>> However, in practice usually this does not need to be a big deal. >>>>>>>> For example, Linux TCP still obtains an RTT sample for every >>>>>>>> non-retransmitted SACKed sequence range, by: >>>>>>>> >>>>>>>> (a) recording the transmit time of every sequence range >>>>>>>> (b) recording whether that sequence range was retransmitted, and >>>>>>>> then >>>>>>>> (c) using those two pieces of information when that sequence range >>>>>>>> is cumulatively or selectively ACKed, to calculate an RTT sample >>>>>>>> (rtt_sample = now - transmit_timestamp) if the sequence range was never >>>>>>>> retransmitted. >>>>>>>> >>>>>>>> So, in Linux TCP, SACKed sequence ranges fail to generate an RTT >>>>>>>> sample only when they were previously retransmitted. >>>>>>>> >>>>>>>> best regards, >>>>>>>> neal >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -- >>>>>>>>> Yoshi >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> tcpm mailing list -- tcpm@ietf.org >>>>>>>>> To unsubscribe send an email to tcpm-leave@ietf.org >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> tcpm mailing list -- tcpm@ietf.org >>>>>>> To unsubscribe send an email to tcpm-leave@ietf.org >>>>>>> >>>>>>
- [tcpm] using SACK info for RTTM? Yoshifumi Nishida
- [tcpm] Re: using SACK info for RTTM? Neal Cardwell
- [tcpm] Re: using SACK info for RTTM? Yoshifumi Nishida
- [tcpm] Re: using SACK info for RTTM? Yuchung Cheng
- [tcpm] Re: using SACK info for RTTM? Yoshifumi Nishida
- [tcpm] Re: using SACK info for RTTM? Neal Cardwell
- [tcpm] Re: using SACK info for RTTM? Yuchung Cheng
- [tcpm] Re: using SACK info for RTTM? Yoshifumi Nishida
- [tcpm] Re: using SACK info for RTTM? Neal Cardwell
- [tcpm] Re: using SACK info for RTTM? Yuchung Cheng
- [tcpm] Re: using SACK info for RTTM? rs.ietf