Re: [quicwg/base-drafts] QUIC PTO is too conservative, causing a measurable regression in tail latency (#3526)

Martin Thomson <notifications@github.com> Tue, 17 March 2020 00:28 UTC

Return-Path: <noreply@github.com>
X-Original-To: quic-issues@ietfa.amsl.com
Delivered-To: quic-issues@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8D5143A144F for <quic-issues@ietfa.amsl.com>; Mon, 16 Mar 2020 17:28:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.099
X-Spam-Level:
X-Spam-Status: No, score=-3.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=github.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xa6siwZyelSS for <quic-issues@ietfa.amsl.com>; Mon, 16 Mar 2020 17:28:37 -0700 (PDT)
Received: from out-22.smtp.github.com (out-22.smtp.github.com [192.30.252.205]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B219B3A1452 for <quic-issues@ietf.org>; Mon, 16 Mar 2020 17:28:36 -0700 (PDT)
Date: Mon, 16 Mar 2020 17:28:35 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com; s=pf2014; t=1584404915; bh=24JFPOQ3zcMIlQHvLxGcUOyRpcGcwNAT1KtHCpuFWZU=; h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID: List-Archive:List-Post:List-Unsubscribe:From; b=lVcay2ciVb9sdaTQcysv+VRBRZW/r+PphgH7r+VDTl+eo8vffS8nkbaLAoMlwPd4t EtACf0LEQtvzKkPU4HTlA1aT3AprazQbmOrcJm7tx5ikusvKFmaSU5wOatRpb4vAlt 2/4Ff8SvCTdUBMaHBI8/D9YWn/GIrhTWaNBj92i4=
From: Martin Thomson <notifications@github.com>
Reply-To: quicwg/base-drafts <reply+AFTOJK5AZTUQA736ZVGTQ3F4PP5LHEVBNHHCFNKKLA@reply.github.com>
To: quicwg/base-drafts <base-drafts@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <quicwg/base-drafts/issues/3526/599818004@github.com>
In-Reply-To: <quicwg/base-drafts/issues/3526@github.com>
References: <quicwg/base-drafts/issues/3526@github.com>
Subject: Re: [quicwg/base-drafts] QUIC PTO is too conservative, causing a measurable regression in tail latency (#3526)
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="--==_mimepart_5e7019b3b2a1b_7b3c3fbd7b8cd96c65257"; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Precedence: list
X-GitHub-Sender: martinthomson
X-GitHub-Recipient: quic-issues
X-GitHub-Reason: subscribed
X-Auto-Response-Suppress: All
X-GitHub-Recipient-Address: quic-issues@ietf.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic-issues/TxO83a7_UVwzTf9cMq7UXfxS6hw>
X-BeenThere: quic-issues@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Notification list for GitHub issues related to the QUIC WG <quic-issues.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic-issues>, <mailto:quic-issues-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic-issues/>
List-Post: <mailto:quic-issues@ietf.org>
List-Help: <mailto:quic-issues-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic-issues>, <mailto:quic-issues-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Mar 2020 00:28:49 -0000

[RFC 6398](https://tools.ietf.org/html/rfc6398#section-5) is entitled "IP Router Alert Considerations and Usage".  Do you refer to (5.1) in [RFC 6298](https://tools.ietf.org/html/rfc6298#section-5)?

> (5.1) Every time a packet containing data is sent (including a retransmission), if the timer is not running, start it running so that it will expire after RTO seconds (for the current value of RTO).

That reads to me like a left edge condition, based on "timer is not running".  Whereas [we say](https://quicwg.org/base-drafts/draft-ietf-quic-recovery.html#name-computing-pto):

> When an ack-eliciting packet is transmitted, the sender schedules a timer for the PTO period as follows:
> `PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay`

I can see one 1.5sRTT factor in [the TLP draft](https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01#section-2.1), but that is inflated quite a bit, so I'm not sure exactly what you think motivates this 1.5 value.

I do see the reason for your caution here.  You want to probe based on the first unacknowledged packet, but leave a buffer relative to the last unacknowledged packet (assuming all elicit acknowledgment, of course).  This is because all outstanding packets will all cause an acknowledgment if they arrive.  So this seems sensible, if a little more aggressive than we have been.

I don't know what the usual relationship between RTT and RTTvar is - I'm guessing that like everything it varies greatly.  I'm inferring that the buffer you want is as much based on allowing for variance as much as anything else.  The PTO is chosen to correspond roughly to 4-sigma, which is 99.994% and probably too conservative.  That conservatism is probably what dominates your numbers.  That also suggests that in the cases that drive your numbers, 4rttvar is greater than 0.5rtt.  I'm guess that it is a LOT greater.

Possible alternatives would be to cap rttvar to a proportion of the RTT estimate, maybe just for th purposes of calculating the PTO.  12.5% would get you almost exactly what you describe without changing the PTO calculation.  It seems to me that PTO on 2-sigma and capping rttvar to RTT/4 is probably better.  That's 2-sigma/95% confidence without the cap, and exactly your numbers if the cap has to be used.  Either way, that recognizes that one probe timeout is primarily dependent on the estimated RTT and is less susceptible to variation in that estimate.

I would like to request that you open a separate issue for the packet number skipping thing.  That seems separable.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/quicwg/base-drafts/issues/3526#issuecomment-599818004