Re: [quicwg/base-drafts] QUIC PTO is too conservative, causing a measurable regression in tail latency (#3526)

mjoras <notifications@github.com> Fri, 29 May 2020 19:29 UTC

Return-Path: <noreply@github.com>
X-Original-To: quic-issues@ietfa.amsl.com
Delivered-To: quic-issues@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 908263A0F75 for <quic-issues@ietfa.amsl.com>; Fri, 29 May 2020 12:29:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.1
X-Spam-Level:
X-Spam-Status: No, score=-3.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_IMAGE_ONLY_32=0.001, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=github.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f9Y-_FOCTm1H for <quic-issues@ietfa.amsl.com>; Fri, 29 May 2020 12:29:53 -0700 (PDT)
Received: from out-24.smtp.github.com (out-24.smtp.github.com [192.30.252.207]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1DC053A0F68 for <quic-issues@ietf.org>; Fri, 29 May 2020 12:29:52 -0700 (PDT)
Received: from github-lowworker-0f78100.ash1-iad.github.net (github-lowworker-0f78100.ash1-iad.github.net [10.56.25.48]) by smtp.github.com (Postfix) with ESMTP id 49E086A0B56 for <quic-issues@ietf.org>; Fri, 29 May 2020 12:29:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com; s=pf2014; t=1590780591; bh=q2pNF04CGf+4CgANOy9Sy44iY32daEh/3P6/+rVCvVU=; h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID: List-Archive:List-Post:List-Unsubscribe:From; b=1hncG70WnBR4N6NuzO+VDE/J/e/RlCCDEwYl0IUH5ENHAROJW3fVilPgYQ2Lm0BGj dHSlQ/nKT2LKN4Ecm/BGa0aW5RdFquXe4Uu1M00tOcAXOEL5XLLqvhz2iLPGSSCW/a fARZvo8kzpHsMueeq9NnAJGdWMBLa69RZ4m29seY=
Date: Fri, 29 May 2020 12:29:51 -0700
From: mjoras <notifications@github.com>
Reply-To: quicwg/base-drafts <reply+AFTOJK5STHCOKU5EFHIDDBV43VB27EVBNHHCFNKKLA@reply.github.com>
To: quicwg/base-drafts <base-drafts@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <quicwg/base-drafts/issues/3526/636147424@github.com>
In-Reply-To: <quicwg/base-drafts/issues/3526@github.com>
References: <quicwg/base-drafts/issues/3526@github.com>
Subject: Re: [quicwg/base-drafts] QUIC PTO is too conservative, causing a measurable regression in tail latency (#3526)
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="--==_mimepart_5ed162af3a3ed_3903fb11a0cd964117463"; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: list
X-GitHub-Sender: mjoras
X-GitHub-Recipient: quic-issues
X-GitHub-Reason: subscribed
X-Auto-Response-Suppress: All
X-GitHub-Recipient-Address: quic-issues@ietf.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic-issues/KjKCKv3ohk275G6aNqPnxtJ9PV8>
X-BeenThere: quic-issues@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Notification list for GitHub issues related to the QUIC WG <quic-issues.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic-issues>, <mailto:quic-issues-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic-issues/>
List-Post: <mailto:quic-issues@ietf.org>
List-Help: <mailto:quic-issues-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic-issues>, <mailto:quic-issues-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 29 May 2020 19:29:55 -0000

> This probably shouldn't make me nervous, but it still does. But I still greatly appreciate the data.

@martinthomson For what it's worth, I did extensive testing of this change on smaller scale first, e.g. A/B tests on groups of individual hosts in different locales to ensure that nothing was fundamentally problematic. But indeed, there is good reason to be nervous when experimenting with recovery on the Internet ūüôā 

> can you confirm that you track both observed loss rates AND the rate of spurious retransmission? It strikes me that there would be high correlation between improvements in latency and real loss, so the loss rates probably don't change except where the faster transmission coincides with true congestion (which I would expect to be undetectable), and the same is likely also true for spurious retransmission, but I just wanted to confirm.

One thing I'll note here is that the vast majority of the newly-fired PTOs in this test (i.e. PTOs that previously would have fired later) were not retransmissions at all but rather PTOs using fresh application data. As such the retransmission rate, spurious or otherwise, was pretty much constant when compared to the control.

I think your intuition about the loss rate is totally warranted; in this test though there was not a statistically significant change in the observed loss rate, either the average or in the p95+ or p99 tail.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/quicwg/base-drafts/issues/3526#issuecomment-636147424