Re: [quicwg/base-drafts] Desirable behavior when it takes time to derive the traffic keys for the next PN space (#3821)

Jana Iyengar <notifications@github.com> Wed, 15 July 2020 02:25 UTC

Return-Path: <noreply@github.com>
X-Original-To: quic-issues@ietfa.amsl.com
Delivered-To: quic-issues@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5FA5D3A0D09 for <quic-issues@ietfa.amsl.com>; Tue, 14 Jul 2020 19:25:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.101
X-Spam-Level:
X-Spam-Status: No, score=-3.101 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=github.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XfI3W3Uxgm_D for <quic-issues@ietfa.amsl.com>; Tue, 14 Jul 2020 19:25:48 -0700 (PDT)
Received: from out-24.smtp.github.com (out-24.smtp.github.com [192.30.252.207]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 908063A0D08 for <quic-issues@ietf.org>; Tue, 14 Jul 2020 19:25:48 -0700 (PDT)
Received: from github-lowworker-d1d6e31.ash1-iad.github.net (github-lowworker-d1d6e31.ash1-iad.github.net [10.56.105.50]) by smtp.github.com (Postfix) with ESMTP id D6CCC6A1211 for <quic-issues@ietf.org>; Tue, 14 Jul 2020 19:25:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com; s=pf2014; t=1594779947; bh=4ZSdE5H7Z2jyKAxorJ22lX3REpxAYFFhwLNroSQf+EY=; h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID: List-Archive:List-Post:List-Unsubscribe:From; b=remt9E6Xp3e3n9LHfLOLd7bdPkVv0NI2mTZYUgq7cGn6tpgiz9aU278yV91xcI3gp TKkcDKzO5yO0Yom2w/EHanVUxV0oHx9P3XjJxGFSCFvLPULO27S3axyJK9u4QOevzv din6zz+2iJ0MgpxybIEVoOsfZWlD2mbxpCHUxyBo=
Date: Tue, 14 Jul 2020 19:25:47 -0700
From: Jana Iyengar <notifications@github.com>
Reply-To: quicwg/base-drafts <reply+AFTOJK5XK2K4MEFA6ZKK6655DJFCXEVBNHHCNTMDWA@reply.github.com>
To: quicwg/base-drafts <base-drafts@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <quicwg/base-drafts/issues/3821/658507628@github.com>
In-Reply-To: <quicwg/base-drafts/issues/3821@github.com>
References: <quicwg/base-drafts/issues/3821@github.com>
Subject: Re: [quicwg/base-drafts] Desirable behavior when it takes time to derive the traffic keys for the next PN space (#3821)
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="--==_mimepart_5f0e692bc68f6_5223ff3698cd95c20305b"; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Precedence: list
X-GitHub-Sender: janaiyengar
X-GitHub-Recipient: quic-issues
X-GitHub-Reason: subscribed
X-Auto-Response-Suppress: All
X-GitHub-Recipient-Address: quic-issues@ietf.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic-issues/dbIzcrsfOBLspyxSw3HvQ4Ge984>
X-BeenThere: quic-issues@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Notification list for GitHub issues related to the QUIC WG <quic-issues.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic-issues>, <mailto:quic-issues-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic-issues/>
List-Post: <mailto:quic-issues@ietf.org>
List-Help: <mailto:quic-issues-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic-issues>, <mailto:quic-issues-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jul 2020 02:25:50 -0000

Summarizing my thoughts after discussing this with @ianswett and thinking about this with @kazuho. Apologies for being verbose, but I'm trying to summarize from the top.

**Problem**:

At a high level, the problem we are discussing here is that an endpoint buffers packets, because it isn't able to read them or write acks for them, and it might take a while to dequeue the buffer. As a result, the acks can be sent much later than when the packets were received at the endpoint, causing the ack_delay to potentially be much higher than max_ack_delay. This inflates the smoothed_rtt and rtt_variance, inflating the PTO, which can cause handshakes to fail (if a "handshake timeout" period is employed).

The problem is that the receiver endpoint can take arbitrarily long to respond. The ack_delay variable is meant to signal precisely this -- we tout it as an important distinction from TCP -- and it is useful here, except that it is limited to max_ack_delay in the various RTT estimations. This was done deliberately, allowing for any additional delays to become part of the path delay, with the expectation that such additional delays are outside of the control of the QUIC endpoint and are likely to be recurring. While that remains true, the issue that we are discussing here is an extreme case that occurs during the handshake and is not expected to be recurring.

**Proposed Solution:**

We use the ack delay signal during the handshake, and acknowledge that we might need to treat ack delays reported during the handshake period as _slightly_ special. Specifically, we can ignore the max_ack_delay, but we still limit the RTT sample to >= min_rtt. An endpoint does this until the handshake is _confirmed_. That is, until the endpoint is certain that the peer has 1RTT read and write keys and therefore does not have to buffer any more packets for lack of keys.

**Also:**

Separately, the recovery spec currently says don't arm PTO on ApplicationData until `handshake complete`. This should be `handshake confirmed`. This is because handshake completes at a client when it sends a CFIN, after which the client can send 1-RTT data. The CFIN however might get lost and the server might buffer the 1-RTT data until it receives a retransmission of the CFIN. Changing this to `handshake confirmed` makes it so that no PTO is armed on ApplicationData until there's no Initial or Handshake data left in flight.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/quicwg/base-drafts/issues/3821#issuecomment-658507628