Re: Questions about recovery draft 29

Jana Iyengar <jri.ietf@gmail.com> Sat, 18 July 2020 03:13 UTC

Return-Path: <jri.ietf@gmail.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DE7F93A0924 for <quic@ietfa.amsl.com>; Fri, 17 Jul 2020 20:13:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1UIpH7nvFWG8 for <quic@ietfa.amsl.com>; Fri, 17 Jul 2020 20:13:33 -0700 (PDT)
Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8D4283A08F3 for <quic@ietf.org>; Fri, 17 Jul 2020 20:13:32 -0700 (PDT)
Received: by mail-lj1-x233.google.com with SMTP id q4so14929158lji.2 for <quic@ietf.org>; Fri, 17 Jul 2020 20:13:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CNJQkVElSbDIrbF9fPIkyv2+/mOEwov6hrH79LR+ObE=; b=gryvfMbNEyt6BcRMQQuPpJZybKGz4ftL14orlmZO9UaDRIpS+sHdqhQxLrg2NUnu9/ t9M0u3PLdx8zk2F/ynBOnZro3+imIcaVpQ8f5oEg7dqkZasJSqLxbeCaAMUlqpRJzZ/d oeN+t/D37u6Moo0tE2HHulhk0gwXUIxWetSnWZWuBhiHxy9CAYLQBBL4R9/vdl7REdGH Qisuy8ViN6L8nDDMC3nS8S9xLrxVbklB54jbGNlyHK8HYD4xNT8peZNp9nbe4vpTHL/d t7jrmrSl6Ewmd/f0uaYAd1/al7oe5qPwJ/A9m/iTfSfNUZuBN2dIo/FTuGTLyYeLtJ4f MQTw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CNJQkVElSbDIrbF9fPIkyv2+/mOEwov6hrH79LR+ObE=; b=F7mn3rYBiofUtJ9y3dECAOs0uPIsFgL0x2mw0xtpxVfRPqo3Yh3K5gj+mxm+j8ZGcE ZrhUHTrd5lAjU6+jHQy+3zo/DQprrKmJy5PiSYhANZae8QiVrNWNeqxARgVhOsHqCDTQ IQg1jtq7XDOzPC9ZqrYkGtOvzhapZdsjqSPlKF9OwZLfMNOdtPf+k/uyZjAmn/vpQ//g CeZKKT0QlTzSusn21oZREtF+MXuR8HzxYjOj7iQ0FuPFVkjtOIhut1BGUWJwSQ2j4Nvj 6xO25yX51A3k7AjH0MVpPTmXMizXPV18uAjr5n5Gce2cWAO04KSIJgarXa5OBM7LGFQm YF6w==
X-Gm-Message-State: AOAM531dNU0wwmqq/0Ng0gHika55X3tcS5pYbD+VPpKfoaf9/NrYJn/S WdNZBBZ9UtYWJKXtA4hW7GtfvDat8bBAdK4I+j8=
X-Google-Smtp-Source: ABdhPJzHhO5vW+jRGNmMWE68w+QGj2pjVNovfCR0SmFGjzQQZ1kidmdUOviBq7qcMG4s29JkFCbdg1K474wuH3jne5g=
X-Received: by 2002:a2e:8783:: with SMTP id n3mr5170395lji.317.1595042010390; Fri, 17 Jul 2020 20:13:30 -0700 (PDT)
MIME-Version: 1.0
References: <CABC-CqayJAeipEhsTf4h8rkUVQvYLd=1wF4UBMkaYSNMfvgnXw@mail.gmail.com> <CADdTf+jxKyPw_zfJ6FWCz230rXqg83d7rQJhKBteBvoLCut1RQ@mail.gmail.com> <CAKKJt-dgEmOtvXR8hVT9qf+YgSe3m9LMUuE6=csU28j+j1V_ZQ@mail.gmail.com>
In-Reply-To: <CAKKJt-dgEmOtvXR8hVT9qf+YgSe3m9LMUuE6=csU28j+j1V_ZQ@mail.gmail.com>
From: Jana Iyengar <jri.ietf@gmail.com>
Date: Fri, 17 Jul 2020 20:13:19 -0700
Message-ID: <CACpbDcd5gSz9rpj+WzvzUZGCquYWtSAbPpjYOUe7m+voPge_gQ@mail.gmail.com>
Subject: Re: Questions about recovery draft 29
To: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>
Cc: Matt Joras <matt.joras@gmail.com>, IETF QUIC WG <quic@ietf.org>, Jonas Reynders <jonas.reynders@student.uhasselt.be>
Content-Type: multipart/alternative; boundary="0000000000006e640c05aaaea80e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/pTx10oO17chq39odBm84XKH5_xY>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Jul 2020 03:13:35 -0000

Jonas,

I agree with Matt's response overall. I'll add a couple of specific
responses to your questions.

On RACK: RACK's key insight is to use timestamps (and time) instead of TCP
sequence numbers for loss detection. RACK makes time-ordering the basis for
loss detection, instead of sequence number ordering.  This difference is
importnat when thinking about retransmissions, detection of spurious
retransmissions, and such. A big but subtle difference between TCP and QUIC
is that QUIC does not _reuse_ its packet numbers. This was a deliberate
design choice, at least in part to ensure that QUIC's packet numbers
represent time-ordering. As a result, using packet numbers for loss
detection in QUIC is already different than base (non-RACK) TCP.

The rest of it is around making constants adaptive, and as Matt notes, it's
not fully clear yet what the "right" answer here is, and we wanted to leave
room for experimentation here. As we learn more, we entirely expect the
mechanisms in the draft to evolve, first in implementations, and then in
specifications.

- jana


On Mon, Jul 13, 2020 at 7:40 AM Spencer Dawkins at IETF <
spencerdawkins.ietf@gmail.com> wrote:

> Just to chime in here, so top-posting.
>
> Matt is correct (that specifications != implementations), but I note that
> the transport area has produced, and revised,
> https://datatracker.ietf.org/doc/rfc7414/ (A Roadmap for Transmission
> Control Protocol (TCP) Specification Documents", that go through what Matt
> characterized as the 'myriad RFCs that loosely define "IETF TCP"'
>
> The TCP community used Informational RFCs to document what's a good idea
> and what's a bad idea at the per-specification level ("really important to
> implement this, that is a really bad idea so don't implement it", because
> RFCs were the tools we have.
>
> It might be worth thinking about the form a "QUIC Roadmap" might take,
> given that we now have wikis, gitbub, and other tools that could allow the
> QUIC roadmap and implementations to be more aware of each other, especially
> if this was more granular than "this document".
>
> We talked about this briefly at IETF 106 in TSVAREA. From the minutes at
> https://datatracker.ietf.org/meeting/106/materials/minutes-106-tsvarea-02
>  ...
>
>    - Ted: Lots of creativity right now. people need to understand what
>    they are doing *to* the protocol, not just *with* it. We should write "the
>    hitchhiker's guide to QUIC". Intended for all audiences.
>    - Mirja: That's what applicability and mgmt docs are for.
>    - Ted: That's not what I'm looking for.
>    - Spencer: Imagine if TCP roadmap had started before there were 150
>    RFCs!
>
> Did people have thoughts about that?
>
> Best,
>
> Spencer
>
>
> On Thu, Jul 9, 2020 at 2:10 PM Matt Joras <matt.joras@gmail.com> wrote:
>
>> I won't offer specific answers to all your questions as Ian or Jana
>> likely have better ones, however I'd like to make a general point
>> about this line of questioning: Linux TCP != IETF TCP, or, said
>> another way, Implementations and Specifications are not the same
>> thing. Recall that QUIC is a series of specifications being developed
>> by the IETF, similar to the myriad RFCs that loosely define "IETF
>> TCP". Linux's TCP stack is one such interpretation/implementation of
>> TCP, and while it can inform the decisions of IETF workings, it should
>> not be an undue determiner of the development of said specifications.
>> Similarly we have an even wider and more diverse collection of QUIC
>> interpretations/implementations in various states of maturity and
>> deployment. Many diverge, sometimes heavily, from the specifications
>> especially insofar as recovery is concerned. Chrome and us (mvfst) use
>> BBR as the default congestion controller, others use Cubic or HSTCP or
>> who knows what else. Chrome, AFAIK has a notion of adaptive reordering
>> similar to Linux TCP and modifications to the draft-suggested time
>> thresholds and PTO strategy.
>>
>> Whether some of these changes should be codified in the QUIC
>> specifications is a good question. My hope is that these best
>> practices will gain deployment experience and drive consensus at the
>> IETF in future iterations of QUIC recovery and encounter fewer
>> roadblocks than innovations based on TCP have historically
>> experienced. As it is now though, QUIC's charter, by design, largely
>> limits us to the adaptation of recovery and congestion control work
>> that has already gained consensus via other IETF RFCs. That being
>> said, what we have in the recovery specs may not be the "state of the
>> art" exactly, but it is still quite good and I would argue has other
>> significant advantages over what is commonly deployed in TCP, even
>> Linux TCP.
>>
>> Matt Joras
>>
>> On Thu, Jul 9, 2020 at 9:22 AM Jonas Reynders
>> <jonas.reynders@student.uhasselt.be> wrote:
>> >
>> > I have been looking at the recovery draft 29 of QUIC and comparing it
>> to TCP, specifically the linux implementation in kernel v5.4. I have a
>> couple of questions about the design choices that differ from TCP in linux.
>> >
>> > The first question is in regards to ack-based loss detection where QUIC
>> uses both a packet-threshold and time-threshold method. Looking at the
>> linux implementation (kernel v5.4), TCP uses only a time-threshold method
>> (RACK) as default. From my understanding both methods were used in linux
>> some time ago, around the time it was clarified to use both methods for
>> QUIC as well. However since then TCP changed to RACK only as the default,
>> from what I found out it was due to tests at google detecting a 10%
>> reduction in recovery latency (reference:
>> https://patchwork.ozlabs.org/project/netdev/cover/20180516234017.172775-1-ycheng@google.com/).
>> My question is why QUIC still recommends using both methods as default? I
>> can understand that in a datacenter context, where RTT values are really
>> low, RACK can struggle due to timers not being fine grained.
>> >
>> > My second question is in regards to the time threshold
>> calculation(6.1..2 in recovery draft). QUIC uses a constant value (9/8) for
>> the time threshold, whilst in linux an adaptive threshold is used in the
>> form of a reordering window variable. In the recovery draft it is mentioned
>> that adaptive thresholds can be used, however it’s not mandatory. Is there
>> a reason why QUIC doesn’t use the linux method for an adaptive threshold?
>> The draft also mentions that reordering can be more common in QUIC and that
>> adaptive thresholds have the advantage of being more aggressive when there
>> is no reordering. The same question for the packet threshold, QUIC uses a
>> constant threshold of 3 packets whilst in linux there is an adaptive
>> threshold.
>> >
>> > My last question is that of merging the TLP and RTO into PTO. If I
>> understand correctly, this was possible due to probe packet being
>> ack-eliciting. And, that the time-threshold method would detect loss in a
>> similar manner to RTO, with the RTO also causing a congestion collapse if
>> no acks were received for consecutive RTOs. I assume that keeping RTO would
>> unnecessarily complicate things, is that correct?
>>
>>