Re: Proposal to replace ACK block count with ACK length

Kazuho Oku <kazuhooku@gmail.com> Sun, 17 June 2018 13:48 UTC

Return-Path: <kazuhooku@gmail.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1A2DB130E5C for <quic@ietfa.amsl.com>; Sun, 17 Jun 2018 06:48:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id btll8kFQmnvv for <quic@ietfa.amsl.com>; Sun, 17 Jun 2018 06:48:24 -0700 (PDT)
Received: from mail-pg0-x242.google.com (mail-pg0-x242.google.com [IPv6:2607:f8b0:400e:c05::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 355511274D0 for <quic@ietf.org>; Sun, 17 Jun 2018 06:48:24 -0700 (PDT)
Received: by mail-pg0-x242.google.com with SMTP id w12-v6so6359872pgc.6 for <quic@ietf.org>; Sun, 17 Jun 2018 06:48:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ROAIJyjw2X5yCMlcx1rGTaWKT8UKLN6einZGbKW2xyU=; b=A7URSC2/exkQ9S3vVnOuI5uwneXPpP4Fg9XFOMX7tBmBh40tc/OEUxqq7ognqhJGif rcf5Sd4BVFUaIsJ5KJ4Z4aE26LjLyxuFXcsW116dEmC8UNsTu3mfsMPiA1SQS3H00gDS uNML6hJ+IvuLpVy65nLf/goIaLqBEGc6pI3SE+peeF+DwjBDLIg7Muq5h++np3z0KX3C g9vGQmNvIF+dJuLUT0XBR54TqJcrr0O89/SRbwEt4oNFZJqPsMgNZb0+F7bc8PXL9IBX oZLQ0vjnplw85KNmhsiWPksJCNJFStLQz+Sh0WihQBvK63zrL4A/1MPlFdHslBZXup38 Q4eg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ROAIJyjw2X5yCMlcx1rGTaWKT8UKLN6einZGbKW2xyU=; b=ks/caxle+gLoHzbc+xCC+qmUqpd3gaTUc4zBnLRwGVBCB1nBprCd0f5vPYYgtJOgbO TLA9C7zfiajA1CQIZEHBQK+p9qe+q13tZcGN5WoD1qVfY5CnpqRGsvptBDDmhSw5mLSF 2Lq60ZWGlodYhR32NJzSCiY7TYCBtnN+WYi9uowPDqVMixJiGwVp+z9qKrckeSuYoRpL o3yAtAUnnxlitt2lafkXwCd7vd4UKkv5O3Khsq4iJd8pQZXnticO0KzwEUcIw49mRKya 8VKFogUjdR5sQ1yjRiXlOEPXDx34dwBa/orRRFaVDr1spAYX65SKzPjh/xwtAIxeuYE9 GFdg==
X-Gm-Message-State: APt69E3HlH8g1xHLGRefcRER5x4LG+Uok1m8dMFC4qYeTSejqGtP6+Ic RQiWJmGveq/d/z6GYlw7RjIAwBxViRZlGF7UuD7f9A==
X-Google-Smtp-Source: ADUXVKJuHTgTSbkEUfv/8aWkt9oiscLzFqxk7Y/80luyAwpaXoUhrmFcJVMFY/+Tz7covf6wP83bE6B25cWHXZCwM+g=
X-Received: by 2002:a63:a902:: with SMTP id u2-v6mr7887299pge.67.1529243303721; Sun, 17 Jun 2018 06:48:23 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a17:90a:1181:0:0:0:0 with HTTP; Sun, 17 Jun 2018 06:48:23 -0700 (PDT)
In-Reply-To: <CAOYVs2oE6yawW04MVH1ApewSJ+0g9g2oMxCj+CU+butfiAe8kA@mail.gmail.com>
References: <1F436ED13A22A246A59CA374CBC543998B832414@ORSMSX111.amr.corp.intel.com> <20180611154244.GA27622@ubuntu-dmitri> <CACpbDcdxzRxeiN93kKoj__vo2TERm4QZKqaesL=jr4wQUN1gXA@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B833B91@ORSMSX111.amr.corp.intel.com> <CABcZeBOjjRrX+AsXdgcUKpL=ciL8U_U1+WVAhQv-ZjwGxkQxYw@mail.gmail.com> <MWHPR21MB0638068EFA850328793E55F6B67C0@MWHPR21MB0638.namprd21.prod.outlook.com> <CACpbDcdbTKKEh8dcshWM6-7vq2hBFJC1myL1+H6etpMMjth+wg@mail.gmail.com> <CABkgnnV_thWcAi=AdwV+Za5rXywiUvtOYpsNNp1y7=RvL2MvWA@mail.gmail.com> <CAOYVs2qE=Tw_7eax9HwaESaQPMh7k3BSVV112d+pPeSfZ09EjQ@mail.gmail.com> <CABcZeBOCRHAuh44CrMH02UZ3Ar_2sa5M1c3LG_A-RPzXX+H+Yw@mail.gmail.com> <CAKcm_gOeZHR-BGJiqK=zQKqbgq=briQuH+fzHrkUYbhQx3B_sw@mail.gmail.com> <CANatvzyKv8EGVR-Z5WMDKbeuKHP791OynsTqX=+HriKBxFnafA@mail.gmail.com> <CAOYVs2oE6yawW04MVH1ApewSJ+0g9g2oMxCj+CU+butfiAe8kA@mail.gmail.com>
From: Kazuho Oku <kazuhooku@gmail.com>
Date: Sun, 17 Jun 2018 22:48:23 +0900
Message-ID: <CANatvzxniU0AUEi5tuKzmX45uTUV6-y0JbqcdKTpu1J4WQR7JA@mail.gmail.com>
Subject: Re: Proposal to replace ACK block count with ACK length
To: Marten Seemann <martenseemann@gmail.com>
Cc: Ian Swett <ianswett=40google.com@dmarc.ietf.org>, Eric Rescorla <ekr@rtfm.com>, Jana Iyengar <jri.ietf@gmail.com>, Praveen Balasubramanian <pravb@microsoft.com>, IETF QUIC WG <quic@ietf.org>, Martin Thomson <martin.thomson@gmail.com>, "Deval, Manasi" <manasi.deval@intel.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/qhJl2p4IyNNkcEO9HBGaCXtnbTI>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Jun 2018 13:48:28 -0000

2018-06-17 22:36 GMT+09:00 Marten Seemann <martenseemann@gmail.com>:
> At least for my implementation, parsing doesn't become easier, it becomes
> more complex with this proposal. My varint-parser always consumes as many
> bytes as the varint requires, so after parsing a varint, I'd have to
> introduce an additional check that this didn't overflow the ACK length (e.g.
> consider that I parsed the ACK frame so far that only 2 bytes are remaining
> according to ACK length field, but the next varint is 4 bytes long).

Isn't your varint parser checking that it has not (or will not) run
across the end of the packet payload for every ACK block it parses?
I'd assume that you would be doing that, because I think that is
necessary to avoid buffer overrun.

What I am saying that that check could be converted to a overrun check
against the end of the "frame payload", and that checking the
remaining block count becomes unnecessary, in case we replace ACK
Block Count with ACK Frame Length.

>
> In general, we've been moving the wire image towards making every encodable
> value valid. This proposal moves us away from that principle:
> * some small values are always invalid (the length can never be between 0
> and 3)
> * a lot of intermediate values are invalid (if the boundary falls inside a
> varint, as described above)
> Both these cases can't occur with the current ACK frame format.
>
> On Sun, Jun 17, 2018 at 7:54 PM Kazuho Oku <kazuhooku@gmail.com> wrote:
>>
>> 2018-06-17 8:34 GMT+09:00 Ian Swett
>> <ianswett=40google.com@dmarc.ietf.org>:
>> > I'm not a fan of this proposal, because I think it is impractical to
>> > drop
>> > the number of ack blocks, because with the ECN proposal it becomes
>> > impractically complex to parse.
>>
>> For the ECN proposal, as Christian has suggested, we can move the ECN
>> counters before the ACK blocks. Then, it would not be complex to
>> parse.
>>
>> And my view is that parsing becomes easier if we replace ACK Block
>> Count with ACK Frame Length.
>>
>> Now, with ACK Block Count, we need to check the remaining number of
>> blocks and the remaining space in the packet payload for every block
>> that we parse. Failing to check either leads to a bug or a security
>> issue.
>>
>> If we switch to ACK Frame Length, we need to only check the remaining
>> space in the frame.
>>
>> I think that this is the biggest benefit of replacing ACK Block Count
>> with ACK Frame Length. OTOH the downside is that you need extra one to
>> two bits (one if the size of block / gap is expected to be below 65,
>> two if they are expected to be above that) for encoding ACK Frame
>> Length compared to ACK Block Count.
>>
>>
>>
>> Having said that, I honestly wonder if all the frames could have it's
>> length being encoded (either explicitly or either as a signal that
>> says "to the end of the packet"). Consider something like below:
>>
>> |0| frame-type (7) | frame-payload-length (i) | frame-payload (*) |
>>  or
>> |1| frame-type (7) | frame-payload (*) |
>>
>> When MSB of the first octet set to zero, the length of the frame
>> payload is designated by the varint that immediately follows the frame
>> type.
>> When MSB of the first octet set to one, the length of the frame
>> payload spans to the end of the packet.
>>
>> In this encoding, we can always omit the Length field of a STREAM
>> frame. So the overhead for carrying stream data will be indifferent in
>> practice.
>>
>> For the ACK frame, we can omit the ACK Block Count field. And the
>> overhead will be one to two bits if the ACK frame is sent in the
>> middle of the packet (thereby using the encoding with explicit frame
>> payload length), or one octet or more shorter if ACK is the last frame
>> of the packet.
>>
>> We are likely to see increase of overhead for most of the other types
>> of frames, but I do not think that would be an issue considering that
>> they will be far seldom seen compared to STREAMs and ACKs.
>>
>> To summarize, my anticipation is that we can make all the frames
>> self-contained (i.e. the length can be determined without the
>> knowledge of how each frame is encoded) without any overhead, if we
>> agree on making the frame type space 1 bit smaller.
>>
>> Finally, the biggest benefit of using a self-contained encoding of
>> frames is that we would have the ability to introduce new optional
>> frames without negotiation. By making the frames self-contained, QUIC
>> endpoints will have the freedom of ignoring the frames that they do
>> not understand.
>>
>> Being able to send QUIC frames defined in extensions without
>> negotiating using Transport Parameters will be a win in both terms of
>> security (because clients' TP is sent in clear) and flexibility
>> (because we will be possible to send the extensions before we figure
>> out whether the peer supports that extension).
>>
>> > If we don't remove the number of ack blocks, then the ack frame is
>> > larger,
>> > but I don't think the extra size field is useful for most
>> > implementations.
>> > Also, it means the length can disagree with the actual length, which add
>> > complexity and the possibility of writing error-prone code.  The idea of
>> > someone offloading ack processing and then proceeding to trust the
>> > length
>> > seems like someone could get wrong and cause some concerning issues.
>> >
>> > My experience is multithreaded packet processing is more cost and work
>> > than
>> > it's worth.  Sure you can't fill a 100G NIC with one connection, but
>> > that
>> > seems like an academic problem, not one for workloads I've seen.
>> > Typically
>> > the extra cost of multithreading outweighs its value.
>> >
>> > To be clear, I don't think this is an awful idea, but I also don't see
>> > the
>> > value and it adds complexity.  I read Manasi's email, but I don't think
>> > I
>> > understand why any of those matter in practice.
>> >
>> > On Sat, Jun 16, 2018 at 4:13 PM Eric Rescorla <ekr@rtfm.com> wrote:
>> >>
>> >> On Fri, Jun 15, 2018 at 6:46 PM, Marten Seemann
>> >> <martenseemann@gmail.com>
>> >> wrote:
>> >>>
>> >>> This proposal increases the size of the ACK frame by 1 byte in the
>> >>> common
>> >>> case (less than 63 ACK ranges), since the ACK length field here always
>> >>> consumes 2 bytes, whereas the ACK Block Count is a variable-length
>> >>> integer.
>> >>> Considering how much work we put into minimising the size of the
>> >>> frames,
>> >>> this feels like a step in the wrong direction..
>> >>>
>> >>> Regarding the processing cost, I agree with Dmitri. Handling an ACK
>> >>> frame
>> >>> requires looping over and making changes to a data structure that
>> >>> keeps
>> >>> track of sent packets. This is much more expensive than simply parsing
>> >>> a
>> >>> bunch of varints in the ACK frame. It seems unlikely that a
>> >>> multi-threaded
>> >>> packet parser would offer any real-world performance benefits.
>> >>
>> >>
>> >> I don't want to overstate the benefit here, but my point isn't that
>> >> parsing is expensive but that if you want to have a multithreaded
>> >> packet
>> >> processing system, then it's nice to have a simpler data structure (the
>> >> unparsed ACK block) to hand to the ACK processing thread.
>> >>
>> >> -Ekr
>> >>
>> >>
>> >>>
>> >>> On Sat, Jun 16, 2018 at 6:19 AM Martin Thomson
>> >>> <martin.thomson@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> When we discussed this before, some people observed that this creates
>> >>>> a need to encode in two passes.  That's the trade-off here.  (Not
>> >>>> expressing an opinion.)
>> >>>> On Fri, Jun 15, 2018 at 3:51 PM Jana Iyengar <jri.ietf@gmail.com>
>> >>>> wrote:
>> >>>> >
>> >>>> > I don't have a strong opinion on this. I'm certainly not opposed to
>> >>>> > it.
>> >>>> > Does anyone have a strong opposition?
>> >>>> >
>> >>>> > On Fri, Jun 15, 2018 at 3:10 PM Praveen Balasubramanian
>> >>>> > <pravb@microsoft.com> wrote:
>> >>>> >>
>> >>>> >> I agree as well since this can help reduce per packet processing
>> >>>> >> overhead. ACKs are going to be the second most common frame type
>> >>>> >> so no
>> >>>> >> objections to special casing.
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> From: QUIC [mailto:quic-bounces@ietf.org] On Behalf Of Eric
>> >>>> >> Rescorla
>> >>>> >> Sent: Friday, June 15, 2018 9:11 AM
>> >>>> >> To: Deval, Manasi <manasi.deval@intel.com>
>> >>>> >> Cc: Jana Iyengar <jri.ietf@gmail.com>; QUIC WG <quic@ietf.org>
>> >>>> >> Subject: Re: Proposal to replace ACK block count with ACK length
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> I agree with Manasi here. This change would allow ack frame
>> >>>> >> parsing
>> >>>> >> to be more self-contained, which is an advantage for the parser
>> >>>> >> and also
>> >>>> >> potentially for parallelism (because you can quickly find the
>> >>>> >> frame and then
>> >>>> >> process it in parallel).
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> -Ekr
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> On Mon, Jun 11, 2018 at 5:22 PM, Deval, Manasi
>> >>>> >> <manasi.deval@intel.com> wrote:
>> >>>> >>
>> >>>> >> In general, varints require some specific logic for parsing. To
>> >>>> >> skip
>> >>>> >> over any header, I have to read every single varint. As the code
>> >>>> >> sees Stream
>> >>>> >> and ACK headers most frequently, that is my focus.  The Stream
>> >>>> >> frame has a
>> >>>> >> length in its third field.
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> ACK parsing, however, needs 6 + 2*num_blocks reads to identify
>> >>>> >> length. There are two reads each for ‘largest acknowledged’, ‘ACK
>> >>>> >> delay’ and
>> >>>> >> ‘ACK block count’. The pain point is the total number of cycles
>> >>>> >> parse an
>> >>>> >> ACK. If I am processing 10M pps, where 10% - 30% of the packets
>> >>>> >> have a
>> >>>> >> piggybacked ACK, these cycles becomes a significant bottleneck.
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> Thanks,
>> >>>> >>
>> >>>> >> Manasi
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> From: QUIC [mailto:quic-bounces@ietf.org] On Behalf Of Jana
>> >>>> >> Iyengar
>> >>>> >> Sent: Monday, June 11, 2018 3:11 PM
>> >>>> >> To: Deval, Manasi <manasi.deval@intel.com>; QUIC WG
>> >>>> >> <quic@ietf.org>
>> >>>> >> Subject: Re: Proposal to replace ACK block count with ACK length
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> You're right that we no longer have the ability to skip an ACK
>> >>>> >> frame,
>> >>>> >> and this crept in when we moved to varints.
>> >>>> >>
>> >>>> >> I believe your problem though is generally true of most frames not
>> >>>> >> just ACKs, since ids, packet numbers, and numbers in all frames
>> >>>> >> are now all
>> >>>> >> varints. To skip any frame, you'll need to parse the varint fields
>> >>>> >> in those
>> >>>> >> frames. If you have logic to process and skip varints, then
>> >>>> >> skipping the ack
>> >>>> >> block section is merely repeating this operation (2*num_block+1)
>> >>>> >> times. Do
>> >>>> >> you see specific value in skipping ACK frames over the other
>> >>>> >> control frames?
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> On Mon, Jun 11, 2018 at 8:43 AM Dmitri Tikhonov
>> >>>> >> <dtikhonov@litespeedtech..com> wrote:
>> >>>> >>
>> >>>> >> On Mon, Jun 11, 2018 at 03:33:35PM +0000, Deval, Manasi wrote:
>> >>>> >> > -        Moving the ACK length to the front of the ACK allows
>> >>>> >> > the
>> >>>> >> >          flexibility of either reading the entire ACK or reading
>> >>>> >> > the
>> >>>> >> >          first 16 bits and skipping over the length. This is a
>> >>>> >> > useful
>> >>>> >> >          feature for the case where ACK processing is split into
>> >>>> >> >          multiple layers. Depending on the processor this is run
>> >>>> >> > on,
>> >>>> >> >          there are different advantages -
>> >>>> >>
>> >>>> >> Just a note.  In my experience, the cost of parsing an ACK frame
>> >>>> >> is
>> >>>> >> negligible compared to the cost of processing an ACK frame: that
>> >>>> >> is,
>> >>>> >> poking at various memory locations to discard newly ACKed packets.
>> >>>> >>
>> >>>> >>   - Dmitri.
>> >>>> >>
>> >>>> >>
>> >>>>
>> >>
>> >
>>
>>
>>
>> --
>> Kazuho Oku



-- 
Kazuho Oku