Re: Proposal to replace ACK block count with ACK length

Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> Wed, 01 August 2018 12:38 UTC

Return-Path: <mirja.kuehlewind@tik.ee.ethz.ch>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 112B7130DFC for <quic@ietfa.amsl.com>; Wed, 1 Aug 2018 05:38:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZTArCBULFtpu for <quic@ietfa.amsl.com>; Wed, 1 Aug 2018 05:38:08 -0700 (PDT)
Received: from virgo02.ee.ethz.ch (virgo02.ee.ethz.ch [129.132.72.10]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 47C50127332 for <quic@ietf.org>; Wed, 1 Aug 2018 05:38:07 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by virgo02.ee.ethz.ch (Postfix) with ESMTP id 41gXsB45vZz15K6G; Wed, 1 Aug 2018 14:38:06 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at virgo02.ee.ethz.ch
Received: from virgo02.ee.ethz.ch ([127.0.0.1]) by localhost (virgo02.ee.ethz.ch [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4D9KnTa5pKiw; Wed, 1 Aug 2018 14:38:04 +0200 (CEST)
X-MtScore: NO score=0
Received: from [192.168.178.24] (i577BCE12.versanet.de [87.123.206.18]) by virgo02.ee.ethz.ch (Postfix) with ESMTPSA; Wed, 1 Aug 2018 14:38:03 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Subject: Re: Proposal to replace ACK block count with ACK length
From: =?utf-8?Q?Mirja_K=C3=BChlewind?= <mirja.kuehlewind@tik.ee.ethz.ch>
In-Reply-To: <ABCEEAFC-0EE5-48F9-939B-E0F819507BE1@intel.com>
Date: Wed, 1 Aug 2018 14:38:01 +0200
Cc: Subodh Iyengar <subodh@fb.com>, Eric Rescorla <ekr@rtfm.com>, Jana Iyengar <jri.ietf@gmail.com>, Praveen Balasubramanian <pravb@microsoft.com>, Ian Swett <ianswett@google.com>, Marten Seemann <martenseemann@gmail.com>, =?utf-8?Q?Mikkel_Fahn=C3=B8e_J=C3=B8rgensen?= <mikkelfj@gmail.com>, IETF QUIC WG <quic@ietf.org>, Martin Thomson <martin.thomson@gmail.com>, Kazuho Oku <kazuhooku@gmail.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <C7D8E368-0EAC-4E10-ABEE-EF25078E78D2@tik.ee.ethz.ch>
References: <1F436ED13A22A246A59CA374CBC543998B832414@ORSMSX111.amr.corp.intel.com> <20180611154244.GA27622@ubuntu-dmitri> <CACpbDcdxzRxeiN93kKoj__vo2TERm4QZKqaesL=jr4wQUN1gXA@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B833B91@ORSMSX111.amr.corp.intel.com> <CABcZeBOjjRrX+AsXdgcUKpL=ciL8U_U1+WVAhQv-ZjwGxkQxYw@mail.gmail.com> <MWHPR21MB0638068EFA850328793E55F6B67C0@MWHPR21MB0638.namprd21.prod.outlook.com> <CACpbDcdbTKKEh8dcshWM6-7vq2hBFJC1myL1+H6etpMMjth+wg@mail.gmail.com> <CABkgnnV_thWcAi=AdwV+Za5rXywiUvtOYpsNNp1y7=RvL2MvWA@mail.gmail.com> <CAOYVs2qE=Tw_7eax9HwaESaQPMh7k3BSVV112d+pPeSfZ09EjQ@mail.gmail.com> <CABcZeBOCRHAuh44CrMH02UZ3Ar_2sa5M1c3LG_A-RPzXX+H+Yw@mail.gmail.com> <CAKcm_gOeZHR-BGJiqK=zQKqbgq=briQuH+fzHrkUYbhQx3B_sw@mail.gmail.com> <CANatvzyKv8EGVR-Z5WMDKbeuKHP791OynsTqX=+HriKBxFnafA@mail.gmail.com> <CAOYVs2oE6yawW04MVH1ApewSJ+0g9g2oMxCj+CU+butfiAe8kA@mail.gmail.com> <CANatvzxniU0AUEi5tuKzmX45uTUV6-y0JbqcdKTpu1J4WQR7JA@mail.gmail.com> <CAOYVs2p9vJrCVuXqGsR29rOGj=CNt1m7TcavGV9Kwk-9hA4sPQ@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B83AB21@ORSMSX111.amr.corp.intel.com> <1F436ED13A22A246A59CA374CBC543998B83EC27@ORSMSX111.amr.corp.intel.com> <CAKcm_gMV4vXXW5jKwAR-cOT6OYpi6FL-mO9K=0GWL6WULjWNKA@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B83EF15@ORSMSX111.amr.corp.intel.com> <CAOYVs2oynZuE43q1MVO3bBKTPCFg_T3pykS4e5p7DpSaSvmgtQ@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B843873@ORSMSX111.amr.corp.intel.com> <CANatvzzEV=BGJXFuOnDfhXJQV78aWFf84joMknRExY48vu8OYw@mail.gmail.com> <CABcZeBPyMBKvY_K6nQSNbxvGXhF2o3hMKeTFgvmbWPgkEyFKaw@mail.gmail.com> <CAN1APddCQ_H18QT+12zytagkBe5VKFUZN31wkMxOgQmHB2Xqug@mail.gmail.com> <CAN1APdeZ4dkOdSxSKxpD0YLy0aiBmwypdLQsS-rAR2Asfwec2w@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B86D457@ORSMSX111.amr.corp.intel.com> <MWHPR15MB182169F94173C287298636AAB65E0@MWHPR15MB1821.namprd15.prod.outlook.com> <ABCEEAFC-0EE5-48F9-939B-E0F819507BE1@intel.com>
To: "Deval, Manasi" <manasi.deval@intel.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/nFh8biJZ1fBTZ9-kmb43Q6O97mE>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Aug 2018 12:38:14 -0000

Hi Manasi,

this idea sounds really interesting, however, I believe that in most cases the segmenting would still be needed to be done by the higher layer because the data frames need to be „small“ enough to fit in a segment and I believe that in many cases the frame size would be chosen to exactly fill the segment size, thus there are not much options to segment differently in the kernel. Or do you have another scenario in mind?

Mirja



> Am 15.07.2018 um 21:14 schrieb Deval, Manasi <manasi.deval@intel.com>om>:
> 
> I think if ACKs are sent every 20 to 40 packets, they require the segmentation code to care about them. 
> 
> Also from talking to various folks I heard different things. Some implementations like to piggyback the ack with data while others keep ACK separate. In my understanding, the piggybacked ask is used for bandwidth restricted cases where they avoid padding. 
> 
> This is also an excellent question for the implementers and deployers on this list.
> 
> Thanks,
> Manasi
> 
> 
> On Jul 15, 2018, at 1:15 PM, Subodh Iyengar <subodh@fb.com> wrote:
> 
>> Hi Manasi,
>> Doesn't the segmentation make more sense for stream frames than ACK frames? ACK frames are not sent very often and at least in production what we've seen is that most of the time ACKs naturally end up getting sent in their own pure ACK packets. 
>> 
>> Not arguing for or against the length prefix design, but trying to understand why having a length prefix in the ACK frame helps segmentation for stream frames even on the recv side.
>> 
>> Subodh
>> From: QUIC <quic-bounces@ietf.org> on behalf of Deval, Manasi <manasi.deval@intel..com>
>> Sent: Saturday, July 14, 2018 8:35:14 PM
>> To: 'Mikkel Fahnøe Jørgensen'; Eric Rescorla; Kazuho Oku
>> Cc: Jana Iyengar; Praveen Balasubramanian; Ian Swett; Marten Seemann; IETF QUIC WG; Martin Thomson
>> Subject: RE: Proposal to replace ACK block count with ACK length
>>  
>> Hi All,
>>  
>> I would like to revisit this discussion in the context of distributed processing. The distributed processing doesn’t only apply to multi-threading but also to layered processing.
>>  
>> One example of layered processing is segmentation. The current Chromium implementation currently creates the segments and passes them as a bunch into the Linux kernel. There is a GSO patch that would chop at the segment boundary and encapsulate in UDP headers. We can further improve performance by plugging in a smarter segmentation algorithm such that the segment processing can account for type of frame, its size and make intelligent boundary and padding decisions at a lower layer or in an offload. To this end, the algorithm needs to identify the size of each of the frames. Having an ACK length that can be determined in a single read is much more efficient than having to do a series of reads. 
>>  
>> Thanks,
>> Manasi
>>  
>> From: Mikkel Fahnøe Jørgensen [mailto:mikkelfj@gmail.com] 
>> Sent: Friday, June 22, 2018 9:15 AM
>> To: Eric Rescorla <ekr@rtfm.com>om>; Kazuho Oku <kazuhooku@gmail.com>
>> Cc: Jana Iyengar <jri.ietf@gmail.com>om>; IETF QUIC WG <quic@ietf.org>rg>; Deval, Manasi <manasi.deval@intel.com>om>; Marten Seemann <martenseemann@gmail.com>om>; Martin Thomson <martin.thomson@gmail.com>om>; Ian Swett <ianswett@google.com>om>; Praveen Balasubramanian <pravb@microsoft.com>
>> Subject: Re: Proposal to replace ACK block count with ACK length
>>  
>> Another option is to write out frames as is without length except as needed for varints. When a packet is nearly done, a run-length encoded frame index is placed in the packet tail. It could also include the frame types. A reader can now directly find or skip ACK frames and the writer does not have to compute lengths before writing. Of course, the reader needs to be able to locate the index at the tail which can be tricky with coalesced packets, but not impossible. Isn’t this largely how zip-files work?
>>  
>> Mikkel
>>  
>> On 22 June 2018 at 17.00.52, Mikkel Fahnøe Jørgensen (mikkelfj@gmail..com) wrote:
>> 
>> As I said earlier, I also favour a length prefix. But, Ian does have a point:
>>  
>> Writing data is generally more expensive than reading data. Especially if you have to traverse long data structures to find the length before you can start writing and/or you may have to conservatively reserve extra space for a length field.
>>  
>> So before deciding one way or the other, the cost of writing needs to be well understood, also in scenarios with large MTU’s.
>>  
>> In the flatbuffers space that I’m involved with, streaming has turned to be an issue because data cannot be transmitted in parts without index data on a separate channel to recombined fragments. A simple, but non-standard, change in the format would fix this. JSON and CoAP allows streaming, but many other formats do not, including later versions of protocol buffers, as I understand.
>>  
>> An odd consequence of making both read and write efficient is that length better work better if stored at the end and packets read backwards. This requires a single total length prefix but this is stored in the datagram header. This is probably too odd-ball, but still a consideration worth noting.
>>  
>> The second best alternative may therefore be to allow writes to be fast and reads to be decent.
>>  
>> Yet, I still like length prefixes if they could be made write efficient because high performance low-latency processes care zero about ACK and can consume a packet directly while a background process handles all the latency insensitive ACK and retransmission logic.
>>  
>> The question is also, where is the pressure: an IoT aggregator might be massively read intensive while a web cache would be very write intensive.
>>  
>>  
>> Kind Regards,
>> Mikkel Fahnøe Jørgensen
>> 
>>  
>> On 22 June 2018 at 15.12.09, Eric Rescorla (ekr@rtfm.com) wrote:
>> 
>> It seems like there are two questions at hand here:
>>  
>> 1. Would it be architecturally better to have frames have a consistent self-contained
>> representation?
>> 2. Is it enough better that we should do so now.
>>  
>> I agree with Kazuho that (a) we don't have that representation now and (b) it would
>> be a better design to do so. I'm perhaps somewhat more positive on (2) than he
>> is. I don't think it's critically important that we make the change, but if we were
>> to hold a consensus call, I think I would be in favor. I'd certainly be interested
>> in looking at a proposal if someone else were to make one.
>>  
>> -Ekr
>>  
>>  
>>  
>> On Thu, Jun 21, 2018 at 8:45 PM, Kazuho Oku <kazuhooku@gmail.com> wrote:
>> 2018-06-22 8:08 GMT+09:00 Deval, Manasi <manasi.deval@intel.com>om>:
>> > I feel that the requirement to have every value valid is somewhat academic.
>> 
>> I think that Marten is correct in pointing out that making the ACK
>> frame self-containing (by having a field that represents the number of
>> octets being consumed by the frame) would be an exception from the
>> design pattern we have.
>> 
>> Look at STREAM frame. The field is not self-contained. Instead, it has
>> a Length field for the Stream Data, which is a leaf. The same goes for
>> NEW_CONNECTION_ID frame (that has the length field for Connection ID
>> field (which is also a leaf)), CONNECTION_CLOSE (length field for
>> Reason Phase).
>> 
>> I agree with Marten, Mikkel (and possibly others as well) that having
>> consistency is important.
>> 
>> Therefore, I agree with Mikkel that we should consider making every
>> frame self-contained or keeping every frame as-is (i.e. not
>> self-contained).
>> 
>> FWIW, as described in the latter half of
>> https://www.ietf.org/mail-archive/web/quic/current/msg04287.html, it
>> is possible to make every frame self-contained *and* also make ACK
>> frames smaller than the current draft. Making every frame
>> self-contained gives us the possibility to send new extension frames
>> without negotiation.
>> 
>> I do not think that I would push for making every frame self-contained
>> by myself (because I do not think it meets the high bar to have a
>> change at such a late moment of standardization), but as stated, my
>> preference goes to seeing every frame made self-contained or none of
>> them made as such.
>> 
>> > The length value provides much more value than the block count and the fact
>> > that certain values can never be achieved is an inherent property of the
>> > length.
>> >
>> >
>> >
>> > One interesting observation is that this property is not limited to length.
>> > One can even make a similar argument about ACK block count. The maximum
>> > number of ACK blocks that can be defined will not always have a meaningful
>> > value. In out examples, 0,1,2,3 are all valid. If I set the value of ACK
>> > block count to have lower two bits to be 11, the maximum value is of ACK
>> > blocks is – 4611686018427387903. This is the same value of largest
>> > acknowledged so if the ACK block count was set to this value, it would still
>> > be meaningless.
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Manasi
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > From: Marten Seemann [mailto:martenseemann@gmail.com]
>> > Sent: Tuesday, June 19, 2018 6:44 PM
>> > To: Deval, Manasi <manasi..deval@intel.com>
>> > Cc: Ian Swett <ianswett@google.com>om>; Kazuho Oku <kazuhooku@gmail.com>om>; Eric
>> > Rescorla <ekr@rtfm.com>om>; Jana Iyengar <jri.ietf@gmail.com>om>; Praveen
>> > Balasubramanian <pravb@microsoft.com>om>; IETF QUIC WG <quic@ietf.org>rg>; Martin
>> > Thomson <martin.thomson@gmail.com>
>> >
>> >
>> > Subject: Re: Proposal to replace ACK block count with ACK length
>> >
>> >
>> >
>> > Hi Manasi,
>> >
>> >
>> >
>> >> The risk of disagreement between ack blocks and ack block count is same as
>> >> the risk of disagreement between ack blocks and ack length. Either way this
>> >> needs to be counted up while creating the ACK and counted down while parsing
>> >> it. The possibility of error is the same. Getting the ack block count wrong
>> >> is as problematic as getting the ack length wrong. Do you agree?
>> >
>> >
>> >
>> > I disagree. Let's take an example of an ACK frame with one ACK range, that
>> > needs a 2 byte varint to represent the First ACK Block and another 2 byte
>> > varint to represent the Gap.
>> >
>> > With your proposal:
>> >
>> > The values 0 and 1 are invalid, since the length fields itself is included
>> > in the length.
>> > The values 2, 3, ..., (2 + len(LargestAcknowledged) + len(AckDelay)) - 1 are
>> > invalid, since the length needs to include the Largest Acknowledged and the
>> > Ack Delay.
>> > The value 2 + len(LargestAcknowledged) + len(AckDelay) would be the first
>> > valid value, and correspond to an ACK frame with no blocks.
>> > The value 2 + len(LargestAcknowledged) + len(AckDelay) + 1 is invalid, since
>> > it would cut the varint for the First ACK Block
>> > The value 2 + len(LargestAcknowledged) + len(AckDelay) + 2 is invalid, since
>> > it would cut the frame after the First ACK Block (but every block must be
>> > followed by a gap length)
>> > The value 2 + len(LargestAcknowledged) + len(AckDelay) + 3 is invalid, since
>> > it would cut the varint for the Gap
>> > Finally, the value 2 + len(LargestAcknowledged) + len(AckDelay) + 4 is valid
>> >
>> > There are *a lot* of invalid values that you can encode into the ACK length
>> > field. More importantly, *none* of these error cases exists with the current
>> > frame format.
>> >
>> > The *only* error case that can occur with our current format is that the
>> > packet is too short for the number of ACK blocks that are supposed to
>> > contained in the frame. This can occur with your proposal as well (in
>> > addition to the error cases listed above).
>> >
>> >
>> >
>> > My concern is not that it's impossible or even particularly hard to catch
>> > these errors, but I dislike the property that some (in fact, most) encodable
>> > values are invalid.
>> >
>> >
>> >
>> > Best,
>> >
>> > Marten
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jun 20, 2018 at 3:21 AM Deval, Manasi <manasi.deval@intel.com>
>> > wrote:
>> >
>> > Hi Ian,
>> >
>> >
>> >
>> > Here is another attempt to solve the objections you raised:
>> >
>> >
>> >
>> >>I'm not a fan of this proposal, because I think it is impractical to drop
>> >> the number of ack blocks, because with the ECN proposal it becomes
>> >> impractically complex to parse.
>> >
>> > Is there a reason the proposal from Christian does not solve this problem?
>> >
>> >
>> >
>> >>If we don't remove the number of ack blocks, then the ack frame is larger,
>> >> but I don't think the extra size field is useful for most implementations.
>> >> Also, it means the length can disagree with the actual length, which add
>> >> complexity and the possibility of writing error-prone code.  The idea of
>> >> someone offloading ack processing and then proceeding to trust the length
>> >> seems like someone could get wrong and cause some concerning issues.
>> >
>> > The risk of disagreement between ack blocks and ack block count is same as
>> > the risk of disagreement between ack blocks and ack length. Either way this
>> > needs to be counted up while creating the ACK and counted down while parsing
>> > it. The possibility of error is the same. Getting the ack block count wrong
>> > is as problematic as getting the ack length wrong. Do you agree?
>> >
>> >
>> >
>> >>My experience is multithreaded packet processing is more cost and work than
>> >> it's worth.  Sure you can't fill a 100G NIC with one connection, but that
>> >> seems like an academic problem, not one for workloads I've seen.  Typically
>> >> the extra cost of multithreading outweighs its value.
>> >
>> > The value is two fold – pre-processing and multi-threading. If we
>> > pre-process the received packets such that ACKs and streams can be
>> > coalesced, the receive side can indicate a large chunk of information though
>> > the kernel, reducing the cost of system call and protocol overhead. This is
>> > the same concept as UDP segmentation taken a step further on receive side.
>> > After this chunk is indicated into the QUIC protocol, the protocol may
>> > process stream and ACK in parallel. While folks may or may not utilize this,
>> > there is an advantage here.
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Manasi
>> >
>> >
>> >
>> >
>> >
>> > From: Ian Swett [mailto:ianswett@google.com]
>> > Sent: Tuesday, June 19, 2018 11:26 AM
>> > To: Deval, Manasi <manasi..deval@intel.com>
>> > Cc: Marten Seemann <martenseemann@gmail.com>om>; Kazuho Oku
>> > <kazuhooku@gmail.com>om>; Eric Rescorla <ekr@rtfm.com>om>; Jana Iyengar
>> > <jri.ietf@gmail.com>om>; Praveen Balasubramanian <pravb@microsoft.com>om>; IETF
>> > QUIC WG <quic@ietf.org>rg>; Martin Thomson <martin.thomson@gmail.com>
>> >
>> >
>> > Subject: Re: Proposal to replace ACK block count with ACK length
>> >
>> >
>> >
>> > I'm still not interested in this change, for the reasons I stated above.
>> >
>> >
>> >
>> > On Tue, Jun 19, 2018 at 2:21 PM Deval, Manasi <manasi.deval@intel.com>
>> > wrote:
>> >
>> > Hi All,
>> >
>> >
>> >
>> > Do we have agreement here to create a new PR?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Manasi
>> >
>> >
>> >
>> > From: Deval, Manasi
>> > Sent: Sunday, June 17, 2018 2:25 PM
>> > To: Marten Seemann <martenseemann@gmail.com>om>; Kazuho Oku
>> > <kazuhooku@gmail.com>
>> > Cc: Ian Swett <ianswett=40google.com@dmarc.ietf.org>rg>; Eric Rescorla
>> > <ekr@rtfm.com>om>; Jana Iyengar <jri.ietf@gmail.com>om>; Praveen Balasubramanian
>> > <pravb@microsoft.com>om>; IETF QUIC WG <quic@ietf.org>rg>; Martin Thomson
>> > <martin.thomson@gmail.com>
>> > Subject: RE: Proposal to replace ACK block count with ACK length
>> >
>> >
>> >
>> > Hi All,
>> >
>> >
>> >
>> > I have made a list of objections to the proposal and the solutions to those
>> > objections discussed on this thread.
>> >
>> >
>> >
>> > a.      Co-existence of length field with ECN field and ACK blocks.
>> >
>> >
>> >
>> > Christian suggested to move the ECN fields to precede the ACK blocks. This
>> > is an elegant solution. Parsing entire list of ACK blocks to review ECN bits
>> > would have been annoying, even though it can work.
>> >
>> >
>> >
>> > b.      There are two cases to be parsed – entire ACK and parse ACK to
>> > identify length. There are some reservations when ACK parsing gets harder
>> > for the case where the entire header needs to be parsed.
>> >
>> >
>> >
>> > Agreement from several folks here. In the original ACK defined in draft 12
>> > of the slide, one would count down number of ACK blocks to get to the end of
>> > the packet. In the proposal I made, one would count down the length to
>> > identify the end of the packet. The logic is very similar in cycle count and
>> > complexity. Several folks also commented to this effect.
>> >
>> >
>> >
>> > c.      Multi-threaded packet processing
>> >
>> >
>> >
>> > I would expect that there are 10s of 1000s of connections in use at any time
>> > for a server with a high speed link. Multi-threading to handle each of these
>> > flows / connections in parallel is necessity to be able to support large
>> > number of connections on a high speed link. Tx segmentation, Rx coalescing
>> > are well known strategies to reduce the processing cost. In initial stages,
>> > code is often written as a single-threaded and then re-factored to
>> > parallelize cycle intensive operations. In order to allow this protocol to
>> > scale in future, I would suggest we do not preclude this case.
>> >
>> >
>> >
>> > d.      Increase in ACK size by 1 byte.
>> >
>> >
>> >
>> > I do not see this as a serious issue but if folks but we can consider making
>> > this a varint, if others have strong feelings about it. It’s a trade-off : 2
>> > reads to save 1 byte.
>> >
>> >
>> >
>> > e.      Every encodable value should be valid
>> >
>> > Not every length will be valid. This is inherent to lengths. This same issue
>> > ails the ‘payload length’ in QUIC header. Not only does the issue exist for
>> > small values, it also applies to large values since data stream will be sent
>> > after crypto negotiation.  E.g.  - how does one craft a payload with 62 bit
>> > payload length in a large header?
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Manasi
>> >
>> >
>> >
>> >
>> >
>> > From: Marten Seemann [mailto:martenseemann@gmail.com]
>> > Sent: Sunday, June 17, 2018 6:59 AM
>> > To: Kazuho Oku <kazuhooku@gmail.com>
>> > Cc: Ian Swett <ianswett=40google.com@dmarc.ietf.org>rg>; Eric Rescorla
>> > <ekr@rtfm.com>om>; Jana Iyengar <jri.ietf@gmail.com>om>; Praveen Balasubramanian
>> > <pravb@microsoft.com>om>; IETF QUIC WG <quic@ietf.org>rg>; Martin Thomson
>> > <martin.thomson@gmail.com>om>; Deval, Manasi <manasi.deval@intel.com>
>> > Subject: Re: Proposal to replace ACK block count with ACK length
>> >
>> >
>> >
>> > Maybe it's specific to Go, but I'm using a single io.Reader for the whole
>> > packet, so as long as the packet payload is long enough, the varint parsing
>> > will not fail.
>> >
>> > I don't think that specifics of programming languages matter here though,
>> > and I'm sure both frame formats can be reasonably implemented in C as well
>> > as in Go. The reasons I'm opposed to Manasi's proposal are that it moves us
>> > away from the principle that only reasonable values should be encodable, and
>> > that it increases the size of the ACK frame, for the questionable benefit of
>> > being able to parallelise the frame parser.
>> >
>> >
>> >
>> > On Sun, Jun 17, 2018 at 8:48 PM Kazuho Oku <kazuhooku@gmail.com> wrote:
>> >
>> > 2018-06-17 22:36 GMT+09:00 Marten Seemann <martenseemann@gmail.com>om>:
>> >> At least for my implementation, parsing doesn't become easier, it becomes
>> >> more complex with this proposal. My varint-parser always consumes as many
>> >> bytes as the varint requires, so after parsing a varint, I'd have to
>> >> introduce an additional check that this didn't overflow the ACK length
>> >> (e.g.
>> >> consider that I parsed the ACK frame so far that only 2 bytes are
>> >> remaining
>> >> according to ACK length field, but the next varint is 4 bytes long).
>> >
>> > Isn't your varint parser checking that it has not (or will not) run
>> > across the end of the packet payload for every ACK block it parses?
>> > I'd assume that you would be doing that, because I think that is
>> > necessary to avoid buffer overrun.
>> >
>> > What I am saying that that check could be converted to a overrun check
>> > against the end of the "frame payload", and that checking the
>> > remaining block count becomes unnecessary, in case we replace ACK
>> > Block Count with ACK Frame Length.
>> >
>> >>
>> >> In general, we've been moving the wire image towards making every
>> >> encodable
>> >> value valid. This proposal moves us away from that principle:
>> >> * some small values are always invalid (the length can never be between 0
>> >> and 3)
>> >> * a lot of intermediate values are invalid (if the boundary falls inside a
>> >> varint, as described above)
>> >> Both these cases can't occur with the current ACK frame format..
>> >>
>> >> On Sun, Jun 17, 2018 at 7:54 PM Kazuho Oku <kazuhooku@gmail.com> wrote:
>> >>>
>> >>> 2018-06-17 8:34 GMT+09:00 Ian Swett
>> >>> <ianswett=40google.com@dmarc.ietf.org>rg>:
>> >>> > I'm not a fan of this proposal, because I think it is impractical to
>> >>> > drop
>> >>> > the number of ack blocks, because with the ECN proposal it becomes
>> >>> > impractically complex to parse.
>> >>>
>> >>> For the ECN proposal, as Christian has suggested, we can move the ECN
>> >>> counters before the ACK blocks. Then, it would not be complex to
>> >>> parse.
>> >>>
>> >>> And my view is that parsing becomes easier if we replace ACK Block
>> >>> Count with ACK Frame Length.
>> >>>
>> >>> Now, with ACK Block Count, we need to check the remaining number of
>> >>> blocks and the remaining space in the packet payload for every block
>> >>> that we parse. Failing to check either leads to a bug or a security
>> >>> issue.
>> >>>
>> >>> If we switch to ACK Frame Length, we need to only check the remaining
>> >>> space in the frame.
>> >>>
>> >>> I think that this is the biggest benefit of replacing ACK Block Count
>> >>> with ACK Frame Length. OTOH the downside is that you need extra one to
>> >>> two bits (one if the size of block / gap is expected to be below 65,
>> >>> two if they are expected to be above that) for encoding ACK Frame
>> >>> Length compared to ACK Block Count.
>> >>>
>> >>>
>> >>>
>> >>> Having said that, I honestly wonder if all the frames could have it's
>> >>> length being encoded (either explicitly or either as a signal that
>> >>> says "to the end of the packet"). Consider something like below:
>> >>>
>> >>> |0| frame-type (7) | frame-payload-length (i) | frame-payload (*) |
>> >>>  or
>> >>> |1| frame-type (7) | frame-payload (*) |
>> >>>
>> >>> When MSB of the first octet set to zero, the length of the frame
>> >>> payload is designated by the varint that immediately follows the frame
>> >>> type.
>> >>> When MSB of the first octet set to one, the length of the frame
>> >>> payload spans to the end of the packet.
>> >>>
>> >>> In this encoding, we can always omit the Length field of a STREAM
>> >>> frame. So the overhead for carrying stream data will be indifferent in
>> >>> practice.
>> >>>
>> >>> For the ACK frame, we can omit the ACK Block Count field. And the
>> >>> overhead will be one to two bits if the ACK frame is sent in the
>> >>> middle of the packet (thereby using the encoding with explicit frame
>> >>> payload length), or one octet or more shorter if ACK is the last frame
>> >>> of the packet.
>> >>>
>> >>> We are likely to see increase of overhead for most of the other types
>> >>> of frames, but I do not think that would be an issue considering that
>> >>> they will be far seldom seen compared to STREAMs and ACKs.
>> >>>
>> >>> To summarize, my anticipation is that we can make all the frames
>> >>> self-contained (i.e. the length can be determined without the
>> >>> knowledge of how each frame is encoded) without any overhead, if we
>> >>> agree on making the frame type space 1 bit smaller.
>> >>>
>> >>> Finally, the biggest benefit of using a self-contained encoding of
>> >>> frames is that we would have the ability to introduce new optional
>> >>> frames without negotiation. By making the frames self-contained, QUIC
>> >>> endpoints will have the freedom of ignoring the frames that they do
>> >>> not understand.
>> >>>
>> >>> Being able to send QUIC frames defined in extensions without
>> >>> negotiating using Transport Parameters will be a win in both terms of
>> >>> security (because clients' TP is sent in clear) and flexibility
>> >>> (because we will be possible to send the extensions before we figure
>> >>> out whether the peer supports that extension).
>> >>>
>> >>> > If we don't remove the number of ack blocks, then the ack frame is
>> >>> > larger,
>> >>> > but I don't think the extra size field is useful for most
>> >>> > implementations.
>> >>> > Also, it means the length can disagree with the actual length, which
>> >>> > add
>> >>> > complexity and the possibility of writing error-prone code.  The idea
>> >>> > of
>> >>> > someone offloading ack processing and then proceeding to trust the
>> >>> > length
>> >>> > seems like someone could get wrong and cause some concerning issues.
>> >>> >
>> >>> > My experience is multithreaded packet processing is more cost and work
>> >>> > than
>> >>> > it's worth.  Sure you can't fill a 100G NIC with one connection, but
>> >>> > that
>> >>> > seems like an academic problem, not one for workloads I've seen.
>> >>> > Typically
>> >>> > the extra cost of multithreading outweighs its value.
>> >>> >
>> >>> > To be clear, I don't think this is an awful idea, but I also don't see
>> >>> > the
>> >>> > value and it adds complexity.  I read Manasi's email, but I don't think
>> >>> > I
>> >>> > understand why any of those matter in practice.
>> >>> >
>> >>> > On Sat, Jun 16, 2018 at 4:13 PM Eric Rescorla <ekr@rtfm.com> wrote:
>> >>> >>
>> >>> >> On Fri, Jun 15, 2018 at 6:46 PM, Marten Seemann
>> >>> >> <martenseemann@gmail.com>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> This proposal increases the size of the ACK frame by 1 byte in the
>> >>> >>> common
>> >>> >>> case (less than 63 ACK ranges), since the ACK length field here
>> >>> >>> always
>> >>> >>> consumes 2 bytes, whereas the ACK Block Count is a variable-length
>> >>> >>> integer.
>> >>> >>> Considering how much work we put into minimising the size of the
>> >>> >>> frames,
>> >>> >>> this feels like a step in the wrong direction..
>> >>> >>>
>> >>> >>> Regarding the processing cost, I agree with Dmitri. Handling an ACK
>> >>> >>> frame
>> >>> >>> requires looping over and making changes to a data structure that
>> >>> >>> keeps
>> >>> >>> track of sent packets. This is much more expensive than simply
>> >>> >>> parsing
>> >>> >>> a
>> >>> >>> bunch of varints in the ACK frame. It seems unlikely that a
>> >>> >>> multi-threaded
>> >>> >>> packet parser would offer any real-world performance benefits.
>> >>> >>
>> >>> >>
>> >>> >> I don't want to overstate the benefit here, but my point isn't that
>> >>> >> parsing is expensive but that if you want to have a multithreaded
>> >>> >> packet
>> >>> >> processing system, then it's nice to have a simpler data structure
>> >>> >> (the
>> >>> >> unparsed ACK block) to hand to the ACK processing thread.
>> >>> >>
>> >>> >> -Ekr
>> >>> >>
>> >>> >>
>> >>> >>>
>> >>> >>> On Sat, Jun 16, 2018 at 6:19 AM Martin Thomson
>> >>> >>> <martin.thomson@gmail.com>
>> >>> >>> wrote:
>> >>> >>>>
>> >>> >>>> When we discussed this before, some people observed that this
>> >>> >>>> creates
>> >>> >>>> a need to encode in two passes.  That's the trade-off here.  (Not
>> >>> >>>> expressing an opinion.)
>> >>> >>>> On Fri, Jun 15, 2018 at 3:51 PM Jana Iyengar <jri.ietf@gmail.com>
>> >>> >>>> wrote:
>> >>> >>>> >
>> >>> >>>> > I don't have a strong opinion on this. I'm certainly not opposed
>> >>> >>>> > to
>> >>> >>>> > it.
>> >>> >>>> > Does anyone have a strong opposition?
>> >>> >>>> >
>> >>> >>>> > On Fri, Jun 15, 2018 at 3:10 PM Praveen Balasubramanian
>> >>> >>>> > <pravb@microsoft.com> wrote:
>> >>> >>>> >>
>> >>> >>>> >> I agree as well since this can help reduce per packet processing
>> >>> >>>> >> overhead. ACKs are going to be the second most common frame type
>> >>> >>>> >> so no
>> >>> >>>> >> objections to special casing.
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> From: QUIC [mailto:quic-bounces@ietf.org] On Behalf Of Eric
>> >>> >>>> >> Rescorla
>> >>> >>>> >> Sent: Friday, June 15, 2018 9:11 AM
>> >>> >>>> >> To: Deval, Manasi <manasi.deval@intel.com>
>> >>> >>>> >> Cc: Jana Iyengar <jri.ietf@gmail.com>om>; QUIC WG <quic@ietf.org>
>> >>> >>>> >> Subject: Re: Proposal to replace ACK block count with ACK length
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> I agree with Manasi here. This change would allow ack frame
>> >>> >>>> >> parsing
>> >>> >>>> >> to be more self-contained, which is an advantage for the parser
>> >>> >>>> >> and also
>> >>> >>>> >> potentially for parallelism (because you can quickly find the
>> >>> >>>> >> frame and then
>> >>> >>>> >> process it in parallel).
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> -Ekr
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> On Mon, Jun 11, 2018 at 5:22 PM, Deval, Manasi
>> >>> >>>> >> <manasi.deval@intel.com> wrote:
>> >>> >>>> >>
>> >>> >>>> >> In general, varints require some specific logic for parsing. To
>> >>> >>>> >> skip
>> >>> >>>> >> over any header, I have to read every single varint. As the code
>> >>> >>>> >> sees Stream
>> >>> >>>> >> and ACK headers most frequently, that is my focus.  The Stream
>> >>> >>>> >> frame has a
>> >>> >>>> >> length in its third field.
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> ACK parsing, however, needs 6 + 2*num_blocks reads to identify
>> >>> >>>> >> length. There are two reads each for ‘largest acknowledged’, ‘ACK
>> >>> >>>> >> delay’ and
>> >>> >>>> >> ‘ACK block count’. The pain point is the total number of cycles
>> >>> >>>> >> parse an
>> >>> >>>> >> ACK. If I am processing 10M pps, where 10% - 30% of the packets
>> >>> >>>> >> have a
>> >>> >>>> >> piggybacked ACK, these cycles becomes a significant bottleneck.
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> Thanks,
>> >>> >>>> >>
>> >>> >>>> >> Manasi
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> From: QUIC [mailto:quic-bounces@ietf.org] On Behalf Of Jana
>> >>> >>>> >> Iyengar
>> >>> >>>> >> Sent: Monday, June 11, 2018 3:11 PM
>> >>> >>>> >> To: Deval, Manasi <manasi.deval@intel.com>om>; QUIC WG
>> >>> >>>> >> <quic@ietf.org>
>> >>> >>>> >> Subject: Re: Proposal to replace ACK block count with ACK length
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> You're right that we no longer have the ability to skip an ACK
>> >>> >>>> >> frame,
>> >>> >>>> >> and this crept in when we moved to varints.
>> >>> >>>> >>
>> >>> >>>> >> I believe your problem though is generally true of most frames
>> >>> >>>> >> not
>> >>> >>>> >> just ACKs, since ids, packet numbers, and numbers in all frames
>> >>> >>>> >> are now all
>> >>> >>>> >> varints. To skip any frame, you'll need to parse the varint
>> >>> >>>> >> fields
>> >>> >>>> >> in those
>> >>> >>>> >> frames. If you have logic to process and skip varints, then
>> >>> >>>> >> skipping the ack
>> >>> >>>> >> block section is merely repeating this operation (2*num_block+1)
>> >>> >>>> >> times. Do
>> >>> >>>> >> you see specific value in skipping ACK frames over the other
>> >>> >>>> >> control frames?
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> On Mon, Jun 11, 2018 at 8:43 AM Dmitri Tikhonov
>> >>> >>>> >> <dtikhonov@litespeedtech..com> wrote:
>> >>> >>>> >>
>> >>> >>>> >> On Mon, Jun 11, 2018 at 03:33:35PM +0000, Deval, Manasi wrote:
>> >>> >>>> >> > -        Moving the ACK length to the front of the ACK allows
>> >>> >>>> >> > the
>> >>> >>>> >> >          flexibility of either reading the entire ACK or
>> >>> >>>> >> > reading
>> >>> >>>> >> > the
>> >>> >>>> >> >          first 16 bits and skipping over the length. This is a
>> >>> >>>> >> > useful
>> >>> >>>> >> >          feature for the case where ACK processing is split
>> >>> >>>> >> > into
>> >>> >>>> >> >          multiple layers. Depending on the processor this is
>> >>> >>>> >> > run
>> >>> >>>> >> > on,
>> >>> >>>> >> >          there are different advantages -
>> >>> >>>> >>
>> >>> >>>> >> Just a note.  In my experience, the cost of parsing an ACK frame
>> >>> >>>> >> is
>> >>> >>>> >> negligible compared to the cost of processing an ACK frame: that
>> >>> >>>> >> is,
>> >>> >>>> >> poking at various memory locations to discard newly ACKed
>> >>> >>>> >> packets.
>> >>> >>>> >>
>> >>> >>>> >>   - Dmitri.
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>>
>> >>> >>
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Kazuho Oku
>> >
>> >
>> >
>> > --
>> > Kazuho Oku
>> 
>> 
>> 
>> --
>> Kazuho Oku