RE: Proposal to replace ACK block count with ACK length

Nick Banks <nibanks@microsoft.com> Fri, 22 June 2018 02:20 UTC

Return-Path: <nibanks@microsoft.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA005130DD2 for <quic@ietfa.amsl.com>; Thu, 21 Jun 2018 19:20:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.01
X-Spam-Level:
X-Spam-Status: No, score=-2.01 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=microsoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qPt-bKS_24d3 for <quic@ietfa.amsl.com>; Thu, 21 Jun 2018 19:20:24 -0700 (PDT)
Received: from NAM01-BN3-obe.outbound.protection.outlook.com (mail-bn3nam01on0121.outbound.protection.outlook.com [104.47.33.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9F52B130DD5 for <quic@ietf.org>; Thu, 21 Jun 2018 19:20:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FM83svx0IEn/XGHQ/gxAPuM8Piq6W5ca+vCDV7yPStI=; b=KNUX+wwpe0QD5mGTU5yOWyOf5uEBmWfAdBdaL+W6Ak2IWkCheIRREL1aVTWjw1/He8OHoXLa15E2b6N7AEzOo0+NRSxjNdYQxV7XLGrJ8mc/CwU1VQ+o+8ZGuF+UbJq02l2ue2FklZloBZ7i8sQ/v2RfEY5yCOhypNmMkLReAMc=
Received: from DM5PR2101MB0901.namprd21.prod.outlook.com (52.132.132.158) by DM5PR2101MB0807.namprd21.prod.outlook.com (10.167.110.154) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.906.1; Fri, 22 Jun 2018 02:20:19 +0000
Received: from DM5PR2101MB0901.namprd21.prod.outlook.com ([fe80::9cbd:940b:ae62:4031]) by DM5PR2101MB0901.namprd21.prod.outlook.com ([fe80::9cbd:940b:ae62:4031%4]) with mapi id 15.20.0906.013; Fri, 22 Jun 2018 02:20:19 +0000
From: Nick Banks <nibanks@microsoft.com>
To: Ian Swett <ianswett=40google.com@dmarc.ietf.org>, "Deval, Manasi" <manasi.deval@intel.com>
CC: Eric Rescorla <ekr@rtfm.com>, Jana Iyengar <jri.ietf@gmail.com>, Praveen Balasubramanian <pravb@microsoft.com>, Marten Seemann <martenseemann@gmail.com>, IETF QUIC WG <quic@ietf.org>, Martin Thomson <martin.thomson@gmail.com>, Kazuho Oku <kazuhooku@gmail.com>
Subject: RE: Proposal to replace ACK block count with ACK length
Thread-Topic: Proposal to replace ACK block count with ACK length
Thread-Index: AdP8YgwJ2vNnJzEFTfGDVpW0S5D9xAFON0uAAA2Oe4AABJAbAAC4BBWAAAAXlWAADdw6gAABAJmAAAUhF4AAJqV6gAAHDGIAABvs+wAAAXoxAAAAa3CAAABfXoAAD4/6gABeMHaAAAAl2oAABAimgAALQ2CAAF8rsoAABY+CAAAA8bAu
Date: Fri, 22 Jun 2018 02:20:19 +0000
Message-ID: <DM5PR2101MB090196A0B48DB53D84B99C5AB3750@DM5PR2101MB0901.namprd21.prod.outlook.com>
References: <1F436ED13A22A246A59CA374CBC543998B832414@ORSMSX111.amr.corp.intel.com> <20180611154244.GA27622@ubuntu-dmitri> <CACpbDcdxzRxeiN93kKoj__vo2TERm4QZKqaesL=jr4wQUN1gXA@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B833B91@ORSMSX111.amr.corp.intel.com> <CABcZeBOjjRrX+AsXdgcUKpL=ciL8U_U1+WVAhQv-ZjwGxkQxYw@mail.gmail.com> <MWHPR21MB0638068EFA850328793E55F6B67C0@MWHPR21MB0638.namprd21.prod.outlook.com> <CACpbDcdbTKKEh8dcshWM6-7vq2hBFJC1myL1+H6etpMMjth+wg@mail.gmail.com> <CABkgnnV_thWcAi=AdwV+Za5rXywiUvtOYpsNNp1y7=RvL2MvWA@mail.gmail.com> <CAOYVs2qE=Tw_7eax9HwaESaQPMh7k3BSVV112d+pPeSfZ09EjQ@mail.gmail.com> <CABcZeBOCRHAuh44CrMH02UZ3Ar_2sa5M1c3LG_A-RPzXX+H+Yw@mail.gmail.com> <CAKcm_gOeZHR-BGJiqK=zQKqbgq=briQuH+fzHrkUYbhQx3B_sw@mail.gmail.com> <CANatvzyKv8EGVR-Z5WMDKbeuKHP791OynsTqX=+HriKBxFnafA@mail.gmail.com> <CAOYVs2oE6yawW04MVH1ApewSJ+0g9g2oMxCj+CU+butfiAe8kA@mail.gmail.com> <CANatvzxniU0AUEi5tuKzmX45uTUV6-y0JbqcdKTpu1J4WQR7JA@mail.gmail.com> <CAOYVs2p9vJrCVuXqGsR29rOGj=CNt1m7TcavGV9Kwk-9hA4sPQ@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B83AB21@ORSMSX111.amr.corp.intel.com> <1F436ED13A22A246A59CA374CBC543998B83EC27@ORSMSX111.amr.corp.intel.com> <CAKcm_gMV4vXXW5jKwAR-cOT6OYpi6FL-mO9K=0GWL6WULjWNKA@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B83EF15@ORSMSX111.amr.corp.intel.com> <CAOYVs2oynZuE43q1MVO3bBKTPCFg_T3pykS4e5p7DpSaSvmgtQ@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B843873@ORSMSX111.amr.corp.intel.com>, <CAKcm_gMb854+PLf7-PemtC_6oVSfEZLQZi9Zq3FEfx81iwfdZw@mail.gmail.com>
In-Reply-To: <CAKcm_gMb854+PLf7-PemtC_6oVSfEZLQZi9Zq3FEfx81iwfdZw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [66.235.10.1]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; DM5PR2101MB0807; 7:XRd/XYEPt6LawQ8KsmH3hT4AgbSEnpRM5eVnHktCZfN7abPXge6mpmBTz7fcypu81yqTJW+/3aYrSEBuFEwGv8HJTJrvFFBg/CHATEGWjfBOv1qEPvmvFda/6vOA6/09tTAVPZppc/HtaElA8zLsDWV7pJBgzxCRqvsWdxoPIOpHIZN1YaKQhocOt1fED8CULzFffdebeB28Oux9yv+UDFGBLn9MMkv+Q1iJMKfp08dumvTJsVo4lKi5fhPKMVZP
x-ms-exchange-antispam-srfa-diagnostics: SOS;
x-ms-office365-filtering-correlation-id: ddfef7e0-874d-4739-f30a-08d5d7e6b376
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(8989111)(4534165)(4627221)(201703031133081)(201702281549075)(8990101)(5600026)(711020)(48565401081)(2017052603328)(7193020); SRVR:DM5PR2101MB0807;
x-ms-traffictypediagnostic: DM5PR2101MB0807:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=nibanks@microsoft.com;
x-microsoft-antispam-prvs: <DM5PR2101MB08078E22EB8A72E59AB9C1F3B3750@DM5PR2101MB0807.namprd21.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(28532068793085)(158342451672863)(89211679590171)(192374486261705)(85827821059158)(788757137089)(211936372134217)(153496737603132)(228905959029699)(17755550239193);
x-ms-exchange-senderadcheck: 1
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3002001)(3231254)(2018427008)(944501410)(52105095)(6055026)(149027)(150027)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123562045)(20161123558120)(20161123564045)(6072148)(201708071742011)(7699016); SRVR:DM5PR2101MB0807; BCL:0; PCL:0; RULEID:; SRVR:DM5PR2101MB0807;
x-forefront-prvs: 071156160B
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39380400002)(396003)(39860400002)(366004)(346002)(376002)(199004)(189003)(53754006)(51444003)(78114003)(6246003)(25786009)(11346002)(486006)(8990500004)(561944003)(186003)(2900100001)(33656002)(476003)(446003)(229853002)(26005)(6506007)(102836004)(53546011)(59450400001)(5660300001)(76176011)(7736002)(6436002)(10090500001)(53936002)(7696005)(53946003)(74316002)(81166006)(3846002)(8676002)(8936002)(39060400002)(81156014)(6116002)(4326008)(105586002)(66066001)(68736007)(14454004)(478600001)(97736004)(106356001)(86362001)(86612001)(54896002)(3660700001)(99286004)(2906002)(55016002)(5250100002)(236005)(93886005)(316002)(3280700002)(9686003)(22452003)(54906003)(110136005)(10290500003)(579004)(559001); DIR:OUT; SFP:1102; SCL:1; SRVR:DM5PR2101MB0807; H:DM5PR2101MB0901.namprd21.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts)
x-microsoft-antispam-message-info: 65rs4cjr1CLsZxcmOG2b5NDJwzuFHb8i7AJ1eD57bj9i2FyoZh1eOyY6SiYmQymPGIh/qlGsYY6Yyw3eL93GYQBLghsNEW9M1tiAM5DUfNOLLoJV3uFJngzyqU8DdhQbn/nLg1Y8LK6vST5t75BSy53GXiBsYx6TAG1tjWvgmHutS62v8hLUozKYophL8KSWTJ1Et9kbNJltwDxLIOAi2+ziJozUR0bxTQxine4KPNKy1eVyIJOEeuBBsECCYRkJxeGgGzYuQwOYF8TxBNx13NQjRKo2mF8M2H6dI0glsK2esRxB8NpZ9cgA+rBPvYsYzbQF9uJePekVAhKMhUDAXg==
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_DM5PR2101MB090196A0B48DB53D84B99C5AB3750DM5PR2101MB0901_"
MIME-Version: 1.0
X-OriginatorOrg: microsoft.com
X-MS-Exchange-CrossTenant-Network-Message-Id: ddfef7e0-874d-4739-f30a-08d5d7e6b376
X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Jun 2018 02:20:19.3818 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR2101MB0807
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/KfqGIocVyrry52hB5xSItTUv--c>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jun 2018 02:20:32 -0000

I agree that changing this for the sake of multi-threaded usage isn’t worth it, but I do think the change has merit for the goal of improving the efficiency of parsing through all the frames quickly. The hardware can use this to help find the correct frames in packets to coalesce together. The code shouldn’t have to pay the penalty of walking every ack block if it’s not interested in the information there.

Sent from my Windows 10 phone
[HxS - 15254 - 16.0.10228.20075]

________________________________
From: QUIC <quic-bounces@ietf.org> on behalf of Ian Swett <ianswett=40google.com@dmarc.ietf.org>
Sent: Thursday, June 21, 2018 6:48:04 PM
To: Deval, Manasi
Cc: Eric Rescorla; Jana Iyengar; Praveen Balasubramanian; Marten Seemann; IETF QUIC WG; Martin Thomson; Kazuho Oku
Subject: Re: Proposal to replace ACK block count with ACK length

On Thu, Jun 21, 2018 at 7:08 PM Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>> wrote:
I feel that the requirement to have every value valid is somewhat academic. The length value provides much more value than the block count

This is a key point we disagree on.  I see no value in a length field.  I believe it having value is predicated on the idea that multi-core QUIC packet processing is a good idea?  I fear efforts to multithread processing of a single QUIC packet are not only a waste of time, but also a potential source of correctness issues.

In my experience, the CPU cost of sending is much higher than receiving, so I have a really difficult time imagining multi-core receive being worthwhile.

and the fact that certain values can never be achieved is an inherent property of the length.

One interesting observation is that this property is not limited to length. One can even make a similar argument about ACK block count. The maximum number of ACK blocks that can be defined will not always have a meaningful value. In out examples, 0,1,2,3 are all valid. If I set the value of ACK block count to have lower two bits to be 11, the maximum value is of ACK blocks is – 4611686018427387903. This is the same value of largest acknowledged so if the ACK block count was set to this value, it would still be meaningless.

Thanks,
Manasi




From: Marten Seemann [mailto:martenseemann@gmail.com<mailto:martenseemann@gmail.com>]
Sent: Tuesday, June 19, 2018 6:44 PM
To: Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>>
Cc: Ian Swett <ianswett@google.com<mailto:ianswett@google.com>>; Kazuho Oku <kazuhooku@gmail.com<mailto:kazuhooku@gmail.com>>; Eric Rescorla <ekr@rtfm.com<mailto:ekr@rtfm.com>>; Jana Iyengar <jri.ietf@gmail.com<mailto:jri.ietf@gmail.com>>; Praveen Balasubramanian <pravb@microsoft.com<mailto:pravb@microsoft.com>>; IETF QUIC WG <quic@ietf.org<mailto:quic@ietf.org>>; Martin Thomson <martin.thomson@gmail.com<mailto:martin.thomson@gmail.com>>
Subject: Re: Proposal to replace ACK block count with ACK length

Hi Manasi,

> The risk of disagreement between ack blocks and ack block count is same as the risk of disagreement between ack blocks and ack length. Either way this needs to be counted up while creating the ACK and counted down while parsing it. The possibility of error is the same. Getting the ack block count wrong is as problematic as getting the ack length wrong. Do you agree?

I disagree. Let's take an example of an ACK frame with one ACK range, that needs a 2 byte varint to represent the First ACK Block and another 2 byte varint to represent the Gap.
With your proposal:

  1.  The values 0 and 1 are invalid, since the length fields itself is included in the length.
  2.  The values 2, 3, ..., (2 + len(LargestAcknowledged) + len(AckDelay)) - 1 are invalid, since the length needs to include the Largest Acknowledged and the Ack Delay.
  3.  The value 2 + len(LargestAcknowledged) + len(AckDelay) would be the first valid value, and correspond to an ACK frame with no blocks.
  4.  The value 2 + len(LargestAcknowledged) + len(AckDelay) + 1 is invalid, since it would cut the varint for the First ACK Block
  5.  The value 2 + len(LargestAcknowledged) + len(AckDelay) + 2 is invalid, since it would cut the frame after the First ACK Block (but every block must be followed by a gap length)
  6.  The value 2 + len(LargestAcknowledged) + len(AckDelay) + 3 is invalid, since it would cut the varint for the Gap
  7.  Finally, the value 2 + len(LargestAcknowledged) + len(AckDelay) + 4 is valid
There are *a lot* of invalid values that you can encode into the ACK length field. More importantly, *none* of these error cases exists with the current frame format.
The *only* error case that can occur with our current format is that the packet is too short for the number of ACK blocks that are supposed to contained in the frame. This can occur with your proposal as well (in addition to the error cases listed above).

My concern is not that it's impossible or even particularly hard to catch these errors, but I dislike the property that some (in fact, most) encodable values are invalid.

Best,
Marten


On Wed, Jun 20, 2018 at 3:21 AM Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>> wrote:
Hi Ian,

Here is another attempt to solve the objections you raised:

>I'm not a fan of this proposal, because I think it is impractical to drop the number of ack blocks, because with the ECN proposal it becomes impractically complex to parse.
Is there a reason the proposal from Christian does not solve this problem?

>If we don't remove the number of ack blocks, then the ack frame is larger, but I don't think the extra size field is useful for most implementations.  Also, it means the length can disagree with the actual length, which add complexity and the possibility of writing error-prone code.  The idea of someone offloading ack processing and then proceeding to trust the length seems like someone could get wrong and cause some concerning issues.
The risk of disagreement between ack blocks and ack block count is same as the risk of disagreement between ack blocks and ack length. Either way this needs to be counted up while creating the ACK and counted down while parsing it. The possibility of error is the same. Getting the ack block count wrong is as problematic as getting the ack length wrong. Do you agree?

>My experience is multithreaded packet processing is more cost and work than it's worth.  Sure you can't fill a 100G NIC with one connection, but that seems like an academic problem, not one for workloads I've seen.  Typically the extra cost of multithreading outweighs its value.
The value is two fold – pre-processing and multi-threading. If we pre-process the received packets such that ACKs and streams can be coalesced, the receive side can indicate a large chunk of information though the kernel, reducing the cost of system call and protocol overhead. This is the same concept as UDP segmentation taken a step further on receive side. After this chunk is indicated into the QUIC protocol, the protocol may process stream and ACK in parallel. While folks may or may not utilize this, there is an advantage here.

Thanks,
Manasi


From: Ian Swett [mailto:ianswett@google.com<mailto:ianswett@google.com>]
Sent: Tuesday, June 19, 2018 11:26 AM
To: Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>>
Cc: Marten Seemann <martenseemann@gmail.com<mailto:martenseemann@gmail.com>>; Kazuho Oku <kazuhooku@gmail.com<mailto:kazuhooku@gmail.com>>; Eric Rescorla <ekr@rtfm.com<mailto:ekr@rtfm.com>>; Jana Iyengar <jri.ietf@gmail.com<mailto:jri.ietf@gmail.com>>; Praveen Balasubramanian <pravb@microsoft.com<mailto:pravb@microsoft.com>>; IETF QUIC WG <quic@ietf.org<mailto:quic@ietf.org>>; Martin Thomson <martin.thomson@gmail.com<mailto:martin.thomson@gmail.com>>

Subject: Re: Proposal to replace ACK block count with ACK length

I'm still not interested in this change, for the reasons I stated above.

On Tue, Jun 19, 2018 at 2:21 PM Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>> wrote:
Hi All,

Do we have agreement here to create a new PR?

Thanks,
Manasi

From: Deval, Manasi
Sent: Sunday, June 17, 2018 2:25 PM
To: Marten Seemann <martenseemann@gmail.com<mailto:martenseemann@gmail.com>>; Kazuho Oku <kazuhooku@gmail.com<mailto:kazuhooku@gmail.com>>
Cc: Ian Swett <ianswett=40google.com@dmarc.ietf.org<mailto:40google.com@dmarc.ietf.org>>; Eric Rescorla <ekr@rtfm.com<mailto:ekr@rtfm.com>>; Jana Iyengar <jri.ietf@gmail.com<mailto:jri.ietf@gmail.com>>; Praveen Balasubramanian <pravb@microsoft.com<mailto:pravb@microsoft.com>>; IETF QUIC WG <quic@ietf.org<mailto:quic@ietf.org>>; Martin Thomson <martin.thomson@gmail.com<mailto:martin.thomson@gmail.com>>
Subject: RE: Proposal to replace ACK block count with ACK length

Hi All,

I have made a list of objections to the proposal and the solutions to those objections discussed on this thread.


a.      Co-existence of length field with ECN field and ACK blocks.



Christian suggested to move the ECN fields to precede the ACK blocks. This is an elegant solution. Parsing entire list of ACK blocks to review ECN bits would have been annoying, even though it can work.



b.      There are two cases to be parsed – entire ACK and parse ACK to identify length. There are some reservations when ACK parsing gets harder for the case where the entire header needs to be parsed.



Agreement from several folks here. In the original ACK defined in draft 12 of the slide, one would count down number of ACK blocks to get to the end of the packet. In the proposal I made, one would count down the length to identify the end of the packet. The logic is very similar in cycle count and complexity. Several folks also commented to this effect.



c.      Multi-threaded packet processing



I would expect that there are 10s of 1000s of connections in use at any time for a server with a high speed link. Multi-threading to handle each of these flows / connections in parallel is necessity to be able to support large number of connections on a high speed link. Tx segmentation, Rx coalescing are well known strategies to reduce the processing cost. In initial stages, code is often written as a single-threaded and then re-factored to parallelize cycle intensive operations. In order to allow this protocol to scale in future, I would suggest we do not preclude this case.



d.      Increase in ACK size by 1 byte.



I do not see this as a serious issue but if folks but we can consider making this a varint, if others have strong feelings about it. It’s a trade-off : 2 reads to save 1 byte.



e.      Every encodable value should be valid

Not every length will be valid. This is inherent to lengths. This same issue ails the ‘payload length’ in QUIC header. Not only does the issue exist for small values, it also applies to large values since data stream will be sent after crypto negotiation.  E.g.  - how does one craft a payload with 62 bit payload length in a large header?





Thanks,

Manasi



From: Marten Seemann [mailto:martenseemann@gmail.com]
Sent: Sunday, June 17, 2018 6:59 AM
To: Kazuho Oku <kazuhooku@gmail.com<mailto:kazuhooku@gmail.com>>
Cc: Ian Swett <ianswett=40google.com@dmarc.ietf.org<mailto:ianswett=40google.com@dmarc.ietf.org>>; Eric Rescorla <ekr@rtfm.com<mailto:ekr@rtfm.com>>; Jana Iyengar <jri.ietf@gmail.com<mailto:jri.ietf@gmail.com>>; Praveen Balasubramanian <pravb@microsoft.com<mailto:pravb@microsoft.com>>; IETF QUIC WG <quic@ietf.org<mailto:quic@ietf.org>>; Martin Thomson <martin.thomson@gmail.com<mailto:martin.thomson@gmail.com>>; Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>>
Subject: Re: Proposal to replace ACK block count with ACK length

Maybe it's specific to Go, but I'm using a single io.Reader for the whole packet, so as long as the packet payload is long enough, the varint parsing will not fail.
I don't think that specifics of programming languages matter here though, and I'm sure both frame formats can be reasonably implemented in C as well as in Go. The reasons I'm opposed to Manasi's proposal are that it moves us away from the principle that only reasonable values should be encodable, and that it increases the size of the ACK frame, for the questionable benefit of being able to parallelise the frame parser.

On Sun, Jun 17, 2018 at 8:48 PM Kazuho Oku <kazuhooku@gmail.com<mailto:kazuhooku@gmail.com>> wrote:
2018-06-17 22:36 GMT+09:00 Marten Seemann <martenseemann@gmail.com<mailto:martenseemann@gmail.com>>:
> At least for my implementation, parsing doesn't become easier, it becomes
> more complex with this proposal. My varint-parser always consumes as many
> bytes as the varint requires, so after parsing a varint, I'd have to
> introduce an additional check that this didn't overflow the ACK length (e.g.
> consider that I parsed the ACK frame so far that only 2 bytes are remaining
> according to ACK length field, but the next varint is 4 bytes long).

Isn't your varint parser checking that it has not (or will not) run
across the end of the packet payload for every ACK block it parses?
I'd assume that you would be doing that, because I think that is
necessary to avoid buffer overrun.

What I am saying that that check could be converted to a overrun check
against the end of the "frame payload", and that checking the
remaining block count becomes unnecessary, in case we replace ACK
Block Count with ACK Frame Length.

>
> In general, we've been moving the wire image towards making every encodable
> value valid. This proposal moves us away from that principle:
> * some small values are always invalid (the length can never be between 0
> and 3)
> * a lot of intermediate values are invalid (if the boundary falls inside a
> varint, as described above)
> Both these cases can't occur with the current ACK frame format.
>
> On Sun, Jun 17, 2018 at 7:54 PM Kazuho Oku <kazuhooku@gmail.com<mailto:kazuhooku@gmail.com>> wrote:
>>
>> 2018-06-17 8:34 GMT+09:00 Ian Swett
>> <ianswett=40google.com@dmarc.ietf.org<mailto:40google.com@dmarc.ietf.org>>:
>> > I'm not a fan of this proposal, because I think it is impractical to
>> > drop
>> > the number of ack blocks, because with the ECN proposal it becomes
>> > impractically complex to parse.
>>
>> For the ECN proposal, as Christian has suggested, we can move the ECN
>> counters before the ACK blocks. Then, it would not be complex to
>> parse.
>>
>> And my view is that parsing becomes easier if we replace ACK Block
>> Count with ACK Frame Length.
>>
>> Now, with ACK Block Count, we need to check the remaining number of
>> blocks and the remaining space in the packet payload for every block
>> that we parse. Failing to check either leads to a bug or a security
>> issue.
>>
>> If we switch to ACK Frame Length, we need to only check the remaining
>> space in the frame.
>>
>> I think that this is the biggest benefit of replacing ACK Block Count
>> with ACK Frame Length. OTOH the downside is that you need extra one to
>> two bits (one if the size of block / gap is expected to be below 65,
>> two if they are expected to be above that) for encoding ACK Frame
>> Length compared to ACK Block Count.
>>
>>
>>
>> Having said that, I honestly wonder if all the frames could have it's
>> length being encoded (either explicitly or either as a signal that
>> says "to the end of the packet"). Consider something like below:
>>
>> |0| frame-type (7) | frame-payload-length (i) | frame-payload (*) |
>>  or
>> |1| frame-type (7) | frame-payload (*) |
>>
>> When MSB of the first octet set to zero, the length of the frame
>> payload is designated by the varint that immediately follows the frame
>> type.
>> When MSB of the first octet set to one, the length of the frame
>> payload spans to the end of the packet.
>>
>> In this encoding, we can always omit the Length field of a STREAM
>> frame. So the overhead for carrying stream data will be indifferent in
>> practice.
>>
>> For the ACK frame, we can omit the ACK Block Count field. And the
>> overhead will be one to two bits if the ACK frame is sent in the
>> middle of the packet (thereby using the encoding with explicit frame
>> payload length), or one octet or more shorter if ACK is the last frame
>> of the packet.
>>
>> We are likely to see increase of overhead for most of the other types
>> of frames, but I do not think that would be an issue considering that
>> they will be far seldom seen compared to STREAMs and ACKs.
>>
>> To summarize, my anticipation is that we can make all the frames
>> self-contained (i.e. the length can be determined without the
>> knowledge of how each frame is encoded) without any overhead, if we
>> agree on making the frame type space 1 bit smaller.
>>
>> Finally, the biggest benefit of using a self-contained encoding of
>> frames is that we would have the ability to introduce new optional
>> frames without negotiation. By making the frames self-contained, QUIC
>> endpoints will have the freedom of ignoring the frames that they do
>> not understand.
>>
>> Being able to send QUIC frames defined in extensions without
>> negotiating using Transport Parameters will be a win in both terms of
>> security (because clients' TP is sent in clear) and flexibility
>> (because we will be possible to send the extensions before we figure
>> out whether the peer supports that extension).
>>
>> > If we don't remove the number of ack blocks, then the ack frame is
>> > larger,
>> > but I don't think the extra size field is useful for most
>> > implementations.
>> > Also, it means the length can disagree with the actual length, which add
>> > complexity and the possibility of writing error-prone code.  The idea of
>> > someone offloading ack processing and then proceeding to trust the
>> > length
>> > seems like someone could get wrong and cause some concerning issues.
>> >
>> > My experience is multithreaded packet processing is more cost and work
>> > than
>> > it's worth.  Sure you can't fill a 100G NIC with one connection, but
>> > that
>> > seems like an academic problem, not one for workloads I've seen.
>> > Typically
>> > the extra cost of multithreading outweighs its value.
>> >
>> > To be clear, I don't think this is an awful idea, but I also don't see
>> > the
>> > value and it adds complexity.  I read Manasi's email, but I don't think
>> > I
>> > understand why any of those matter in practice.
>> >
>> > On Sat, Jun 16, 2018 at 4:13 PM Eric Rescorla <ekr@rtfm.com<mailto:ekr@rtfm.com>> wrote:
>> >>
>> >> On Fri, Jun 15, 2018 at 6:46 PM, Marten Seemann
>> >> <martenseemann@gmail.com<mailto:martenseemann@gmail.com>>
>> >> wrote:
>> >>>
>> >>> This proposal increases the size of the ACK frame by 1 byte in the
>> >>> common
>> >>> case (less than 63 ACK ranges), since the ACK length field here always
>> >>> consumes 2 bytes, whereas the ACK Block Count is a variable-length
>> >>> integer.
>> >>> Considering how much work we put into minimising the size of the
>> >>> frames,
>> >>> this feels like a step in the wrong direction..
>> >>>
>> >>> Regarding the processing cost, I agree with Dmitri. Handling an ACK
>> >>> frame
>> >>> requires looping over and making changes to a data structure that
>> >>> keeps
>> >>> track of sent packets. This is much more expensive than simply parsing
>> >>> a
>> >>> bunch of varints in the ACK frame. It seems unlikely that a
>> >>> multi-threaded
>> >>> packet parser would offer any real-world performance benefits.
>> >>
>> >>
>> >> I don't want to overstate the benefit here, but my point isn't that
>> >> parsing is expensive but that if you want to have a multithreaded
>> >> packet
>> >> processing system, then it's nice to have a simpler data structure (the
>> >> unparsed ACK block) to hand to the ACK processing thread.
>> >>
>> >> -Ekr
>> >>
>> >>
>> >>>
>> >>> On Sat, Jun 16, 2018 at 6:19 AM Martin Thomson
>> >>> <martin.thomson@gmail.com<mailto:martin.thomson@gmail.com>>
>> >>> wrote:
>> >>>>
>> >>>> When we discussed this before, some people observed that this creates
>> >>>> a need to encode in two passes.  That's the trade-off here.  (Not
>> >>>> expressing an opinion.)
>> >>>> On Fri, Jun 15, 2018 at 3:51 PM Jana Iyengar <jri.ietf@gmail.com<mailto:jri.ietf@gmail.com>>
>> >>>> wrote:
>> >>>> >
>> >>>> > I don't have a strong opinion on this. I'm certainly not opposed to
>> >>>> > it.
>> >>>> > Does anyone have a strong opposition?
>> >>>> >
>> >>>> > On Fri, Jun 15, 2018 at 3:10 PM Praveen Balasubramanian
>> >>>> > <pravb@microsoft.com<mailto:pravb@microsoft.com>> wrote:
>> >>>> >>
>> >>>> >> I agree as well since this can help reduce per packet processing
>> >>>> >> overhead. ACKs are going to be the second most common frame type
>> >>>> >> so no
>> >>>> >> objections to special casing.
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> From: QUIC [mailto:quic-bounces@ietf.org<mailto:quic-bounces@ietf.org>] On Behalf Of Eric
>> >>>> >> Rescorla
>> >>>> >> Sent: Friday, June 15, 2018 9:11 AM
>> >>>> >> To: Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>>
>> >>>> >> Cc: Jana Iyengar <jri.ietf@gmail.com<mailto:jri.ietf@gmail.com>>; QUIC WG <quic@ietf.org<mailto:quic@ietf.org>>
>> >>>> >> Subject: Re: Proposal to replace ACK block count with ACK length
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> I agree with Manasi here. This change would allow ack frame
>> >>>> >> parsing
>> >>>> >> to be more self-contained, which is an advantage for the parser
>> >>>> >> and also
>> >>>> >> potentially for parallelism (because you can quickly find the
>> >>>> >> frame and then
>> >>>> >> process it in parallel).
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> -Ekr
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> On Mon, Jun 11, 2018 at 5:22 PM, Deval, Manasi
>> >>>> >> <manasi.deval@intel.com<mailto:manasi.deval@intel..com>> wrote:
>> >>>> >>
>> >>>> >> In general, varints require some specific logic for parsing. To
>> >>>> >> skip
>> >>>> >> over any header, I have to read every single varint. As the code
>> >>>> >> sees Stream
>> >>>> >> and ACK headers most frequently, that is my focus.  The Stream
>> >>>> >> frame has a
>> >>>> >> length in its third field.
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> ACK parsing, however, needs 6 + 2*num_blocks reads to identify
>> >>>> >> length. There are two reads each for ‘largest acknowledged’, ‘ACK
>> >>>> >> delay’ and
>> >>>> >> ‘ACK block count’. The pain point is the total number of cycles
>> >>>> >> parse an
>> >>>> >> ACK. If I am processing 10M pps, where 10% - 30% of the packets
>> >>>> >> have a
>> >>>> >> piggybacked ACK, these cycles becomes a significant bottleneck.
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> Thanks,
>> >>>> >>
>> >>>> >> Manasi
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> From: QUIC [mailto:quic-bounces@ietf.org<mailto:quic-bounces@ietf.org>] On Behalf Of Jana
>> >>>> >> Iyengar
>> >>>> >> Sent: Monday, June 11, 2018 3:11 PM
>> >>>> >> To: Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>>; QUIC WG
>> >>>> >> <quic@ietf.org<mailto:quic@ietf.org>>
>> >>>> >> Subject: Re: Proposal to replace ACK block count with ACK length
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> You're right that we no longer have the ability to skip an ACK
>> >>>> >> frame,
>> >>>> >> and this crept in when we moved to varints.
>> >>>> >>
>> >>>> >> I believe your problem though is generally true of most frames not
>> >>>> >> just ACKs, since ids, packet numbers, and numbers in all frames
>> >>>> >> are now all
>> >>>> >> varints. To skip any frame, you'll need to parse the varint fields
>> >>>> >> in those
>> >>>> >> frames. If you have logic to process and skip varints, then
>> >>>> >> skipping the ack
>> >>>> >> block section is merely repeating this operation (2*num_block+1)
>> >>>> >> times. Do
>> >>>> >> you see specific value in skipping ACK frames over the other
>> >>>> >> control frames?
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> On Mon, Jun 11, 2018 at 8:43 AM Dmitri Tikhonov
>> >>>> >> <dtikhonov@litespeedtech..com<mailto:dtikhonov@litespeedtech..com>> wrote:
>> >>>> >>
>> >>>> >> On Mon, Jun 11, 2018 at 03:33:35PM +0000, Deval, Manasi wrote:
>> >>>> >> > -        Moving the ACK length to the front of the ACK allows
>> >>>> >> > the
>> >>>> >> >          flexibility of either reading the entire ACK or reading
>> >>>> >> > the
>> >>>> >> >          first 16 bits and skipping over the length. This is a
>> >>>> >> > useful
>> >>>> >> >          feature for the case where ACK processing is split into
>> >>>> >> >          multiple layers. Depending on the processor this is run
>> >>>> >> > on,
>> >>>> >> >          there are different advantages -
>> >>>> >>
>> >>>> >> Just a note.  In my experience, the cost of parsing an ACK frame
>> >>>> >> is
>> >>>> >> negligible compared to the cost of processing an ACK frame: that
>> >>>> >> is,
>> >>>> >> poking at various memory locations to discard newly ACKed packets.
>> >>>> >>
>> >>>> >>   - Dmitri.
>> >>>> >>
>> >>>> >>
>> >>>>
>> >>
>> >
>>
>>
>>
>> --
>> Kazuho Oku



--
Kazuho Oku