Re: Proposal - Reduce HTTP2 frame length from 16 to 12 bits

Roberto Peon <grmocg@gmail.com> Wed, 29 May 2013 01:13 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9BF8521F8BC0 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 28 May 2013 18:13:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.388
X-Spam-Level:
X-Spam-Status: No, score=-10.388 tagged_above=-999 required=5 tests=[AWL=-0.090, BAYES_00=-2.599, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RdY93QtLSI6F for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 28 May 2013 18:13:52 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 6834221F842B for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 28 May 2013 18:13:52 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UhUxM-0000A1-Rx for ietf-http-wg-dist@listhub.w3.org; Wed, 29 May 2013 01:13:36 +0000
Resent-Date: Wed, 29 May 2013 01:13:36 +0000
Resent-Message-Id: <E1UhUxM-0000A1-Rx@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1UhUxA-00007h-E0 for ietf-http-wg@listhub.w3.org; Wed, 29 May 2013 01:13:24 +0000
Received: from mail-oa0-f49.google.com ([209.85.219.49]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1UhUx8-0003mS-U4 for ietf-http-wg@w3.org; Wed, 29 May 2013 01:13:24 +0000
Received: by mail-oa0-f49.google.com with SMTP id k14so10659610oag.8 for <ietf-http-wg@w3.org>; Tue, 28 May 2013 18:12:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=7QhfGTPx570npH9KGf6h3D4tVRlNxfRB1O0jvj17TXE=; b=xojxBwbXsNGA0gR0edXIx8Px5bU1xeh8DWQIN/TqGpMIEeOkUeY0qC+qAjsinET+wA WSMO6tJ2MR/QCP/nuQVh5a7sJxkzskVoaJfpvmxw6W9xpMFiLwijK3mlwctmw4lxp83x xANTaARdUvSRIsJKIBCc23jJ62r/JMIEigsDmh6ZlVkIXpAQeE/D7fMIQhvg3DxHj27q YQWG7r3QoY9AU+fdDWue4tym3FN5xeim0sIawnUcMYnZsppmvLH/2SpBM1vo0O934O/7 ZqojjX4bfJQ0vU/WCjrJoRkO3V/Vm64HbgLq3pHlx5YT4aknAsIotRE/52nK20Y1OdUw cKrA==
MIME-Version: 1.0
X-Received: by 10.60.155.177 with SMTP id vx17mr296849oeb.9.1369789977105; Tue, 28 May 2013 18:12:57 -0700 (PDT)
Received: by 10.76.169.68 with HTTP; Tue, 28 May 2013 18:12:57 -0700 (PDT)
In-Reply-To: <CAA4WUYhOnocH7nxX=ZmzH8jyygF_JAaYzTezCWFXP1XdTUEgKg@mail.gmail.com>
References: <CAOdDvNoAjiRSBv9ue6RgCQJ4wMNQcKBH2a8zVa4_96wbp=g8MA@mail.gmail.com> <CABP7Rbefh0HxT7Pui_F8viNvu8232O3Qt=VaR6SgsL1DQarVSA@mail.gmail.com> <CAA4WUYgKsDudsSAywWSwz5KVsEV5iUREqjmYVB5sWuc+11ujOQ@mail.gmail.com> <CAP+FsNdejY=K4fp6jMh1AzSkMpdxWNd+cCnaF6uw2GPfMVtjAA@mail.gmail.com> <CABP7Rbf6Ls8pBf9Rons9hgLeXjnm-yk6t6kebk1EXcS3bTdf_Q@mail.gmail.com> <CAA4WUYjGk5EYeP9pP=TDWdGGyq5PjwHcDc+qD1mBGuSAt9yvng@mail.gmail.com> <CAP+FsNez763nkt5EPo8Wf496gH-+hY_V1NRuT5TDuM+697L6_g@mail.gmail.com> <CAA4WUYhOnocH7nxX=ZmzH8jyygF_JAaYzTezCWFXP1XdTUEgKg@mail.gmail.com>
Date: Tue, 28 May 2013 18:12:57 -0700
Message-ID: <CAP+FsNfidCwu9x8Ru2k8ws15pQn-CMGS8dJCa4ELB5kK=BLg-Q@mail.gmail.com>
From: Roberto Peon <grmocg@gmail.com>
To: "William Chan (陈智昌)" <willchan@chromium.org>
Cc: James M Snell <jasnell@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>, Patrick McManus <mcmanus@ducksong.com>
Content-Type: multipart/alternative; boundary="089e010d812400990704ddd1157a"
Received-SPF: pass client-ip=209.85.219.49; envelope-from=grmocg@gmail.com; helo=mail-oa0-f49.google.com
X-W3C-Hub-Spam-Status: No, score=-3.5
X-W3C-Hub-Spam-Report: AWL=-2.677, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1UhUx8-0003mS-U4 4d4afdc8204e92a4e931d339d2f4a421
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Proposal - Reduce HTTP2 frame length from 16 to 12 bits
Archived-At: <http://www.w3.org/mid/CAP+FsNfidCwu9x8Ru2k8ws15pQn-CMGS8dJCa4ELB5kK=BLg-Q@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/18136
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

I understand what you're saying. We're all agreeing that the max frame size
should be limited. I'm simply suggesting 16k is what I think is the best
small frame size. Patrick is suggesting 4k.
I wouldn't object strenuously to 4k, but my experience and gut both say 16k
is the better choice.

-=R


On Tue, May 28, 2013 at 4:25 PM, William Chan (陈智昌)
<willchan@chromium.org>wrote:

> Just to be clear, I don't feel too strongly here. I do want to address a
> point as I feel my previous point was lost.
>
>
> On Tue, May 28, 2013 at 1:12 PM, Roberto Peon <grmocg@gmail.com> wrote:
>
>> responses inline
>>
>>
>> On Tue, May 28, 2013 at 12:16 PM, William Chan (陈智昌) <
>> willchan@chromium.org> wrote:
>>
>>> On Tue, May 28, 2013 at 11:50 AM, James M Snell <jasnell@gmail.com>wrote:
>>>
>>>> On Tue, May 28, 2013 at 11:41 AM, Roberto Peon <grmocg@gmail.com>
>>>> wrote:
>>>> > As a reverse proxy, I've seen properties for which 4k writes/reads
>>>> were too
>>>> > small and induced latency increases.
>>>> >
>>>>
>>>> I haven't played with this part too much yet but this is my general
>>>> suspicion also.
>>>>
>>>
>>> Can you guys clarify this in more detail? Specifically, where the
>>> latency comes from. I have ideas, but I'd rather than an authoritative
>>> explanation.
>>>
>>
>> It always comes down to the cost of the context switches (i.e. syscalls)
>> and the locking that must be done in the lower layers of the IO stack.
>>
>
> Thanks for the clarification, I suspected it was the write()/read() cost,
> which I assume is what you mean by syscall.
>
>
>>
>>
>>>
>>>>
>>>> > Admittedly, frame size doesn't have to be the same as read/write
>>>> size, but
>>>> > it certainly does encourage that implementation (which is, I think,
>>>> the
>>>> > point of smaller max frame size that you proposed).
>>>>
>>>
>>> You're right that it does encourage that implementation. Just like a
>>> larger length encourages just naively breaking up frames into that max
>>> frame size and thus hurt responsiveness. Which one is likelier to cause
>>> worse overall "performance" (I know this is vague, since people care about
>>> different aspects of perf)? What we want to do is have the most reasonable
>>> default behavior, with the ability for performant implementations to tune
>>> without unreasonable difficulty. I believe we're mostly focusing here on
>>> optimizing the naive implementations, not the highly tuned implementations.
>>>
>>
>> Remember that I'm the one who proposed the smaller max frame size in the
>> first place (now a fair while ago)? :)
>>
>
> I don't believe I've said anything that would imply I forgot that :)
>
>
>> My sweet-spot number was 16k, as I knew that I could saturate a 10G nic
>> with 16k frames/writes and have enough CPU left over to do some actual
>> work. The amount of overhead goes up more than linearly with the decrease
>> in frame size thanks to contention, etc.
>>
>
> I think you miss my point. Please correct me if I'm wrong, but I think
> you're saying that for your server, 16k was the right choice for write()s.
> write() sizes don't need to be tied to actual frame size, but of course
> that's what a naive implementation would do. And again, I think we should
> pick a max frame size that results in reasonable behavior for naive
> implementations/deployments. And I think the highly performant
> implementations will want to write their code in a way that decouples frame
> size from write() size, and will pick the optimal write() size given the
> tradeoffs.
>
>
>>
>>
>>>
>>>
>>>> >
>>>> > I propose we keep the 16 bit frame size and instead allow the (now
>>>> > negotiated setting of) max frame size to default to 12 bits worth,
>>>> with that
>>>> > going upwards out downwards when a settings frame arrives from the
>>>> other
>>>> > side indicating it's max receive size. HK
>>>> >
>>>>
>>>> Honestly, I'd prefer to do away with frame size negotiation altogether
>>>> because of the potential for path mtu style issues. Keeping the 16-bit
>>>> size for now with strong encouragement (SHOULD, perhaps?) for keeping
>>>> sizes around 12-bit lengths for the most common cases  seems like the
>>>> right approach.
>>>>
>>>> -- James
>>>>
>>>
>> Unlike TCP/IP, max frame size is a point-to-point thing, as the primitive
>> we mux is streams, not frames. Frames are the way we accomplish the muxing.
>> Why would there be any path MTU like thing?
>>
>> -=R
>>
>>
>>>
>>>> > This would give the best chance that the code would be written in
>>>> such a way
>>>> > as to adapt with the times as they change.
>>>> > -=R
>>>> >
>>>> > On May 28, 2013 10:01 AM, "William Chan (陈智昌)" <willchan@chromium.org
>>>> >
>>>> > wrote:
>>>> >>
>>>> >> Can you clarify what you mean by a documented performance metric for
>>>> >> non-browser use cases? I don't think Patrick said anything browser
>>>> specific.
>>>> >> He provided some serialization latency numbers and noted that they
>>>> are high
>>>> >> enough to impact responsiveness. And then he provided numbers on
>>>> overhead.
>>>> >>
>>>> >> I, for one, find the responsiveness argument compelling for
>>>> browsers. I'm
>>>> >> not completely sure 0.2% is low enough overhead for everyone, but I
>>>> wouldn't
>>>> >> complain about it. And in absence of complaints, I guess I'd support
>>>> moving
>>>> >> forward with only 12 bits for length.
>>>> >>
>>>> >>
>>>> >> On Tue, May 28, 2013 at 9:22 AM, James M Snell <jasnell@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>> Currently, my only challenge with this is that, so far, we have not
>>>> >>> seen any documented performance metrics for non-browser based uses.
>>>> >>> .That said, I don't really have the time currently to put together a
>>>> >>> comprehensive set of such metrics so it wouldn't be polite of me to
>>>> >>> insist on them ;-) ... perhaps for now we ought to keep the 16-bit
>>>> >>> size but include a recommendation about not exceeding 12-bits, then
>>>> >>> see what more implementation experience does for us.
>>>> >>>
>>>> >>> On Tue, May 28, 2013 at 7:20 AM, Patrick McManus <
>>>> mcmanus@ducksong.com>
>>>> >>> wrote:
>>>> >>> > Hi All,
>>>> >>> >
>>>> >>> > I've been looking at a lot of spdy frames lately, and I've
>>>> noticed what
>>>> >>> > I
>>>> >>> > consider a common implementation problem that I think a good
>>>> http/2
>>>> >>> > spec
>>>> >>> > could help with. I'm commonly seeing frames large enough to
>>>> interfere
>>>> >>> > with
>>>> >>> > effective prioritization. I've seen this from at least 3 different
>>>> >>> > servers.
>>>> >>> >
>>>> >>> > The HTTP/2 draft has a max frame size of 16 bits, which is a huge
>>>> >>> > improvement from spdy's 24. I propose we reduce it further to 12.
>>>> (i.e.
>>>> >>> > 4096
>>>> >>> > bytes).
>>>> >>> >
>>>> >>> > The muxxed approach of multiple streams onto one connection done
>>>> in
>>>> >>> > HTTP/2
>>>> >>> > has great advantages, but the one downside of it is that it
>>>> creates
>>>> >>> > head of
>>>> >>> > line blocking problems between those streams dictated by frame
>>>> >>> > granularity.
>>>> >>> > With small frames this is pretty manageable, with extremely large
>>>> ones
>>>> >>> > we've
>>>> >>> > recreated the same head of line problems that HTTP/1 pipelines
>>>> have.
>>>> >>> > The
>>>> >>> > server needs to  be able to respond quickly to higher priority
>>>> events
>>>> >>> > (including cancellations) and once it has written a frame header
>>>> to the
>>>> >>> > wire
>>>> >>> > it is committed to the entire frame for how ever long it takes to
>>>> >>> > serialize
>>>> >>> > it. IMO the shorter that time, the better.
>>>> >>> >
>>>> >>> > Our spec can help implementations do the right thing here by
>>>> limiting
>>>> >>> > the
>>>> >>> > max frame size to 12 bits.
>>>> >>> >
>>>> >>> > It takes 500msec to serialize 64KB at 1Mbit/sec... 125msec at
>>>> >>> > 4Mbit/sec.
>>>> >>> > Those are some pretty notable task-switch times. Dropping the
>>>> frame to
>>>> >>> > 4096
>>>> >>> > cuts them to 32msec and 8 msec.. that's much more responsive, at
>>>> the
>>>> >>> > cost of
>>>> >>> > 120 extra bytes of transfer (< 1msec at 1Mbit/sec).
>>>> >>> >
>>>> >>> > In general - the smaller the better as long as the overhead
>>>> doesn't get
>>>> >>> > to
>>>> >>> > be too large. At 8 in 4096 (~.2%) I think that's acceptable. Its
>>>> >>> > roughly the
>>>> >>> > same overhead as a VLAN tag.
>>>> >>> >
>>>> >>> > Obviously this makes a continuation bit for control frames
>>>> absolutely
>>>> >>> > mandatory, but I think we're already in that spot with 16 bit
>>>> frame
>>>> >>> > lengths.
>>>> >>> >
>>>> >>> > -Patrick
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>