Re: [hybi] Frame size

Jamie Lokier <jamie@shareable.org> Mon, 19 April 2010 10:21 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4CF703A6979 for <hybi@core3.amsl.com>; Mon, 19 Apr 2010 03:21:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.044
X-Spam-Level:
X-Spam-Status: No, score=-2.044 tagged_above=-999 required=5 tests=[AWL=-2.045, BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DLB6FH8bogFJ for <hybi@core3.amsl.com>; Mon, 19 Apr 2010 03:21:33 -0700 (PDT)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id 1B8913A6935 for <hybi@ietf.org>; Mon, 19 Apr 2010 03:21:22 -0700 (PDT)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1O3o6K-0008C9-Uj; Mon, 19 Apr 2010 11:21:12 +0100
Date: Mon, 19 Apr 2010 11:21:12 +0100
From: Jamie Lokier <jamie@shareable.org>
To: Mike Belshe <mike@belshe.com>
Message-ID: <20100419102112.GB28758@shareable.org>
References: <8B0A9FCBB9832F43971E38010638454F03E3F313ED@SISPE7MB1.commscope.com> <v2m5c902b9e1004160043i7b5ccc79y2346e1b2b2c55cf5@mail.gmail.com> <s2qad99d8ce1004160053w436a29b1idae0c66737b3760a@mail.gmail.com> <4BC85A31.6060605@webtide.com> <t2iad99d8ce1004160949yb1ba9582l3b626c19dacf8d9@mail.gmail.com> <4BC96DA1.3000706@webtide.com> <u2m2a10ed241004181635qd0554193v36da94ecd7284d31@mail.gmail.com> <l2o2a10ed241004181637hdfab97d5r68f6845be49e8ad8@mail.gmail.com> <20100419005102.GC18876@shareable.org> <g2n2a10ed241004182005n9d8a5f02o29702620ae6205f4@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <g2n2a10ed241004182005n9d8a5f02o29702620ae6205f4@mail.gmail.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] Frame size
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Apr 2010 10:21:39 -0000

Mike Belshe wrote:
>      I think the point of chunks are
>       - to permit the transport implementation to make network-aware
>         decisions (such as aligning with TCP segments)
> 
>    TCP is a stream, so you must mean IP segments.

No, IP has packets; TCP has segments :-)

The segments aren't application visible except in the *timing*, but
there are still performance benefits to an application choosing how to
submit its bytes with segment awareness - primarily to avoid sending
partial segments, which TCP then reacts by adding more delays...  But
we don't need to go into details here.  It's just a tweak to reduce
latency in some cases with sporadic messages - an implementation detail.

>     But chunking doesn't
>    impact the ability to align with packets or ensure efficient use of
>    them.  I must be missing a key point here?

Two things: Firstly, if the proxy forwards partially received chunks
without rechunking, then it cannot send any control messages (such as
graceful close or anything else) until it receives all of the rest of
the data.  That means it cannot signal network errors or other
conditions - the only option it has is to abruptly close the
connection - which the receiver can't reliably distinguish from
other network errors.

This means when you do forward partially received chunk data, its
important to rechunk (unless you'll never have control messages).

(To avoid introducing long, cumulative delays, and infinite memory
requirements, forwarding partial chunks is a good idea, rather than
buffering them whole.)

Secondly, the relationship with packets is that, as a performance
optimisation, you might choose to forward all the data you have
received so far up to a multiple of the estimated next-hop segment
size, and keep the remainder for a short time until you receive more
data.  Remember that's just a performance tweak and needn't be
specified anywhere.  But it's good to make it possible.

>       - to permit intermediaries to re-chunk when that is advantageous
>         (such as aligning with the next hop TCP segments)
> 
>    This would mean that applications cannot rely on chunking to be
>    preserved end-to-end.  Is that really what you want?  Why?  HTTP
>    chunking cannot be tweaked by intermediaries either.  Well, I suppose
>    an intermediary could try, but they could easily make mistakes, and it
>    is unclear if there is any benefit?

Ah...  HTTP chunking *is* altered by intermediaries :-)  They are a
hop-by-hop mechanism, and for that matter, some hops don't support
chunks at all, so they must be added by the intermediaries.

Any application which depends on HTTP chunks being preserved is as
broken as an application which hopes that TCP write boundaries are
preserved when reading...  (And I've seen plenty of the latter which
break as soon as they leave the shop they were developed in.)

>       - to carry information for smarter buffering decisions at
>      intermediaries
>         (to reduce latency - eager forwarding of every byte when
>      received
>         is not ideal) (and I know you care about latency)
> 
>    It's unclear to me what value intermediaries are adding here.  What
>    feature is this?

To avoid introducing long delays (which add up over multiple proxies),
forwarding partial chunks is a good idea, rather than buffering them whole.

But to send whole TCP segments when possible (for both bandwidth and
latency reasons), its a good idea *not* to forward every byte as soon
as its received, and instead wait a short time to receive more data.
Or even wait forever, if you're sure that a small partial message is of
no interest to the receiver.

(Whether small partial messages are of interest to the receiver does
rather depend on the application.  For WebSocket API clients they
don't care because the API won't deliver them, but for, say, a byte
streaming application over WebSocket, partial message delivery reduces
the application-visible byte latency)

>       - as a realistic basis for multiplexing, even if that's
>         an extension not in the base protocol
> 
>    If we want to add multiplexing, there is more than just chunking
>    involved.

There certainly is! :-)

>       - to leave room for transport control messages inserted by the
>         transport (such as graceful close and transport error
>      indicators),
>         so that applications don't have implement them
> 
>    The protocol has implicit chunks - frames.  Having a second level of
>    chunking has little value here.  Or at least I haven't heard the use
>    case.
>    Applications that want to use chunking can do so already by just using
>    smaller frames.

Ah, no, the point of chunks is to be *not visible* to the
applications.  The same way as they aren't visible in HTTP chunking.

They're to let the WS implementation make smart decisions for the
network they are on, for the sort of reasons we've covered.  Chunking
is a generic and simple method which enables a lot of things.

The application visible function are called messages.

>    >    Also, why does a buffer need to be contained in RAM?  Just
>    >    because we have a long length doesn't mean that the payload can't
>    >    stream.  Lots of protocols work this way today, like HTTP.
> 
>      WebSocket API spec says that messages must be completely collected
>      before they are signalled as DOM events.  They cannot be streamed
>      in
>      that API.
> 
>    Good point.  But that is still implementation on top of the protocol,
>    and the two can be different.  The WebSocket API also doesn't define a
>    maximum string size.  But obviously, each implementation will have
>    one.

I think you might want to discuss that with Ian Hickson.

>      About buffering *frames* as opposed to messages, I agree.  There is
>      no
>      reason that frames need to be buffered whole before assembling them
>      into messages or forwarding them to the next hop.  Intermediaries
>      should not buffer whole frames of arbitrary length, they should
>      forward them, and probably change the frame boundaries at the
>      moment
>      of forwarding to avoid blocking control messages later.
> 
>    >     And what happens if you
>    >    stream a 10GB message in a single web page?  The browser just
>    hangs up
>    >    the phone at some point.  There is no graceful error.
> 
>      Why?  There is no reason why the browser cannot stop reading, and
>      send
>      the graceful error message...
> 
>      And that is important: Look at it the other way (although either
>      way
>      around this can occur).  You're sending a 10GB message to the
>      server
>      in something akin to HTTP PUT (but over this protocol), and that
>      particular server is being shut down, times out or whatever.  If
>      the
>      server just hangs up, the browser doesn't know if it's safe to
>      retransmit that POST request, so it must report an error.  But if
>      the
>      server sends a graceful "sorry I'm closing" message, the browser
>      can
>      retry automatically (like it does for HTTP GETs).
> 
>    I agree that this is a problem - but note that it exists regardless of
>    the size of the message.  Even if your frame length is 10 bytes, you
>    have this problem.

Exactly, hence rechunking solves the graceful error problem (as well
as other control messages).  You still get ungraceful errors, but they
are considered a "fault", as opposed to graceful errors in which
applications can make safe assumptions, such as the knowing that a
message definitely was not received at the other end, and the safety
of retrying an application request.

>      That is currently why pipelined HTTP is so severely broken and
>      close
>      to unusable...  Let's not repeat that debacle.
> 
>    I agree that pipelining is broken; you're implying that somehow
>    dealing with frame length would fix it - but I know you don't really
>    believe that to be true.  The reasons it never got deployed are not
>    this :-)

I do belive that to be part of the problem.

Sure, there are a bunch of implementation and compatibility problems
which also get in the way :-)

But the inability of the client to distinguish between "network
dropped the connection" and "server gracefully timed out the
connection" is a fundamental *design* error: It makes it technically
impossible to safely retry non-idempotent requests on a pipelined
connection where no actual network errors have occurred.  A normal
situation where you do want retrying.

Even if there were no interop difficulties, HTTP pipelining is broken
by design in that respect.

Frame length doesn't fix this.  Graceful close control messages (if
done correctly) *do* solve it.

>    >    While this is a theoretical problem (which Greg raised), it is
>    >    not a practical one.  Unfortunately, once you decide that 32bit
>    >    lengths are not good enough, you're running into lots of
>    >    subproblems: fragmentation/reassembly, max frame size, etc.  I
>    >    think we're just overthinknig the needs to the WebSockets API.  I
>    >    propose we make lengths be 32bits and drop
>    >    fragmentation/reassembly from this protocol.
> 
>      So you're committing to being unable to send 5GB messages?
> 
>    No - just that they have to be split up at the application layer.
> 
>      That seems inconsistent with the mention of future networks.
>      Today, a
>      5GB message takes just 5 seconds on a decent 10gig ethernet.  That
>      is
>      well within the usability range of WebSocket and browser
>      applications,
>      although typically there isn't enough RAM and processing isn't fast
>      enough to do something useful inside a browser application with
>      that
>      kind of data.
> 
>    I don't think we should design for the edge case.  The edge case can
>    be handled at the app layer.

Edge case?
> 
>      In 2020, a 5GB message will take a fraction of a second to
>      transmit,
>      <1% of RAM, and the CPU will be able to process all of it quickly
>      enough to do useful things, such as updating a 3D model in
>      Javascript.
>      I see no benefit to forbidding their use while permitting 3GB
>      messages.
> 
>    Agree, but I didn't impose a limit.  Just a frame size.  UDP has a
>    maximum frame size of 16bits (64KB).  Are you suggesting that you
>    can't send more than 64KB over UDP? :-)

Are you suggesting applications split their ages into 64kb pieces and
reassemble them at the receiver?

When talking about chunks, we're talking about applications *not* having
to deal with that - keep it in the WS implementation.

You seem to be proposing to make it more complicated for application
authors, and for no benefit that I can see.  (End to end fixed-size
chunking - what benefit does that provide?)

>    Overall, I'm talking about simplicity.  A 32bit fixed length is simple
>    and sufficient for purposes today and tomorrow.  It doesn't inhibit
>    the edge case, it merely makes it so that those wishing to support the
>    edge case have to do extra work, rather than the common case doing
>    extra work.

It's only an edge case because you've made it one.

3GB *isn't* an edge case in your picture, but 5GB is.

So in that picture, applications will be written which work fine, and
then some poor user will get upset because the application has a 4GB
file size limit that the app designer didn't think about or care to solve.

I really think by 2020, 5GB strings will be quite common in
applications, including Javascript browser applications.

It's already common to transfer multi-megabyte strings over AJAX requests.

Why have yet another unnecessary hurdle for users and app designers?

Note: There is no problem with having a 32-bit fixed limit on *chunk*
length, invisible to applications.  That doesn't limit anything, it's
just a transport detail and makes life easier there.

Imho, an application message being broken into multiple 32-bit fixed
length chunks is looking good at the moment.

-- Jamie