Re: [hybi] Call for interest: multiplexing dedicated for WebSocket

Takeshi Yoshino <tyoshino@google.com> Wed, 29 May 2013 05:43 UTC

Return-Path: <tyoshino@google.com>
X-Original-To: hybi@ietfa.amsl.com
Delivered-To: hybi@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D16EE21F8EB2 for <hybi@ietfa.amsl.com>; Tue, 28 May 2013 22:43:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.392
X-Spam-Level:
X-Spam-Status: No, score=0.392 tagged_above=-999 required=5 tests=[AWL=-2.219, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, FRT_STOCK2=3.988, HTML_MESSAGE=0.001, J_CHICKENPOX_52=0.6, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oqr8jiACoVtl for <hybi@ietfa.amsl.com>; Tue, 28 May 2013 22:43:32 -0700 (PDT)
Received: from mail-we0-x229.google.com (mail-we0-x229.google.com [IPv6:2a00:1450:400c:c03::229]) by ietfa.amsl.com (Postfix) with ESMTP id 245B721F8EBB for <hybi@ietf.org>; Tue, 28 May 2013 22:43:31 -0700 (PDT)
Received: by mail-we0-f169.google.com with SMTP id q55so6104405wes.14 for <hybi@ietf.org>; Tue, 28 May 2013 22:43:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=qAiCr5lG5RjxJ8dr732LA7r8kS8Fmo8nKWhRl5FQwT4=; b=UqM0zgF1HHaCaKuru0B60OW9cZP+D4S11KcEKn0p2abvGPUP+9Sob0il4HNqW3Kocw CId6xcNO9SWavW6oXzcfLQIhwTshjgjJwSKcj9rNeH1wyiYstziTb+TVGkHwM3QaX0N+ Pdmy24tiLaRX1qlDdCiLsvW6iyCpqoTAm7FYIGkyZ722hhysHeLctApqfTn+536ERvWW QNRD0kL5p5WVRKGM1DZFyXtmbh37P0+QOR0y4tqCU9pvURCuCCskbZqXtQdEWfsqtuH1 YTiVOVaNzi5h+DE+TXwYruosXBxv3/0syNs9RlV9yuup0EapF99xwnn9mxZHY4ftoZJk PJgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=qAiCr5lG5RjxJ8dr732LA7r8kS8Fmo8nKWhRl5FQwT4=; b=prNSNOOs0IEnKHLMYlNLrVWNIEowDXPT7P6/fxpXCPcqkIy3Vptc/8f4P1AardOUAT wRDpncjvlNIofe1czu0KFEcNGRxYM95vbOIH3bjcZM2h+CMpY0NKNQYOoeWCyq+EWpTX bOX/hMtbJJFsKImAM8VuMb10RskdvcerjGRObq3NZS+mskH0F8xA3+dvsJW27d6Cm4qC 5Dza1XLGSiU0zy5bohtQMgmsVbMPzZvOieXlnqalRdJWdh0lfM1/BnFYVlvj92uwWEjJ /JdJo6olNFiGNbcGkXfw8SOoZrG6148NPzdbF/943tz/g3/lNB083j1U3fAVPaqgBxBf dY1Q==
X-Received: by 10.180.11.176 with SMTP id r16mr778560wib.58.1369806211164; Tue, 28 May 2013 22:43:31 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.180.5.136 with HTTP; Tue, 28 May 2013 22:43:11 -0700 (PDT)
In-Reply-To: <634914A010D0B943A035D226786325D4422C3DA775@EXVMBX020-12.exch020.serverdata.net>
References: <CAH9hSJZxr+aG7GZa4f-dUOTGj4bnJ+3XxivUX4jei5CMyqN4LQ@mail.gmail.com> <CAH9hSJZUG1f+3Uk=t2=A5i4O9=wPvAisspM=pgmGEH9emTL9-Q@mail.gmail.com> <CAH9hSJZai_UuxW4O6mZcEJT2DJoURtLo16XNci1qkYVWv4HVdg@mail.gmail.com> <007501ce56f0$67f74080$37e5c180$@noemax.com> <519CD6A1.7080708@ericsson.com> <519CE075.4000106@tavendo.de> <CAM5k6X9WmO80hiQZ6_GqK66PAd3of=2ZRi9=VrWj52apA1+=5g@mail.gmail.com> <519D02E7.6040009@tavendo.de> <CAM5k6X8Z-JxrgTy3NAc-wC7zK_AfsWAhqzNKaEY_yZzmz=pZcQ@mail.gmail.com> <519D2E2D.4080809@tavendo.de> <CAH9hSJZw7sLhQH6i2O9-rRKc88Bmh0EaJEcpxSB5kUKprP6YyQ@mail.gmail.com> <634914A010D0B943A035D226786325D4422C3DA775@EXVMBX020-12.exch020.serverdata.net>
From: Takeshi Yoshino <tyoshino@google.com>
Date: Wed, 29 May 2013 14:43:11 +0900
Message-ID: <CAH9hSJYms5rPSvDDNeMYoZ19UmRLJUd7YM7d98ONUkB4YZnM=Q@mail.gmail.com>
To: Tobias Oberstein <tobias.oberstein@tavendo.de>
Content-Type: multipart/alternative; boundary="001a11c22202a0bdec04ddd4dc86"
X-Gm-Message-State: ALoCoQmNzCBUZEUT0NEB1qruE1uwsLnHrBWsS7e7sps2LIz7uW7bEzxIgZ7zVLRGN19Zsp6qNvb4g129m1uxfmkQ2ltKTlFAT+0Gd+GgvxbrPj3YDfMRKIKUKuTJ3tArDgHbM/lndMhE9j25CmfAFX1sQoJ19hZJ4XN1wjn8QkNqfL5RTp1oExgpACP1mKUiYiKLfdg4EMuv
Cc: "hybi@ietf.org" <hybi@ietf.org>
Subject: Re: [hybi] Call for interest: multiplexing dedicated for WebSocket
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 May 2013 05:43:33 -0000

On Mon, May 27, 2013 at 9:51 PM, Tobias Oberstein <
tobias.oberstein@tavendo.de> wrote:

> >>Having those buffers in user space (in app) allows one to tune the
> buffer size _per stream_. Thus, we can have large buffers for streams that
> carry mass data, and small buffers for chatty streams. Thats a real
> >>advantage.
>
> >Exactly.
>
> In the meantime, I have read more about modern TCP/IP stacks.
>
> E.g. beginning with FreeBSD 7, the TCP/IP stack auto-tunes the TCP
> send/receive buffer sizes _per socket_. This is controlled via:
>
> net.inet.tcp.sendbuf_auto=1         # TCP send buffer size autotuning
> (default: on)
> net.inet.tcp.sendbuf_inc=16384      # TCP send buffer autotuning step size
> net.inet.tcp.recvbuf_auto=1         # TCP receive buffer size autotuning
> (default: on)
> net.inet.tcp.recvbuf_inc=524288     # TCP receive buffer autotuning step
> size
> net.inet.tcp.sendbuf_max=16777216   # TCP send buffer maximum size (tune
> up for Fat-Long-Pipes)
> net.inet.tcp.recvbuf_max=16777216   # TCP receive buffer maximum size
> (tune up for Fat-Long-Pipes)
>
> """
> TCP Autotuning
>
> Beginning with Linux 2.6, Mac OSX 10.5, Windows Vista, and FreeBSD 7.0,
> both sender and receiver autotuning became available, eliminating the need
> to set the TCP send and receive buffers by hand for each path. However the
> maximum buffer sizes are still too small for many high-speed network path,
> and must be increased as described on the pages for each operating system.
> """
> http://fasterdata.es.net/host-tuning/background/
>
> Apart from autotuning, there is also the option to control send/receive
> buffer sizes _per socket_ using setsockopt() with SO_SNDBUF / SO_RCVBUF.
>
> ===
>
> Given above, how would the following 2 approaches compare?
>
> a)
> - Client 1 <=> Server 1 connected with 2 TCP connections.
> - TCP autotuning on both client and server OR manually set buffer sizes:
> small for TCP-1, and large for TCP-2
> - First TCP _only_ carries "chatty" traffic, and second TCP carries "mass
> data" traffic.
>
> b)
> - Client 1 <=> Server 1 connected with 1 TCP connection
> - TCP autotuning OR large buffers set system wide
> - multiplexing done over the single TCP that carries both "chatty" and
> "mass data" traffic and
> - the multiplexing is further optimizing buffer size (app level) and does
> priorization of traffic (chatty = high-prio, mass = low-prio):
>
> "Compare": I am mostly interested in the end-user experience:
>
> i) Is the chatty traffic still low-latency, even in the presence of
> concurrent mass-data traffic?
>

a) it depends on process scheduler and network queue design of the server
and client, I think. I don't know much about per-socket buffer autotuning,
but it probably just improves memory usage and throughput but doesn't have
much effect on latency of chatty traffic?

b) the endpoints need to cut mass data into smaller chunks to reserve slots
for inserting chatty traffic. e.g. if acceptable delay for chatty traffic
is x ms, set the maximum send unit of mass traffic to (current through put)
* x.

To allow one to control the peer's message scheduling, both per-socket
buffer size and ws-mux flow control are insufficient, I think. Needs to
exchange ToS info.


> ii) Is the throughput of mass-data traffic concurrently to the chatty
> traffic still (nearly) as high as when only doing mass data transfer?
>

I think minimum delay requirement for the chatty traffic determines
fragmentation on mass-data traffic regardless it's done in OS's network
stack or ws-mux code.


> >>How about having a shared Web worker in the background that handles all
> messaging for the app over a single WS?
> >One restriction of Web Worker is that we cannot share a worker between
> webapps with different origins while WebSocket can connect to a host
> different from its origin.
>
> Ah, right. Same origin policy. Thus, with WS-MUX, there is _more_
> potentially TCP sharing than whats possible with shared WebWorkers.
>
> However, for a given single app (from a single origin) running in multiple
> browser tabs, the shared WebWorker seems not so bad. Works today. No need
> for WS-MUX or HTTP 2.0 MUX + WS over HTTP 2.0.
>

Yes. Thanks for reminding us about shared worker approach. We haven't
evaluated it well.


> >For load balancer vendors, it's better that multiplexing is standardized
> than each service provider develops its own.
>
> I don't have experience running G scale infrastructure. However, I'd like
> to understand why LBs need to have a look into WS at all .. why not Layer-4
> LB? Doing balancing based on hash of source IP?
>

Traffic for various services would be multiplexed into one TCP. We wanna
demultiplex it and forward to designated backends. This means that there're
again lots of TCP connections between LB-backends, but we can make any
optimization there as we want.


> >Basically multiplexing should be done as close to network as possible. If
> it's done in application layer, for example when compression is used, such
> a load balancer need to decode the compression to see the >multiplexing
> information. That doesn't make them happy.
>
> This seems to assume that
>
> - either the LB is done above layer 4 and/or
> - the logical WS connections contained in the physical WS would need to be
> balanced to _different_ backend nodes.
>

Yes. Sorry that we haven't told the story in our mind in detail.


> >As Google, our motivation of mux is to make sure that total meaningful
> traffic per server cost (memory, port, machine, etc.) small enough compared
> to HTTP/Comet approach.
>
> I see. For me, I am mostly interested in the use case: 1 app doing both
> chatty and mass-data coms.
>

OK