Re: [hybi] Additional WebSocket Close Error Codes

Tobias Oberstein <tobias.oberstein@tavendo.de> Thu, 07 June 2012 11:50 UTC

Return-Path: <tobias.oberstein@tavendo.de>
X-Original-To: hybi@ietfa.amsl.com
Delivered-To: hybi@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D27CC21F876F for <hybi@ietfa.amsl.com>; Thu, 7 Jun 2012 04:50:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, WEIRD_PORT=0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id COudMayiNZYg for <hybi@ietfa.amsl.com>; Thu, 7 Jun 2012 04:50:05 -0700 (PDT)
Received: from EXHUB020-2.exch020.serverdata.net (exhub020-2.exch020.serverdata.net [206.225.164.29]) by ietfa.amsl.com (Postfix) with ESMTP id 192C721F8736 for <hybi@ietf.org>; Thu, 7 Jun 2012 04:50:04 -0700 (PDT)
Received: from EXVMBX020-12.exch020.serverdata.net ([169.254.3.81]) by EXHUB020-2.exch020.serverdata.net ([206.225.164.29]) with mapi; Thu, 7 Jun 2012 04:50:03 -0700
From: Tobias Oberstein <tobias.oberstein@tavendo.de>
To: Jamie Lokier <jamie@shareable.org>, Alexey Melnikov <alexey.melnikov@isode.com>
Date: Thu, 07 Jun 2012 04:50:02 -0700
Thread-Topic: [hybi] Additional WebSocket Close Error Codes
Thread-Index: Ac1EVsEiMsk2Pah+Q/WbnBqRLzURpwAOcDyg
Message-ID: <634914A010D0B943A035D226786325D43377ECB6A4@EXVMBX020-12.exch020.serverdata.net>
References: <4FB3765D.5060308@isode.com> <20120607023918.GC26406@jl-vm1.vm.bytemark.co.uk>
In-Reply-To: <20120607023918.GC26406@jl-vm1.vm.bytemark.co.uk>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: de-DE, en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] Additional WebSocket Close Error Codes
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 11:50:06 -0000

> > 1012/Service Restart
> > 1012 indicates that the service is restarted. a client may reconnect,
> > and if it choses to do, should reconnect using a randomized delay of 5
> > - 30s.
> >
> > Use case:
> > restart a service with 100k clients connected clients present an
> > informative user notification ("service restarting .. reconnecting in
> > N secs) clients should not reconnect all at exactly the same time ..
> > thus the randomized delay
> 
> On the other hand, if you have almost real-time requirements and you simply
> restarted a service, sometimes 5-30s is uncomfortable long.
> For some applications it means the client is effectively frozen for 5-30s.
> Sometimes, you want restarting the service to be as invisible to the user as
> reasonably possible.
> 
> Your server knows the current load, so has an idea if it could handle immediate
> or faster reconnection attempts.  On an intranet, you might want it randomised
> as 0-1s for 10k clients.
> 
> Any reason you chose this length of time?  Could a suggested delay be included
> with the error, with 5-30s being the default?

When I suggested that use case, the 5-30s where merely a rough, unscientific guess what
range could make sense in a public WAN server scenario .. the 5s for allowing the server
to restart at all, and the upper bound of 30s leaving 25s for 100k clients to reconnect
within that interval:

100k/25s = 4k/s opening WebSocket handshakes sustained

But yes, ultimately the interval should be decided by the server based on (at least)

* exptected server restart time
* LAN vs WAN
* # of connected clients
* opening handshake rate the server can do

The server could for 1012 provide a close reason (string):

[5,30]

that is comma separated randomized reconnect interval .. JSON.

> 
> > 1013/Service Overload
> > 1013 indicates that the service is experiencing overload. a client
> > should only connect to a different IP (when there are multiple for the
> > target) or reconnect to the same IP upon user action.
> 
> Maybe it could suggest which different IP(s) (and/or URL(s))?

Close reason could be again JSON list like i.e.:

["ws://2nd.example.com", "ws://3rd.example.com:9000", "wss://4th.example.com", "ws://62.146.25.34"]

That could be interpreted by the client as a priority sorted list of servers to connect.

The priorities could also be given explicitly

[["ws://2nd.example.com", 2], ["ws://3rd.example.com:9000", 5], ["wss://4th.example.com", 12], ["ws://62.146.25.34", 9]]

where the client then connects to a server from the list with a probability proportional to the priority value, i.e. to

ws://3rd.example.com:9000

with probability

5 / (2+5+12+9)

and so on.

> 
> -- Jamie
> _______________________________________________
> hybi mailing list
> hybi@ietf.org
> https://www.ietf.org/mailman/listinfo/hybi