Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?

Vladimir Katardjiev <vladimir@d2dx.com> Mon, 19 April 2010 13:35 UTC

Return-Path: <vladimir@d2dx.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 01EC23A6A50 for <hybi@core3.amsl.com>; Mon, 19 Apr 2010 06:35:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.001
X-Spam-Level:
X-Spam-Status: No, score=0.001 tagged_above=-999 required=5 tests=[BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bUeskCjHKK8z for <hybi@core3.amsl.com>; Mon, 19 Apr 2010 06:35:19 -0700 (PDT)
Received: from homiemail-a38.g.dreamhost.com (caiajhbdcbef.dreamhost.com [208.97.132.145]) by core3.amsl.com (Postfix) with ESMTP id 191083A69D8 for <hybi@ietf.org>; Mon, 19 Apr 2010 06:34:38 -0700 (PDT)
Received: from c-39e6e055.321-1-64736c12.cust.bredbandsbolaget.se (c-39e6e055.321-1-64736c12.cust.bredbandsbolaget.se [85.224.230.57]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by homiemail-a38.g.dreamhost.com (Postfix) with ESMTP id D52A3D4FD1 for <hybi@ietf.org>; Mon, 19 Apr 2010 06:34:28 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Apple Message framework v1077)
From: Vladimir Katardjiev <vladimir@d2dx.com>
In-Reply-To: <20100419121000.GG28758@shareable.org>
Date: Mon, 19 Apr 2010 15:34:25 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <87764B8E-5872-40EE-AA2F-D4E659B94F63@d2dx.com>
References: <4BCAB2C1.2000404@webtide.com> <B9DC25B0-CD21-44E7-BD9B-06D0C9440933@apple.com> <Pine.LNX.4.64.1004181812370.751@ps20323.dreamhostps.com> <4BCB6641.70408@webtide.com> <Pine.LNX.4.64.1004182010070.751@ps20323.dreamhostps.com> <4BCB6FD0.7080003@webtide.com> <j2n5c4444771004181403o81184b00r294f3c3b878f24f6@mail.gmail.com> <20100419091736.GA28758@shareable.org> <p2w2a10ed241004190222ne3a61417i47b021dbe0422f71@mail.gmail.com> <B3F72E5548B10A4A8E6F4795430F841832040920C4@NOK-EUMSG-02.mgdnok.nokia.com> <20100419121000.GG28758@shareable.org>
To: Hybi <hybi@ietf.org>
X-Mailer: Apple Mail (2.1077)
Subject: Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Apr 2010 13:35:24 -0000

On 19 apr 2010, at 14.10, Jamie Lokier wrote:
> Keepalives have two roles, and they actually have *distinct* timing
> requirements:
> 
>    1. To tell NATs / SPIs / TCP relays / HTTP proxies (CONNECT)
>       not to drop the connection.
> 
>    2. To tell the client application that the connection is still
>       present so it knows when to abort and make a *new* connection.
> 
> The timeout needed for 1 is typically larger than the timeout for 2
> but it depends on the app.
> 
> 1 is network dependent; 2 is application dependent.

#2 is the key, at least to me, and #1 a happy coincidence. The keepalive can tell the application when the connection is NOT there. To put this in the context, I have a very simple expectation of the WebSocket (browser) API.

onconnect() should tell me when the connection is ready for use
onclose() should tell me when the connection is dead. I can't use it anymore so I need to make a new one.

This is because after onconnect() I should be able to assume a persistent connection (that's the point of WebSockets, no?) until the browser tells me otherwise. All applications that depend on user-generated inputs (or unpredictable inputs in general) need it. Unpredictable inputs are things that the application can't know are going to happen, so the general application code is silent waiting for an event to occur.

Take, for no particular reason, the case of a web-based Tic Tac Toe application. Suppose it is the opponent's turn to move, but he went afk. Meanwhile, some NAT on the path decided it wants to close the connection. Since it's not my turn to move, my client should not have any reason to send traffic over the connection. It also doesn't have any set time it expects a response (because the other party is human the application has no way to know when he'll move). 

Suppose instead we added some way for one point (for the purposes of this discussion, let's say client but I don't want to exclude server /or intermediary/) to send a keepalive message as part of the protocol. This would be propagated to the endpoint in the given direction where the specification just says "echo it back". The burden of implementation on the server end is minimal, but it opens up for a number of improvements.

- Browsers can opt to include an adaptive keepalive algorithm that automatically keeps tabs on connection idling and sends a keepalive after a certain amount of time of no traffic. Trivial JS implementation is a function that has setTimeout(sendkeepalive(), 30000) and thus unequivocally sends a keepalive every 30 seconds even if there's activity on the line. 

- Nodes that know their connections won't be interrupted can opt to drop the keepalives between themselves and just replicate them at the edges. This may not be sexy, but now apply it between a mobile phone client and its first-hop proxy. If it's a WS-aware proxy, and there's some form of keepalive negotiation, they can determine that keepalives are unnecessary, or have a longer timer than is usual (within a controlled network this is possible) and not send them. For a mobile phone, this can incur significant battery savings (or, rather, not doing it may kill a smartphone's battery in less than a day for only keepalives). The keepalives can be regenerated at the edges so the target server wouldn't see anything different.

- The client can actually be told when a connection is dropped by onclose(wasClean=false) being triggered, and thus reestablish the connection. I mean, if you open a connection, surely you want to know if it's no longer available...

- Our hypothetical amateur programmer, testing his websocket connection against localhost, won't see the need for keepalives, but the protocol requiring them will stop his application from failing on the Wild Web. All he needs to do is echo back the bytes the browser sent, and the expert browser programmer handles the rest.

So I'd rather say I am having trouble seeing what applications would _not_ want keepalive functionality as part of the base offering of WebSockets.

> 
>>   but I bet NATs are smart enough to break even there too.  Data is
>>   needed....
> 
> I can give you personal experience.  Yes: NATs do break connections
> with traffic on them.

Of course, that's not the only thing that can fail silently. Lost connections due to other issues than NATs are also prevalent, anything from a cord being cut to someone using a wireless connection and going into a tunnel.

(This only appears to contradict what I said above on mobile keepalives if you assume all mobile networks are equal. They're not. What I want to say though is that even though we assume a general case where the connection WILL fail if it's left on its own, we should make it possible to make the transfer more efficient)
> 
>>   I agree this is an important point. If I'm writing an application on
>>   top of the WebSocket API, is it my responsibility to deal with
>>   keepalives and connection failures due to NAT timer resets or reboots,
>>   or does the protocol do somethig for me in this regard? I suppose the
>>   current draft (-75) is pretty silent on this, which means it's all up
>>   to the application programmer. That sounds a bit unreasonable if we
>>   want the app development to be really simple. Probably, even if the
>>   app developers figured out how to do it, they probably would not do it
>>   in an optimal manner. So, I suppose it would be good to have either an
>>   extension (or a subprotocol) to deal with this. At least it's a piece
>>   of code that all client side apps will need anyway. I am not sure to
>>   which level it needs to be standardized, but there has to be some way
>>   to get it done.
[...]
> There is currently no consensus on whether the application should
> handle these network issues itself, or if the WebSocket implementation
> should play a role.
> 
> [...]
> 
> Proxy and server implementers seem to prefer that WebSockets handles
> it, so application programmers get a robust pipe without having to deal
> with these issues (and it might use the network a bit better).

This really depends on how you define "robust". I'm okay with WebSockets failing because the recovery conditions aren't necessarily easy to define, and for some values of robustness you need to do stuff like deferring messages on the server-side, and then you need to identify the connection that requested them, and then authenticate it (if needed) and then you need to determine when you're NOT waiting for the robustness recovery, and it just goes on and on forever, much like this sentence.

So, yeah. Failure is good. My preference, though, is that the protocol itself takes care of the failing part if the failure is due to the network conditions, so anything written on top of the protocol doesn't have to keep doing the same old networking traps every. single. time. Though it doesn't hurt if the protocol also minimises the risk of failure, and, go figure, keepalives do both. Keepalives for President!

Vladimir