Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?

Vladimir Katardjiev <vladimir@d2dx.com> Tue, 20 April 2010 07:15 UTC

Return-Path: <vladimir@d2dx.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 374B03A6AB1 for <hybi@core3.amsl.com>; Tue, 20 Apr 2010 00:15:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.324
X-Spam-Level:
X-Spam-Status: No, score=-0.324 tagged_above=-999 required=5 tests=[AWL=-0.325, BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eS-h0Lsy8fD5 for <hybi@core3.amsl.com>; Tue, 20 Apr 2010 00:15:36 -0700 (PDT)
Received: from homiemail-a3.g.dreamhost.com (caiajhbdcbef.dreamhost.com [208.97.132.145]) by core3.amsl.com (Postfix) with ESMTP id 595303A6ABF for <hybi@ietf.org>; Tue, 20 Apr 2010 00:15:32 -0700 (PDT)
Received: from c-78e7e055.321-1-64736c12.cust.bredbandsbolaget.se (c-78e7e055.321-1-64736c12.cust.bredbandsbolaget.se [85.224.231.120]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by homiemail-a3.g.dreamhost.com (Postfix) with ESMTP id 0DECDC5E1D for <hybi@ietf.org>; Tue, 20 Apr 2010 00:15:22 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Apple Message framework v1077)
From: Vladimir Katardjiev <vladimir@d2dx.com>
In-Reply-To: <p2o5821ea241004192303v20afca34vf90bcd4325eb2265@mail.gmail.com>
Date: Tue, 20 Apr 2010 09:15:19 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <FC3C2E6A-5BE3-490E-AD73-741511E96491@d2dx.com>
References: <B3F72E5548B10A4A8E6F4795430F841832040920C4@NOK-EUMSG-02.mgdnok.nokia.com> <87764B8E-5872-40EE-AA2F-D4E659B94F63@d2dx.com> <20100419140423.GC3631@shareable.org> <6959E9B3-B1AC-4AFB-A53D-AB3BA340208C@d2dx.com> <B3F72E5548B10A4A8E6F4795430F841832040F78C0@NOK-EUMSG-02.mgdnok.nokia.com> <w2q5821ea241004191309t7362de42p922788d380119dc4@mail.gmail.com> <B3F72E5548B10A4A8E6F4795430F841832040F78DB@NOK-EUMSG-02.mgdnok.nokia.com> <l2v5821ea241004191326i50970f32zbda7f876eda777f1@mail.gmail.com> <B3F72E5548B10A4A8E6F4795430F841832040F78ED@NOK-EUMSG-02.mgdnok.nokia.com> <20100420014041.GD21899@shareable.org> <p2o5821ea241004192303v20afca34vf90bcd4325eb2265@mail.gmail.com>
To: Hybi <hybi@ietf.org>
X-Mailer: Apple Mail (2.1077)
Subject: Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Apr 2010 07:15:37 -0000

On 20 apr 2010, at 08.03, Pieter Hintjens wrote:

> On Tue, Apr 20, 2010 at 3:40 AM, Jamie Lokier <jamie@shareable.org> wrote:
> 
>> Perhaps the missing thing is that request-response is more complicated
>> in this kind of protocol, because you also have to define a way to
>> _associated_ each response with the request.  That means sequence
>> numbers, or identifiers, or being careful with counting response
>> messages distinct from non-response messages.
> 
> This is pretty much the problem we solved in AMQP by using async KAs.

Is it really necessary to match responses to requests? The point is just to see if the connection is alive, no? All the keepalive would do is tell the server "hey, can you send me something". Logically, if the server already is sending data then it shouldn't need to send a keepalive back (because there's no need to establish liveness). So what interest would we have to be mapping these keepalive messages to each-other? What semantics would that derive? The best I can figure out is give the req/resp latency...

I mean, I can't really see why you'd need to match keepalive "responses" to which keepalive "request" that triggered them. Since we're mainly interested about ascertaining liveness, doesn't the fact that we receive a keepalive suffice? If there's traffic on the line, we can already deduce liveness, and if there's no traffic on the line, we can't send two keepalives in a row (because we'd timeout if we didn't receive some traffic in response to the first one). 

The reason I suggested "echo" though (at least for a basic case) is because it fulfils the "amateur programmer" requirement. The server is completely stateless, no timers, no nothing. This assumption even includes the server not keeping the state of if it's sending or not.

(To put it in RFC terms, if the server receives a KA, it MUST send some data to the client; if the server is already sending application data it MAY skip sending a keepalive back)
> 
>> There are also subtleties: If the other end sends a graceful close
>> message, and then receives a KA request, does it send a KA response,
>> or does the graceful close mean that KA response is among the message
>> types that won't be sent?
> 
> Indeed... having to respond to a KA request makes a graceful close
> much more complex.

I think this problem applies generally. Graceful close was supposed to let the remote side empty its buffer. But suppose it has 500gb in its sending buffer. That's going to take a while to gracefully close, so keepalives are still needed during this time period. And since the side that already sent graceful close won't send any more _data_ it needs to send keepalives.

Once both sides have sent a graceful close, what would be the point of replying to a keepalive? If the connection is alive, it would be closed first. If the connection is dead it's not going to make a world of difference.

(Of course, there is an argument for async KAs here, because if the server sent the graceful close, and the client is flushing a single 500gb frame, it can't send any KA frames, so the server would never echo back a KA in synchronous mode. Perhaps some sort of hybrid solution? It's a tricky question though, because my next point is also a concern...)
> 
>> However for packet efficiency, synchronising timing by using
>> request-response, or another synchronisation method, may still be
>> advantageous.  Even though it's more complicated than async keepalive.
> 
> Complexity is always worth removing unless the overhead is significant
> and since KAs can be entirely switched off when there is traffic at
> all, and then reduced to once per 5 or 10 or 30 seconds, packet
> efficiency seems irrelevant here.

Well, to be precise, my main concern isn't _packet_ efficiency so much as _power_ efficiency. Async keepalive keeps two separate timers on when to send, so in the worst case scenario (both intervals are equally long) some side will send a keepalive at a frequency that will be double that which synchronised keepalives would be. 

In a mobile scenario, what costs the most is radio uptime, not bandwidth. Whenever you need to set up a radio link to receive data, you need to spend a certain amount of time doing control signaling (during which time you're drawing power, but not getting any data). To minimise this, after a data link is set up it stays active for a couple of seconds, before going idle again.

For the purposes of keepalives, uncoordinated keepalives will almost always force the radio to wake up. This is bad enough for uncoordinated keepalives between different applications, but the problem would be doubled if each connection had two uncoordinated keepalives. All it takes is three uncoordinated keepalive timers to drain a modern smartphone's battery in the timespam of a single day -- and this is doing nothing but sending keepalives. If each connection has two uncoordinated timers, that means all it takes is two connections to empty the battery in a single day, and it only goes downhill from there.

This leaves us in the paradoxical scenario where sending keepalives from the client (slightly) more frequently than the keepalive timeout is actually better efficiency, because they can be sent when the radio is already up. But this is way overkill for a trivial WebSocket implementation. My interest is to have a base case of keepalives that is simple enough so that every server supports it, but flexible enough that I can have a smart client (smart phone? har har har) that can coerce the server into behaving well. Or that an even smarter client could cooperate with an intermediary on liveness and avoid many keepalives.

Vladimir

PS: Thanks Jamie for your in-depth replies! I'd like to look into more the per-hop keepalive option, especially if it can be done in an implicit manner.