Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?

Jamie Lokier <jamie@shareable.org> Tue, 20 April 2010 16:55 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 58C1C3A6ADE for <hybi@core3.amsl.com>; Tue, 20 Apr 2010 09:55:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.301
X-Spam-Level:
X-Spam-Status: No, score=-2.301 tagged_above=-999 required=5 tests=[AWL=-1.191, BAYES_05=-1.11]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p6lbelBXH6F8 for <hybi@core3.amsl.com>; Tue, 20 Apr 2010 09:55:57 -0700 (PDT)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id 461843A6ACF for <hybi@ietf.org>; Tue, 20 Apr 2010 09:55:57 -0700 (PDT)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1O4Gje-0003PK-V5; Tue, 20 Apr 2010 17:55:42 +0100
Date: Tue, 20 Apr 2010 17:55:42 +0100
From: Jamie Lokier <jamie@shareable.org>
To: Markus.Isomaki@nokia.com
Message-ID: <20100420165542.GB11723@shareable.org>
References: <87764B8E-5872-40EE-AA2F-D4E659B94F63@d2dx.com> <20100419140423.GC3631@shareable.org> <6959E9B3-B1AC-4AFB-A53D-AB3BA340208C@d2dx.com> <B3F72E5548B10A4A8E6F4795430F841832040F78C0@NOK-EUMSG-02.mgdnok.nokia.com> <w2q5821ea241004191309t7362de42p922788d380119dc4@mail.gmail.com> <B3F72E5548B10A4A8E6F4795430F841832040F78DB@NOK-EUMSG-02.mgdnok.nokia.com> <20100420013220.GC21899@shareable.org> <l2s5821ea241004192301u692d2344y8da146470a68ab75@mail.gmail.com> <8B0A9FCBB9832F43971E38010638454F03E7D06A36@SISPE7MB1.commscope.com> <B3F72E5548B10A4A8E6F4795430F841832040F7B57@NOK-EUMSG-02.mgdnok.nokia.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <B3F72E5548B10A4A8E6F4795430F841832040F7B57@NOK-EUMSG-02.mgdnok.nokia.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: hybi@ietf.org, Martin.Thomson@andrew.com
Subject: Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Apr 2010 16:55:58 -0000

Markus.Isomaki@nokia.com wrote:
> Martin Thomson wrote:
> >
> >Furthermore, you don't need a response, TCP provides ACKs.  A 
> >one-way exchange of as little as a single byte should be 
> >enough to keep any bindings alive in (TCP) intermediaries.
>
> This is true. 

It doesn't work.

It's fine for intermediaries: they must be willing to accept data in
either direction as sufficient to prevent disconnection, because it's
quite common that data flows in one direction only for long periods.

But the client won't be able to detect that the connection is down and
and open a new one in a reasonable time with that mechanism.  Two reasons:

1. There is usually no operating system interface for the client to
detect TCP ACKs.

2. Any intermediaries will send TCP ACKs.  These are not confirmation
that the next hops and server are responding.

Remember that keepalive isn't just to keep the intermediaries alive,
it's to inform the endpoints that the link is still alive and doesn't
need recreating too.

> I would say there are two options to do keepalives:
> i) Define a keepalive frame that the client sends. There is no
> websocket level response but the TCP ACK is still enough to keep
> NATs, firewalls etc. open.

Works for intermediaries, but fails in multiple ways as described above.

> Ii) Define a keepalive request frame that the client sends and
> keepalive response that the server uses to respond to it. The
> response is not really necessary for NATs etc., but might help the
> client to notice a broken connection faster. (I'm not sure if this
> is true but I remember this was an argument in another protocol,
> SIP, why keepalive over TCP was done in that way.) The response may
> make the protocol more complicated, though, as people have
> commented.

Yes, massively faster.  TCP will take about 20 minutes if I recall
correctly, to detect send failures.  That's too long for most
interactive applications to detect and recreate broken connections.

But that reponse is redundant, if the server has sent data within the
timeout interval, in most situations.  So why insist on a
request-response pattern if it's adding redundant messages?

> I would not want to make the keepalives peer-to-peer in a sense that
> also the server could send keepalives. I think the client should be
> in charge. The reason is that as a client developer for a mobile
> device I would like to be able to send keepalives as infrequently as
> possible. We have had some bad experiences with IMAP servers sending
> stuff frequently back to the client (presumably because not all
> clients are clever enough to do any keepalives themselves - this is
> a risk of course if we don't specify this at all in websocket spec),
> and the client has no way to affect their rate, which has been
> fixed.

No!  That's too 2010 and too application-specific.  What about when
the server is on your mobile phone and the client is in a data centre?
Think it won't happen?  What about that new Opera thingy where the
browser contains a web server for other browsers to use...

You can't assume the client side will always be the one with stricter
requirements, even in usual case of a user's web browser.  Servers
have resource limitations and need to manage idle connections.
1M+ connections to a single cluster is a fair implementation target;
then it matters.

> If there really is a requirement for the server to see keepalives at
> a certain minimum rate (for instance so that it can abandon stale
> connections), I would then have that negotiated so that the server
> can tell that back to the client who would still do the sending. But
> I'm not sure if that's needed.

This is not optional.

It is essential to abandon stale connections, otherwise servers run
out of resources over time - no matter how much memory they have.
Then they crash.

The only way to detect stale connections is send something or expect
to receive something.  TCP won't detect staleness if there's no activity.
(Useful word for blackholed connections, "stale", btw.  Thanks.)

Servers can send their own keepalive probes using the request-response
pattern, or by negotiated request to the client to send periodic
keepalives during idle periods, or by the packet-optimal synchronised
combination describe in an earlier mail from me.

All types have pros and cons, the same as the client side.

If both sides are sending request-response periodically, that's a high
overhead.  (Nb: In many applications, keepalives are the most expensive
bandwidth cost!)

Don't make the mistake of thinking a client-only regular ping is the
simplest keepalive to code...  It sounds simpler than it really is, when
you add what's needed for stale detection on both sides.

-- Jamie