Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?

Jamie Lokier <jamie@shareable.org> Tue, 20 April 2010 18:52 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 866573A6B8F for <hybi@core3.amsl.com>; Tue, 20 Apr 2010 11:52:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.005
X-Spam-Level:
X-Spam-Status: No, score=-3.005 tagged_above=-999 required=5 tests=[AWL=-0.406, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sNIa0nQfa+hW for <hybi@core3.amsl.com>; Tue, 20 Apr 2010 11:52:55 -0700 (PDT)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id 426983A6B65 for <hybi@ietf.org>; Tue, 20 Apr 2010 11:52:49 -0700 (PDT)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1O4IYl-00049e-O2; Tue, 20 Apr 2010 19:52:35 +0100
Date: Tue, 20 Apr 2010 19:52:35 +0100
From: Jamie Lokier <jamie@shareable.org>
To: Dave Cridland <dave@cridland.net>
Message-ID: <20100420185235.GC11723@shareable.org>
References: <20100419140423.GC3631@shareable.org> <6959E9B3-B1AC-4AFB-A53D-AB3BA340208C@d2dx.com> <B3F72E5548B10A4A8E6F4795430F841832040F78C0@NOK-EUMSG-02.mgdnok.nokia.com> <w2q5821ea241004191309t7362de42p922788d380119dc4@mail.gmail.com> <B3F72E5548B10A4A8E6F4795430F841832040F78DB@NOK-EUMSG-02.mgdnok.nokia.com> <20100420013220.GC21899@shareable.org> <l2s5821ea241004192301u692d2344y8da146470a68ab75@mail.gmail.com> <8B0A9FCBB9832F43971E38010638454F03E7D06A36@SISPE7MB1.commscope.com> <B3F72E5548B10A4A8E6F4795430F841832040F7B57@NOK-EUMSG-02.mgdnok.nokia.com> <4991.1271757865.754227@Sputnik>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <4991.1271757865.754227@Sputnik>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: Server-Initiated HTTP <hybi@ietf.org>, "Thomson, Martin" <Martin.Thomson@andrew.com>, "Markus.Isomaki@nokia.com" <Markus.Isomaki@nokia.com>
Subject: Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Apr 2010 18:52:56 -0000

Dave Cridland wrote:
> First, in IMAP, we've found that the client needs to be in control,  
> as it often knows better - however, others (including Mark Crispin)  
> found that having the server send keepalives when it'd otherwise be  
> shutting down the session is useful. Note that in IMAP, there are  
> technically no actual keepalives, but servers can send an ignorable  
> "* OK", and in extreme cases, can fake a delivery to cause a client  
> to do something. Clients have a NOOP command. I suspect a lot of the  
> issues with IMAP relate to there being no simple way for the server  
> to ping the client built-in to the protocol.
> 
> In XMPP, however, we've found that having the ability to ping the  
> client from the server is very useful. In XMPP, there's two distinct  
> protocol features which enable TCP-level keepalive. Firstly, either  
> side may send a SP character between stanzas. Secondly, each side may  
> actually ping the other. Having the server send a SP on an idle  
> session is useful for defeating NAT timeouts, whereas the ping  
> (XEP-0199) is extremely useful to definitively check reachability.

That is an excellent description, thank you.  I think you got it spot
on with XMPP, having both idle one-way keepalives and a ping option
that either side can use at any time.

The ability to definitively check reachability with a ping is useful
in some situations.  Especially with long keepalives giving a
"probably online (may be 2 minutes out of date)" state, sometimes
you need to upgrade that to "definitely online".

Due to keepalives being the primary bandwidth cost in mostly-idle
applications, I've studied optimising them.  I found one-way keepalive
on idle connections are most packet-efficient if there are asymmetric
timeout requirements; expediting a keepalive when you think the local
TCP is about to send a delayed ACK (i.e. when you just received
something after an idle period) increases packet-efficiency, and that
magically settles into a one-way request-response pattern when there
is a symmetric timeout requirement and no real data happening.  That
can be assembled into a single simple algorithm, but it's not the one
people tend to come up with when they first think about it.

> >If there really is a requirement for the server to see keepalives  
> >at a certain minimum rate (for instance so that it can abandon  
> >stale connections), I would then have that negotiated so that the  
> >server can tell that back to the client who would still do the  
> >sending. But I'm not sure if that's needed.
> 
> I think we need all cases possible, then some sensible advice to  
> implementors. In the XSF, we've learned a lot about how to defeat NAT  
> timeouts and get early failure detection, and the main thing we've  
> learned is that things change - only a year or two back using  
> long-lived TCP over mobile networks was really difficult, whereas now  
> it's much better.

+1, twice!

> What's more interesting, to me, is what to do if the connection does  
> break.
> 
> If each connection is granted an identifier, and - moreover - the  
> client can request a specific identifer to be reattached during the  
> HTTP Upgrade negotiation at a known sequence point, then in principle  
> a WebSocket needn't be conceptually broken even if connectivity is  
> entirely lost due to a NAT reboot.
> 
> Something very similar happens with XEP-0198, where XMPP clients can  
> reacquire a running XMPP session after TCP is lost.

I've looked at something very similar in my multiplexing/tunnelling
applications, and that is indeed a useful feature.

I'm not sure if it's stated as a requirement, but some of the
WebSocket discussion implies it is *intended* to provide a
quasi-reliable connection between web applications and server.

Unfortunately TCP does not provide that reliability on current networks.

I suspect automatic session resumption would be welcomed with open
arms by application developers, especially those who don't want
anything complicated, just something that's reliable.

HTTP and XHR effectively does provide it: The browser retries GET
requests automatically after a timeout if it doesn't get a response.
So web application authors are used to expecting some level of
automatic robustness.

It looks like XMPP people have valuable implementation experience that
is directly relevant.

-- Jamie