Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?

Jamie Lokier <jamie@shareable.org> Tue, 20 April 2010 20:43 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 13B7A3A6B09 for <hybi@core3.amsl.com>; Tue, 20 Apr 2010 13:43:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.907
X-Spam-Level:
X-Spam-Status: No, score=-2.907 tagged_above=-999 required=5 tests=[AWL=-0.308, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xCXzwVED9E-m for <hybi@core3.amsl.com>; Tue, 20 Apr 2010 13:43:49 -0700 (PDT)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id E16A73A6853 for <hybi@ietf.org>; Tue, 20 Apr 2010 13:43:48 -0700 (PDT)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1O4KIE-0004mN-34; Tue, 20 Apr 2010 21:43:38 +0100
Date: Tue, 20 Apr 2010 21:43:38 +0100
From: Jamie Lokier <jamie@shareable.org>
To: Vladimir Katardjiev <vladimir@d2dx.com>
Message-ID: <20100420204338.GH11723@shareable.org>
References: <20100419140423.GC3631@shareable.org> <6959E9B3-B1AC-4AFB-A53D-AB3BA340208C@d2dx.com> <B3F72E5548B10A4A8E6F4795430F841832040F78C0@NOK-EUMSG-02.mgdnok.nokia.com> <w2q5821ea241004191309t7362de42p922788d380119dc4@mail.gmail.com> <B3F72E5548B10A4A8E6F4795430F841832040F78DB@NOK-EUMSG-02.mgdnok.nokia.com> <l2v5821ea241004191326i50970f32zbda7f876eda777f1@mail.gmail.com> <B3F72E5548B10A4A8E6F4795430F841832040F78ED@NOK-EUMSG-02.mgdnok.nokia.com> <20100420014041.GD21899@shareable.org> <p2o5821ea241004192303v20afca34vf90bcd4325eb2265@mail.gmail.com> <FC3C2E6A-5BE3-490E-AD73-741511E96491@d2dx.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <FC3C2E6A-5BE3-490E-AD73-741511E96491@d2dx.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] NAT reset recovery? Was: Extensibility mechanisms?
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Apr 2010 20:43:50 -0000

Vladimir Katardjiev wrote:
> 
> On 20 apr 2010, at 08.03, Pieter Hintjens wrote:
> 
> > On Tue, Apr 20, 2010 at 3:40 AM, Jamie Lokier <jamie@shareable.org> wrote:
> > 
> >> Perhaps the missing thing is that request-response is more complicated
> >> in this kind of protocol, because you also have to define a way to
> >> _associated_ each response with the request.  That means sequence
> >> numbers, or identifiers, or being careful with counting response
> >> messages distinct from non-response messages.
> > 
> > This is pretty much the problem we solved in AMQP by using async KAs.
> 
> Is it really necessary to match responses to requests? The point is
> just to see if the connection is alive, no? All the keepalive would
> do is tell the server "hey, can you send me something". Logically,
> if the server already is sending data then it shouldn't need to send
> a keepalive back (because there's no need to establish liveness). So
> what interest would we have to be mapping these keepalive messages
> to each-other? What semantics would that derive? The best I can
> figure out is give the req/resp latency...

If you receive a message, it could have been queued in the network for
up to 2 minutes.  So you don't know if the connection is really alive.

I see 20 second transient delays often on mobile networks; and 50
second delays occasionally elsewhere.

Whether you care depends on the application.

> The reason I suggested "echo" though (at least for a basic case) is
> because it fulfils the "amateur programmer" requirement. The server
> is completely stateless, no timers, no nothing. This assumption even
> includes the server not keeping the state of if it's sending or not.

It does involve a timer: You have to decide when to send the request.

Both sides need a timer to clean up stale connections.  The server
cannot be stateless; it must do this.

> (To put it in RFC terms, if the server receives a KA, it MUST send
> some data to the client; if the server is already sending
> application data it MAY skip sending a keepalive back)

If the server doesn't send a KA response, and the client receives the
data, the client doesn't know if that data was queued in the network
for up to 2 minutes, so it doesn't know if the server's been dead for
2 minutes.

Whether you care depends on the application.

Usually you don't in this case, but sometimes you want to know if the
server is still alive at the specific moment you send the ping
request.  Either to update an online status accurately, or to confirm
delivery of prior messages.

> I think this problem applies generally. Graceful close was supposed
> to let the remote side empty its buffer. But suppose it has 500gb in
> its sending buffer. That's going to take a while to gracefully
> close, so keepalives are still needed during this time period. And
> since the side that already sent graceful close won't send any more
> _data_ it needs to send keepalives.

I think you're right, keepalives are probably needed in this case, as
long as the closing side is prepared to drain incoming data.  (After
that, it becomes a connection error and graceful application recovery
may not be possible.)

However senders shouldn't queue 500GB if it will take a long time to
send (in 2020 maybe it will be quick), precisely because they can't
react to things like a graceful close message during the safe time :-)

> Once both sides have sent a graceful close, what would be the point
> of replying to a keepalive? If the connection is alive, it would be
> closed first. If the connection is dead it's not going to make a
> world of difference.

Depends on the semantics of graceful close.  For some it means "close"
:-) For some it means "I acknowledge receipt of messages up to
sequence number X or all those I've replied to, and will definitely
ignore further messages".  For some it means "requesting graceful
close to save resources, but we might re-open if you still want to use
this connection".

> [mobile radio considerations]

+1 :-)

Synchronised, coordinated keepalives are best for radio power.

Coordination between multiple connections is better.

Best is multiplexing with a single keepalive shared among them ;-)

> PS: Thanks Jamie for your in-depth replies! I'd like to look into more the per-hop keepalive option, especially if it can be done in an implicit manner.

I'm glad someone finds them useful.  I've just re-read them and worry
about spamming the list. (?)

-- Jamie