Re: [hybi] Reliable message delivery (was Re: Technical feedback.)

Jamie Lokier <jamie@shareable.org> Sat, 30 January 2010 14:49 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 48DD23A6782 for <hybi@core3.amsl.com>; Sat, 30 Jan 2010 06:49:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.134
X-Spam-Level:
X-Spam-Status: No, score=-2.134 tagged_above=-999 required=5 tests=[AWL=0.465, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Lnmq+YNusg9f for <hybi@core3.amsl.com>; Sat, 30 Jan 2010 06:49:13 -0800 (PST)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id 482F93A63D3 for <hybi@ietf.org>; Sat, 30 Jan 2010 06:49:13 -0800 (PST)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1NbEdk-0003Bv-Rt; Sat, 30 Jan 2010 14:49:36 +0000
Date: Sat, 30 Jan 2010 14:49:36 +0000
From: Jamie Lokier <jamie@shareable.org>
To: Maciej Stachowiak <mjs@apple.com>
Message-ID: <20100130144936.GD19124@shareable.org>
References: <4B62C5FE.8090904@it.aoyama.ac.jp> <Pine.LNX.4.64.1001291134350.22020@ps20323.dreamhostps.com> <4B62E516.2010003@webtide.com> <5c902b9e1001290756r3f585204h32cacd6e64fbebaa@mail.gmail.com> <4B636757.3040307@webtide.com> <8449BE19-3061-4512-B563-02973FBB707B@apple.com> <5c902b9e1001292310l5442d476n8375139f3480671b@mail.gmail.com> <26D406E7-2319-476E-9ADF-80D84200C270@apple.com> <5c902b9e1001292333k79569316lf371938c9aa766@mail.gmail.com> <128BFD31-9835-47B1-B7A9-F20F5CDA8D8C@apple.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <128BFD31-9835-47B1-B7A9-F20F5CDA8D8C@apple.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] Reliable message delivery (was Re: Technical feedback.)
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Jan 2010 14:49:14 -0000

Maciej Stachowiak wrote:
> > On Fri, Jan 29, 2010 at 11:25 PM, Maciej Stachowiak <mjs@apple.com> wrote:
> >>> It depends upon what level of "reliability" you are looking for.  If
> >>> you are aiming for the "common" case, the answer to both is "yes".
> >>> 
> >>> However, edge cases make the answer "no" - it is quite possible to
> >>> have "lost" responses that a server actually sends, but the client
> >>> will never see.
> >> 
> >> So you could lose messages, but can you tell in this case that you are not guaranteed yet that they have been delivered?
> > 
> > No, not really - the client simply thinks the server close()'d the
> > connection but it has no way of knowing there were other data packets
> > that the server really meant for the client to see.  Correspondingly,
> > the server did everything in the right order - it wrote all the data
> > it expected and then it close()'d the socket.  Yet...oops.
> 
> Presumably the server could know that at least all the packets ACK'd
> at the TCP level have been successfully delivered, right? So I
> assume the only problem is the remaining packets after that, if you
> don't do a lingering close.

No.

The server sends a TCP RST when it recieves data after it has called
close().

That TCP RST causes the client to *discard* data is has previously
received and ACK'd at the TCP level.  The client application does not
see that data, if it hasn't already read it from the OS.

The client should get a socket error, but that's not very useful.
Depending on how the client is used, the practical effect is sometimes
a truncated message, or missing messages.

This is why Apache must implement a rather complicated "lingering
close", which uses shutdown(SHUT_WR) instead of close(), and then
reads and discards any further data recieved from the HTTP client.

HTTP servers (and proxies) which don't do this are prone to unreliable
response delivery if the client sends any more data, such as a
pipelined request.  It only happens under some network and load
conditions, and with some clients, and some configurations, which is
why there are a lot of implementations that get it wrong.

Some applications piggybacked on WebSocket look likely to get it wrong
and suffer this problem in corner cases.

> I would like to understand the lingering close issue better. Does it consist of waiting for TCP ACKs for all your packets before closing the TCP connection?

No, it consists of calling shutdown(SHUT_WR) immediately, and then
reading and discarding whatever the client sends until you recieve a
client-side close (read() returns EOF), or you think you have waited
long enough for the client application to have read the response
(e.g. 2 minutes).

Due to the time heuristic, and the client's internal delays, there is
no guarantee that the client application will actually have received
the response, but it is ok in practice with normal HTTP clients.

> I think you can do better than just orderly close. Either from
> TCP-level acks or from WebSocket-protocol-level acks, you could tell
> that some number of your messages have definitely been delivered,
> even in the face of a service interruption. Right?

I agree.
 
> Maybe I'm thinking of reliable message delivery differently than
> you, but I assumed a major goal would be to know what might need to
> be retransmitted even if there is an unexpected disconnect.

Yes, that is a major one, because you often want automatic
retransmission when possible.  (See: HTTP pipelining problems).

Unexpected disconnected can happen for many reasons, including the
network itself which endpoints have no control over.  E.g. NAT router
resets (happens daily on one network I'm aware of).

Orderly close does not help with network-level disconnects, so other
techniques like duplicate elimination are valuable too.

-- Jamie