Re: [hybi] Reliable message delivery (was Re: Technical feedback.)

Maciej Stachowiak <mjs@apple.com> Sat, 30 January 2010 07:47 UTC

Return-Path: <mjs@apple.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D2F493A6885 for <hybi@core3.amsl.com>; Fri, 29 Jan 2010 23:47:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.298
X-Spam-Level:
X-Spam-Status: No, score=-106.298 tagged_above=-999 required=5 tests=[AWL=0.301, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wffgg92NOtaq for <hybi@core3.amsl.com>; Fri, 29 Jan 2010 23:47:26 -0800 (PST)
Received: from mail-out4.apple.com (mail-out4.apple.com [17.254.13.23]) by core3.amsl.com (Postfix) with ESMTP id 0F50C3A687E for <hybi@ietf.org>; Fri, 29 Jan 2010 23:47:26 -0800 (PST)
Received: from relay15.apple.com (relay15.apple.com [17.128.113.54]) by mail-out4.apple.com (Postfix) with ESMTP id 63AF489469CF for <hybi@ietf.org>; Fri, 29 Jan 2010 23:47:51 -0800 (PST)
X-AuditID: 11807136-b7bafae000000e8d-04-4b63e4272347
Received: from et.apple.com (et.apple.com [17.151.62.12]) by relay15.apple.com (Apple SCV relay) with SMTP id 9F.93.03725.724E36B4; Fri, 29 Jan 2010 23:47:51 -0800 (PST)
MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: text/plain; charset="us-ascii"
Received: from [17.151.93.115] by et.apple.com (Sun Java(tm) System Messaging Server 6.3-7.04 (built Sep 26 2008; 32bit)) with ESMTPSA id <0KX1005UOUZQ3R90@et.apple.com> for hybi@ietf.org; Fri, 29 Jan 2010 23:47:51 -0800 (PST)
From: Maciej Stachowiak <mjs@apple.com>
In-reply-to: <5c902b9e1001292333k79569316lf371938c9aa766@mail.gmail.com>
Date: Fri, 29 Jan 2010 23:47:50 -0800
Message-id: <128BFD31-9835-47B1-B7A9-F20F5CDA8D8C@apple.com>
References: <de17d48e1001280012i2657b587i83cda30f50013e6b@mail.gmail.com> <Pine.LNX.4.64.1001290817520.22020@ps20323.dreamhostps.com> <4B62C5FE.8090904@it.aoyama.ac.jp> <Pine.LNX.4.64.1001291134350.22020@ps20323.dreamhostps.com> <4B62E516.2010003@webtide.com> <5c902b9e1001290756r3f585204h32cacd6e64fbebaa@mail.gmail.com> <4B636757.3040307@webtide.com> <8449BE19-3061-4512-B563-02973FBB707B@apple.com> <5c902b9e1001292310l5442d476n8375139f3480671b@mail.gmail.com> <26D406E7-2319-476E-9ADF-80D84200C270@apple.com> <5c902b9e1001292333k79569316lf371938c9aa766@mail.gmail.com>
To: Justin Erenkrantz <justin@erenkrantz.com>
X-Mailer: Apple Mail (2.1077)
X-Brightmail-Tracker: AAAAAQAAAZE=
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] Reliable message delivery (was Re: Technical feedback.)
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Jan 2010 07:47:26 -0000

On Jan 29, 2010, at 11:33 PM, Justin Erenkrantz wrote:

> On Fri, Jan 29, 2010 at 11:25 PM, Maciej Stachowiak <mjs@apple.com> wrote:
>>> It depends upon what level of "reliability" you are looking for.  If
>>> you are aiming for the "common" case, the answer to both is "yes".
>>> 
>>> However, edge cases make the answer "no" - it is quite possible to
>>> have "lost" responses that a server actually sends, but the client
>>> will never see.
>> 
>> So you could lose messages, but can you tell in this case that you are not guaranteed yet that they have been delivered?
> 
> No, not really - the client simply thinks the server close()'d the
> connection but it has no way of knowing there were other data packets
> that the server really meant for the client to see.  Correspondingly,
> the server did everything in the right order - it wrote all the data
> it expected and then it close()'d the socket.  Yet...oops.

Presumably the server could know that at least all the packets ACK'd at the TCP level have been successfully delivered, right? So I assume the only problem is the remaining packets after that, if you don't do a lingering close.

In this case things are a bit more complicated because either the client or the server could be transmitting at any time, and either could choose to close the connection at any time, and either side may want to know if some of its messages are not guaranteed to be delivered.

> 
> I don't know how much code Jetty has to deal with lingering close, but
> httpd has an embarrassingly large amount of code to deal with this
> situation.

I would like to understand the lingering close issue better. Does it consist of waiting for TCP ACKs for all your packets before closing the TCP connection?

> 
>> It seems like a WebSocket-level close handshake would only solve part of the problem - you also need to be able to deal with an interruption of service that prematurely breaks the connection, and ideally you would have some better guarantee in that case than just assuming all messages are lost. Does that also need provision at the protocol layer, or could it just piggyback on TCP-level acks?
> 
> Like Greg, I think orderly close is about as good as you can do.
> Interruption of service is always going to be a possibility (power
> failure, router outages, etc.) - at least if orderly close is
> explicitly part of the protocol, then if it doesn't happen, then the
> client knows something went awry and then it can deal with it as best
> as it can.  Currently, in HTTP, you can't tell the difference between
> an orderly close and a "oh, no, something bad happened".  I think
> providing that type of hint would be a big step-forward - especially
> when async messages are involved.  -- justin

I think you can do better than just orderly close. Either from TCP-level acks or from WebSocket-protocol-level acks, you could tell that some number of your messages have definitely been delivered, even in the face of a service interruption. Right?

Maybe I'm thinking of reliable message delivery differently than you, but I assumed a major goal would be to know what might need to be retransmitted even if there is an unexpected disconnect.

Regards,
Maciej