Re: [hybi] Reliable message delivery (was Re: Technical feedback.)

Jamie Lokier <jamie@shareable.org> Tue, 02 February 2010 23:25 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C75363A68E1 for <hybi@core3.amsl.com>; Tue, 2 Feb 2010 15:25:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.325
X-Spam-Level:
X-Spam-Status: No, score=-2.325 tagged_above=-999 required=5 tests=[AWL=-0.041, BAYES_00=-2.599, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VQhOOxRQ2b+x for <hybi@core3.amsl.com>; Tue, 2 Feb 2010 15:25:42 -0800 (PST)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id BB3043A68B7 for <hybi@ietf.org>; Tue, 2 Feb 2010 15:25:42 -0800 (PST)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1NcS8Q-0003Tb-HA; Tue, 02 Feb 2010 23:26:18 +0000
Date: Tue, 02 Feb 2010 23:26:18 +0000
From: Jamie Lokier <jamie@shareable.org>
To: Francis Brosnan Blazquez <francis@aspl.es>
Message-ID: <20100202232618.GD32743@shareable.org>
References: <20100130144936.GD19124@shareable.org> <5c902b9e1001301552n6efb7969o34110373e3ab4945@mail.gmail.com> <4B672C9D.9010205@ericsson.com> <op.u7gy9bag64w2qv@annevk-t60> <4B675CA6.2070406@webtide.com> <op.u7g04dun64w2qv@annevk-t60> <4B676ABE.9060901@webtide.com> <op.u7g30dfu64w2qv@annevk-t60> <4B676FCA.9070506@webtide.com> <1265135155.4450.556.camel@vulcan.aspl.local>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <1265135155.4450.556.camel@vulcan.aspl.local>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] Reliable message delivery (was Re: Technical feedback.)
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Feb 2010 23:25:43 -0000

Francis Brosnan Blazquez wrote:
> Hi Greg,
> 
> > TCP does have  difference between an orderly shutdown (FIN),
> > error shutdown (RST) and timeouts.

Btw, TCP is missing one: Distinguish between application crash and
application orderly shutdown.  Both result in FIN.

And TCP is actively unhelpful with it's TCP RST data-discard hazard.

> > So if you want to be like TCP, then you'll have those
> > concepts.
> 
> Why not?
> 
> I think people is interested on Websocket because they will layer its
> favourite protocol on top of it. I not stating I'm against this but I
> still don't see this point and how it would improve the protocol...

Think about what happens when:

   - Assume an app which is all request-response, originated by the client.
   - Assume some requests are state changing ("INSERT INTO...").

   - client opens a WebSocket
   - sends a few requests to the server, server responds
   - after 30 seconds, the server decides to close the connection
     because it's not being used(*)
   - at around the same time, the client sends another request
   - client receives TCP RST from the server, as "connection reset" error

(*) Note that servers which don't send or expect keepalive probes
    *must* timeout connections like that, because it's impossible to
    tell the difference between an idle connection and one which
    has gone away due to the network.

Two problems happen:

   1. What does the client do with that last request?  It's not safe to open
      a new connection and repeat it, because it might be acted on twice.
   2. What happens to the response to the second to last request?
      It gets truncated, because it was unlucky enough that the last
      packets were delayed by >30 seconds.

Now, I know how to use WebSocket to solve both of those, and so do you.

Anyone serious about their WebSocket app will handle it.  We don't
need to worry about them.

But seriously, do you think the millions of authors who write little
web applications in a few pages of Javascript and a few pages of Python
will think of that, or even care?  It almost never happens, and it is
much rarer on the LAN where they are testing.  It'll be one of those
rare glitches that are left broken, like random double-posting to a blog.

Mostly it only causes problems for the unlucky schmuck on some obscure
network in a remote location on a 56k modem - or an over-subscribed
mobile network; inevitably the application author doesn't experience
the problem, and it settles on the bugzilla (if there is one) as
"CANNOT REPRODUCE".

And, getting it right does demand a little more mystical TCP knowledge
than most web app authors will ever care about.  Heck, half the people
I know who write TCP apps don't realise write boundaries aren't
preserved (because their app works fine on Windows over a LAN except
for those pesky rare unexplainable lockups...) - and those are people
writing socket apps in C who should know better, not web app authors.

I wouldn't mention _those_ apps and authors, except that it's been
explicitly spelled out elsewhere that they are a target audience for
using WebSocket directly.

> In other words, people that cares about orderly close will use a
> protocol on top of Websocket that provides graceful close (like BEEP)
> and people that do not cares about this will have an additional feature
> not requested....so it looks to me both kind of users won't use this. 

Orderly close is a bit subtle, and the failure modes when it's done
wrong don't occur for everyone because they are quite dependent on
network configuration.

It's not _complicated_, it just involves non-obvious issues - the TCP
reset hazard being totally non-obvious.

I agree that orderly close can be overlaid on top of WebSocket
(with a small overhead).

But do we really want to encourage fragile applications, because many
of them will be written by people who don't think about this sort of
thing every day like us protocol designers do?

I think that's what Greg's getting at with deciding the base requirements.

-- Jamie