Re: [aqm] [Bloat] ping loss "considered harmful"

Brian Trammell <ietf@trammell.ch> Mon, 02 March 2015 14:45 UTC

Return-Path: <ietf@trammell.ch>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 962141A8794 for <aqm@ietfa.amsl.com>; Mon, 2 Mar 2015 06:45:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.912
X-Spam-Level:
X-Spam-Status: No, score=-1.912 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KebH7sh63Oj5 for <aqm@ietfa.amsl.com>; Mon, 2 Mar 2015 06:45:12 -0800 (PST)
Received: from trammell.ch (trammell.ch [5.148.172.66]) by ietfa.amsl.com (Postfix) with ESMTP id E46D01A87A4 for <aqm@ietf.org>; Mon, 2 Mar 2015 06:45:11 -0800 (PST)
Received: from [IPv6:2001:470:26:9c2:c174:5cb9:1a28:4f01] (unknown [IPv6:2001:470:26:9c2:c174:5cb9:1a28:4f01]) by trammell.ch (Postfix) with ESMTPSA id E51771A00F4; Mon, 2 Mar 2015 15:45:10 +0100 (CET)
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\))
Content-Type: multipart/signed; boundary="Apple-Mail=_26AFCB9D-0C1F-4140-9A90-1860E63FF2F1"; protocol="application/pgp-signature"; micalg="pgp-sha512"
X-Pgp-Agent: GPGMail 2.5b5
From: Brian Trammell <ietf@trammell.ch>
In-Reply-To: <802AFC8C-B59B-4971-A4ED-5C0375E683B1@gmail.com>
Date: Mon, 02 Mar 2015 15:45:10 +0100
Message-Id: <CEB558AF-2DBC-4341-83FF-F878AFD3FE8F@trammell.ch>
References: <CAA93jw7KW=9PH002d3Via5ks6+mHScz5VDhpPVqLUGK2K=Mhew@mail.gmail.com> <7B3E53F5-2112-4A50-A777-B76F928CE8F2@trammell.ch> <alpine.DEB.2.02.1503021108270.20507@uplift.swm.pp.se> <802AFC8C-B59B-4971-A4ED-5C0375E683B1@gmail.com>
To: Jonathan Morton <chromatix99@gmail.com>
X-Mailer: Apple Mail (2.2070.6)
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/eDB3r_V_Qw7EAwlx8qYXJhukc9c>
Cc: bloat <bloat@lists.bufferbloat.net>, "aqm@ietf.org" <aqm@ietf.org>, "cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.net>, Mikael Abrahamsson <swmike@swm.pp.se>
Subject: Re: [aqm] [Bloat] ping loss "considered harmful"
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Mar 2015 14:45:14 -0000

> On 02 Mar 2015, at 11:54, Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 2 Mar, 2015, at 12:17, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>> 
>> On Mon, 2 Mar 2015, Brian Trammell wrote:
>> 
>>> Gaming protocols do this right - latency measurement is built into the protocol.
>> 
>> I believe this is the only way to do it properly, and the most likely easiest way to get this deployed would be to use the TCP stack.
>> 
>> We need to give users an easy-to-understand metric on how well their Internet traffic is working. So the problem here is that the users can't tell how well it's working without resorting to ICMP PING to try to figure out what's going on.
>> 
>> For instance, if their web browser had insight into what the TCP stack was doing then it could present information a lot better to the user. Instead of telling the user "time to first byte" (which is L4 information), it could tell the less novice user about packet loss, PDV, reordering, RTT, how well concurrent connections to the same IP address are doing, tell more about *why* some connections are slow instead of just saying "it took 5.3 seconds to load this webpage and here are the connections and how long each took". For the novice user there should be some kind of expert system that collects data that you can send to the ISP that also has an expert system to say "it seems your local connection delays packets", please connect to a wired connection and try again". It would know if the problem was excessive delay, excessive delay that varied a lot, packet loss, reordering, or whatever.
>> 
>> We have a huge amount of information in our TCP stacks that either are locked in there and not used properly to help users figure out what's going on, and there is basically zero information flow between the applications using TCP and the TCP stack itself. Each just tries to do its best on its own layer.
> 
> This seems like an actually good idea.  Several of those statistics, at least, could be exposed to userspace without incurring any additional overhead in the stack (except for the queries themselves), which is important for high-performance server users.  TCP stacks already track RTT, and sometimes MinRTT - the difference between these values is a reasonable lower-bound estimate of induced latency.
> 
> For stacks which don’t already track all the desirable data, a socket option could be used to turn that on, allocating extra space to do so.  To maximise portability, therefore, it might be necessary to require that option before statistics requests will be valid, even on stacks which do collect it all anyway.

So there seem to me to be three separate but related problems we want to solve here:

(1) How to get users who don't care what ping is useful information about their applications' network performance.

(2) How to get users who do care what ping is information that actually reflects application performance when they type ping (or, more generally, how to make sure that the common diagnostic tools neither (a) provide misleading information or (b) require network misconfiguration to ensure "proper" operation of the diagnostic tools (cf. speedtest.net and its ilk).

(3) How to get application developers tools they can use to integrate network measurement into their apps without having to roll their own (i.e., helping them to solve (1), and enabling the creation of tools for (2) ).

This is an approach to (3)... but (as with many things) the key to getting it deployed and used on endpoints would be defining a universal interface to it. "Yet another API to learn on every platform" == "I'm just going to use ping, thank you very much."

> Recent versions of Windows, even, have a semi-magic system which gives a little indicator of whether your connection has functioning Internet connectivity or not.  This could be extended, if Microsoft saw fit, to interpret these statistics and notify the user that their connection was behaving badly in the ways we now find interesting.  Whether Microsoft will do such a thing (which would undoubtedly piss off every major ISP on the planet) is another matter, but it’s a concept that can be used by Linux desktops as well, and with less political fallout.

This is, I'm afraid, the kicker. As long as everyone has an interest in pointing the finger at everyone else, people will choose to interpret the metrics how they like, and fail to respond to metrics they don't, no matter how good they are.

> Now, who’s going to knuckle down and implement it?

web10g.org is a start, for Linux anyway.

Cheers,

Brian