Re: Measurement bit(s) or not

"Brian Trammell (IETF)" <ietf@trammell.ch> Mon, 12 February 2018 12:35 UTC

Return-Path: <ietf@trammell.ch>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AA6BC124E15 for <quic@ietfa.amsl.com>; Mon, 12 Feb 2018 04:35:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZXSxjxW9b8gk for <quic@ietfa.amsl.com>; Mon, 12 Feb 2018 04:35:07 -0800 (PST)
Received: from gozo.iway.ch (gozo.iway.ch [212.25.24.36]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CA4121204DA for <quic@ietf.org>; Mon, 12 Feb 2018 04:35:06 -0800 (PST)
Received: from gozo.iway.ch (localhost [127.0.0.1]) by localhost (Postfix) with ESMTP id B5CB9340E47; Mon, 12 Feb 2018 13:35:03 +0100 (CET)
Received: from localhost (localhost [127.0.0.1]) by localhost (ACF/6597.24373); Mon, 12 Feb 2018 13:35:03 +0100 (CET)
Received: from switchplus-mail.ch (switchplus-mail.ch [212.25.8.236]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by gozo.iway.ch (Postfix) with ESMTPS; Mon, 12 Feb 2018 13:35:03 +0100 (CET)
Received: from [195.176.111.11] (account ietf@trammell.ch HELO public-docking-cx-3449.ethz.ch) by switchplus-mail.ch (CommuniGate Pro SMTP 6.1.18) with ESMTPSA id 45100869; Mon, 12 Feb 2018 13:35:03 +0100
From: "Brian Trammell (IETF)" <ietf@trammell.ch>
Message-Id: <5E9E3102-2F45-46D5-A5C2-7D63085F88F5@trammell.ch>
Content-Type: multipart/signed; boundary="Apple-Mail=_B57F3B4C-0CD6-4D4C-A114-F3B5ED1F4925"; protocol="application/pgp-signature"; micalg="pgp-sha512"
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: Measurement bit(s) or not
Date: Mon, 12 Feb 2018 13:35:02 +0100
In-Reply-To: <27669_1518436598_5A8180F6_27669_174_1_5A8180F8.6020705@orange.com>
Cc: "quic@ietf.org" <quic@ietf.org>
To: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
References: <1817_1518284090_5A7F2D3A_1817_79_1_5A7F2D3E.4050806@orange.com> <aa7a56d01f0a41fe9ad0fd9e61c54c50@usma1ex-dag1mb5.msg.corp.akamai.com> <CAN1APddOWZRF6FxiEcJ4MbOpMwxqHm9=LbMB92pVkdUJNMuMyQ@mail.gmail.com> <CAN1APdcTH=oHdf=wixJZXOCCXcaYKR1ZkJQLDpndRdehuKfvBA@mail.gmail.com> <19F415EA-DC06-4FEE-8AFA-8A6EBEBB9AFA@trammell.ch> <27669_1518436598_5A8180F6_27669_174_1_5A8180F8.6020705@orange.com>
X-Mailer: Apple Mail (2.3273)
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/IPh27PwIxYxVY2DlyCg1EnZInME>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Feb 2018 12:35:09 -0000

hi Alexandre,

> On 12 Feb 2018, at 12:56, alexandre.ferrieux@orange.com wrote:
> 
> Hi Brian,
> 
> We're clearly in violent agreement about both the potential to make intelligent
> use of this measurement nibble, and the desirability of *some* troubleshooting
> helper. Thanks for proposing to include this discussion in the spin bit table.
> 
> As it turns out, we do have concrete illustrations to show there, on real
> networks and long distances. But beyond that, I'd like to "probe" this group for
> a possible veto before digging further: while I have no doubt about an eventual
> consensus among us "troubleshooters", I'm more worried by the other side of the board: people primarily concerned about ossification and linkability, to whom
> network debugging is at best a secondary goal.
> 
> More precisely, as you accurately described, the position of the cursor of both
> tradeoffs may display a kind of threshold for acceptability by them :
> 
> - endpoint vs midpoint complexity: if the tool is too easy on the midpoint,
> active Murphies will come in ; at kind of entry barrier should be set -- how high ?
> 
> - fidelity: the coarser, or the more delayed, the better, since it precludes any
> real-time feedback loop like active Murphies do ; what is the minimum
> degradation that is needed ?
> 

While our experience with TCP here provides ample, loud warning about the dangers of letting the middle know too much, IMO these worries are a little misplaced with QUIC.

Two important things to note:

(1) A middlebox that speaks TCP can modify the header without breaking the connection, and it knows which ACK belongs with which packet. It has a wide variety of actions available to it: it can delay or drop packets; coalesce, delay, or even falsify ACKs; manipulate seq/ack and timestamps (though I don't know of any that do the latter), and so on. A middlebox that understands QUIC can only delay or drop a packet.

(2) Nobody will buy (or build and deploy) middlebox that can't even pretend to solve a problem. If we've designed QUIC such that selective drop or delay of packets based only on path-observable latency and loss will improve performance for anyone, we have done something horribly wrong.

In general, the cost of deriving a measurement from a signal should be paid by the user of the measurement, not the endpoint generating the signal (principle 3 from Principles for Measurability in Protocol Design (https://arxiv.org/pdf/1612.02902.pdf) if you're playing along at home :). This isn't meant to make it harder to mess something up, rather to make it so easy to implement the signal at the endpoint that doing so isn't seen as a negative tradeoff.

Fidelity is a hard one. Some troubleshooting tasks turn on the pattern of losses, which means you really want to know *which* packet was lost. Again, knowing which packet was lost doesn't give a QUIC-understanding measurement box as much freedom to get things wrong as the equivalent TCP box.

Cheers,

Brian



> 
> 
> On 12/02/2018 10:20, Brian Trammell (IETF) wrote:
>> hi Mikkel, Igor, Alexandre, all,
>> 
>> Engineering is fun, but let's step back a bit. :)
>> 
>> It looks like we're exploring a space of proposals that have different
>> tradeoffs for the patterns of loss and reordering they can easily make
>> visible, tradeoffs for sender (endpoint) versus observer (midpoint)
>> complexity, and tradeoffs for fidelity versus overhead.
>> 
>> In any case, it seems like it is possible to design a signal that would be a
>> vast improvement (from the measurement utility standpoint) over no signal and
>> no discernible pattern in the packet number that will fit in bits scavenged
>> from the Type field of the short header, i.e., the bandwidth overhead will be
>> *zero*, because otherwise in an encrypted-PN world we just have to grease
>> those bits anyway.
>> 
>> Back to Alexandre's question:
>> 
>> Do we want to do this?
>> 
>> Rephrased: Is the passive measurability of loss, reordering (and, if we
>> consider the spin bit as one of the measurement bits, latency) of QUIC
>> important to us, or do we decide we can live with the negative pressure a
>> complete loss of visibility and an vast increase in diagnostic complexity
>> will place on deployment?
>> 
>> Note, of course, that all the proposals we have so far represent a decrease
>> in visibility and an increase in complexity of measurement compared to
>> passive measurement of TCP. New tools will have to be developed. But the loss
>> of visibility is minimal compared to blackout, and the deployability and
>> feasibility of all of these is far, far better than an SSLKEYLOGFILE-based
>> debugging approach, especially in the interdomain case.
>> 
>> I've heard at least one dismissal of this whole space as being too abstract
>> to take seriously. (I'm not concerned, but maybe I've been staring at network
>> measurement both passive and active for too long to know what's intuitive
>> anymore.) Let me then suggest a way forward:
>> 
>> I've announced a table at the London hackathon for "Transport Measurability"
>> (see https://trac.ietf.org/trac/ietf/meeting/wiki/101hackathon), which we
>> intend to set up in the vicinity of QUIC. This was originally intended as the
>> "Spin Bit" table, and we (from ETH) will be there working on scalable,
>> open-source passive measurement tools both for the spin bit as well as for
>> the current TCP TSOPT and SEQ/ACK methodologies (as a basis of comparison,
>> mainly; at least in the case of the spin bit we so far believe the explicit
>> signalœ to have superior usability compared to current TCP measurement). I
>> suggest we expand the scope of table to hack on various signals for loss and
>> reordering, and to compare their complexity and fidelity against the loss and
>> reordering patterns we want visibility into. One output of this work could be
>> a (smaller) set of suggestion(s) for which signal(s) to add, so that those
>> who want to have concrete proposals to evaluate can do so.
>> 
>> Cheers,
>> 
>> Brian
>> 
>> [...]
> 
> 
> 
> 
> 
>>>>> 
>>>>> -----Original Message----- From: alexandre.ferrieux@orange.com
>>>>> [alexandre.ferrieux@orange.com] Received: Saturday, 10 Feb 2018,
>>>>> 12:34PM To: quic@ietf.org [quic@ietf.org] Subject: Measurement bit(s)
>>>>> or not
>>>>> 
>>>>> On 07/02/2018 14:34, Brian Trammell (IETF) wrote:
>>>>>> hi Jana,
>>>>>> 
>>>>>>> 3. Some sequencing information -- a few bits of the packet number
>>>>>>> perhaps -- should be revealed (for monitoring. Number of bits
>>>>>>> TBD.)
>>>>>> 
>>>>>> This is the crux of the argument. On one side we have the risk of
>>>>>> misuse and ossification (well, not ossification -- these bits are
>>>>>> *meant* for the path -- rather the risk that we'll figure out later
>>>>>> that we specified the wrong thing), on the other side we have the
>>>>>> loss of visibility into how QUIC traffic interacts with the network
>>>>>> as compared to TCP, with a side question of whether or not this
>>>>>> visibility is really the transport layer's problem despite the
>>>>>> evolution the practice of diagnostics and troubleshooting using TCP
>>>>>> information.
>>>>>> 
>>>>>> If we can come to agreement on this question, everything else falls
>>>>>> into place. I have my arguments here, but as you said, this subthread
>>>>>> is not the place for them. :)
>>>>> 
>>>>> The crux indeed. So what about settling it first ?
>>>>> 
>>>>> With the troubleshooting hat, I can only stress the need for
>>>>> measurement bits, for the benefit of everybody, since s**t happens,
>>>>> networks are imperfect, and nifty encapsulations-with-seqnum will
>>>>> simply not be where you need them.
>>>>> 
>>>>> Now to the exact nature of these measurement bits:
>>>>> 
>>>>> Thanks to the detailed exchanges on this thread, it is by now clear
>>>>> that a simple gapless counter, even nonzero-based and XORed, is not
>>>>> acceptable. The 4-bit SSN comes pretty close but is not enough when
>>>>> things go really wrong (and they will - and that's where we need the
>>>>> tool).
>>>>> 
>>>>> Then Kazuho's square signal and Mikkel's Pi (or any other consensual
>>>>> self-synchronizing sequence) ramification came up. They are both
>>>>> appealing for their elegance and low complexity on QUIC endpoints.
>>>>> Beyond their quirks acknowledged here, here are a few considerations
>>>>> for troubleshooting:
>>>>> 
>>>>> (1) Since reordering is less of a concern to QUIC than to TCP, it
>>>>> becomes a secondary goal. This is nice, because the square doesn't see
>>>>> it, and the self-synchronizing sequence will only tolerate a mild one,
>>>>> and never see its detail like cycle length etc.
>>>>> 
>>>>> (2) There's of course a huge difference between them in complexity for
>>>>> the midpoint: square is trivial, Pi is hefty.
>>>>> 
>>>>> Given these, a benevolent, troubleshooting-minded passive midpoint will
>>>>> clearly vote for the square. Now the obvious question is: is this
>>>>> acceptable, or deemed too easy for a Murphy, Inc. active middlebox to
>>>>> see upstream losses and benevolently wreak havoc by delaying packets ?
>>>>> 
>>>>> _________________________________________________________________________________________________________________________
> 
> 
> _________________________________________________________________________________________________________________________
> 
> Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
> Thank you.
>