Re: [Plus] Blog post on quic and tou

Jana Iyengar <jri@google.com> Thu, 08 December 2016 17:39 UTC

MIME-Version: 1.0
In-Reply-To: <E416974A-BF2C-4D24-B9CF-1591CAE8D6C2@trammell.ch>
References: <6850cb85-6b26-e5b8-50a9-7565c39b0c28@tik.ee.ethz.ch> <CAGD1bZaXto+fq9A806ME1bonZv0639Yu5yazyp_eeeOQemeNzw@mail.gmail.com> <E416974A-BF2C-4D24-B9CF-1591CAE8D6C2@trammell.ch>
From: Jana Iyengar <jri@google.com>
Date: Thu, 08 Dec 2016 09:39:13 -0800
Message-ID: <CAGD1bZZMTEeJ+07Y90vXE6j1yWTdDqPQ2-tWy-gNJSoNFaWsZg@mail.gmail.com>
To: Brian Trammell <ietf@trammell.ch>
Content-Type: multipart/alternative; boundary="94eb2c07b01cb0d8540543291d69"
Archived-At: <https://mailarchive.ietf.org/arch/msg/plus/Vvqt9i2viJ8NNi0bLHzTG2F_qtk>
Cc: plus@ietf.org, Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Subject: Re: [Plus] Blog post on quic and tou
Precedence: list

Hi Brian,

I'm largely in agreement; a couple of thoughts inline.

On Thu, Dec 8, 2016 at 1:35 AM, Brian Trammell <ietf@trammell.ch> wrote:

> hi Jana,
>
> > On 07 Dec 2016, at 22:35, Jana Iyengar <jri@google.com> wrote:
> >
> > Thanks for forwarding the article. I'll offer some thoughts (and some
> corrections.)
> >
> > There's surely a solid argument to be made about network monitoring, as
> this blog post makes. Operators' needs are real, and we need to ensure that
> they are able to reasonably do the things that they need to do. At least
> for QUIC, exposing a small additional bit of information addresses ~80% of
> the use cases mentioned in the document: a "largest acked" ack number on
> all packets (or at least packets that contain acks). I won't design this
> mechanism on this list,
>
> (...noting that this list exists *precisely* for designing this mechanism
> ;) but yes, details should happen on the quic@ list...)
>
> > but I'll note that it's a conversation that's happening in several
> corners in the QUIC wg. It needs to be aired and discussed, and I expect it
> to happen relatively soon.
>
> For those not familiar with the details of IETF-quic (which AFAIK from
> others' implementation reports diverges somewhat from the version of QUIC
> deployed by Google right now, and will diverge more during the WG's work):
> the packet number is already exposed. Together with highest-ack, this
> allows one-observation-point split-RTT measurement with an unknown
> responder delay term, equivalent to TCP; two-observation-point approaches
> for loss measurement; and one-observation-point approaches for loss
> estimation to work with more information about the dynamics of the
> particular version of QUIC running, also similar to TCP.
>
> I personally think we can do a good deal better than this with epsilon
> more complexity and overhead, without either constraining QUIC's transport
> dynamics or requiring measurement devices to know about the details of
> those dynamics. Need to do a bit more work before I can say how small
> epsilon is, though.
>
> Missing is more detailed information about TCP dynamics mentioned in the
> post. Many of these are TCP (and CC algorithm) specific, so it doesn't make
> much sense to expose the same information, though each of the requirements
> implicit in the list is worth evaluating separately for its
> ossification/security/utility tradeoff.
>
> One requirement from that list that seems quite useful, though I don't
> know how to solve in the general case or in QUIC specifically: "determine
> [if] the software on the client or the server is the bottleneck". This is a
> very common triage task in network operations: does this problem indicate a
> misconfiguration of my network, or (more cynically) can I demonstrate that
> it's not my fault and therefore not my problem? Requiring access to one or
> both endpoint machines to answer that... seems like a question for a future
> Internet architecture research project.


This can be done by measuring at ingress and egress. A simplistic design,
at a high level, is:
(i) use packet number / ack number to measure queue buildup and loss
downstream of the ingress point, measured at the ingress.
(ii) use packet number / ack number to measure queue buildup and loss
downstream of the egress point, measured at the egress.
(iii) subtract (ii) from (i) to find queue buildup and loss in the network.

That at least gets you as far as identifying the problem as in the network
or outside it.

> The article though looks at real needs and current tools that operators
> have, and over-generalizes to saying that the "entire header" should be
> visible. My argument remains that only what is absolutely required should
> be exposed, and that every bit exposed should be debated.
>
> I would go further (again, this is the philosophical underpinning of PLUS,
> to the extent that it has one): the design of the header that is exposed
> unencrypted to the network (which constitutes its "wire image") should be
> treated as an entirely separate endeavour than the design of the transport
> protocol machinery.
>
> > This is not a security argument, it's an ossification one. The whole
> point of ossification is that there are third parties that are unresponsive
> to changes in allegedly e2e protocols. Middleboxes are reactive. If they
> see traffic shifting a particular way, they'll go build something in
> response -- I've seen this happen several times. But, they are not
> proactive. This creates a serious "deployment impossibility cycle" where
> deploying a protocol change widely requires it to work through a huge range
> of middleboxes, but even high-end middleboxes will not change behavior in
> response until the protocol change is widely deployed.
>
> My (possibly starry-eyed optimistic) hope is that a deliberately designed
> wire image will create a path of least resistance for middlebox designs to
> (reactively) follow. A well-designed wire image should be so obvious that
> the in-network reaction will be the one desired by the designers evenfor
> middleboxes built by people who didn't read the spec.
>

The problem that remains is that once middeboxes react to certain bits,
deployment of changes to those bits requires herculean effort, as we see
with TFO. But this is exactly the tradeoff -- every bit that is exposed may
help the operator, but loses agility.

[Plus] Blog post on quic and tou Mirja Kühlewind
Re: [Plus] Blog post on quic and tou Jana Iyengar
Re: [Plus] Blog post on quic and tou Brian Trammell
Re: [Plus] Blog post on quic and tou Jana Iyengar
Re: [Plus] Blog post on quic and tou Brian Trammell
Re: [Plus] Blog post on quic and tou Juho Snellman