Re: Opsdir last call review of draft-ietf-quic-manageability-14

"Brian Trammell (IETF)" <ietf@trammell.ch> Tue, 22 March 2022 14:42 UTC

Return-Path: <ietf@trammell.ch>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 767703A15BA for <quic@ietfa.amsl.com>; Tue, 22 Mar 2022 07:42:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=trammell.ch
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CPvc4YRbbbJJ for <quic@ietfa.amsl.com>; Tue, 22 Mar 2022 07:42:49 -0700 (PDT)
Received: from smtp-bc0f.mail.infomaniak.ch (smtp-bc0f.mail.infomaniak.ch [IPv6:2001:1600:3:17::bc0f]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8D5E93A156E for <quic@ietf.org>; Tue, 22 Mar 2022 07:42:46 -0700 (PDT)
Received: from smtp-2-0000.mail.infomaniak.ch (unknown [10.5.36.107]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4KNDjZ4RmnzMqLVB; Tue, 22 Mar 2022 15:42:42 +0100 (CET)
Received: from smtpclient.apple (unknown [IPv6:2001:67c:370:230:703f:ae9d:d2c1:8798]) by smtp-2-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4KNDjY5NBSzlhSMr; Tue, 22 Mar 2022 15:42:41 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=trammell.ch; s=20191114; t=1647960162; bh=ODYcjPBQrTDZTgn9ncTcKnN4m4Y+AB1u0b3zyRzQ4fg=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=lxZdHtCgY0w/spPe8j/y+pim4i8UqUdP9ICNroFkdNtaeoK3ZUK/AL8cGyx0zVmPY KOza8ZEEPbO7Q+xDDVfDz/XOC2BiOUr1KcswuR+M6aL2KeNjWTiqK8dlkRlWE49TpE rxHF9fRQcXBk3RcmaubvOZKQkOQX4l1BNHlOiDgM=
From: "Brian Trammell (IETF)" <ietf@trammell.ch>
Message-Id: <670E06D4-8C0B-412B-A0C1-814F0F8D980D@trammell.ch>
Content-Type: multipart/alternative; boundary="Apple-Mail=_BB06807A-E5BC-4FC1-9909-C5D96EC1C9FC"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\))
Subject: Re: Opsdir last call review of draft-ietf-quic-manageability-14
Date: Tue, 22 Mar 2022 15:42:41 +0100
In-Reply-To: <DM8PR02MB7973BBE35F26700D004BF9A3D3119@DM8PR02MB7973.namprd02.prod.outlook.com>
Cc: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com>, "last-call@ietf.org" <last-call@ietf.org>, "draft-ietf-quic-manageability.all@ietf.org" <draft-ietf-quic-manageability.all@ietf.org>, "quic@ietf.org" <quic@ietf.org>, "ops-dir@ietf.org" <ops-dir@ietf.org>
To: "MORTON JR., AL" <acmorton@att.com>
References: <CH0PR02MB7980CA04E5EADBF6D25AD8F2D3319@CH0PR02MB7980.namprd02.prod.outlook.com> <D82872C2-4C79-45AB-92F1-9F27B324ADE0@ericsson.com> <CH0PR02MB79803C4AF8ED0F28A5F81D30D3009@CH0PR02MB7980.namprd02.prod.outlook.com> <5224BCAC-B8EC-4150-B3B1-5735056BC54C@ericsson.com> <CH0PR02MB798003A25A1C96D02F1FE525D3069@CH0PR02MB7980.namprd02.prod.outlook.com> <346C0025-B1CB-4CAF-BB23-A7E09D79E9B5@ericsson.com> <DM8PR02MB7973BBE35F26700D004BF9A3D3119@DM8PR02MB7973.namprd02.prod.outlook.com>
X-Mailer: Apple Mail (2.3696.80.82.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/8AiRVWMj5A9BQlI45obPS7oKbzY>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Mar 2022 14:42:55 -0000

Hi Al,

> On 16 Mar 2022, at 20:23, MORTON JR., AL <acmorton@att.com> wrote:
> 
> Hi Mirja,
> 
>> -----Original Message-----
>> From: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com <mailto:mirja.kuehlewind@ericsson.com>>
>> Sent: Wednesday, March 16, 2022 10:40 AM
>> To: MORTON JR., AL <acmorton@att.com <mailto:acmorton@att.com>>
>> Cc: last-call@ietf.org <mailto:last-call@ietf.org>; draft-ietf-quic-manageability.all@ietf.org <mailto:draft-ietf-quic-manageability.all@ietf.org>;
>> quic@ietf.org <mailto:quic@ietf.org>; ops-dir@ietf.org <mailto:ops-dir@ietf.org>
>> Subject: Re: Opsdir last call review of draft-ietf-quic-manageability-14
>> 
>> Hi Al,
>> 
>> as you might have seen we merged the remaining PRs and submitted a new version
>> last week.
>> But unfortunately, I don't think we were able to address your comment below
>> fully.
>> 
>> Regarding use of version number I believe the text in the draft reflects the
>> group consensus, so we only made my editorial change to make if clearer.
> [acm] 
> Your draft and WG consensus discourages use of the version field for admission purposes.
> This is a question of whether any WG should state a consensus that expressed a *policy* for network managers and operators in an IETF RFC. If the same intent was stated as a conclusion reached by the WG, it would be far more palatable, and I offered alternative text as an example.

I don’t think we’re stating a policy here, we’re stating a recommendation.

Note that QUIC v2 is mostly done; it’s a minimal change to the wire image meant to exercise the versioning mechanism. A network that admits only QUIC v1 (which, indeed, seems mostly reasonable from the standpoint of an operator used to the last few decades of the use and abuse of extensions in the Internet) will, at that point, reject ~half of QUIC traffic for ~no benefit to the operator or its users. The recommendation is meant to avoid that sort of silliness.

The reason behind this version agility is, in turn, to maintain the deployability of new versions. Networks are of course free to admit any traffic they want; the point of this language is to point out the mostly-negative tradeoff of doing so.

> So, let's consider this issue as needing further discussion in a wider venue.

I think we’re having that discussion on last-call@ietf.org <mailto:last-call@ietf.org> right now. :)

>> Regarding when the handshake fails, I'm not sure if it would be correct to say
>> anything more here. You can always just not see some of the packets on the
>> path, or the handshake could even change with a new version or an extension I
>> guess. Again I'm also not really sure what to do with that information either.
>> If you don't see any further packets flowing at any time, incl. right after
>> the handshake, something went either wrong or the transmission is just done.
>> It's really hard to make any assumption from the network here.
> [acm] 
> The case I cited was an operator that wants to support QUIC, and wants to identify when QUIC setup fails and how frequently failure occurs, to support analysis and troubleshooting and properly manage their network.

There seems to be a tacit assumption here that holds in the TCP case that does not necessarily hold in the QUIC case: that an operator can helpfully debug the operation and performance of a transport protocol within their network. One of the reasons this is a useful (indeed, essential) role of network operators in the TCP world is that there is often an unavoidable, unintentional, transport-dependent differential impact of an operator’s own network on different traffic flows, where the remedy is often only actionable by the operator itself.

I’d submit that the main reason this happens with protocols like TCP is that the TCP wire image is path-observable and path-mutable. Without this path-observability and path-mutability, the set of possible flow-dependent impacts is necessarily reduced, if not eliminated. Without operator-actionable problems on the network, the observability of internal protocol dynamics from non-cooperative third parties becomes less important.

In other words, the set of wire image features that can cause differential treatment in an operator's network is equal to the set of wire image features that are freely observable by that operator.

Cheers,

Brian

> I also note the dependency on knowing the version number in your paragraph above (when attempting to understand the handshake), as hint to accomplishing this management goal (by relating the version to a published specification). 
> 
> I think that a supporting operator (like the one above) is the most-likely reader of your memo, so it will help them to add a few sentences about non-Figure 1 handshakes. Even if the sentences are something like this (based on what you said above):
> 
> 	If the handshake in Figure 1 is truncated or missing packets, many actual outcomes are possible (and not necessarily handshake failure). The end-points may have switched to a different version and handshake, switched to a different path, implemented fallback, terminated the attempt as the end-points intended, or other outcome. 
> -=-=-=-=-=-=-=-=-=-
> 
> Over time, observers will likely develop heuristics to mitigate these uncertainties and draw probable conclusions (like they did with TCP), but you don't need to add that aspect. Just indicate the possibilities and try to improve the manageability of QUIC.
> 
> Al
> 
>> 
>> Mirja
>> 
>> 
>> 
>> On 05.03.22, 17:02, "MORTON JR., AL" <acmorton@att.com> wrote:
>> 
>> Hi Mirja, thanks for your replies and PRs.
>> please see replies below, I clipped discussions we have closed.
>> Al
>> 
>>> -----Original Message-----
>>> From: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com>
>>> Sent: Tuesday, March 1, 2022 1:49 PM
>>> To: MORTON JR., AL <acmorton@att.com>
>>> Cc: last-call@ietf.org; draft-ietf-quic-manageability.all@ietf.org;
>>> quic@ietf.org; ops-dir@ietf.org
>>> Subject: Re: Opsdir last call review of draft-ietf-quic-manageability-14
>>> 
>>> Hi Al,
>>> 
>>> thanks again! See below!
>>> 
>>> On 27.02.22, 19:50, "MORTON JR., AL" <acmorton@att.com> wrote:
>>> 
>>> [snip]
>> ...
>>> [acm] I see that there was additional editing since you wrote last
>> Monday,
>>> so I made a comment and suggestions on GitHub.
>>> 
>>> [MK] Thx! Added you suggestions!
>>> 
>>> [snip]
>>> 
>>> 
>>>>> 
>>>>> 2.8. Version Negotiation and Greasing
>>>>> 
>>>>> ...
>>>>> QUIC is expected to evolve rapidly, so new versions,
>> both
>>>>> experimental and IETF standard versions, will be
>> deployed
>>> on the
>>>>> Internet more often than with traditional Internet-
>> and
>>>> transport-
>>>>> layer protocols. Using a particular version number
>> to
>>> recognize
>>>>> valid QUIC traffic is likely to persistently miss a
>>> fraction of
>>>> QUIC
>>>>> flows and completely fail in the near future, and is
>>> therefore
>>>> not
>>>>> recommended.
>>>>> [acm] Where "valid traffic" is the focus, I agree, let
>> it
>>> flow.
>>>>> But the Operator's focus may instead be "admissible
>> traffic",
>>> where
>>>>> experimental traffic is not wanted or allowed. IOW, only
>>> traffic
>>>> that is
>>>>> understood to conform to <RFC list> shall pass, because
>>> "Active
>>>> Attacks are
>>>>> also Pervasive", to put a different spin on 7258. [acm]
>> See
>>> also the
>>>> comment in
>>>>> 3.4.1.
>>>>> 
>>>>> [MK] This is not about experimentation.
>>>> [acm]
>>>> OK, let's just say unexpected traffic.
>>>> 
>>>>> The expectation is that QUIC versions
>>>>> will change often, e.g. we already have a draft for a new
>> version
>>>> adopted in
>>>>> the group and there might be another RFC some time this
>> year. So
>>> if you
>>>>> "manually" have to allow for new versions in all your
>> equipment
>>> that
>>>> will
>>>>> delay deployment of new versions (or even hinder them
>> because
>>> there is
>>>> always
>>>>> one box that doesn't get updated). Therefore we strongly
>> recommend
>>> to
>>>> not use
>>>>> the version to filter QUIC traffic. Is that not clear enough
>> in
>>> the
>>>> text?
>>>>> 
>>>>> In addition, due to the speed of evolution of the
>>>>> protocol, devices that attempt to distinguish QUIC
>> traffic
>>> from
>>>> non-
>>>>> QUIC traffic for purposes of network admission
>> control
>>> should
>>>> admit
>>>>> all QUIC traffic regardless of version.
>>>> [acm]
>>>> I think it is clear, and at the same time, it is aspirational
>> for
>>> many
>>>> networks.
>>>> This sentence informs, but then strays into policy.
>>>> 
>>>> Maybe this will work:
>>>> ...devices that attempt to distinguish QUIC traffic from
>> non-
>>>> QUIC traffic for purposes of network admission control
>> should
>>> not
>>>> rely
>>>> on the version field alone.
>>>> 
>>>> [MK] I think your proposal is not correct because the whole point
>> is
>>> that you
>>>> really should not use the version field _at all_. I know that
>> people
>>> will
>>>> still do that, but I think we should at least spell it out clearly
>> here
>>> that
>>>> this is problematic and hinders evolution.
>>> [acm]
>>> Evolution is what happens when a succeeding RFC is approved.
>>> Experimentation is the many months between approvals.
>>> 
>>> 	...devices that attempt to distinguish QUIC traffic from non-
>>> QUIC traffic for purposes of network admission control
>>> *** should admit all QUIC traffic regardless of version.***
>>> The last phrase attempts to define operator policy.
>>> Don't do that.
>>> The version field exists. It's specified in a standard.
>>> If you simply say,
>>> "The version field will change in the future." no one will be
>> surprised.
>>> 
>>> [MK] Okay I got your point about policy. However, this document is meant
>> to
>>> provide guidance/recommendations to operators. I also see now that this
>> in the
>>> "background" part which is also rather to explain QUIC than give
>>> recommendations. However, I think this is actually one of the essential
>>> recommendations of the document, so I would like to still spell this out
>>> clearly and as early/often as possible. I tried a slightly different
>> wording
>>> in a new PR on github. Is that any better?
>>> 
>>> 
>> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=31323334- <https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=31323334->
>> 501d5122-313273af-454445555731-0c8d12cf3c8f69d3&q=1&e=0560674f-fb74-4ca7-afd2-
>> 16c2148a7129&u=https*3A*2F*2Furldefense.com <http://2furldefense.com/>*2Fv3*2F__https*3A*2F*2Fgithub.com <http://2fgithub.com/>*
>> 2Fquicwg*2Fops-
>> __;JSUlJSUlJSUlJQ!!BhdT!iHzaYKyN6pGji70tbHntNd77OXfIU4uuz7yrrrdyIBk1xF8H4AbY4b
>> Yu77k6OuH_qZ54CWUksfGvL7zx23VlQpdp$
>>> drafts/pull/459/files__;!!BhdT!mOHh0CyPDRUf9uvgZfIrDspADvFLupiMn-
>>> 5czo4ercUtLNr7_gQJcuGTzI0cYadmIRktrtZrgoTKCp4DmqHssizC$
>>> 
>> [acm]
>> Not yet. Maybe we can compose your message to operators *without* making
>> it sound like you are trying to set policy. I suggested text in the PR like
>> this:
>> 
>> Developers would prefer admission of all QUIC traffic regardless of
>> version in order to support continuous version-based evolution. However, all
>> parties understand the value of versions with a corresponding, fully-approved
>> standard.
>> 
>>> 
>>>> 
>>>>> 
>>>>> [acm] I was hoping to see a description of fallback to
>> TCP (I
>>> see
>>>> that fallback
>>>>> is mentioned briefly at the end of section 4.2., and
>> later,
>>> fail
>>>> over and
>>>>> failover. pick one...)
>>>>> 
>>>>> How can Network Operators observe when a QUIC setup has
>>> failed, and
>>>> the
>>>>> corresponding TCP fallback connection(s) succeeded?
>>>>> 
>>>>> [MK] There is no unified way how and if fallback is
>> implemented.
>>>> However, why
>>>>> do you think a network operator would need that information?
>>>> [acm]
>>>> To affirm that their admission policy is working properly, for
>> one
>>> reason.
>>>> 
>>>> [MK] However, there is really no guarantee that all QUIC will have
>> a
>>> fallback.
>>>> Without further knowledge about what higher layer service the QUIC
>>> transport
>>>> carries, I don't think you can make any assumption about fallback.
>> If
>>> you want
>>>> to support evolution, you need to support QUIC and not rely on any
>>> potentially
>>>> fallbacks.
>>> [acm]
>>> I chose example carefully: the operator wants to support QUIC, but
>> has
>>> reports that QUIC setup is failing and needs to make measurements to
>> gather
>>> symptoms & info. Experience will indicate the circumstances where QUIC
>> setup
>>> failure is accompanied by fallback, and other possibilities. Repeated
>>> experiences become heuristics for passive observation.
>>> No assumptions necessary.
>>> Has QUIC setup failed if the exchanges in Figure 1 are incomplete?
>>> I think there might be a yes or no answer...
>>> If no, then the passive observation procedure will mostly be
>> governed by
>>> heuristics.
>>> 
>>> [MK] I think I lost the point now. If QUIC fails even if there is a
>> fallback,
>>> that's still not great because the original intention was obviously to
>> use
>>> QUIC. Is there anything we need to say in the draft that is missing?
>> [acm]
>> Without getting into fallback in any way,
>> Help the operator determine when a QUIC setup has failed by providing a
>> little more info.
>> It would be useful to know:
>> What QUIC messages would accompany a QUIC setup failure? (other than those
>> in Figure 1)
>> OR
>> A statement like:
>> If the exchange in Figure 1 is incomplete, then the QUIC setup has failed.
>> (IF that is true)
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> Is there a reference available with this info, to save
>> effort
>>> here?
>>>>> 
>>>>> [MK] As I said this is rather implementation specific, so I
>> would
>>> say
>>>> no.
>>>>> 
>>>>> ...
>>>>> 
>>>>> 3.4.1. Extracting Server Name Indication (SNI)
>> Information
>>>>> 
>>>>> ...
>>>>> 
>>>>> Note that proprietary QUIC versions, that have been
>>> deployed
>>>> before
>>>>> standardization, might not set the first bit in a
>> QUIC long
>>>> header
>>>>> packet to 1. However, it is expected that these
>> versions
>>> will
>>>>> gradually disappear over time.
>>>>> [acm]
>>>>> And some networks may prefer not to admit experimental
>>> traffic. The
>>>> goal of the
>>>>> experiment may be problematic for the network operator
>> and/or
>>> their
>>>>> subscribers. I think this is legitimate operator
>> behavior, and
>>> worth
>>>> a few more
>>>>> words in the draft.
>>>>> 
>>>>> [MK] To be honest I don't understand this point. How would
>> an
>>> operator
>>>> even
>>>>> know if an experiment would be problematic or no? QUIC is
>> fully
>>>> encrypted.
>>>>> Versioning is only one extension mechanism. So basically
>> even if
>>> you see
>>>> the
>>>>> same version number, the QUIC behind that could behave very
>>> differently
>>>>> depending on which extensions are used and because of the
>>> encryption,
>>>> there is
>>>>> no chance for the operator to know about this. Is this not
>> clear
>>> in the
>>>>> document? Do we need to state this more clearly?
>>>> [acm]
>>>> First, let's say s/experimental/unexpected/ or
>>> s/experimental/proprietary/
>>>> Then, I'm responding to your reply more than the paragraph in
>> the
>>> draft
>>>> now:
>>>> Network operators are also end users, and often act on their
>>> subscriber's
>>>> behalf. Observations are not strictly limited to mid-points, where
>>> encryption
>>>> is present.
>>>> Harboring old notions of what operators cannot do will not sit
>> well
>>> with
>>>> your audience...
>>>> 
>>>> So, (in the paragraph above) you've informed operators that
>> some
>>>> proprietary QUIC versions remain in use as of this writing.
>>>> But traffic that doesn't conform *might* be considered
>> nefarious.
>>> That's
>>>> all. It's a message for everyone involved.
>>>> 
>>>> [MK] I think the point is actually rather that we want to say
>> here: if
>>> you
>>>> don't support these old versions that will not be a problem in the
>> near
>>>> future.
>>> [acm]
>>> Ok, say that in the draft, please.
>>> 
>>> [MK] Okay started a PR on github. Is that more clear now?
>>> 
>>> 
>> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=31323334- <https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=31323334->
>> 501d5122-313273af-454445555731-0c8d12cf3c8f69d3&q=1&e=0560674f-fb74-4ca7-afd2-
>> 16c2148a7129&u=https*3A*2F*2Furldefense.com <http://2furldefense.com/>*2Fv3*2F__https*3A*2F*2Fgithub.com <http://2fgithub.com/>*
>> 2Fquicwg*2Fops-
>> __;JSUlJSUlJSUlJQ!!BhdT!iHzaYKyN6pGji70tbHntNd77OXfIU4uuz7yrrrdyIBk1xF8H4AbY4b
>> Yu77k6OuH_qZ54CWUksfGvL7zx23VlQpdp$
>>> drafts/pull/460/files__;!!BhdT!mOHh0CyPDRUf9uvgZfIrDspADvFLupiMn-
>>> 5czo4ercUtLNr7_gQJcuGTzI0cYadmIRktrtZrgoTKCp4DmqkDVmpL$
>>> 
>>> 
>> [acm]
>> I'm ok with this one, thanks.