Re: [tsvwg] Roman Danyliw's Discuss on draft-ietf-tsvwg-l4s-arch-19: (with DISCUSS and COMMENT)

Bob Briscoe <ietf@bobbriscoe.net> Thu, 25 August 2022 12:07 UTC

Content-Type: multipart/alternative; boundary="------------TOpgCGYp5DTyNY2RoHw4wwKl"
Message-ID: <3bb5d621-3586-d7da-f341-564a3827c2dd@bobbriscoe.net>
Date: Thu, 25 Aug 2022 13:07:10 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0
Content-Language: en-GB
To: Roman Danyliw <rdd@cert.org>, The IESG <iesg@ietf.org>
Cc: draft-ietf-tsvwg-l4s-arch@ietf.org, tsvwg-chairs@ietf.org, tsvwg@ietf.org
References: <166135543475.8949.3772316498330227070@ietfa.amsl.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
In-Reply-To: <166135543475.8949.3772316498330227070@ietfa.amsl.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/pV0uW56svXtlA24fuLO14KgkPCM>
Subject: Re: [tsvwg] Roman Danyliw's Discuss on draft-ietf-tsvwg-l4s-arch-19: (with DISCUSS and COMMENT)
Precedence: list

Roman,

Thank you for reading through and all your comments/questions. See [BB] 
inline..

On 24/08/2022 16:37, Roman Danyliw via Datatracker wrote:
> Roman Danyliw has entered the following ballot position for
> draft-ietf-tsvwg-l4s-arch-19: Discuss
>
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
>
>
> Please refer tohttps://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/  
> for more information about how to handle DISCUSS and COMMENT positions.
>
>
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-tsvwg-l4s-arch/
>
>
>
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
>
> ** Section 6.4.2.  My read of Figure 3 is that the suggested architecture can
> be incrementally deployed.  My other read is that it appears that this staged
> deployment isn’t safe outside of controlled environments (i.e., for the
> Internet) until after phase 1.  Assuming this is accurate, please add clear
> normative language that partial roll-outs of the L4S architecture MUST NOT
> occur outside of controlled environment until <insert needed capabilities>.

[BB] There is no normative language in the architecture (informative). 
Nonetheless, in the editor's copy, I've altered the bullets under Figure 
3 as per this diff temporarily uploaded to my web site.

https://bobbriscoe.net/tmp/draft-ietf-tsvwg-l4s-arch-20b-DIFF-20a.html

Is this sufficient?

You've probably already found that draft-ietf-tsvwg-ecn-l4s-id is 
experimental track and replete with normative statements about not using 
a non-compliant CC over the public Internet.
But I understand that not everyone reads all the drafts.

>
>
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> abstracts of
> ** Section 8.1.
>     In the current Internet, scheduling usually enforces separation
>     between 'sites'
>
> If would be clearer to explain scheduling of what and by whom.

[BB] I've edited to:

     In the current Internet, ISPs usually enforce separation
     between the capacity of shared links assigned to different
     'sites' (e.g. households, businesses or mobile users) using
     some form of scheduler [RFC0970].

OK?

>
> ** Section 8.1
>
>     However, there has
>     never been a universal need to police the rate of individual
>     application flows - the Internet has generally always relied on self-
>     restraint of congestion controls at senders for sharing intra-'site'
>     capacity.
>
> I don’t follow the generalization being suggested here.  Is the text suggesting
> that enterprises do not shape traffic on a per application basis?  “Upstream
> networks” to provide different treatment of select traffic?  Is there a
> measurement study that can be cited?

[BB] This para was meant to set the scene for the rest - we didn't 
expect it to be controversial.
I have a feeling we're talking past each other, because there seem to be 
some subtle but important changes of words between the draft and your 
question:

  * per-application is class-based (e.g. Diffserv), whereas
    per-application-flow is 5-tuple based. The draft is saying the
    latter is not generally deployed or found to be necessary (the
    exception is FQ-CoDel which was primarily motivated by latency
    isolation, and capacity division came as a side-effect).
  * shaping implies delaying some packets to smooth out bursts, whereas
    the draft was talking about policing, and specifically about
    limiting bit rate of each 'site' rather than each 5-tuple.
  * enterprises (and institutions) are a less demanding security
    environment than the public Internet, which is what the draft was
    mainly talking about (institutions have disciplinary ways of
    enforcing reasonable behaviour). Nonetheless the answer to your
    question is, yes, the generalization does hold for enterprises and
    institutions, which also don't generally police individual
    applications flow rates (some might police on a per-IP basis, but
    not per-5-tuple).
  * I suspect the word 'site' also made you think of enterprises. It's
    actually defined in the terminology section, but I admit that the
    natural meaning of the word is not a good fit for all the uses we
    have tried to put it to.

Regarding a citation, we'd have to cite each particular spec defining 
the scheduler function for each access network (DSL, PoN, 3G, 4G, 5G, 
DVB, DOCSIS, etc). But I think such lengths are only really necessary to 
support a controversial assertion, whereas this is stating a fact that 
is well-known (or perhaps your question is evidence that it's only 
well-known in the circles I move in?).

There have been  measurement studies to infer specific details about 
scheduler algorithms, but I can't imagine anyone would try to measure 
the mere existence of a scheduler when they could just look up the specs 
for the system architecture of a particular access technology. This is 
how Internet access networks have been built ever since packet 
networking was first deployed. That's why I cited RFC0970.

In summary, I won't change anything here, pending further clarification.

>
> ** Section 8.1.  The first paragraph ends with a conclusion that there isn’t a
> universal need to police individual flow rates.  However, the second paragraph
> then discusses the use of per-flow-queuing.  One paragraph doesn’t follow from
> the other.  If no one does per-flow treatment as suggested, why discuss the
> approach?

[BB] This was intended to say that, if it turns out in future that a 
need for per-flow policing arises, it can be added 'cos it's orthogonal.

But you're right that the logic doesn't flow. Indeed, to record ML 
discussions, various other paras have been added to this section over 
the life of the draft, and only some are meant to follow one thread. To 
help, I've divided the section into two subsections, and shifted one 
point into the next section where it was more relevant. Then I've 
reworked the logical flow.

Does this help?

> ** Section 8.2
>
>     It is hoped that self-interest and guidance on dynamic
>     behaviour (especially flow start-up, which might need to be
>     standardized) will be sufficient to prevent transports from sending
>     excessive bursts of L4S traffic, given the application's own latency
>     will suffer most from such behaviour.
>
> Why is “self-interest and guidance on dynamic behaviour” the appropriate threat
> model?

[BB] In traffic security, the large majority of stakeholders is expected 
to act in their own self-interest. Then a much smaller minority use 
traffic to harm others out of malice, even if they do not need to 
transfer any information in the traffic. So the system has to be 
inherently secure against self-interest, while additional security 
mechanisms might need to be employed intermittently that are specific to 
various active attacks (or bugs and other human errors).

The 'guidance on dynamic behaviour' part is because, even if it's 
possible to avoid sending bursts (for the user's own self-interest), 
with a new technology, developers might need to be told how.

Does this all need explaining in the draft?

>
> ** Section 8.2
>
>   None of these possible queue protection capabilities are considered a
>     necessary part of the L4S architecture, which works without them
>
> Could this be restated.  The current text suggests to me that various security
> mitigations are not part of the L4S architecture.  I don’t see how this would
> be possible if it were to see Internet deployment.

[BB] I've changed the editor's copy to:

"No single one of these possible queue protection capabilities is 
considered an essential part of the L4S architecture, which works 
without any of them under non-attack conditions (in a similar way to how 
the Internet normally works without per-flow rate policing)."

Better?

>
> ** Section 8.3
>
>     So, in networks that already use rate policers and plan to deploy
>     L4S, it will be preferable to redesign these rate policers to be more
>     friendly to the L4S service.
>
> It might be helpful to be more specific on where this traffic shaping is
> happening as part of the migration consideration.

[BB] It already says "in certain scenarios (e.g. corporate networks)". 
Did you want it to say precisely where within those corporate networks? 
Or are you asking for other examples?
Assuming you meant the latter, I've added the example in parentheses below:
     "Similarly, in the default Diffserv class, rate policers are 
sometimes used to partition shared capacity (e.g. for some passive 
optical networks)."

Nonetheless, I'm not sure how useful it will be to give examples. 
Policers are one of the component parts that have always been available 
to system designers. So I suspect they will have been used in random 
places where the designers thought they were a good idea, whether they 
were or not.

> ** Section 8.4.  This section discusses malicious senders, thank you.  Could a
> statement also be made about an on-path attacker modifying the ECN marks in the
> absence of integrity mechanisms.  For example, draft-ietf-tsvwg-l4s-ids notes
> in Section 5.4.1.1:
>
>     If a non-compliant or
>     malicious network node did swap ECT(0) to ECT(1), the packet could
>     subsequently be ECN-marked by a downstream L4S AQM, but the sender
>     would respond to congestion indications thinking it had sent a
>     Classic packet.  This could result in the flow yielding excessively
>     to other L4S flows sharing the downstream bottleneck.

[BB As you can see from my conversation with Valery Smyslov, I'm not a 
great fan of including concerns about on-path attackers in every RFC:
https://mailarchive.ietf.org/arch/msg/tsvwg/irOO5LvRl__e2d8yue6fETbilJE/
On-path attacks are a concern in discussion of access control (physical 
and logical) to network equipment. Once that is breached, the service is 
toast (see above link for various ways of burning toast to a crisp).

The IP-ECN field is mutable, so the more useful of the integrity 
mechanisms listed in §8.4 check or constrain the integrity of the 
/behaviour/ of the nodes contributing to the congestion feedback loop. 
This section is only a summary, because ecn-l4s-id is more specifically 
about the L4S ECN protocol itself. So this section refers to the more 
comprehensive section in the Security Considerations of ecn-l4s-id that 
Valery helped me improve, adding clearer applicability of each technique.

Is that sufficient?

>
> ** Section 8.5.
>     Because L4S can provide low delay for a broad set of applications
>     that choose to use it, there is no need for individual applications
>     or classes within that broad set to be distinguishable in any way
>     while traversing networks.
>
> >From a purely technical view, this might be true.  However, couldn’t local
> policy require different treatment?

[BB] There might still be different bandwidth treatments (e.g. using 
Diffserv), but there is no longer a need to distinguish different 
degrees of /queuing delay/ nor to distinguish different flow IDs, when 
/all/ traffic can have extremely low delay.

>
> ** Section 8.5.
>     There may be some types of traffic
>     that prefer not to use L4S, but the coarse binary categorization of
>     traffic reveals very little that could be exploited to compromise
>     privacy.
>
> After L4S is more widely adopted, that specific application could be identified
> by their _lack_ use of L4S?  In the early days where there is limited L4S use,
> wouldn’t these early adopter stick out?

[BB] Yes to both. But, if you care about privacy, you don't have to be 
an early or late adopter.

FYI, once in full production usage, L4S will be a per-host upgrade*, not 
per application. But  during initial testing, I'm sure it will be more 
on a per-app basis.

* (either to the OS for TCP, or to libraries used for a userland 
transport link QUIC)

Pls see 
https://bobbriscoe.net/tmp/draft-ietf-tsvwg-l4s-arch-20b-DIFF-20a.html 
for a complete diff of everything done to address your concerns so far, 
including the nits below.

>
> Nits
> ** Section 2.  Typo. s/two queues is sufficient/two queues are sufficient/

[BB] Instead, I've explicitly added the implicit "_using_ two queues is 
sufficient and does not require inspection...".

>
> ** Section 5.1.  Per “Cubic [RFC8312] was developed to be less unscalable …”,
> if there a way to rephrase “less unscalable”?

[BB] The only other way I can think of would be 'more scalable', but 
that doesn't get over the sense of 'still not scalable enough'. GIven it 
says what it means, I'll leave it.

>
> ** Section 6.4.1.  Per
>
> An L4S AQM would often next be needed where the WiFi links in a home
>     sometimes become the bottleneck.
> Is this suggesting that L4S AQM isn’t suitable for WiFi links but that is
> future work?

[BB] "isn't suitable" is too strong. But, in general, implementation for 
radio links is more challenging for any queue management technology.
There has been an L4S WiFi access point available for 3 years, and a 
second one was brought to the latest interop (in Philly in July). There 
are also 5G emulations and simulations as yet.

> ** Section 4.6.2.  s/Scalable CC/scalable congestion control approach/
[BB Done

Thank you again.

Bob

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/

[tsvwg] Roman Danyliw's Discuss on draft-ietf-tsv… Roman Danyliw via Datatracker
Re: [tsvwg] Roman Danyliw's Discuss on draft-ietf… Bob Briscoe