Re: [CCWG] [tsvwg] Network feedback and security (was: New Version Notification for draft-huang-tsvwg-transport-challenges-00.txt)

Sebastian Moeller <moeller0@gmx.de> Fri, 20 October 2023 08:05 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <AM8PR07MB8137643CB3BAA31CC5EA9BC6C2DBA@AM8PR07MB8137.eurprd07.prod.outlook.com>
Date: Fri, 20 Oct 2023 10:05:23 +0200
Cc: "Shihang(Vincent)" <shihang9=40huawei.com@dmarc.ietf.org>, Christian Huitema <huitema@huitema.net>, Tom Herbert <tom@herbertland.com>, tsvwg <tsvwg@ietf.org>, "ccwg@ietf.org" <ccwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <0F46FABB-726B-4F2A-B078-C1DD7F9C4C80@gmx.de>
References: <8c04c73ed6424da7a8c0a560ba673f63@huawei.com> <AM8PR07MB8137064565A08222EF49D6FFC2D6A@AM8PR07MB8137.eurprd07.prod.outlook.com> <FR2P281MB152725788020368B2BB483A39CD6A@FR2P281MB1527.DEUP281.PROD.OUTLOOK.COM> <2A095262-F757-4995-9A30-38E915E92021@gmx.de> <FR2P281MB15270FF6D523A75427CDE6419CD6A@FR2P281MB1527.DEUP281.PROD.OUTLOOK.COM> <7B7D6B62-F78E-4400-81EB-31DFB8445E17@gmx.de> <2c4d3b5f-8b4d-e427-8095-6de26f91f543@huitema.net> <CALx6S37=pFnh_61LF6YhXD5wimNa4Dsb7WoUMRSUdSymZ2QjMA@mail.gmail.com> <f6b4a845-5f2c-b83a-519e-d3c30b052d95@huitema.net> <CALx6S34V8C_YhB2UWJDP9TKFbgCz3EGDUwHDe8kwcmQCbPWZuQ@mail.gmail.com> <E251131D-BD02-457D-A982-C2EC84F766DB@gmx.de> <c35c4bf8-4f50-4c55-1c19-0b3978566114@huitema.net> <f0b66bbc9b3a449ab3034e2d5b1640df@huawei.com> <AM8PR07MB8137643CB3BAA31CC5EA9BC6C2DBA@AM8PR07MB8137.eurprd07.prod.outlook.com>
To: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
UI-OutboundReport: notjunk:1;M01:P0:6fJkeL7gj5M=;DHjRPD/qYdPrR8EfsD+1mBcnQpc oFC8ADX8UOnrf6TnnMkU4v6uP1WJn9g3tJBxNF9Nthoc4sfmtVl3YZZgnHlwm9d4V+ZjKHFPW MMufIlgjzxcvGjyilLxay/5H0omBT5QDUag6uV71/8Jawj7AGry/LjAcY1QeX8zLTyM4O0yLs nELNqLcazOH3qesrwagHPTSUv/tWOgepIYokj5+SRqv6PdZa62ArL/VGoqskm1G2r8ZnRTZ0k oRGBTNpyanyEFdQt7vkqK8Cjd3umG3f7NLpz987e1sQE+Me05USoF7jaU7uSp2NgH9bYyN32F Cs9rdz7++23iBFz4OhQXQUEgyGAaxams9deOfGgclBoiR/4jdFZq4rpHOtO6FU8qkPRZXShqt 1wjiqTifvj57LFThsbjq1jUzUZ0C4KR2er6pQutrs1Bos/zXtC7j0E2q6pLE/qE5LLOsdsoai JTttbHDe+/cYb/wF0Lkh5kcZVIaMHQK4fhRHWP63arPw6GjsUs5n0HjXZVHMLGayKLcU2TQOD DN1xLVbVEYKSp/3wPCVAxXYjgDPOldDP5L7H8lBiIyN9oXYY/pSESG6K2BK3k7kT3YpSm0cFr lSneNuPTgUrY7yDgSa0pf87pk1Rg+kkaLMLyYrchFjLzZmL0yVLbivIcMtwS99rw6yKIGXptF iJbKr/6WFpwzvlYWVWlP2q0Zx9XK8XtwvHxB7bpeh4n1vISoZplPEQfkwLuY3zSi4QTiTO5cx oLuA+0ierl/Qw21FHOJyeHpvqc5A67dV+sY8yKLGhP5FwaQDJaNFTL4YNS5i8yDZgK9fDKr9K QkwuCHMkZ3gKaPI8z+VJ11cOqz6ZIfu0FrvlWcsYcqjN/F7Vr+3+Oyygg9gH1EB4OuuOUVas7 RrL9ebuGthx9kb5Jmbm8agwiMdFPhfIgis3g9zxwQo2w8AhKBAOGdaStRl+Dl4IXKNQD+afeP ryd0lFOPDJ/rJH8eHhiJFMDQYRk=
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccwg/tUSGAK8r1I1MGpP_rcsWXbD96Uk>
Subject: Re: [CCWG] [tsvwg] Network feedback and security (was: New Version Notification for draft-huang-tsvwg-transport-challenges-00.txt)
Precedence: list

Hi Ingemar,


> On Oct 20, 2023, at 09:03, Ingemar Johansson S <ingemar.s.johansson@ericsson.com> wrote:
> 
> Hi
> 
> A question (that will probably give many answers)
> Is this something that is more for the ICCRG?. Or perhaps some parts are in ICCRG and other parts belong elsewhere?.
> 
> The multi-bit approach is to me a bit futuristic as it requires space in some yet to defined header. 
> 
> And then comes the dicussion how to define the actual congestion marking. Is it based on measured queue delay or something else?,

	[SM] Looking at the proposals, there clearly is not one measure to rule them all at the moment, but experimentation using different measures, so the proposals seem to aim at making the newer header generic enough to allow for different measures. I am not sure that for a standards track RFC this would be the best idea, but to allow the required experiments I think it would be excellent if the IETF would standardize a header as that should make experimentation much more efficient (it would/should allow to quickly change measures to see which is most robust and reliable). This is completely orthogonal to existing protocols and will have little side-effects, so I see no reason not to ratify it. I think however we should look far enough ti ratify something that at least has the potential fir  use over the internet (which e.g. would rule out L2 headers).

> and how should it interact with the endpoints?.

	[SM] Same here this is ongoing research, the current goal should be to define the method to encapsulate the required pieces of information end to end and make it possible for network nodes to change these.


> The latter definitely sounds like ICCRG to me,

	[SM] I fully agree.

> while the former requires work in other groups (T.B.D) as it need to deal with both the protocol aspects as well as the security concerns around it.

	[SM] I think that TSV is not a bad place given that we discusss already how/if UDP option might be a suitable "carrier".


> PS. I guess we can discuss the ups and downs with L4S like.. forever. Only experinence from deployment and live testing will show. 
	
	[SM] Again I agree, where we differ ist that I would have liked to see that data before ratifying things, as L4s clearly has side-effects. And it is the side-effects that concern me. If L4S would be completely orthogonal I would happily ignore it (unless and until it could be demonstrated to be worth its complexity).

> My experience so far is very positive. We have however not been overly open with publishing results for various reasons, hopefully we can do this better in the future. 
> You can anyway find a small collection of results here
> https://github.com/EricssonResearch/scream/blob/master/L4S-Results.pdf?raw=true 

	[SM] Thanks for posting that.

> 
> /Ingemar 
> 
>> -----Original Message-----
>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Shihang(Vincent)
>> Sent: Friday, 20 October 2023 08:19
>> To: Christian Huitema <huitema@huitema.net>; Sebastian Moeller
>> <moeller0@gmx.de>; Tom Herbert <tom@herbertland.com>
>> Cc: tsvwg <tsvwg@ietf.org>; ccwg@ietf.org
>> Subject: Re: [tsvwg] Network feedback and security (was: New Version
>> Notification for draft-huang-tsvwg-transport-challenges-00.txt)
>> 
>> Adding CCWG into the loop since the CCWG charter says:
>> "The congestion control expertise in the working group also makes it a
>> natural venue to take on other work related to *indications of
>> congestion such as delay, queuing algorithms, rate pacing, multipath,
>> interaction with other layers*, among others."
>> 
>> Hi Christian,
>> Please see comments inline marked as [HS]
>> 
>> Thanks,
>> Hang
>> 
>> -----Original Message-----
>> From: tsvwg <tsvwg-bounces@ietf.org> On Behalf Of Christian Huitema
>> Sent: Thursday, October 19, 2023 2:06 AM
>> To: Sebastian Moeller <moeller0@gmx.de>; Tom Herbert
>> <tom@herbertland.com>
>> Cc: tsvwg <tsvwg@ietf.org>
>> Subject: Re: [tsvwg] Network feedback and security (was: New Version
>> Notification for draft-huang-tsvwg-transport-challenges-00.txt)
>> 
>> 
>> 
>> On 10/18/2023 8:40 AM, Sebastian Moeller wrote:
>>> Hi Tom,
>>> 
>>> 
>>>> On Oct 18, 2023, at 16:43, Tom Herbert <tom@herbertland.com> wrote:
>>>> 
>>>> On Tue, Oct 17, 2023, 6:29 PM Christian Huitema <huitema@huitema.net>
>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 10/17/2023 9:09 AM, Tom Herbert wrote:
>>>>>> On Tue, Oct 17, 2023 at 8:28 AM Christian Huitema
>> <huitema@huitema.net> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On 10/17/2023 3:17 AM, Sebastian Moeller wrote:
>>>>>>>> Personally, ever since I read Arslan, Serhat, and Nick McKeown.
>> ‘Switches Know the Exact Amount of Congestion’. In Proceedings of the
>> 2019 Workshop on Buffer Sizing, 1–6, 2019. I came to the prediction that
>> something like max(bufferoccupancy) over a network path in either
>> predicted sojourntime or percentual buffer filling could really improve
>> congestion control if included in all packets. That paper and follow ups
>> on that idea are to my quite convincing. I see no real security concern
>> as the network node adding this information might as well have dropped
>> the packet if it wanted to harm that flow, so little increase in attack
>> surface in that direction. What I hope something like this might allow
>> is a gentler exist from slow start (assuming one can measure and compare
>> the dynamics of the congestion indicator with those of the slow-starting
>> flow to predict when to exist slow start without first having to dump ~2
>> too much data into the network in one RTT). The network could indirectly
>> profit if end-points use this information to better reign in congestion
>> pro-actively, but that clearly is not guaranteed, especially over the
>> open internet.
>>>>>>> 
>>>>>>> Yes, network feedback is useful. You mention "max(bufferoccupancy)
>>>>>>> over a network path" as a signal, but I would argue that ECN is
>>>>>>> pretty much a
>>>>>>> 1 bit version of exactly that. So your argument is really for
>>>>>>> "more ECN bits". That's plausible, but the whole "L4S/Prague"
>>>>>>> effort seems to indicate that 1 bit is enough, because you can
>>>>>>> observe that bit on many consecutive packets. The conservative
>>>>>>> side of me would say we should first deploy the 1 bit version, and
>>>>>>> then study whether we really need many more bits.
>>>>>> 
>>>>>> Hi Christian,
>>>>>> 
>>>>>> https://datatracker.ietf.org/doc/draft-ravi-ippm-csig might be
>>>>>> relevant here. I believe they are using more than one bit of
>>>>>> information in the network to host signals. Their use case is
>>>>>> clearly intended for use in the datacenter not over the Internet.
>>>>>> One of the problems we see in the DC is that congestion events may
>>>>>> be short lived. If one bit of information means we can only make a
>>>>>> decision only after multiple packets and multiple RTTs then that
>>>>>> might be too long. In some cases even waiting a single RTT for a
>>>>>> congestion indication might be too long! >>
>>>>>>> The "gentler exit from slow start" issue is largely addressed by
>>>>>>> Hystart, which uses the variations of the RTT as a signal. And
>>>>>>> yes, Hystart can be improved by also monitoring the ECN bits, not
>> just the RTT.
>>>>>>> 
>>>>>>> But there are in fact security concerns. We already know about
>>>>>>> ICMP attacks in which attackers spoof the network feedback and
>>>>>>> inject fake ICMP packets, for example to disrupt PMTU discovery.
>> "Man on the side"
>>>>>>> attackers can easily send a second copy of a packet with modified
>>>>>>> header bits and race it to the destination. If transports accept
>>>>>>> that copy as genuine, they will heed the faked congestion signals
>>>>>>> and the attacker will succeed in disrupting the connection.
>>>>>> 
>>>>>> I think this is also where we might see some divergence when the
>>>>>> transport is running in a limited domain versus the open Internet.
>>>>>> In a limited domain datacenter the devices in the path might be
>>>>>> trusted so that network to host signaling might be reasonably
>> secure.
>>>>> 
>>>>> Maybe. But what if 100,000 of your new best friends are also running
>>>>> VMs in the same datacenter? I suppose that there will be firewalls
>>>>> and other filters, but when it comes to security the "limited
>>>>> domain" arguments are very suspect.
>>>> 
>>>> Hi Christian,
>>>> 
>>>> If traffic isolation is failing to the extent that VMs can intercept
>>>> unrelated traffic or inject packets into other tenants networks then
>>>> you have bigger problems than someone forging congestion signals!
>> 
>> In principle you are right, but it is certainly a failure mode.
>> 
>>>>> The 1 bit ECN is probably fine, because it is a slow signal. The
>>>>> attacker would be forced to race a lot of packets, for a modest
>> result.
>>>>> But the more signal bits you allow in the header, the most effective
>>>>> attacks become.
>>>> 
>>>> That seems like a variant on security by obscurity, in this case
>>>> security by being slow :-). And being slow for an attacker means it's
>>>> also slow for everyone else. AI/ML workloads running in the
>>>> datacenter are going to need faster reaction time to congestion than
>>>> several RTTs.
>> 
>> First, we should recognize that ECN is essentially a 1 bit version of
>> what Sebastian is asking. I am not enthusiastic with the L4S approach
>> that rigidly encodes a signal into the frequency of EC marks, but I can
>> certainly go with the general idea that repeated EC marks signal more
>> congestion, and that absence of EC marks over some period signals "all
>> clear". With the expectation that congestion control algorithms will
>> combine EC marks with delay measurements and with packet drop
>> measurements.
>> 
>>>> I think there's only two options for securing a network to host
>>>> signals and host to network signals: authenticate the signals or
>>>> confine the signals to a trusted limited domain.
>>> 
>>> 	[SM] That would essentially make this useless for better congestion
>> control over the internet. Authentication is not an option, nor is
>> limited domain (with the scope being "over the internet"). That would be
>> a rather sad state of affairs. With ECN/L4S we already created the
>> "security" issue, the only question I have is: Is adding (a bit more)
>> more resolution to the per-packet congestion information really an
>> unacceptable security trade-off? For my expected use-cases I would say
>> that trade-off seems OK, but others might differ.
>> 
>> The more powerful the signal, the more you will need authentication. I
>> believe that ECN might just squeeze in, because it is a narrow signal
>> and faking it might not be worth the expense. But if the signal can get
>> the equivalent of ICMP unreachable or some such, then yes no serious
>> endpoint will trust that without authentication.
>> 
>> 
>>>> For network to host signaling,
>>> 
>>> 	[SM] Well, advertising the current congestion state is not really a
>> "signal" the network nodes expects immediate responses too, no? As a
>> best effort thing, does this actually need strong security? The actual
>> node writing this information might have already dropped the packet, so
>> can do way more harm much easier, and any man-in-the-middle/replay
>> attack would need to get hold of a packet, modify the congestion
>> magnitude value and have it delivered before the original packet... if
>> the attacker is also able to corrupt the original packet so it is not
>> delivered at all, we are back at the first situation. I assume that
>> protocols used over the internet by now all have grown a
>> duplicate/replay rejection method, no?
>>> 
>>> 
>>>> like congestion indication or IOAM, the only viable option seems to
>>>> be to restrict to a limited domain. Having the network authenticate
>>>> to hosts doesn't seem scalable since we have a few nodes trying to
>>>> authenticate to 100Ks hosts.
>>> 
>>> 	[SM] I agree on the second. The first would be unfortunate.
>> 
>> Even limited domain is questionable. Yes, you can conceive of setups in
>> which everything sent or received in the local domain is tightly
>> controlled, but in QoS applications "limited domain" is often much
>> looser than that, e.g., an ISP network. I think this group should focus
>> on solutions that work on the whole Internet.
>> 
>> We might want to explore the concept of "honest signals". In human
>> communication, honest signals are those that are either hard to fake. In
>> networking, transmission delay are hard to fake: yes, nodes could put
>> some packets on the slow path to fake a long delay, but that's hard and
>> costly. Packet losses are somewhat hard to fake, because the overall
>> loss rate will affect the reputation of the ISP. Painting bits in
>> packets, on the other hand, is not that hard to fake. Or could be bugs,
>> like these reports of the CE bits being set for every packet on a given
>> path.
>> 
>> If we want stronger signals than delays or losses, maybe we need a
>> mechanism to monitor usage of these signals and have negative
>> consequences for network providers that misuse them.
>> [HS] Not only packet loss will affect the reputation of the ISP, the
>> latency will too. Once in a while, Tencent published the network
>> condition from the most famous mobile game viewpoint(loss, latency etc).
>> By releasing the report, they are putting the ISP of each province in an
>> arena to compete. If we have more congestion indication from the
>> network, I am confident that application can add these to the report too
>> and catch the ISP that misuses the congestion indication(for example,
>> fires way more congestion signals comparing to its peers). An
>> application can correlate the multi bits congestion indication with the
>> latency/loss rate to validate the signal. Just like the ECT0 Count, ECT1
>> Count, ECN-CE Count used in the TCP accurate ECN and QUIC.
>> 
>>>> For host to network signaling, authentication may be feasible where
>>>> network nodes can authenticate the signal from a host the network
>>>> originally gave the signal to the host (draft-herbert-host2netsig).
>>>> In this case the network authenticates data that it created and is
>>>> authorized for use to a host.
>>> 
>>> 	[SM] Here I agree if the information is to influence the network
>> behavior it needs to be hard to abuse, either by being essential for
>> packet delivery (addresses and ports mostly fall into this category, and
>> routers already evaluate these for the end hosts) or by cryptographic
>> methods.
>> 
>> Maybe, but in practice network to host signalling is mostly used for
>> differentiating services. Most app developers clearly do not want that.
>> They deploy encryption which is a big step towards enforcing network
>> neutrality.
>> 
>> I think we miss an "all clear" signal, and we should concentrate on
>> that. Congestion control algorithms know how to slow down. They struggle
>> to ramp up fast enough after congestion has eased. Some kind of "all
>> clear" convention would do that. Maybe it could be ECN. Seeing no mark
>> at all for some time, or some number of packets, could mean that. That
>> would work if in stable mode the network marks some packets, to
>> distinguish the "stable" condition from "all clear".
>> 
>> -- Christian Huitema
>

Re: [CCWG] [tsvwg] Network feedback and security … Shihang(Vincent)
Re: [CCWG] [tsvwg] Network feedback and security … Ingemar Johansson S
Re: [CCWG] [tsvwg] Network feedback and security … Sebastian Moeller
Re: [CCWG] [tsvwg] Network feedback and security … Shihang(Vincent)