RE: My BoF report: multipath

"Flinck, Hannu (Nokia - FI/Espoo)" <hannu.flinck@nokia-bell-labs.com> Fri, 23 October 2020 08:05 UTC

Return-Path: <hannu.flinck@nokia-bell-labs.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3224E3A09FD for <quic@ietfa.amsl.com>; Fri, 23 Oct 2020 01:05:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nokia.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wTnnbYXe3SVW for <quic@ietfa.amsl.com>; Fri, 23 Oct 2020 01:05:00 -0700 (PDT)
Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40121.outbound.protection.outlook.com [40.107.4.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4EAFA3A09F8 for <quic@ietf.org>; Fri, 23 Oct 2020 01:04:58 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Yq+q4+XqFNjBGsT1mYYXaG0z4e6EY+Ei4ehHBjW9pY0s+xXMcFilgamegR7wG403YqCq+MPO+lveszjDueEh3F/OOD9bPz4P0J1kUAaEUGpDD+sel3TSyNBV8pxf01RXNuqVpcPfhnVO5OY0hsLORzFuP2I5IKI94o7VXu7y8lRCsj9VpvtNMzhwXyK5gbUWvfICCFWv90oQSjxWIkNFtHdmnl+rTjBZdX8wK4FgL1urkGVLkQyGAEJq1ymAuRj9zllMHZu2VtMXduqkcPu1gXj3s71uLrwn4EN1nhnhz6pJqjLgTJkU8U48OZQ9KPgU8Ra5iZ2sTOOPliDSEYM/cA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RYy9aRIOW5ug3iEj842rHbiF+T08v5nXKzStSjoOgVg=; b=aljRknklGEy5/T+aSgQmxujiX+tSCCI7lsIB+FfwbMPNmX2FlGEImnvMmqCimC8LoBbGvAwUWY+sWHaaSS4ZkJVPNGE+s9jIlkWqBSFYc/R6BhH8Mvazm4aZs/Xs1QFpzm0GzYsUzaR3+c4sXrjRS1rqRX4yU38SLOGkcJ4w14P+kLAvqlf8bUtSpkEpkuS3YpuO5vR6/RHvA6ySBYaWxp1hUJ81s/E9MsrArC5M5YFZgstTosdZiH4bkuxaxxs3X2pw80mZjn5e2w2+6Rz65mLVW5p6nXDURneQBvTU57Xa7j2f9Ur4sZ+T+JAyDIJefQCetlY/kUSAvuTRECnV6w==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nokia-bell-labs.com; dmarc=pass action=none header.from=nokia-bell-labs.com; dkim=pass header.d=nokia-bell-labs.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nokia.onmicrosoft.com; s=selector1-nokia-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RYy9aRIOW5ug3iEj842rHbiF+T08v5nXKzStSjoOgVg=; b=NS/lRJJ854YKamrJ01RZ4FZwtJXEY5DAVdKw9LgUmIMZsuEOE/eO5N9a+kSXyLWXs/PgGrUGwOnOpBjt+dvwuEKN101n8MReMupDpe/5f/17UmBBAyvqcGdRsz/I5JTO5xl1R4bmMsa6tqAUmAJahyGKdxSchFfg+s/VJfZrXfM=
Received: from HE1PR07MB3386.eurprd07.prod.outlook.com (2603:10a6:7:2d::25) by HE1PR0702MB3820.eurprd07.prod.outlook.com (2603:10a6:7:81::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.17; Fri, 23 Oct 2020 08:04:53 +0000
Received: from HE1PR07MB3386.eurprd07.prod.outlook.com ([fe80::f8c8:5ad8:f747:3fbf]) by HE1PR07MB3386.eurprd07.prod.outlook.com ([fe80::f8c8:5ad8:f747:3fbf%4]) with mapi id 15.20.3499.018; Fri, 23 Oct 2020 08:04:53 +0000
From: "Flinck, Hannu (Nokia - FI/Espoo)" <hannu.flinck@nokia-bell-labs.com>
To: Christian Huitema <huitema@huitema.net>, Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>, Martin Thomson <mt@lowentropy.net>
CC: IETF QUIC WG <quic@ietf.org>
Subject: RE: My BoF report: multipath
Thread-Topic: My BoF report: multipath
Thread-Index: AQHWqNBOU95Top9h0U6Q0pC7K71sXqmkhJSAgAAewgCAAC8tcA==
Date: Fri, 23 Oct 2020 08:04:53 +0000
Message-ID: <HE1PR07MB3386AD2C9AA68DDFD9FB234C9B1A0@HE1PR07MB3386.eurprd07.prod.outlook.com>
References: <d84c82b1-fa67-4676-9ce2-d2a53d81b5f7@www.fastmail.com> <CAKKJt-d7iQG-5BsTjR+xKLNt_Ru+h_XpDuaO3JgtAx7+Z3_S0A@mail.gmail.com> <f876d34e-0dbd-be85-62f7-7252328454a8@huitema.net>
In-Reply-To: <f876d34e-0dbd-be85-62f7-7252328454a8@huitema.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: huitema.net; dkim=none (message not signed) header.d=none;huitema.net; dmarc=none action=none header.from=nokia-bell-labs.com;
x-originating-ip: [91.154.146.213]
x-ms-publictraffictype: Email
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: 060d2bb5-61f1-4964-e71c-08d8772a52a1
x-ms-traffictypediagnostic: HE1PR0702MB3820:
x-microsoft-antispam-prvs: <HE1PR0702MB3820BABF75DCF2774BE3010E9B1A0@HE1PR0702MB3820.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:6430;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: Gflb9ikh9/4RL+m1TGslTs5538IborIgpmK7vurRzo86KkuJwXJJG9agqY6kgrvWM4TWtiGTlEZVxres5Rk0c9lmJjDzHcFNA3EkJ6JesK/KkibY/4lS/OFkx11IIFWSlr8jY8l3gzubOZ5FUVzR2D4uIsEf5SsgD+WK67nUlbQcNYk3oD3ic87cbb96+206JWovyk3wT0ic8MV70TN9ug01OkAqmP+pjnOgf1CM67XmNFHOUvai8IdzkxO06d/1aDTgUPJgug6OZMk50Jx1HtPnHYgVdIsE9j4zLm0qTac6xlxL5rbF7H8w0I4IkNxlURC3JqqBOp51gJ3SzkFyEg==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:HE1PR07MB3386.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(396003)(136003)(39860400002)(376002)(366004)(30864003)(66446008)(4326008)(478600001)(55016002)(53546011)(110136005)(316002)(83380400001)(2906002)(6506007)(9686003)(26005)(7696005)(76116006)(66946007)(86362001)(64756008)(66476007)(66556008)(52536014)(71200400001)(5660300002)(186003)(8936002)(33656002)(8676002); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata: gMWQ3jawWVDzrkd8pY41puu4aTHQ8fDXgW6hAgp5HdYzrb7krqEXX6pC89+VzCXRDSopF6WKT1fYWdFUgwPNkRHUieQcIhkMTD7tmlggnn/CJj5iUVpPaiXVb7DCRg9LxuSJn0xrejkAV3bkHrokUPfTH66gRJAyH91m1XBOltpZBtZ0WLFmp9hKXQwf4gN1QUWtPll2wllyJw9NFK5HvKCr+EAe0//b5dp5N8dqYPyArm545294AUZfZ6giv4l1W5DLLOsFPWxke4RzYCB8XNw3ioSSYbCKAJxwcAOp8HG46YJTndlwfxEHAa/G4rWSV6UoaGx6G1+SUKPnoK1+HXYolZl/JfmmAHLYiAym9Es6CYO4NQ0Q6NIWyn90oQ7iowfJxE+Ojg42ip4X//9v4uggup4XptZbFKaeuwmXkRSiEPv5csZwv4jT/G6DPhKWtLgiBrbKVIwXdjJ3PiXUV/iB2dD2ACzc2DI+5j/KMv0NPoFCl6V2+Os3fPnYzaySs9/LLiyeDMBW7TUZR1dywKIW+O3YLvRSBZ3oQmN8CbB2sSF03nkkPYbgp1XKyCEVOCix3ZzPxAMbf5PsakTXMFugOuAxqkXGV792laI6UksC/62mjj+k2gxdJhB6WOx6NYy7J+cuoSu0l3ATMxqt5A==
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_HE1PR07MB3386AD2C9AA68DDFD9FB234C9B1A0HE1PR07MB3386eurp_"
MIME-Version: 1.0
X-OriginatorOrg: nokia-bell-labs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: HE1PR07MB3386.eurprd07.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 060d2bb5-61f1-4964-e71c-08d8772a52a1
X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Oct 2020 08:04:53.0709 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: m7HymInyND3rIJ8tie/+LsEF9nXQ+1cy8u9Cfeo2jjb0L0geGOn3K6zpvx62paK/HQm8P9PqfMGowIJvVBqxNkpIo3tVzly3Bvtj19o5q+nX5MHBorROKCzvLbSfOwfG
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0702MB3820
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/55NY8CkDKOsx5ToYxyP6pTWMWzc>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Oct 2020 08:05:03 -0000

Hello Martin r

You wrote: “.  There might be too much diversity in use cases or a schism in approaches, but we probably could, with sufficient energy, overcome that.  However, I have to conclude that this is not a good time for starting that work.”

Of course there are multiple use cases. If there was only one or two, one would argue that the application space is too narrow.
But ALL these use cases are about two things: 1) load balancing across two or more interfaces and 2) scheduling based on latency.

If you can do those you can implement the “diverse” use cases.

Best regards
Hannu


From: QUIC <quic-bounces@ietf.org> On Behalf Of Christian Huitema
Sent: Friday, October 23, 2020 8:08 AM
To: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>; Martin Thomson <mt@lowentropy.net>
Cc: IETF QUIC WG <quic@ietf.org>
Subject: Re: My BoF report: multipath



On 10/22/2020 8:17 PM, Spencer Dawkins at IETF wrote:
Hi, Martin,

Thanks for this. I have a question and a couple of comments.
On Thu, Oct 22, 2020, 19:06 Martin Thomson <mt@lowentropy.net<mailto:mt@lowentropy.net>> wrote:
(I put a variation of this comment in the meeting and in slack, but I wanted to expand on it some.  Sorry, but this got long.  Four hours is not enough sleep.)

Multipath seems pretty clearly useful for certain cases.  I think that the meeting today answered at least the first two of the BoF questions I posed earlier on the list.  So if we are to regard this as a BoF, it meet its goals (thanks chairs).  There is some uncertainty about the first question about having a clear problem to solve, but I am of the view that we could muddle through with some combination of either ignoring our differences or working around them.  The third question regarding constituency is where I didn't find a satisfactory answer.  I want to be clear though, this is no fault of the proponents.  At the current time, I am convinced that formally starting work on multipath would be unwise.

I've been assuming that targeting Experimental would allow people who care about multipath to work on it as part of the QUIC community without derailing needed standards-track work, rather than working in isolation or in a splinter group. At a minimum, the chairs could give priority to needed standards-track work when that's required to make progress.

Does that make sense?

As Martin writes later in his report, anyone can use the extensibility mechanisms of QUIC and design a multipath extension. I would expect to see several of those. If applications really demand that, then some of those extensions will be deployed, experience will be gained, etc. At that point it might be time to pick draft a standard proposal based on these prototype deployments.

Multipath aims to improve performance either through latency, robustness, or throughput.  Application awareness and involvement in scheduling seemed to be the key factor that enables finding the optimal usage pattern or scheduling algorithm that allow multipath to deliver on those goals.  Applications and users are in the best place to balance goals against other factors like cost or whatever else matters most.  (For reference, I recall the same point being made by Roberto and Christian most clearly, but several others made the same point.)  Christoph did a good job of showing how this applies to very specific use cases, and I thought I saw that in the Alibaba presentation also, but we didn't quite get enough time to get the necessary detail in either presentation.  One potential advantage in this regard is that QUIC implementations are often closer to applications, so they might be in a good position to integrate better.

One of the things I came away with today was an appreciation for at least two kinds of applications - the kind that can do better if they handle multiple paths themselves because the details of the application matters so much, and the kind that don't care as much about - if you're using multiple paths simultaneously just to make use of all your bandwidth, that's a lot easier to delegate to a general purpose transport mechanism.

There are very few applications that "use multiple paths simultaneously just to make use of all your bandwidth". Christopher told us for example how Siri and Apple Music use very different strategies, one to minimize latency in a low bandwidth application, the other to minimize use of the expensive cellular link unless that's required to avoid stalls. Multipath can easily go to "win some, lose some" trade-offs, an in particular trade-offs between bandwidth and latency, or bandwidth an cost of operation.

However, many of the cases that were presented were exactly the sorts of opaque intermediation that is almost the antithesis of that ideal.  Similarly, David's assertion that multipath is orthogonal to MASQUE is reliant on the assumption that application involvement is not that important.  In these cases, it's not clear that using multipath is strictly good.

I should unpack that a little.  For those people who are making scheduling decisions outside of the endpoint (possible examples being the satellite case and the 3GPP case), it's not clear that this is anything endpoints can prevent.  An endpoint probably can't stop a network provider from using ECMP either.  Similarly, it is not clear how an application endpoint could be aware of these decisions at a level that would allow them to understand and adapt to this treatment.  The result is that these cases have a far more ambiguous value proposition.  Improvements come with trade-offs: for instance, the application might get better throughput, but it comes at a cost to latency.  So I conclude that while these intermediary-based designs might provide an aggregate gain, they will probably not realize the full performance gains that come from end-to-end awareness and control.

For IETF insiders, see also the BANANA or LOOPS BoFs which were strictly network-based analogues of these.  Many of the same concerns that caused those BoFs to fail apply to those use cases.

Maybe we accept the application of the protocol to these questionable ends as acceptable collateral if we are able to deploy at the endpoints.  Maybe we allow intermediaries to seek marginal improvements, but try to ensure that we have a clear path to deploying something better in the long term.  But there is a risk that deployment in the network could interact poorly with more-ideal end-to-end solutions and even prevent those deployments.

ISTM that this is extremely helpful observation. The presentation I gave talked about short-term synergy (can't do multipath QUIC without a QUIC stack, right?), but you're raising a really important question about opt-in and bypass for intermediaries, that people who care about intermediaries should be thinking about.

At a minimum, transparent interception for QUIC intermediaries will be a lot harder than it was for TCP intermediaries ...

One of the priorities in QUIC development was to restore end-to-end sanity by getting rid of middle-box interference. ATSSS appears to be yet another design of middle boxes, so you can imagine that there is very little enthusiasm for that scenario.

These are systems-level questions that are large in scope and subtle in their effect.  I think that it will require considerable energy to resolve them.  Or, as seems more likely in my experience, it will take more time and effort to design a protocol where there are fundamental disagreements about the nature of the deployment models.

However, this isn't the only factor.  We are not deciding on the merits and value of multipath in a vacuum.  It was pretty clear that multipath has potential, at least in principle, or in certain cases.  I'm also mostly convinced now that we could produce a design.  There's some uncertainty, but it seems like we could tolerate that.  QUIC definitely wasn't a sure thing when we started out, I can't expect any large effort to be risk free.

So, with some uncertainty about uses cases, I might still conclude that we have satisfactory answers to the first two BoF questions.  My concern here is about the third: constituency.

What I think is most important at this point is understanding if this protocol will remain a single, coherent thing.  That we can keep building on the "synergies" that Spencer referred to.  No matter the technical merits of the protocol (it's great! probably!) that synergy is probably the most important feature that this working group has delivered with QUIC.  The details of the protocol matter less than the fact that we have a group of people committed to building and maintaining that protocol.  This working group needs to be the venue where work happens so that this community can continue to build on this success.

So for multipath, if we take it on, I'd only like to do so if I was convinced that a non-trivial proportion of the active deployments are committed to working on it and deploying the new extension or version.  That is, that this community wants to do the work.  I see no evidence of that yet, which is why I will claim that this fails to satisfactorily answer that third BoF question.

It is very easy for a splinter group to define a new version of QUIC that does anything.  draft-deconnick or draft-huitema could be the basis of that sort of effort and that could result in the definition of QUIC 84 or 0x0219c81 or whatever.  Call it QUICv2 if you really want.  But if that protocol is only used in certain narrow contexts, then it doesn't produce any of those synergies.  On the contrary, it works to undermine them, so I would prefer to avoid that.

So rather than ask whether multipath is doable, I think we need to instead decide what the QUIC working group - the group that built the core protocol - is doing next for that core protocol and the deployments that depend on it.  Personally, I don't think that we're ready for another large project.  We need deployment experience with the protocol.  We also need to go in and backfill those pieces of QUIC we need for the next thing, like version negotiation.  For me, that's more than enough.

We had a discussion recently on Slack, comparing the size of the early prototype implementations, maybe 3,000 LOC, and the size of the current implementations, often around 30,000 LOC. We discovered and solved a lot of complex issues during the standardization process, and the increased code size is an indicator of that additional complexity. Every additional feature interacts with other features, potentially in a multiplicative ways. Add multipath, and expect the code size to grow significantly yet again. Hence the call for caution.

I've now seen a lot of enthusiasm for the idea of multipath.  There were some great presentations with convincing use cases.  There might be too much diversity in use cases or a schism in approaches, but we probably could, with sufficient energy, overcome that.  However, I have to conclude that this is not a good time for starting that work.

I realize that this is likely unsatisfactory to those who want multipath.  I also recognize that deferring work when there is such clear demand could result in that demand manifesting in a bunch of non-interoperable protocols.  Those are risks that we each have to assess for ourselves.

This will change over time.  I don't know how long it will take.  But it's not now.

Yes. I don't see how we can have a generic multipath solution now, not without having some clear consensus from application developers about the features that they want.

-- Christian Huitema