RE: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)

Marcus Ihlar <marcus.ihlar@ericsson.com> Thu, 14 May 2020 13:52 UTC

Return-Path: <marcus.ihlar@ericsson.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3DD4A3A082C; Thu, 14 May 2020 06:52:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.774
X-Spam-Level:
X-Spam-Status: No, score=-1.774 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.173, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, GB_ABOUTYOU=0.5, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=ericsson.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Lvsb-p565nyE; Thu, 14 May 2020 06:52:20 -0700 (PDT)
Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-eopbgr130085.outbound.protection.outlook.com [40.107.13.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 499FE3A07E9; Thu, 14 May 2020 06:52:19 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=alFKfdCikS62oNhXqvbPLfUd2IJkkTw0pu6wNLhzY2HsZTcBJO2hV/cgywM10SCEyjR33wT1VUii8cCO6oCe0m6egboGXRe822BEE795KYFRcCUYy0Yvs40AThj3zfpV3vun+okB3l4CW9h4rXZY1NgV5QHVAeeZl2aGfGV1q5fE3rkjjsxF+oB/4kR3SztFUG43Oq5UIbhplZ8l8/l8kcPWQFSBR7/I30GxAzvG5QY7XQg+plWw9YUjUfJElmiSgcNByBJqqKiDkcxz0OjJtx3g+/7QI2cavyXdez9JOObVV6Nn3ud8D5yuQ3uwm76XXCrmH2+SJuQ4iMTEbntHmw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AMLuwl0DQss73wsVcm+Lx7CsYhVN11nw7f4BPhblzlc=; b=ZXbonIotULqYhauQehD6c6vqd1vL3xakE15kkGl5s6oBLluEzQuyD5kj9FUocbuIAUzEuVOxJDz9vDgnF5tQYNfwlLd/JkKgO1Cv3ys4ZaVpOOxX9q+bo5aHr4e3ZX44AW2x/EoixrXQL5w4a+/yP7Nml/JQNQwUnZ+Pge/mGLuUszGBX14z+7kjThasZDR8fbL/PYMrgymfmDx/Yf8Z4XUYq+1fN3Nml0NtnfkQsTHYnJSNPyYmi8JikES0ZpWzIEjI8t3e/KXI6AvZwYABJYeDBztHKuWn7Eq1NDvF49CoEAQJZ2bFkGEE3a+4qhvKIoDpIp6L/Sk/fEcKTchkAQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=ericsson.com; dmarc=pass action=none header.from=ericsson.com; dkim=pass header.d=ericsson.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AMLuwl0DQss73wsVcm+Lx7CsYhVN11nw7f4BPhblzlc=; b=uOnxRfc3qxuXi3/YBdITF+ohgt//YHRw0EOvUCfE/6OPHnJcIzsZlNOAPkeboatM53vwNdFq7sWnvEOyNBfkRWCoy9j9TkPWuV2M2U34oYgTKTwuxem2s3dBxxwm4Nfla0INDJbB4vCkVlAbmn/LI1YbKVRk6qsqteHJb+1PTp0=
Received: from HE1PR07MB4426.eurprd07.prod.outlook.com (2603:10a6:7:a1::15) by HE1PR07MB3209.eurprd07.prod.outlook.com (2603:10a6:7:32::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3000.19; Thu, 14 May 2020 13:52:16 +0000
Received: from HE1PR07MB4426.eurprd07.prod.outlook.com ([fe80::8de4:f9d4:1e81:6c79]) by HE1PR07MB4426.eurprd07.prod.outlook.com ([fe80::8de4:f9d4:1e81:6c79%5]) with mapi id 15.20.3000.016; Thu, 14 May 2020 13:52:16 +0000
From: Marcus Ihlar <marcus.ihlar@ericsson.com>
To: "fabio.bulgarella=40guest.telecomitalia.it@dmarc.ietf.org" <fabio.bulgarella=40guest.telecomitalia.it@dmarc.ietf.org>, "ilubashe=40akamai.com@dmarc.ietf.org" <ilubashe=40akamai.com@dmarc.ietf.org>, "ianswett@google.com" <ianswett@google.com>, "draft-cfb-ippm-spinbit-measurements@ietf.org" <draft-cfb-ippm-spinbit-measurements@ietf.org>
CC: "isabelle.hamchaoui@orange.com" <isabelle.hamchaoui@orange.com>, "quic@ietf.org" <quic@ietf.org>, "alexandre.ferrieux@orange.com" <alexandre.ferrieux@orange.com>, "ippm@ietf.org" <ippm@ietf.org>
Subject: RE: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)
Thread-Topic: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)
Thread-Index: AdYNfRCPTsr9cJpyT2q4st79g+H+ngAYPt+AAocS/gAAN+A5AAAtC9yAAPWAoIACGMOCAAEKa1yAAABupWA=
Date: Thu, 14 May 2020 13:52:15 +0000
Message-ID: <HE1PR07MB4426E089C7D7D5183367E970E2BC0@HE1PR07MB4426.eurprd07.prod.outlook.com>
References: <3ca3b5aae01d4650a3451639268b3f1e@TELMBXD14BA020.telecomitalia.local> <CAKcm_gMEELBizN_h5+s3Ow0LKXEgTRGg+-AqzJMZXVBDwQcDLA@mail.gmail.com>, <6b9e74ac94114d28ae4a66f1e9625ebd@usma1ex-dag1mb6.msg.corp.akamai.com> <1587582698016.72632@guest.telecomitalia.it>, <676f723b239b42ab9992780c440c5297@usma1ex-dag1mb6.msg.corp.akamai.com> <1588081856900.50391@guest.telecomitalia.it>, <ca220795058b472b9910beb35d65569c@usma1ex-dag1mb5.msg.corp.akamai.com> <1589461715896.14688@guest.telecomitalia.it>
In-Reply-To: <1589461715896.14688@guest.telecomitalia.it>
Accept-Language: sv-SE, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: dmarc.ietf.org; dkim=none (message not signed) header.d=none;dmarc.ietf.org; dmarc=none action=none header.from=ericsson.com;
x-originating-ip: [192.176.1.85]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 64eb72e5-04b2-4572-b6b9-08d7f80e0315
x-ms-traffictypediagnostic: HE1PR07MB3209:
x-microsoft-antispam-prvs: <HE1PR07MB3209FA0C121A4FE6612531F2E2BC0@HE1PR07MB3209.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:2582;
x-forefront-prvs: 040359335D
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: /AXtwvN11wjgFQ0Lm0J+A0ub99bD9rS5TS2uqx3UL5Ro5b4P1ZPluWY5lYA/3zEdRir+mQtHMuCPUlufzmKbhxzInfwyfMGMueUPy1UTVkVajSf9zP5QkrkV2kSyHsESNWCEA9qoolris7VXTLxouyBrLgg97wjlhsKyX2glzXg17zfhAC6T/k7+aU1ClfRI0ymJjP+/feAS4B2O4eWZPu+Cx47e3R/qULpO5B2kwGs8K7SqJiJgJESZ0gCt7s52TCE6LZ9WGYF/ReSeC6CCYVzx5TGUPsStXBGW2Wlo/DuDBDvjrqWCM5rlEESfzia+jniwwa8ePAMmRMLnaVcszBwBhRNZwarYfL696fJKvkKBhlQa6UFzwk+3EtPjzAkcZQVgEA2sN8JeFrEP7ZiEhT72q/H6GgP4AwtgkiBbRgtVm2p9yL8OI1axKGoK+8F0Zkl9vzMH7z+9gSG9pRq+ve7VO1cBpT0BkEq1obViX5nUHtUn3CfdMqOJpfmSvnrGiyuHBF0gKxMioa1KgvKc5qk/g1bDVchbD5uby8mVJu/wO+PKYibXDMZhaCixTDD+
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:HE1PR07MB4426.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(396003)(346002)(376002)(39860400002)(136003)(366004)(8936002)(66946007)(8676002)(44832011)(71200400001)(33656002)(66446008)(2906002)(64756008)(86362001)(5660300002)(76116006)(66556008)(66476007)(6506007)(478600001)(966005)(9686003)(52536014)(30864003)(54906003)(53546011)(316002)(4326008)(55016002)(186003)(7696005)(66574014)(26005)(110136005)(559001)(579004)(554374003); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: cYOk6oQo/MAPQVby6wXcap4PjL7gDZ91YAL8SC+WB6Q/o2WswG5KihNFAbyebyMyuElZ+buF7NQPn0FTYWqHHkyNec9WUgs48FXCw3dAx82N9OCK8xRE7I3A9jJbDevY1Gz7E0PsYz9udtHoxr+Ses80kO+JQfara8at1dJqfQIglHpH8Xyjp+DwzKiPwTQFpkvvlO0EQozC7oVky1pMG4+B/FGoVgbIm4ggMKVG4kltk2VBAgL9wvgyKEb9yVKSe6TqVX3Xb0q6++8DLLbCW0A2eq5/CmiIBwH0Qp/xgjccRTb7bP7UF4pl2lovQ2jWFboRsYsRLGJb/MJuwlRCAhaPQodooubJcuAfbDlz7/SB2jEBnTuEQHihLkbBxW1WvS4LXrWOlObywDnChktKAgUmjNpie3nXQbqVY9mENumbRFVwAvC7XToMlRrX79xRfiBPoAg1PRYHDG4Z8qHlQK7Hy25OPZkeyIGt1+cIXmE=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: ericsson.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 64eb72e5-04b2-4572-b6b9-08d7f80e0315
X-MS-Exchange-CrossTenant-originalarrivaltime: 14 May 2020 13:52:15.9750 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 92e84ceb-fbfd-47ab-be52-080c6b87953f
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: d+oQjn+UWhIE79NlO+NdqChEmczMz8gtvcFYWruiU/mV7SkD4/FF4RoHY1KYoKvf0Xn1lJ0HP1XaXv4Rugm2Z6iplYgHcLARRAU5tE5r+nU=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR07MB3209
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/cq1JAr95RsXk3gvcLGF0rYRfU2c>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 May 2020 13:52:24 -0000

Hi,
I agree with the original problem that we need a good way to do packet loss measurements for both aggregates and individual flows.
I'm not yet in a position to say which methodology is the most beneficial one from the set of cases I'm interested in (SLA assurance and troubleshooting mainly).

In Spindump we have support for both measurement methodologies and could quite easily extend it to observe loss using both methodologies simultaneously.
So I like the approach that Fabio is describing here, perhaps this could be a good Hackathon project?

Marcus Ihlar

-----Original Message-----
From: QUIC <quic-bounces@ietf.org> On Behalf Of Bulgarella Fabio (Guest)
Sent: den 14 maj 2020 15:09
To: Lubashev, Igor <ilubashe=40akamai.com@dmarc.ietf.org>; Ian Swett <ianswett@google.com>; draft-cfb-ippm-spinbit-measurements@ietf.org
Cc: isabelle.hamchaoui@orange.com; quic@ietf.org; alexandre.ferrieux@orange.com; IETF IPPM WG (ippm@ietf.org) <ippm@ietf.org>
Subject: Re: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)

Hi Igor and all QUIC/IPPM WG members,

I think it's clear to everyone that both systems have advantages and disadvantages. In this email exchange, we both expressed our views on it.
At this point we would ask to WG members if they think it would be interesting to see a comparative test between the two methodologies.

Our proposal is to implement both solutions on a common implementation of the QUIC protocol so that the same data flow can be marked using both techniques (i.e., Qbit, Rbit and Lbit - borrowing the third most significant bit occupied by the spinbit) and analyzed using the two observers.

Waiting for your feedbacks on this proposal, Best regards.

Fabio, Mauro, Massimo, Giuseppe and Riccardo


________________________________________
Da: Lubashev, Igor <ilubashe=40akamai.com@dmarc.ietf.org>
Inviato: sabato 9 maggio 2020 08:00
A: Bulgarella Fabio (Guest); Ian Swett; draft-cfb-ippm-spinbit-measurements@ietf.org
Cc: isabelle.hamchaoui@orange.com; quic@ietf.org; alexandre.ferrieux@orange..com; IETF IPPM WG (ippm@ietf.org)
Oggetto: [EXT] RE: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)

Fabio,

I think we are in a complete agreement that any loss reporting is a very useful extension - both for monitoring the network and for troubleshooting individual flows. I think we agree that L-bit and R-bit are better than nothing.  The disagreement is about the relative fitness for the purpose and ease of deployability.

I would also welcome other people to weigh in with novel insights.  Maybe more people will catch up on both drafts before the next IETF.

I am happy to get into the details of each point (inline), but my biggest concern is aligned with Dmitri's concern that:

   (a) Simultaneous captures/observations by different organizations of infrequently occurring events is a futile proposition, and
   (b) A very large percentage of internet flows is asymmetric.

As a result, a method that relies of either of these properties is limiting its utility for troubleshooting customer problems.

(The rest of the comments are inline)

> On Tue, April 28, 2020 at 9:51 AM, Bulgarella Fabio (Guest) <fabio.bulgarella=40guest.telecomitalia.it@dmarc.ietf.org> wrote:
>
> Hello Igor,
> thanks for your reply.
>
> This exchange of e-mails is very useful to understand which of the two
> solutions has the best balance between positive and negative aspects,
> at least in the measurement scenarios of interest for telecommunication operators. The important thing is that one of the two solution is chosen and a packet loss feature is finally included in the Quic protocol (after the necessary checks in terms of privacy and security).
> However, we can already say, from the point of view of Telecom Italia and Orange, that the Qbit is a fixed point of this type of measures.
> It remains to decide which is better between L-bit and R-bit.
>
> First of all, let's remember the reasons that led us to design the Rbit.
> We found some weaknesses in the Lbit:
>
> - the LBit logic is strictly tied to the protocol implementation.
> For example in LSQUIC there are 6 different hook points where this
> variable is set. QL performance mechanism logic is intertwined with
> the protocol logic. So, the concern here is that future changes in the
> protocol or in new implementations need a careful analysis of the code
> to avoid different behaviors and different measurement results;

The quality of the implementation of each stack is mostly related to the implementor.  The behavior for both L-bit and R-bit is specified by the standards, so if the standards are implemented correctly, no analysis of the implementation should be needed.

> - loss of packets carrying only ACK frames (are detected?).
> We saw in the Montreal demo (using Ericsson SpinDump tool), the last July, that if Quic packets only contain ACK frames the Lbit is not set.
> Actually, during a file download we inserted losses in the upload direction that were not detected by the system.
> Alexandre confirmed this behavior. Have you changed something since then?

With QL-bits, each endpoint is free to implement or NOT implement the marking.  The beauty of the QL scheme is that if an endpoint chooses to mark packets, those markings are sufficient to signal upstream and downstream loss of all packets sent by THAT endpoint. In the implementation you are referring to, it looks like the client endpoint did NOT choose to implement QL markings, so its packets (pure ACKs) did not provide loss signaling.  If the client did implement QL, those pure ACKs would report losses all the same.

> - the loss measurement signal is inaccurate in case of losses:
> The Lbit is a simple signaling protocol. If you miss a Lbit marked
> packet you miss a signal of loss. If losses are not equally
> distributed between L-marked and not L-marked packets, it's very
> likely that the produced measurements are wrong or at least inaccurate. The two approaches are completely different. Rbit works exactly as the Qbit. It counts the lost packets, it's not a signaling. So, as the Qbit, the correct percentage of lost packets is detected.

There is no difference between QL- and QR- schemes in regard to the signal quality in case a loss of packets with the signal. As you note, we discuss in the draft that it is conceivable for a somewhat different loss ratio for packets w/ L=0 and L=1 for QL-bit scheme. Likewise, it is conceivable for a different loss ratio for packets belonging to "shorter" and "longer" R-trains for QR-bit scheme. After all, when there are losses to report (shorter R trains), the ACKs will be sent more frequently, increasing the chance of the return path congestion.

> Now some inline answers to your considerations, even though at this
> point it would be convenient to organize a comparison involving also
> other people interested in the topic, maybe in a next IETF meeting. We
> are also available to carry out comparative tests on the same flows (marking simultaneously with both solutions) using for example the Ericsson SpinDump tool which has already been used in the previous hackathons as QUIC performance measurement tool.
>
>> Strong reordering exists.  That's why we are recommending the minimum Q to be 64. This comes from real data, not lab data.
>> If 5 packets were enough to tell the end of marking periods, we would had recommended minimum Q to be 8.
>
> The length of the period is not chosen only in function of reordering but also to avoid that burst losses completely cancel one or more periods.

That's right, that is a factor also.  However, we have not observed losses that obliterate several dozens of consecutive packets, though I am sure they exist somewhere.  Consistent packet reordering of that magnitude we did see.

> However, stimulated by Igor's concerns, we thought of an improvement regarding the tolerance of our system against out of sequence.
> In a nutshell, it would be possible to intercept reordered packets
> received after the detection of an incoming "Qbit marking period", using them to update the computed average until the current "Rbit marking period" is completed in generation.

Sure, though the receiver endpoint has access to the packet numbers and knows the loss rate exactly and can signal it directly.

> After the necessary laboratory tests, we could add this new feature to the next revision of our draft.
>
> R signal is still going to be significantly delayed, especially with
> slower connections (not every connection is going to be streaming
> media at a sustained rate of 15Mbps). So if for a 15Mbps average-rate connection you can expect to see R-signal with a delay between 8-RTTs (1:2 ACK) and 38-RTTs (1:10 ACK), the delay increases to 24-RTTs and 114-RTTs for a 5Mbps average-rate connection.
>
> As already said in the previous mail, based on our tests, this is not
> a big problem because it's still possible to obtain enough measurements to evaluate the packet loss of a session. Moreover, the measurement delay is shortened when packet loss occurs.

It depends.  For a large media download it is probably enough data for generic network monitoring.  It would not work for shorter flows.  Also, as I showed above, even with an ack2 policy, you may get a 38-RTT delay for very realistic network media flows, which will make troubleshooting specific flows difficult.

>> L-bit signal provides a crisp shape of the loss - whether it is random loss, or packets are lost of batches (and how many packets are in a loss batch).
> By contrast, R-signal only indicates an average loss over 2 to 10
> Q-bit marking periods.  The share of the loss is erased. I think for that alone, L-signal is more useful for troubleshooting.
>
> Although we calculate the E2E loss through an average (only if the
> reflection is on a slower channel than the signal reception channel), we are still able to accurately calculate the loss using a correction factor (see https: //tools.ietf.org/html/draft-cfb-ippm-spinbit-measurements-00#section-7.4).

I think you did not understand the concern here.  The specific shape of the loss may be very useful for a network troubleshooting.  QL-signal allows one to observe the precise shape of the loss.  QR-scheme only gets you an average rate over multiple RTTs.

In fact, even if the loss is on the slower channel and reflection is on the faster channel, QR only gets an average loss rate rather then the precise shape of the loss.

>> This is an important difference, and it seems accurate on its
>> surface.  In reality, every QUIC implementation will keep track of packets and declare them lost for the purposes of data retransmission and/or congestion control.  The timing and details of when packets are declared lost (L-bit signal) will vary.
>> Likewise, every QUIC implementation will send some ACKs, but their
>> timing and frequency (R-signal) will vary.  R-signal quality will also heavily depend on the "marking period detection" implementation quality and "Q-signal averaging" implementation details.
>
> See first point in the L-bit weeknesses paragraph. Moreover, the two
> functions "marking period detection" and "Q-signal averaging", really simple in implementation, are completely detached from the rest of the protocol logic and their behaviors are expressly indicated in the draft.

There is no particular L-bit weakness here.  See my comment above - the precise timings of the signals will vary in both approaches.  The timing of L-bit will vary based on the sender's loss detection.  The timing of R-bit will vary based on the receiver's ACK policy (and also the details of their "reordering mitigation" algorithms).

>> Also, accurate accounting with QR scheme requires correlation of both directions, with the R-bit direction significantly lagging the Q-bit direction.
>> It should be understood how this scheme behaves in cases of changing Connection IDs.
>
> We do not see the problem of implementing the solution on both endpoints since generally the client, unlike the server, does not have scalability problems.
> So if you implement it on the server it costs nothing to implement it on the client as well.

There is no scalability concern for the QL implementation at all.  It is just two counters that are incremented and decremented.

The real concern is developer skills and developer resources.  Servers typically have substantial development organizations that will get it right. Except for a handful of most popular desktop and phone clients, however, clients have very limited developer support and update story. A lot of bits are pushed by clients built into various streaming devices, smart TVs, set top boxes, BluRay players, and some such.  Their implementations tend to be minimal and somewhat sloppy, and their upgrade path is sporadic at best.

Best regards,

- Igor

________________________________________
Da: Lubashev, Igor <mailto:ilubashe=40akamai.com@dmarc.ietf.org>
Inviato: giovedì 23 aprile 2020 18:41
A: Bulgarella Fabio (Guest); Ian Swett; Cociglio Mauro
Cc: Riccardo Sisto; mailto:isabelle.hamchaoui@orange.com; mailto:quic@ietf.org; mailto:alexandre.ferrieux@orange.com; IETF IPPM WG (mailto:ippm@ietf.org)
Oggetto: [EXT] RE: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)

Fabio, thanks for the reply.

* "Moreover, with a strong amount of reordering the Qbit (and the SpinBit) would not work either."

Strong reordering exists.  That's why we are recommending the minimum Q to be 64.  This comes from real data, not lab data.  If 5 packets were enough to tell the end of marking periods, we would had recommended minimum Q to be 8.


* R-bit signal delay.

R signal is still going to be significantly delayed, especially with slower connections (not every connection is going to be streaming media at a sustained rate of 15Mbps).  So if for a 15Mbps average-rate connection you can expect to see R-signal with a delay between 8-RTTs (1:2 ACK) and 38-RTTs (1:10 ACK), the delay increases to 24-RTTs and 114-RTTs for a 5Mbps average-rate connection.

* R-bit signal quality.

L-bit signal provides a crisp shape of the loss - whether it is random loss, or packets are lost of batches (and how many packets are in a loss batch).  By contrast, R-signal only indicates an average loss over 2 to 10 Q-bit marking periods.  The share of the loss is erased.

I think for that alone, L-signal is more useful for troubleshooting.


* L-bit signal depends on each implementation's determination of when to "declare a packet lost" (which may differ for each implementation), while R-signal depends only on more generic behaviors.

This is an important difference, and it seems accurate on its surface.  In reality, every QUIC implementation will keep track of packets and declare them lost for the purposes of data retransmission and/or congestion control.  The timing and details of when packets are declared lost (L-bit signal) will vary.  Likewise, every QUIC implementation will send some ACKs, but their timing and frequency (R-signal) will vary.  R-signal quality will also heavily depend on the "marking period detection" implementation quality and "Q-signal averaging" implementation details.


* End-to-end loss: "We can also place two independent observers wherever on the two directions and they will still produce coherent E2E measures."
* End-to-end loss: "by observing only one direction we can measure the [...] the end-to-end in the opposite direction (Rbit).

Correct, as long as both endpoints implement the entire QR scheme (even if this is an unusual setup - measuring path A requires observing path B).

Also, accurate accounting with QR scheme requires correlation of both directions, with the R-bit direction significantly lagging the Q-bit direction.  It should be understood how this scheme behaves in cases of changing Connection IDs.

Best wishes,

* Igor


From: Bulgarella Fabio (Guest) <mailto:fabio.bulgarella=40guest.telecomitalia.it@dmarc.ietf.org>
Sent: Wednesday, April 22, 2020 3:12 PM
To: Lubashev, Igor <mailto:ilubashe@akamai.com>; Ian Swett <mailto:ianswett@google.com>; Cociglio Mauro <mailto:mauro.cociglio@telecomitalia.it>
Cc: Riccardo Sisto <mailto:riccardo.sisto@polito.it>; mailto:isabelle.hamchaoui@orange.com; mailto:quic@ietf.org; mailto:alexandre.ferrieux@orange.com; IETF IPPM WG (mailto:ippm@ietf.org) <mailto:ippm@ietf.org>
Subject: Re: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)

Hello Igor,
here some thoughts about your concerns.

1. If big differences in delay are present between two parallel paths on which the same connection is divided, this is certainly an unwanted behavior, because it could create problems to the protocol itself even before damaging the performance measurements. As a network operator, we try to avoid such scenarios. However, our mechanism works correctly in most cases and is still resistant to normal reordering. Moreover, with a strong amount of reordering the Qbit (and the SpinBit) would not work either.


2. It's true that your mechanism allows the measurement of E2E and upstream loss of the observed direction. However, these measurement methodologies are born with the idea of providing information on the performance of the entire connection, as was done with the SpinBit. Actually with our mechanism, when observing a single direction, we still are able to measure the E2E loss of the opposite channel (using the R-bit).


3. The ACK ratio of 1:10 should only occur in the absence of loss. By introducing loss, the ratio between ACK and data is certainly better. So the quality of our signal improves just when it is needed; we don't care if we get delayed measurements when everything is working great. This behavior is confirmed by our laboratory tests where we have seen that loss can be easily detected in an extremely accurate manner even in the case of highly unbalanced flows.

L-Bit has an RTO delay but to evaluate an accurate loss ratio you have still to observe a significant number of packets.


4. First of all, our solution is not dependent on any specific protocol implementation unlike the QL and therefore there shouldn't be problems related to the quality of the implementation. In other words, our portion of code is always the same, in every protocol implementation.
Secondly, as anticipated in point 2, by observing only one direction we can measure the losses in the operator's domain (Qbit) and the end-to-end in the opposite direction (Rbit). So, our solution does work also for asymmetric path segments. We can also place two indipendent observers wherever on the two directions and they will still produce coherent E2E measures.

The QL solution does not give any information about the opposite direction, it only measures what it observes, that is the current direction. Having two independent measures, one per direction, can be an advantage in some aspects, but a disadvantage for others.


Best regards,
Fabio B..


________________________________________
Da: ippm <mailto:ippm-bounces@ietf.org> per conto di Lubashev, Igor <mailto:ilubashe=40akamai.com@dmarc.ietf.org>
Inviato: martedì 21 aprile 2020 18:31
A: Ian Swett; Cociglio Mauro
Cc: Riccardo Sisto; mailto:isabelle.hamchaoui@orange.com; mailto:quic@ietf.org; mailto:alexandre.ferrieux@orange.com; IETF IPPM WG (mailto:ippm@ietf.org)
Oggetto: [EXT] Re: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)

I like the QR draft from the perspective that it mostly works (and lots of credit is due for things that work).

I have a few concerns, though.

1. The most trivial one is that QR is actually harder to implement correctly for endpoints, given the tricky processing required to identify when a marking interval has ended.  Experimental data that Akamai & Orange compiled shows that there are networks that do ECMP among links with significantly different latencies.  They will cause a lot more reordering to a burst of packets than the recommended "observe 5 packets (in a row?) of the opposite markings" rule can address.  You might need to potentially defer the decision on where exactly the marking period boundary is until you've seen almost half of the next marking period.  Requiring every endpoint to implement this right is tough.

2. Since the most important direction (at least for downloads and media streaming) is server-to-client, to get this signal with QR, you need both endpoints implementing the logic, with the heaviest lift on the clients (see above).  For QL, a client that does not want to implement QL logic but is ok with the loss data reporting can just update the TP and header protection mask - all the lift is on the server side.

3. I am concerned with the quality of the signal you get from QR.  While the upstream loss signal is strong (same Q-bit), the downstream/end-to-end loss signal is weak and is likely to result in very delayed and inconclusive signal.  Google's experience (as I understand it) and protocol research (https://urldefense.proofpoint.com/v2/url?u=https-3A__erg.abdn.ac.uk_-7Edownloads_ackscaling.pdf&d=DwMF-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=Djn3bQ5uNJDPM_2skfL3rW1tzcIxyjUZdn_m55KPmlo&m=u3hS0QnHLn0-P-Lmy8-Yhm7L0G310RjwTQfw2XLvfJk&s=8qb_Zcm3YASCVrMZiZ5Le_3_nPxd8AC7bUbVwXl4L9I&e=) point to the benefit of ACK-ing only 1:10 packets after the initial slow start (draft-fairhurst-quic-ack-scaling).  Since such behavior has a beneficial impact on networks, mobile devices, and endpoint cpu, and no negative impact on the protocol performance, I expect it to be in common use.  With 1:10 ACK ratio, R-bit signal is significantly delayed as it is lagging the loss by at least 1400*(Q=64)*2*10=1.8MB of data transferred, which is at least 38 RTTs for a 15Mbps stream over a 25ms link (L-bit signal is delayed by an RTO).

4. QR scheme only works for symmetric path segments, when the observer can capture both directions of the flow. You also cannot compute end-to-end loss in any direction, unless both endpoints implement the entire QR algorithm..  In contrast, QL works for single-direction observers, and whichever endpoint implements QL algorithm ensures end-to-end loss signal for the packets it sends, and the quality of that loss signal solely depends on the quality of that endpoint's implementation (not the quality of the other endpoint's implementation).  QL seems more deployable.

* Igor


From: Ian Swett <mailto:ianswett@google.com>
Sent: Wednesday, April 8, 2020 3:44 PM
To: Cociglio Mauro <mailto:mauro.cociglio=40telecomitalia.it@dmarc.ietf.org>
Cc: mailto:quic@ietf.org; Lubashev, Igor <mailto:ilubashe@akamai.com>; mailto:alexandre.ferrieux@orange.com; mailto:isabelle.hamchaoui@orange.com; Riccardo Sisto <mailto:riccardo.sisto@polito.it>; IETF IPPM WG (mailto:ippm@ietf.org) <mailto:ippm@ietf.org>
Subject: Re: [ippm] New QUIC Packet Loss Measurement (draft-cfb-ippm-spinbit-measurements)

Thanks for sharing this.  From my perspective, this is an improvement upon the previous proposal in terms of robustness.  This design feels very similar to the spin bit, and seems trivial to implement, which are also nice properties.

Also, QUIC doesn't really have retransmissions, so the 'R bit' name always made me a bit uncomfortable.

I'm not saying QUIC should adopt this yet, but I'd be interested in seeing a privacy analysis completed for it.

Ian

On Wed, Apr 8, 2020 at 4:25 AM Cociglio Mauro <mauro.cociglio=mailto:40telecomitalia.it@dmarc.ietf.org> wrote:
Dear QUIC WG Members.

We submitted to IPPM WG a draft where we described a new 2 bit packet loss methodology (https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dcfb-2Dippm-2Dspinbit-2Dmeasurements-2D01&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=Djn3bQ5uNJDPM_2skfL3rW1tzcIxyjUZdn_m55KPmlo&m=e_BQ3UDNSWkXu_ID1rkGv3CCZs89sezu-6Ibkvltf7Y&s=UeTW_fLvxHy_ewVvm0y6kIqZoXUR8PYWgxAOxmQScYw&e=).
The measurement of packet loss is under discussion in QUIC WG and our draft introduces two alternatives about it.
The first one is a spin-bit dependent signal and uses a single bit. The second one, described in the following linked slides, is a standalone solution based on a two bits loss signal and on alternate marking (RFC8321)..
This last methodology improves, in our opinion, the algorithm proposed by Orange and Akamai described in https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dferrieuxhamchaoui-2Dquic-2Dlossbits-2D03&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=Djn3bQ5uNJDPM_2skfL3rW1tzcIxyjUZdn_m55KPmlo&m=e_BQ3UDNSWkXu_ID1rkGv3CCZs89sezu-6Ibkvltf7Y&s=V649Uqu8mfuBWvMRBTUkpEWfpEqlLfufGs67XXzvNFo&e=.

The 2 bits are the sQuare bit (Q-bit) and the Reflection square bit (R-bit)..
The Q-bit doesn't change from the Ferrieux-Hamchaoui draft but the R-bit substitutes the L-bit.
This avoids the L-bit dependence from an internal protocol variable, a problem raised in the last QUIC interim meeting.

You can find the slides, we prepared for the IPPM interim meeting, at the following link: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ietf-2Dippm_meeting-2Dmaterials_blob_aa486f7ead28b9f55c5ef499e5c9ed33ab06daf9_ietf107-2Dvirtual_Slides_07-2Dspinbit-2Dmeasurements.pdf&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=Djn3bQ5uNJDPM_2skfL3rW1tzcIxyjUZdn_m55KPmlo&m=e_BQ3UDNSWkXu_ID1rkGv3CCZs89sezu-6Ibkvltf7Y&s=3l6t3FYQn0iz_xX0sxNFKUnh7bwYFMOpQ0z_zNhXHpg&e=

Comments and suggestions are always welcome.

Of course we are available to present our proposals in the next QUIC WG meeting or to arrange a Webex side meeting if needed.

Best regards.

Mauro, Fabio, Giuseppe, Massimo and Riccardo


_____________________
Mauro Cociglio
TIM - CT.TA.EI
Via G. Reiss Romoli, 274
10148 - Torino (Italy)
Tel.: tel:+39%20011%20228%205028
Mobile: tel:+39%20335%20766%209751
_____________________


https://urldefense.proofpoint.com/v2/url?u=https-3A__on.tim.it_banner-2Dmail-2Ddip&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=Djn3bQ5uNJDPM_2skfL3rW1tzcIxyjUZdn_m55KPmlo&m=e_BQ3UDNSWkXu_ID1rkGv3CCZs89sezu-6Ibkvltf7Y&s=z55QpRg48UYrhUvomUNAlkMCUxdQT20Mz1VhZkNrLQI&e=
Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie.

This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks...

Rispetta l'ambiente. Non stampare questa mail se non è necessario.
_______________________________________________
ippm mailing list
mailto:ippm@ietf.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_ippm&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=Djn3bQ5uNJDPM_2skfL3rW1tzcIxyjUZdn_m55KPmlo&m=e_BQ3UDNSWkXu_ID1rkGv3CCZs89sezu-6Ibkvltf7Y&s=YF4GnFEgwmSakC8Bv_763sIZZR3g8t7wdsuiOl6C8fU&e=