Re: [tsvwg] ECN encapsulation draft - proposed resolution

Jonathan Morton <chromatix99@gmail.com> Fri, 04 June 2021 01:03 UTC

Return-Path: <chromatix99@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 53AA53A21F1 for <tsvwg@ietfa.amsl.com>; Thu, 3 Jun 2021 18:03:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.847
X-Spam-Level:
X-Spam-Status: No, score=-1.847 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1FP--KAL7unH for <tsvwg@ietfa.amsl.com>; Thu, 3 Jun 2021 18:03:00 -0700 (PDT)
Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 35CA93A21CE for <tsvwg@ietf.org>; Thu, 3 Jun 2021 18:03:00 -0700 (PDT)
Received: by mail-lj1-x22b.google.com with SMTP id e11so9399761ljn.13 for <tsvwg@ietf.org>; Thu, 03 Jun 2021 18:02:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=rhB5D5p/2he+kJi7Oy2CtokyJKmfXHm8x6YcZCxO/8U=; b=DxaMINb8Zvz22vJPyxVnHsAOw6AxUrJqP3AFDKSso3mGfvPdByg1Jha4j0EDm+1ytm QueKAUETGGFgfLKX2Gvcui9gYZ8UMN4k0T2sAL0eYPqyyGnEgSxBZ6qBOTK1L9BPNesf 55OrdxLmtC5SYp/EYo2DmzLG/bIvNsjpT0B6wbvE1RTL622cqEuOADidCoKmMTgs++DV Cqh4t1g7TvhUwRjgdfwuNJir18KUsxi7Kv/dhTQhauh0se/uIBJymMb3dWTnhhLs9U8X dvBB4OaGyxfgUEH+oUiT/vOCaaqWAbWzjHq8doSsAasOaYHZNtGapQrSPCUP9yApwvJN TQyA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=rhB5D5p/2he+kJi7Oy2CtokyJKmfXHm8x6YcZCxO/8U=; b=mdcUnUMpD/F7tpilidLN9W8HwyyU0FABQ0uHLWgl7zKONQ4LR/TCDrj4IWUuqcWLSw MBg4M/YUUf3tRKY69kpAUxFpH+yOfsRaNtYGsITyuzZzMB8+nss07hHVf6Y2IoHat+kr vnmM7bBMyYnsZLN86u/qBzFOZZVqnmWZvUcGMZ4Yu9QL1FsqfYWfQ2rYZSL2xWIwcwGY IBmXYYgaveQhY8a1GHfZG0yLvq/GyEoadjVZ4eoQ8ZHbyXJjnhJjC4buY6A0QAxXzpAG nLNkDsynYE0j6CzVcfH69VUHfW1LrRN5N9mTEE/LxuxBz5uM1DulrtaSwpEMg3fRB6lT wQ5w==
X-Gm-Message-State: AOAM531yP58SeinWj7tHw3E7VVoQWyCyskqNfZhS6wFX9sc2ExZpv8W+ 8wJsRJ31rTvSH/tGT4EMaGk=
X-Google-Smtp-Source: ABdhPJzT3U/N3L1IG/y8wtWwwgslkwhPH3KBamR/edIA4biYpbmGRRNF7L+v1h2A9cHW0roTm0g55g==
X-Received: by 2002:a2e:a547:: with SMTP id e7mr1522138ljn.408.1622768577649; Thu, 03 Jun 2021 18:02:57 -0700 (PDT)
Received: from jonathartonsmbp.lan (87-93-133-133.bb.dnainternet.fi. [87.93.133.133]) by smtp.gmail.com with ESMTPSA id u22sm543747lja.5.2021.06.03.18.02.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 03 Jun 2021 18:02:56 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.7\))
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <alpine.DEB.2.21.2106021717300.4214@hp8x-60.cs.helsinki.fi>
Date: Fri, 04 Jun 2021 04:02:55 +0300
Cc: David Black <David.Black@dell.com>, Donald Eastlake <d3e3e3@gmail.com>, John Kaippallimalil <kjohn@futurewei.com>, "tsvwg@ietf.org" <tsvwg@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <BE497F82-5452-41A1-943F-7ABD0048C7F9@gmail.com>
References: <MN2PR19MB40454BC50161943BC33AAAD783289@MN2PR19MB4045.namprd19.prod.outlook.com> <43e89761-d168-1eca-20ce-86aa574bd17a@bobbriscoe.net> <de8d355d-08b6-34fb-a6cc-56755c9a11ee@bobbriscoe.net> <MN2PR19MB4045DB9D2C45066AEB0762DB83259@MN2PR19MB4045.namprd19.prod.outlook.com> <alpine.DEB.2.21.2106021717300.4214@hp8x-60.cs.helsinki.fi>
To: Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>
X-Mailer: Apple Mail (2.3445.9.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/QkCyzwUSKwgcgELelHLoFd2ZyG8>
Subject: Re: [tsvwg] ECN encapsulation draft - proposed resolution
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Jun 2021 01:03:07 -0000

> On 3 Jun, 2021, at 2:41 pm, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org> wrote:
> 
> I'm afraid there is an open/ongoing discussion on this where we have not reached concensus.

I think I agree with your analysis that preserving the number of bytes marked is probably the wrong thing to do.  The implementation complexity argument is fairly compelling in itself, particularly noting that there isn't any known "running code" which would place some upper bound on what the complexity is.

By contrast the RFC-3168 approach of "if any part of the packet is marked, all of it is marked" is very simple to implement, and this is proved by actual implementations in the wild.  This approach is also consistent with the behaviour when ECN is not in use, since if any fragment of a packet is lost, the packet can't be reassembled and is discarded in whole.

I'd like to try and back this position up with some mathematical rigour from a different direction.

One thing I noticed about Bob's position is that he's trying to preserve the number of bytes marked, but usually the transport is actually sensitive to the number of bytes, or possibly the interval of time, *between* marks.  These are two (or three) very different quantities, and it's important not to get them confused in the debate.

In engineering, a very useful tool for checking whether an answer is correct is "dimensional analysis".  Simply put, work out the units that your inputs and outputs are most naturally expressed as, and check that the formulae linking them are consistent with respect to the units.  In this case, we can look at the units of the AQM's marking function, the units of the transport's congestion control steady state criterion, and whether each approach to resolving ECN marks at reassembly preserves the quality of the marking with respect to those units.

For the purpose of this analysis, we are looking for the quantity that is most nearly preserved if we hold the AQM state constant or the transport in steady state, but vary packet size and/or link bandwidth arbitrarily.

Starting with the AQMs:

1: Codel is a time-domain AQM; its marking function is explicitly a frequency, expressed for performance reasons as a time interval between marks.  For analysis, we can equivalently express this quantity in units of marks/sec or its reciprocal, seconds per mark.  If we change the bandwidth or the packet size while holding Codel's control law constant, the number of bytes in marked packets and/or the number of packets between marks will change, but the time between marks will stay constant.

2: A packet-domain marking function (as used in BLUE and many RED-family AQMs) maintains a probability that any given packet will be marked.  Amortised over a large interval, we can express this as a (fractional) number of marks per packet, or equivalently as packets between marks. If the packet rate increases, so will the number of marks per time interval; if the packet size decreases, so will the number of bytes between marks.

3: A byte-domain marking function, as Markku described for early RED work, is obtained by taking a nominally packet-mode probability function, but multiplying the probability by bytes/packet (and a constant scale factor).  The resulting unit is (fractional) marks per byte, or bytes between marks.

Now the transports:

1: DCTCP's steady state is two marks per RTT.  Since RTT has units of time, the steady-state criterion has units of marks/sec.  This is precisely the same as Codel's marking function.

2: Reno's steady state is BDP/2 RTTs per mark, where BDP has units of bytes (the two time terms in the formula cancel).  The steady-state criterion thus has units of byte-seconds per mark.  This does not exactly match any of the AQMs' control laws, but it seems likely that "seconds per mark" or "bytes between marks" would be a better match than "packets between marks".  Regardless, for constant performance with Reno, we would look for methods that preserve the byte-seconds per mark value.

3: CUBIC's steady state, when not in the Reno compatibility regime, is quoted in RFC-8312 as (BDP/MSS) = 1.054 * RTT^0.75 / p^0.75, where p is the (fractional) number of marks per segment, and the quantity BDP/MSS has units of segments.  But each segment, as seen by the transport, is MSS bytes long, and it is ignorant of any fragmentation that might happen at Layer 3 or 2 along the way - so if that occurs, "segments" is not the same as "packets" as seem by the AQM.  So we can convert p into a byte quantity by dividing it by MSS, yielding (BDP/MSS) = 1.054 * (RTT*MSS / p)^0.75. This reveals that for a constant cwnd, a constant byte-seconds per mark value is required, similar to Reno, and we can probably expect this to be true for other "TCP friendly" congestion controls as well.

So now we can turn to the methods of resolving congestion marks at reassembly.  I'll examine three candidate rules that seem relevant to the debate, though others probably exist:

1: "If any fragment of a packet is marked, the whole packet is marked."  This is the RFC-3168 rule, generalised to also cover the weird Layer 2 fragmentation scheme that sparked this debate.  It is clear that this rule preserves the number of bytes between marks and the time between marks (and therefore also the byte-seconds between marks), but not the number of packets between marks - except for the specific case where a fragment contains portions of more than one packet.  If these packets all belong to the same flow, then this is harmless to Reno and CUBIC since consecutive ECN marks are reported to the sender as a single event.  If multiple flows are involved, then this will stochastically amplify the congestion signal by having more than one flow react to a single mark - which seems tolerable in practice.

2: "Every mark is sacred, every mark is great."  The AQM produces one mark, and exactly one reassembled packet receives that mark; the reassembly process can make a reasonable choice as to which one.  This preserves bytes between marks, time between marks, and byte-seconds between marks *exactly*.

3: "The number of bytes marked should be preserved by reassembly."  This would tend to preserve the number of packets between marks better than the other two options, but that is not useful to any of the transports considered.  The distortion to the bytes between marks and time between marks metrics would be distorted proportionally to the difference between average fragmented and reassembled packet sizes, and the byte-seconds between marks measure would be distorted *quadratically* by that difference.

In summary, the "preserve the number of marked bytes" rule is mathematically unreasonable, as well as being much more complicated to implement than either the existing RFC-3168 rule, or the "sacred mark" rule, which actually preserve the quantities that need to be preserved.

 - Jonathan Morton