Re: [tsvwg] Slides to support discussion of draft-ietf-tsvwg-rfc6040shim-update

Bob Briscoe <ietf@bobbriscoe.net> Wed, 08 April 2020 23:04 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BB7BA3A1965 for <tsvwg@ietfa.amsl.com>; Wed, 8 Apr 2020 16:04:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tcYJeEnc7JZn for <tsvwg@ietfa.amsl.com>; Wed, 8 Apr 2020 16:04:43 -0700 (PDT)
Received: from cl3.bcs-hosting.net (cl3.bcs-hosting.net [3.11.37.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DA4FA3A1964 for <tsvwg@ietf.org>; Wed, 8 Apr 2020 16:04:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=9+2EuSs/vce2ZoPQQzybyDK6lX3KHxoMxbSTv8cWjbA=; b=c2N5DW3reEqqD91g2GD74k8vWq uIERYwhMTfd1DGPB234bLp/Li3IHpe0fSAB0phg2xvV8HVv1Pcg/x+Rkcf+FHtuUeXp0/ha4p1tAX H+hRc2+jmesBwljPcqgU7BKbLk+fYwMJiFm3iFuygwC5m/nvnP1UBnebEXEcj2aFbMuhmEsUtJVBq 70iWLKyHrAp21kRKCTAARCSpnu9ACynFZOc7clwwecci0umjgZjW8/Wk2DZFht63jcGOZ1fO1sQqc btXinxNNHIO3IpS8AH/srTmaqDeC8EBBKo2moKJNyBTiJWp0qRo/r6re89PdcPV8y0EYMh56628rk fy3SoQeA==;
Received: from 92.40.248.35.threembb.co.uk ([92.40.248.35]:29081 helo=[172.20.10.2]) by cl3.bcs-hosting.net with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from <ietf@bobbriscoe.net>) id 1jMJkO-005NHt-RS; Wed, 08 Apr 2020 23:04:40 +0000
To: Jonathan Morton <chromatix99@gmail.com>
Cc: tsvwg IETF list <tsvwg@ietf.org>
References: <5f207793-e26a-30eb-3243-f2d9bb5d4430@bobbriscoe.net> <F1A42D3C-6D5D-42F1-944E-412D8015B024@gmail.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <8641d39b-9a35-ad88-6e99-7b723ae6b55b@bobbriscoe.net>
Date: Thu, 9 Apr 2020 00:04:40 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <F1A42D3C-6D5D-42F1-944E-412D8015B024@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - cl3.bcs-hosting.net
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: cl3.bcs-hosting.net: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: cl3.bcs-hosting.net: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/PJHlLEjLLFszZMpjPO-ZNOorNtY>
Subject: Re: [tsvwg] Slides to support discussion of draft-ietf-tsvwg-rfc6040shim-update
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Apr 2020 23:04:45 -0000

Jonathan,

On 08/04/2020 17:30, Jonathan Morton wrote:
>> On 8 Apr, 2020, at 4:54 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> I've produced 4 slides that might be useful to support the fragment reassembly discussion on draft-ietf-tsvwg-rfc6040shim-update
>> http://bobbriscoe.net/presents/2004ietf/2004rfc6040update-shim.pdf
> I'd like to get out ahead of the main discussion over the fragment reassembly semantics, because these slides contained some analysis of that problem which I believe to be erroneous.  I was going to mention this during the meeting, but David Black rightly pushed this topic off to mailing list discussion.
>
> It's also worth remembering that this discussion is technically off-topic for rfc6040shim-update, since the latter is supposed to be dropping fragment reassembly semantics as out of scope.  But I think I should present an alternative analysis now, so that we can converge on the truth more quickly.
>
> The analysis as presented appears to proceed along the following lines:
>
> 1: CE marking is modelled as a uniform, steady-state probability over all packets, regardless of packet size or AQM type.
>
> 2: Traffic sources utilising different packets sizes are contrasted as to fragmentation behaviour when encountering a tunnel, and their packet rate for constant throughput, both on the tunnel path and after reassembly.  Specifically, 1500 byte packets are fragmented into full and runt packets, 1480 byte packets are not fragmented and have the same origin packet rate, and 750 byte packets are not fragmented but have twice the packet rate.
>
> 3: There is a well-known equation relating Reno average cwnd to marking probability, and another relating segment size, cwnd (in segments) and RTT to flow throughput.  This is applied to find the relative behaviour of the above traffic sources when experiencing the same, uniform marking probability, somewhere on the tunnel path.
>
> 4: Another scenario involving an FQ-AQM is presented, in which the throughput of each flow is equalised and the marking rate and probability calculated from that.  It is pointed out, unsurprisingly, that these end up being different for each flow.
>
> 5: The conclusion is explicitly drawn that RFC-3168's existing rule for preserving CE marks on fragment reassembly "is broken".  Implied is an assertion that it needs to be materially changed.
>
> Steps 2, 3, and 4 appear (at first glance) to be sound.  However, the modelling assumption in step 1 is flawed, and I think this torpedoes the conclusion.
>
> The central assumption is that all AQMs mark packets with uniform probability, including those which calculate a timebase marking schedule (eg. Codel) instead of a marking probability (eg. RED).  The analysis as presented is valid only for the latter case.
>
> When timebase marking is in use, the relative sizes of the packets being considered for marking becomes relevant.  A runt fragment of 40 bytes occupies the head position in the queue (where Codel does its marking) for a different length of time than the full 1500-byte fragments either side of it, and this strongly influences the relative probability of it being marked.  Likewise a stream of 750-byte packets will each spend only half the time at the queue head as an otherwise similar stream of 1500-byte packets, so their probability of marking is halved in the same timebase schedule.
>
> Therefore, it is incorrect to convert a timebase marking rate to a uniform marking probability, when packets of significantly different sizes are involved.  A different analysis must therefore be run to establish the effect of a shared timebase AQM on the tunnel path.

I had looked at the code to check this. When a timer fires, it's the 
/next/ packet to reach the head that gets marked. Not the one being 
dequeued when the timer fires. Usually two fragments will get onto the 
wire back to back, with the runt behind. Then, what happens depends on 
if it's FQ or shared queue, and of course on the specifics of 
implementations, but in general:

* If FQ, the likelihood of marking a runt vs a full-size fragment will 
be roughly the inverse of their sizes (e.g. 40/1540 for the larger 
packets and 1500/1540 for the runts). Then RFC3168 reassembly will still 
roughly double the marking probability for fragmented flows vs 
non-fragmented.

* If shared queue, the runts are more likely to be marked than any other 
packet, because they will generally follow their larger sister fragment. 
The relative likelihood of marking of the rest of the packets will 
depend on how much of the packet mix are fragments. If there's a decent 
amount of non-fragments, the marking probability for the rest of the 
packets will be at least 'fairly equal' to each other.

.....
However, this is all beside the point. This just argues that the effects 
I have laid out will not be quite as pronounced as the simple 
approximations I have used. But the effects will still be there.
* None of what you've said argues /for/ preserving the time of each 
marking during reassembly.
* None of what you've said argues /against/ preserving probability 
rather than timing.

With shared buffers:
* Preserving probability will remove the unfairness effects.
* Preserving timing will tend to have some degree of unfairness effects.

With per-flow scheduling:
* There are (obviously) no unfairness effects anyway.
* The reason I put up the FQ example wasn't fairness, it was to 
illustrate that there is nothing special about the particular time 
between one mark and the next - because flows B&C with identical average 
packet sizes and packet rates converge on different times between marks.

Do you have an argument /for/ preserving the time when individual marks 
occur, given it causes unfairness in shared queue AQMs (even if not to 
the degree my simple calculations have predicted)?

If you haven't got such an argument, it would be helpful if you could 
say so. Then we can move on.

>
> Another implicit assumption is that AQMs apply a static marking rate (in whichever paradigm they choose) over a timescale of many seconds.  In fact, they typically react to changes in queue depth over timescales of milliseconds.  This has the effect of concentrating marking events on the flows with highest throughput at the moment congestion occurs, at the peak of each flow's sawtooth.  This is a further factor which may invalidate the assumption of uniform marking probability, even for probabilistic AQMs.

During a dynamic episode of higher congestion, the fragmented packets 
will still end up with a higher incidence of marking, than if they had 
been identical but non-fragmented packets.

You're not arguing for time-preserving reassembly any more. You're 
picking on second order issues with my approximations. Please try to 
stay focused on the question at hand.

>
> I hope the above will help to inform the main discussion about RFC-3168 semantics, when that is taken up.

That argument is happening here and now. There's no point writing a 
draft for a new reassembly RFC, if my arguments are flawed.


Bob

>
>   - Jonathan Morton
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/