Re: [TERNLI] Notes from tonight's ad hoc

Bob Briscoe <rbriscoe@jungle.bt.co.uk> Thu, 03 August 2006 04:47 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1G8V7M-0005b5-32; Thu, 03 Aug 2006 00:47:32 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1G8V7L-0005VP-1C for ternli@ietf.org; Thu, 03 Aug 2006 00:47:31 -0400
Received: from stsc1260-eth-s1-s1p1-vip.va.neustar.com ([156.154.16.129] helo=chiedprmail1.ietf.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G8Nux-0005Hf-8U for ternli@ietf.org; Wed, 02 Aug 2006 17:06:15 -0400
Received: from smtp1.smtp.bt.com ([217.32.164.137]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1G8NMK-00017s-Oj for ternli@ietf.org; Wed, 02 Aug 2006 16:30:30 -0400
Received: from i2kc08-ukbr.domain1.systemhost.net ([193.113.197.71]) by smtp1.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 2 Aug 2006 21:29:21 +0100
Received: from cbibipnt08.iuser.iroot.adidom.com ([147.149.100.81]) by i2kc08-ukbr.domain1.systemhost.net with Microsoft SMTPSVC(6.0.3790.211); Wed, 2 Aug 2006 21:29:20 +0100
Received: From bagheera.jungle.bt.co.uk ([132.146.168.158]) by cbibipnt08.iuser.iroot.adidom.com (WebShield SMTP v4.5 MR1a P0803.399); id 1154550560140; Wed, 2 Aug 2006 21:29:20 +0100
Received: from mut.jungle.bt.co.uk ([10.215.130.80]) by bagheera.jungle.bt.co.uk (8.13.5/8.12.8) with ESMTP id k72KTDHa014928; Wed, 2 Aug 2006 21:29:17 +0100
Message-Id: <5.2.1.1.2.20060802212808.018d68a0@pop3.jungle.bt.co.uk>
X-Sender: rbriscoe@pop3.jungle.bt.co.uk
X-Mailer: QUALCOMM Windows Eudora Version 5.2.1
Date: Wed, 02 Aug 2006 21:29:14 +0100
To: Aaron Falk <falk@ISI.EDU>
From: Bob Briscoe <rbriscoe@jungle.bt.co.uk>
Subject: Re: [TERNLI] Notes from tonight's ad hoc
In-Reply-To: <408642C4-8B8E-4766-A42B-B90D8E508E38@ISI.EDU>
References: <44D0B465.7060902@isi.edu> <20060728125608.670AF444391@lawyers.icir.org> <94176DAD-C2FC-4018-8A15-005606394435@isi.edu> <44D0B465.7060902@isi.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Spam-Score: -1.36 () ALL_TRUSTED
X-Scanned-By: MIMEDefang 2.56 on 132.146.168.158
X-OriginalArrivalTime: 02 Aug 2006 20:29:20.0883 (UTC) FILETIME=[55CF8C30:01C6B672]
X-Spam-Score: -2.6 (--)
X-Scan-Signature: 9a2be21919e71dc6faef12b370c4ecf5
Cc: ternli@ietf.org
X-BeenThere: ternli@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Transport-Enhancing Refinements to the Network Layer Interface <ternli.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ternli>, <mailto:ternli-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ternli>
List-Post: <mailto:ternli@ietf.org>
List-Help: <mailto:ternli-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ternli>, <mailto:ternli-request@ietf.org?subject=subscribe>
Errors-To: ternli-bounces@ietf.org

Aaron,

At 16:12 02/08/2006, Aaron Falk wrote:
>I agree with you so I must not be making myself clear.  What I mean
>is to select those characteristics of the path which are possibly/ 
>generally of interest to transport.  It seems to me as though some of
>those characteristics will be more clearly visible to the network
>(e.g., path), some to the link (or subnetwork, really) (e.g., loss
>rate).
>
>I'm not asserting I have an answer here.  Just probing the problem.

I prefer to talk specifics (but excuse me if I've missed some shared 
context - I missed the ad hoc BoF, but I've tried to pick it up from the 
archive)...

We certainly know what info the *congestion control* part of the transport 
needs: congestion and delay. So for that we merely need to signal the 
information the transport needs in packets. And if either of these values 
changes, it's obvious it's changed. Surely the transport doesn't then need 
a "something's changed" message. Surely the problem is that the netwk layer 
doesn't signal congestion information with enough precision for the modern 
transport's needs.

My personal preference is to keep the DoS-resistant model where lower layer 
info is piggy-backed on packets to the receiver, then the receiver deals 
with feedback.

So we would need the following changes to protocols:
* a multi-bit netwk layer congestion field within the forwarded packet.
* the addition of multi-bit congestion feedback in transport protocols.

A drastic change in load at any router would immediately be visible as a 
change in this higher-precision congestion field (if it's a re-route but 
the new path has the same congestion, the transport doesn't need to know 
anything has changed). A multi-bit congestion field would also quickly 
bootstrap a connection's knowledge of path congestion in the feedback from 
the very first packet.

I'm not just choosing congestion as the signal because we already have it. 
There is a good argument for why it is right (and other metrics are wrong).

Congestion is a probability p in [0,1] which represents the proportion of 
the load unlikely to be able to be served. IMHO, signalling p with more 
precision is *better* than signalling explicit rate information (like 
QS/XCP). p gives as much information as signalling explicit rate infomation 
without the router having to decide who gets what.

For instance, the variable p is the internal variable in the RED algorithm 
that drives the random dice throw to decide whether to {drop | mark ECN} or 
not. I'm saying routers should write this number into the multi-bit 
congestion field of each packet (accumulating it from multiple routers is 
described later).

 From p, the source can work out the excess rate it is sending like this: 
if the local congestion on an interface is p, incoming load is Y and 
outgoing capacity is X, then p ~ (Y-X)/Y. So if every flow getting signal p 
reduced its rate by p%, they would all fit through the available capacity. 
But of course, sources actually do some form of AIMD to allow for new flows 
etc, but the alg is essentially designed to converge on a rate that is p% 
lower than the one being used. Essentially, a signal of p can be reverse 
engineered to get a rate signal.

But the important difference is that p is normalised, so it is 
proportionate to the bit rate sent in each flow but the router doesn't have 
to understand flows. Basically, the router doesn't have to dish out 
bandwidth allocations, which saves it understanding who it is dishing out 
stuff to.

If instead the router dishes out rate to flows (like XCP/QS), sources can 
cheat by splitting one flow into multiple flows. The idea that fairness 
means that all flows should get equal rate is completely daft and should 
never have taken hold - it's just soooo vulnerable to cheating by splitting 
flow IDs.

The other strength of using p, is that it can be combined properly when 
there are multiple bottlenecks. The accumulation algorithm to update the 
header field h as it passes through a router with local congestion p should 
use combinatorial probability
         h  <-    1 - (1-h)(1-p)
So, for instance, if there are two bottlenecks, with p of 5% and 1%, the 
header would end up carrying h = 5.95% (= 100% - 95%*99%).

This way, the traffic matrix packs into the network most efficiently. It 
stems from the meaning of congestion as a probability. Finding the minimum 
bottleneck (like XCP & QS) isn't necessarily the best approach.

Basically, I'm saying
a) Instead of signalling "something's changed", I believe it is more 
practical to continuously signal the "something".
b) What congestion control needs most is greater precision for the 
congestion signal
c) There are strong but subtle reasons for using probability as a universal 
congestion signal that lead to overall simplicity when the whole picture is 
taken into account (which we lose if we adopt QS or XCP).


Bob



____________________________________________________________________________
Bob Briscoe, <bob.briscoe@bt.com>      Networks Research Centre, BT Research
B54/77 Adastral Park,Martlesham Heath,Ipswich,IP5 3RE,UK.    +44 1473 645196