Re: [conex] byte vs packet counting

Bob Briscoe <bob.briscoe@bt.com> Tue, 13 December 2011 20:12 UTC

Return-Path: <bob.briscoe@bt.com>
X-Original-To: conex@ietfa.amsl.com
Delivered-To: conex@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 43D5521F8AF2 for <conex@ietfa.amsl.com>; Tue, 13 Dec 2011 12:12:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.349
X-Spam-Level:
X-Spam-Status: No, score=-2.349 tagged_above=-999 required=5 tests=[AWL=1.250, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZD2cdrPBfxn3 for <conex@ietfa.amsl.com>; Tue, 13 Dec 2011 12:12:57 -0800 (PST)
Received: from smtp4.smtp.bt.com (smtp4.smtp.bt.com [217.32.164.151]) by ietfa.amsl.com (Postfix) with ESMTP id 6090D21F8A70 for <conex@ietf.org>; Tue, 13 Dec 2011 12:12:57 -0800 (PST)
Received: from i2kc08-ukbr.domain1.systemhost.net ([193.113.197.71]) by smtp4.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 13 Dec 2011 20:12:54 +0000
Received: from cbibipnt08.iuser.iroot.adidom.com ([147.149.100.81]) by i2kc08-ukbr.domain1.systemhost.net with Microsoft SMTPSVC(6.0.3790.4675); Tue, 13 Dec 2011 20:12:54 +0000
Received: From bagheera.jungle.bt.co.uk ([132.146.168.158]) by cbibipnt08.iuser.iroot.adidom.com (WebShield SMTP v4.5 MR1a P0803.399); id 1323807192749; Tue, 13 Dec 2011 20:13:12 +0000
Received: from MUT.jungle.bt.co.uk ([10.142.64.78]) by bagheera.jungle.bt.co.uk (8.13.5/8.12.8) with ESMTP id pBDKCokX014681; Tue, 13 Dec 2011 20:12:50 GMT
Message-Id: <201112132012.pBDKCokX014681@bagheera.jungle.bt.co.uk>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Tue, 13 Dec 2011 20:12:50 +0000
To: Toby Moncaster <toby.moncaster@cl.cam.ac.uk>
From: Bob Briscoe <bob.briscoe@bt.com>
In-Reply-To: <9BD81879-81A2-4EF0-A60B-F541D0BA418B@cl.cam.ac.uk>
References: <CAH56bmD2fh3sm4mozh17K2C+K0Pxyw7vRvykCo9Xt-jeEP36ZQ@mail.gmail.com> <201111201956.pAKJuJSQ007421@bagheera.jungle.bt.co.uk> <20111120214012.GE22465@verdi> <201111202327.pAKNRJPT008060@bagheera.jungle.bt.co.uk> <20111121203356.GG22465@verdi> <201111212314.pALNEhvZ013554@bagheera.jungle.bt.co.uk> <20111122001928.GH22465@verdi> <82AB329A76E2484D934BBCA77E9F524924B97E81@Polydeuces.office.hd> <20111202232051.GH31463@verdi> <Prayer.1.3.4.1112031010520.17047@hermes-2.csi.cam.ac.uk> <20111205230153.GC39149@verdi> <9BD81879-81A2-4EF0-A60B-F541D0BA418B@cl.cam.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-Scanned-By: MIMEDefang 2.56 on 132.146.168.158
X-OriginalArrivalTime: 13 Dec 2011 20:12:54.0416 (UTC) FILETIME=[9910E100:01CCB9D3]
Cc: "T. Moncaster" <tm444@cam.ac.uk>, ConEx IETF list <conex@ietf.org>
Subject: Re: [conex] byte vs packet counting
X-BeenThere: conex@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Congestion Exposure working group discussion list <conex.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/conex>, <mailto:conex-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/conex>
List-Post: <mailto:conex@ietf.org>
List-Help: <mailto:conex-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/conex>, <mailto:conex-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Dec 2011 20:12:59 -0000

Toby,

Catching up after been away for a week...
Inline, and I've snipped everything I agree with...

At 09:50 06/12/2011, Toby Moncaster wrote:
>On 5 Dec 2011, at 23:01, John Leslie wrote:
> > T. Moncaster <tm444@cam.ac.uk> wrote:

[snip]

> >>> Auditing every packet is plausible near the sender and receiver,
> >>> but not in the backbone. I don't believe we have agreement among
> >>> ourselves what a receiver's uplink will do if their auditing shows
> >>> "abuse" of ConEx marking: so I don't see what auditing at the
> >>> sender's uplink can do and be confident it's the right thing.
> >>
> >> I agree, it seems hard to see how auditing something when you are
> >> missing half the information is going to be hard.
> >
> >   (I'm having trouble parsing that sentence.)
>
>I think we are hitting the issue of "what is 
>audit?" If audit is taken to mean accounting for 
>ConEx marks, then that can happen anywhere. 
>However I was taking auditing to mean accounting 
>for the balance of ConEx marks with actual 
>congestion. That function can only work 
>somewhere where you can see both the congestion 
>and the ConEx marks. The only way you can 
>achieve that at the sender side is by inspecting 
>the ACKs on the return path (which may be impossible).

[BB]: I want to take issue with 'the only way'. 
There's at least one other very plausible 
scenario. Assume by design there's one major 
bottleneck on the upstream path from a sender 
(e.g. a radio access network). Then a network 
node at the congested point can see all the ConEx 
and most of the loss (or ECN). It cannot know 
what 'most' means, but bear with me...

Imagine a sender is trying to cheat. This can 
even be a really knowledgeable sender - a 
disgruntled ex-employee of the network operator 
who knows that the audit function is badly placed 
- not near the receiver, but near the sender.

Even if there might sometimes be another 
bottleneck nearer the receiver on the path, when 
this dishonest sender sees congestion feedback 
(from loss or ECN), it cannot know which 
bottleneck each congestion signal was from.

So if this sender is trying to cheat, it will 
sometimes understate ConEx markings with respect 
to the first bottleneck alone. The audit at that 
first bottleneck will detect persistent cheating, 
so it knows the sender is suspect. An honest 
sender might accidentally get ConEx marks wrong 
occasionally, but not persistently badly like this.


> >> There is a separate accounting mechanism where you can measure bulk
> >> flows of congestion, but that has to be based on a belief in the
> >> accuracy of the marking of the underlying individual flows.
> >
> >   I don't see how that follows.
>
>Measuring the bulk is very easy, but if all the 
>flows making up the bulk are understating their 
>congestion then the bulk will be understated as 
>well. Bob always had in mind a clever mechanism 
>to do spot checking at any border to identify 
>persistent under (or over) declaration, but I 
>never entirely understood how it was meant to work.

[BB]: The bulk auditing idea is primarily 
designed to remove the motivation for networks to 
launch attacks on /each other/ using ConEx. I 
don't expect many networks would even think of 
doing that, but unless you have a way of 
detecting breaches of trust, you cannot rely on 
that trust. You don't necessarily have to use the 
mechanism; other networks just need to know that 
there is feasible mechanism and you might be using it.

> >>> Unless I misread draft-ietf-conex-concepts-uses, our principle
> >>> use case is Informing Traffic Management. I have posted before that
> >>> the most important feature of ConEx marking (IMHO) is informing
> >>> any node along the way that the sender is knowingly sending packets
> >>> despite having reason to doubt that congestion has cleared.
> >>>
> >>> Auditing does nothing to help here. The abusive thing to do is
> >>> to _clear_ ConEx marks -- and auditing can't detect that.
> >>
> >> No. ConEx is about a balance of markings. To cheat you fail to send
> >> forward into the network sufficient ConEx markings to cover the
> >> congestion your flow is causing (or has caused).
> >
> >   Not quite... an ISP can cheat by clearing marks which one sender
> > placed in his outgoing traffic -- that is what I meant to refer to.
>
>OK, yes. That is a separate requirement for audit.

[BB]: Just to ensure we make clear which sort of 
abuse we're talking about at any one time, I 
detect three different types of abuse being discussed here:
a) The sender continuing to send despite 
expecting that congestion has not cleared
b) The sender understating ConEx markings relative to path congestion
c) An ISP clearing the sender's ConEx markings

a-type behaviour could describe normal TCP/IP or 
UDP. It's only abuse if in excess. The idea of 
ConEx is to give ISPs the info, so ISPs can draw 
the line between normal and abuse, rather than 
the IETF making this judgement call.

(b) and (c) are about validity of ConEx 
information, whereas (a) is about behaviour 
measured by ConEx info. The idea is to prevent 
b-type & c-type behaviour so that ConEx info is 
reliable enough that ISPs can use it to make judgements about (a).

b-type behaviour is intended to be addressed by audit.

c-type behaviour can be addressed by e2e verification of ConEx info.

We're not doing protocol work on (c) in the w-g, 
but the info is in the protocol if we wanted to. 
I can straight-off think of three possible protocols to address this:
- The source could protect the integrity of the 
DSOPT extension header with IPsec AH, so if an 
ISP altered ConEx markings, AH integrity checks 
would detect that something on-path had altered an immutable header.
- We could add an e2e feedback field in TCP so 
that the receiver can feed back the received 
ConEx markings to the sender (re-re-feedback?). 
Then the sender can check that an ISP hasn't changed what it sent.
- The sender could log the packets it marks with 
ConEx, and the receiver could occasionally send 
back a log of the packets it receives with ConEx 
markings for the sender to check (not sure what 
protocol this would be added to).

> >> So the function of auditing is to check that the two numbers
> >> roughly match.
> >
> >   (I note that Toby says "roughly match". Others seem to be assuming
> > a much stricter match. I'm afraid we don't have a clear understanding
> > here.)
>
>Clearly, given the min 1RTT delay between 
>getting congestion info and responding to it, 
>there can never be a completely accurate match. 
>I did have numerous debates with Bob about this 
>topic. Basically you run the risk of a user just 
>terminating his flow rather than making up his losses.

[BB]: That attack is dealt with. That's why I/we 
decided that the sender has responsibility for 
keeping ConEx info greater than or equal to 
congestion, despite RTT delay. Then if the sender 
runs up a 'loss' and jumps to a new flow, the end 
of its last flow will have been discarded by the 
audit function, so the sender won't have gained by this cheat.

>You also have to be cautious that the actual 
>process of dropping packets for audit doesn't 
>simply distort the apparent congestion 
>information (because the sender will respond by 
>retransmitting and hopefully adding more ConEx marks)

[BB]: We (in BT) established that's also not a 
problem. The sender ought to match these audit 
losses with the ConEx marks it didn't send in the 
first place. If it doesn't, the audit node can 
detect that, because it knows about the losses it introduced.

> >
> >> And yes, this is a very difficult problem to solve if you can't
> >> assume ECN is being used.
> >
> >   I absolutely agree with Toby here.

[BB]: I also agree audit is hard without ECN - 
but I believe it's not a lost cause (excuse the 
pun). I started to agree with Matt when I 
realised cases like that discussed earlier ('not 
the only way' above) are common enough for ConEx without ECN to be useful.

> >
> >>> Auditing, to tell truth, may or may not prove useful after we
> >>> have done enough experiments. IMHO, if it proves useful we'll find
> >>> that it's useful when done statistically, rather than on every
> >>> packet.
> >>
> >> I think this may point to the heart of the issue. Yes, you can sample
> >> to do audit, but only if you assume a relatively high density of
> >> markings.
> >
> >   I don't follow why that is true. If auditing is not strict, sampling
> > should work for a wide range of density; if auditing is strict, it
> > will fail pretty much regardless of the density…
>
>OK, this comes down to the point below

[BB]:  Sampling isn't applicable if anyone who is 
'caught' can just whitewash their reputation by 
adopting a new clean identity. The problem is 
that identity here is just a flow ID and new flow IDs can be created at will.

A dishonest transport can cheat in as many flows 
as it can get away with, until sampling catches 
one of its flows, which it just terminates and 
starts another flow. E.g. a msbehaving receiver 
could understate congestion feedback to a server. 
When the receiver detects high loss, it assumes 
it is being randomly audited, so it just 
terminates that connection and opens a new one.

"Reputation whitewashing" is easy when your 
identity is as emphemeral as a flow ID and you're 
the one who controls the end with the ephemeral ports.

> >> If ConEx is a specific number carried in each packet then sparse
> >> sampling may well give you enough to be able to achieve the aims of
> >> audit.
> >
> >   Carrying what I call "congestion-expected" as a multi-bit fraction
> > is an interesting idea, in which I see potential benefits. I'd be
> > happy to discuss it, but my impression is that we're not heading in
> > that direction.
>
>That was certainly how I understood a re-ECN 
>like scheme working… Te whole core of this 
>argument (should you account for bytes or 
>packets) has always been a major bone of 
>contention between me and Bob. Not least because 
>it can be impossible for a sender to comply (it 
>is very easy to construct pathological sequences 
>of packets where the sender is forced to either 
>over or under declare if they are doing per-byte 
>auditing). I think this is a case where the IETF 
>view has to diverge from the researcher's view. 
>In the IETF we have to make prosaic engineering 
>decisions. Personally I would like ConEx to 
>account on a per-packet basis, BUT with an 
>allowance for it to be more accurate if an operator so wishes

[BB]: that's my position too.
Assuming "more accurate" means per-byte.

I don't think this is disagreeing with the 
wording in abstract-mech, or the proposed wording 
in the tcp mods draft. If it is, pls say how.

>(in other words MUST as a minimum account per 
>packet, but MAY account per byte)

Except I wouldn't state it in terms of what audit 
must or may do, because the argument is about 
what we say the transport should do (to be robust against what audit may do).

> >> Which brings me on to the aims of audit. To my mind there are two
> >> possible aims.
> >> 1) to guarantee that every single packet, flow, aggregation of traffic
> >>   in the network has accurately declared their congestion; or
> >> 2) to act as a "speed camera" that catches a sufficiently high
> >>   proportion of infringements to ensure that senders don't risk
> >>   under-declaring their congestion.
> >
> >   I don't see auditing as either of these; but that may be more a
> > function of my tendency to think of auditing as gathering information,
> > not enforcing disciplinary measures based on this information.
>
>Perhaps auditing is an unhelpful word to use. To 
>my mind an audit implies an accountancy audit 
>that tries to balance the books. I agree that 
>carries no implication of policing or 
>enforcement, but I think that a lot of this 
>debate has assumed the two concepts are identical

The critical issue is whether the audit function 
is possible. Once it is, the action as a result of audit can be added.

When audit is in a remote network from the 
sender, the only action available seems to be to 
'punish' the packets (e.g. drop) because it's too 
complex in general to track down and 'punish' a 
user that might be in another network.

> >   IMHO, "guaranteeing" much of anything isn't worth the effort over
> > organizational boundaries. At significant expense, we could use
> > cryptography to give arbitrarily high confidence to a particular
> > "guarantee" having come from a particular entity, but without also
> > signing the data "guaranteed", the effort is entirely wasted.
> >
> >   (And processing all that cryptography in backbone routers seems
> > a non-starter to me.)
>
>Agreed, this has to be a lightweight mechanism. 
>All the guarantees have to come from the idea 
>that operators have to have some level of trust 
>among one another (or at least, the realisation 
>that an operator found to have cheated at the 
>expense of others runs  risk of getting sat on heavily by the others),

See point above - in a large distributed system, 
trust can only be relied on if it would at least 
be feasible to detect breaches of trust (in the 
case of trust building, random sampling is highly appropriate).

> >> My own view is that the second is more feasible in the real world,
> >> although the first might well be more desirable.
> >
> >   I at least partly agree here. However, my approach to thinking
> > about such problems simply stops descending into details when I reach
> > a feasibility blockage.
>
>And therein lies the lesson for ConEx. We HAVE 
>to fix on a SIMPLE subset of ConEx that we can 
>all agree on and that seems at least feasible in 
>the real world. Arguments like the recent ones 
>make ConEx appear to be still firmly stuck in 
>the land of research (IRTF). Too many people (on 
>all sides of all arguments) seem to want 
>perfection and that is not sensible in the real 
>world. That is just a recipe for ratholing 
>ourselves to the point where the IESG declares our WG dysfunctional…

See above.


Bob


>Toby
>
>
> >
> > --
> > John Leslie <john@jlc.net>
>
>_______________________________________________
>conex mailing list
>conex@ietf.org
>https://www.ietf.org/mailman/listinfo/conex

________________________________________________________________
Bob Briscoe,                                BT Innovate & Design