[conex] Catching "Cheaters"

John Leslie <john@jlc.net> Thu, 15 December 2011 21:18 UTC

Return-Path: <john@jlc.net>
X-Original-To: conex@ietfa.amsl.com
Delivered-To: conex@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BD06F21F8678 for <conex@ietfa.amsl.com>; Thu, 15 Dec 2011 13:18:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -107.492
X-Spam-Level:
X-Spam-Status: No, score=-107.492 tagged_above=-999 required=5 tests=[AWL=1.107, BAYES_00=-2.599, GB_I_INVITATION=-2, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lAqOg+RWTrec for <conex@ietfa.amsl.com>; Thu, 15 Dec 2011 13:18:02 -0800 (PST)
Received: from mailhost.jlc.net (mailhost.jlc.net [199.201.159.4]) by ietfa.amsl.com (Postfix) with ESMTP id 330F621F85FF for <conex@ietf.org>; Thu, 15 Dec 2011 13:18:02 -0800 (PST)
Received: by mailhost.jlc.net (Postfix, from userid 104) id 4727433C20; Thu, 15 Dec 2011 16:18:01 -0500 (EST)
Date: Thu, 15 Dec 2011 16:18:01 -0500
From: John Leslie <john@jlc.net>
To: Toby Moncaster <toby.moncaster@cl.cam.ac.uk>
Message-ID: <20111215211801.GB26608@verdi>
References: <20111121203356.GG22465@verdi> <201111212314.pALNEhvZ013554@bagheera.jungle.bt.co.uk> <20111122001928.GH22465@verdi> <82AB329A76E2484D934BBCA77E9F524924B97E81@Polydeuces.office.hd> <20111202232051.GH31463@verdi> <Prayer.1.3.4.1112031010520.17047@hermes-2.csi.cam.ac.uk> <20111205230153.GC39149@verdi> <9BD81879-81A2-4EF0-A60B-F541D0BA418B@cl.cam.ac.uk> <201112132012.pBDKCokX014681@bagheera.jungle.bt.co.uk> <6342D66E-6525-49D4-9DD9-3713230F2303@cl.cam.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <6342D66E-6525-49D4-9DD9-3713230F2303@cl.cam.ac.uk>
User-Agent: Mutt/1.4.1i
Cc: ConEx IETF list <conex@ietf.org>
Subject: [conex] Catching "Cheaters"
X-BeenThere: conex@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Congestion Exposure working group discussion list <conex.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/conex>, <mailto:conex-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/conex>
List-Post: <mailto:conex@ietf.org>
List-Help: <mailto:conex-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/conex>, <mailto:conex-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Dec 2011 21:18:03 -0000

This is still rather longish, but I think it holds together enough...

Toby Moncaster <toby.moncaster@cl.cam.ac.uk> wrote:
> On 13 Dec 2011, at 20:12, Bob Briscoe wrote:
>>
>> Imagine a sender is trying to cheat. This can even be a really
>> knowledgeable sender - a disgruntled ex-employee of the network
>> operator who knows that the audit function is badly placed - not
>> near the receiver, but near the sender.
>>
>> Even if there might sometimes be another bottleneck nearer the
>> receiver on the path, when this dishonest sender sees congestion
>> feedback (from loss or ECN), it cannot know which bottleneck each
>> congestion signal was from.

   The most likely bottleneck near the sender is the uplink from the
customer of the ADSL or Cable modem.

>> So if this sender is trying to cheat, it will sometimes understate
>> ConEx markings with respect to the first bottleneck alone.

   Assuming Bob means the cheater _is_ the sender, it can still have
a pretty good picture of the congestion rate in its upstream link.

>> The audit at that first bottleneck will detect persistent cheating,
>> so it knows the sender is suspect...

   Understand that ConEx credit marks can get lost: the audit function
can't expect to see _every_ ConEx mark.

   Even assuming that the cheating is sufficiently flagrant, I don't
see how we could specify what the audit function must do...

>> [BB]: Just to ensure we make clear which sort of abuse we're talking
>> about at any one time, I detect three different types of abuse being
>> discussed here:
>> a) The sender continuing to send despite expecting that congestion
>>    has not cleared
>> b) The sender understating ConEx markings relative to path congestion
>> c) An ISP clearing the sender's ConEx markings
>>
>> a-type behaviour could describe normal TCP/IP or UDP. It's only abuse
>> if in excess. The idea of ConEx is to give ISPs the info, so ISPs
>> can draw the line between normal and abuse, rather than the IETF
>> making this judgement call.

   I agree the IETF shouldn't make that judgment call.

>> (b) and (c) are about validity of ConEx information, whereas (a) is
>> about behaviour measured by ConEx info. The idea is to prevent
>> b-type & c-type behaviour so that ConEx info is reliable enough
>> that ISPs can use it to make judgements about (a).
>>
>> b-type behaviour is intended to be addressed by audit.
>>
>> c-type behaviour can be addressed by e2e verification of ConEx info.
>
> I think this is a really good and clear description of the ways to
> distort ConEx info. I would like to see something like this as a
> stand-alone informational draft (at least for now) so we have it
> captured somewhere and can refer back to it as and when needed.

   I'd love to see Toby work on such an I-D. I'd be willing to help.

>> We're not doing protocol work on (c) in the w-g, but the info is
>> in the protocol if we wanted to. I can straight-off think of three
>> possible protocols to address this:
>> - The source could protect the integrity of the DSOPT extension
>>   header with IPsec AH, so if an ISP altered ConEx markings, AH
>>   integrity checks would detect that something on-path had altered
>>   an immutable header.
>> - We could add an e2e feedback field in TCP so that the receiver
>>   can feed back the received ConEx markings to the sender
>>   (re-re-feedback?). Then the sender can check that an ISP hasn't
>>   changed what it sent.
>> - The sender could log the packets it marks with ConEx, and the
>>   receiver could occasionally send back a log of the packets it
>>   receives with ConEx markings for the sender to check (not sure
>>   what protocol this would be added to).

   I don't believe the IPsec case is worth pursuing.

   The second and third might prove worthwhile, but at this point I
don't see what we could specify which would help.

>> Toby Moncaster <toby.moncaster@cl.cam.ac.uk> wrote:
>> 
>>> Clearly, given the min 1RTT delay between getting congestion info
>>> and responding to it, there can never be a completely accurate
>>> match. I did have numerous debates with Bob about this topic.
>>> Basically you run the risk of a user just terminating his flow
>>> rather than making up his losses.
>>
>> [BB]: That attack is dealt with. That's why I/we decided that the
>> sender has responsibility for keeping ConEx info greater than or
>> equal to congestion, despite RTT delay. Then if the sender runs up
>> a 'loss' and jumps to a new flow, the end of its last flow will
>> have been discarded by the audit function, so the sender won't have
>> gained by this cheat.
>
> I think this may be one of the things people haven't understood.
> Namely, we need any sender to start by assuming they will cause a
> little bit of congestion and paying for that up-front.

   Worth including in that I-D I hope Toby will write...

   Nonetheless, I don't think we should make any requirement for ConEx
credits for _every_ packet in flight (which I have seen proposed).

   While, indeed, 100% loss is possible, that's not a case worth a lot
of worry -- it's essentially a dead path, and optimizing a dead path
seems silly.

   Nor do I think we should _define_ a likely loss rate for packets
in flight. That deserves to be a judgment call for each sender. The
question is, if the sender guesses low, what should happen?

   It doesn't bother me if an audit policer drops some more packets
in that case; nor does it bother me if it doesn't. It's not worth the
effort to me to keep that much state about connections originating at
some other ISP.

   But I'm happy to see a statement that senders SHOULD estimate a
likely loss rate for packets in flight.

>>> You also have to be cautious that the actual process of dropping
>>> packets for audit doesn't simply distort the apparent congestion
>>> information (because the sender will respond by retransmitting
>>> and hopefully adding more ConEx marks)
>>
>> [BB]: We (in BT) established that's also not a problem. The sender
>> ought to match these audit losses with the ConEx marks it didn't
>> send in the first place. If it doesn't, the audit node can detect
>> that, because it knows about the losses it introduced.

   (Assuming a single audit node...)

> The only drawback then is that the aggregate over the path is possibly
> distorted (as in, some of the stated congestion isn't actual congestion)
> which may have an impact on some future ConEx use cases.

   IMHO, we're going to have to live with cases of overstated congestion;
and it isn't worth the effort of trying to stop them.

>> [BB]:  Sampling isn't applicable if anyone who is 'caught' can just
>> whitewash their reputation by adopting a new clean identity. The
>> problem is that identity here is just a flow ID and new flow IDs
>> can be created at will.

   Actually, I think we still waving hands wildly on what constitutes
a "flow" as we get farther from the sender. The IPv6 "flow ID" field
is almost never used, and proposals I've seen to get it used more seem
a bit buggy...

>> A dishonest transport can cheat in as many flows as it can get away
>> with, until sampling catches one of its flows, which it just
>> terminates and starts another flow. E.g. a misbehaving receiver
>> could understate congestion feedback to a server. When the receiver
>> detects high loss, it assumes it is being randomly audited, so it
>> just terminates that connection and opens a new one.

   In truth, just the escalation of state information to be kept would
be a problem. (And there probably isn't even a need to terminate a
"connection".)

   IMHO, keeping per-flow state at all far from the sender (or the
receiver, if the receiver is being held responsible for congestion) is
a non-starter: I just don't see how we can limit the state information
to a manageable level.

   OTOH, the overall amount of congestion coming into my AS from each
peer is a concern worthy of keeping per-AS state: thus I expect per-AS
state to be kept and acted upon. For purposes of per-AS state, it
doesn't help to argue whether it's honest or dishonest: either way, it
threatens to congest _all_ my traffic to the particular destination.

>> "Reputation whitewashing" is easy when your identity is as emphemeral
>> as a flow ID and you're the one who controls the end with the
>> ephemeral ports.

   Agreed. But reputation of my peer AS's can't be whitewashed.

> So what you are arguing is that speed camera-style auditing is not
> sensible because in the Internet you can change your car registration
> at will with near zero cost, and there is no external mechanism
> (licensing authority) that can retrospectively apply a sanction to
> you? (sorry for the convoluted simile)

   I'll argue that one: speed-camera is not particularly sensible for
blaming individual _flows_ because we have no protection of any of the
available methods to identify the flows to blame.

>> When audit is in a remote network from the sender, the only action
>> available seems to be to 'punish' the packets (e.g. drop) because
>> it's too complex in general to track down and 'punish' a user that
>> might be in another network.
>
> Agreed

   Likewise, agreed -- with the additional note that matching the drops
to the actual flow will not always be possible.

   Nonetheless, I would agree it's reasonable to drop packets that _are_
marked ConEx-aware but appear to be part of a flow which has not built up
sufficient credit -- in those cases where we have reason to believe
_some_ packets to that destination will have to be dropped anyway.

   I don't honestly see much benefit from "punishing" a failure to
estimate enough congestion unless we have evidence that packet drop is
inevitable (somewhere) anyway. (If nothing else, that would be a DoS
invitation.)

>> See point above - in a large distributed system, trust can only be
>> relied on if it would at least be feasible to detect breaches of
>> trust (in the case of trust building, random sampling is highly
>> appropriate).
>
> Absolutely. Among operators you can genuinely rely on the speed camera
> approach to audit - the risk of getting caught may be low (but has to
> be finite) but the harm to your reputation makes the risk unjustifiable

   Exactly!

>>> And therein lies the lesson for ConEx. We HAVE to fix on a SIMPLE
>>> subset of ConEx that we can all agree on and that seems at least
>>> feasible in the real world. Arguments like the recent ones make
>>> ConEx appear to be still firmly stuck in the land of research
>>> (IRTF). Too many people (on all sides of all arguments) seem to
>>> want perfection and that is not sensible in the real world.
>>> That is just a recipe for ratholing ourselves to the point where
>>> the IESG declares our WG dysfunctional?
>>
>> See above.
>
> I actually think the disagreements among us are quite small and are
> (generally) semantic. The problem is that they look much bigger to
> any outsider.

   IMHO our disagreements, whether small or not, are often fundamental;
and our intransigence is what makes the problems appear so big. To
whatever extent we are willing to let go of our insistence upon any
particular solution, I think folks are happy to see this work continue;
but where we hang on too tightly to one particular "solution", we start
to look "dysfunctional". :^(

--
John Leslie <john@jlc.net>