Re: [LOOPS] draft-li-tsvwg-loops-problem-opportunities-05 - why do traffic engineering at the overlay as well?

Carsten Bormann <cabo@tzi.org> Tue, 21 July 2020 09:03 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: loops@ietfa.amsl.com
Delivered-To: loops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7149C3A1704 for <loops@ietfa.amsl.com>; Tue, 21 Jul 2020 02:03:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.92
X-Spam-Level:
X-Spam-Status: No, score=-1.92 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XSGG3xmHgJCW for <loops@ietfa.amsl.com>; Tue, 21 Jul 2020 02:03:25 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2ADF93A1706 for <loops@ietf.org>; Tue, 21 Jul 2020 02:03:25 -0700 (PDT)
Received: from [172.16.42.100] (p5089ae91.dip0.t-ipconnect.de [80.137.174.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4B9t171TqGz17qj; Tue, 21 Jul 2020 11:03:23 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CAKKJt-cUEhf-1nY=aBzrgJVZvdqHPM5sRba2Qob1ESrjXUv9-Q@mail.gmail.com>
Date: Tue, 21 Jul 2020 11:03:22 +0200
Cc: Gorry Fairhurst <gorry@erg.abdn.ac.uk>, Liyizhou <liyizhou@huawei.com>, loops <loops@ietf.org>, Michael Welzl <michawe@ifi.uio.no>
X-Mao-Original-Outgoing-Id: 617015002.446655-f3c0ddd3bc0fdbf819d43f29dda46291
Content-Transfer-Encoding: quoted-printable
Message-Id: <9D5A00EA-55CC-4C78-B771-34C129301698@tzi.org>
References: <dd240ea2-f1b7-28eb-00ad-bb037c764d4d@erg.abdn.ac.uk> <C5795E6B-14AE-47ED-ADB1-DBEEE37A024A@tzi.org> <e57dbf09-d1d0-e899-f12d-59db29a11f21@erg.abdn.ac.uk> <19d1f8379e464b70a00c025371a15e31@huawei.com> <9777f17b-4da4-6632-0ec7-606c7b7c9f9f@erg.abdn.ac.uk> <0C024505-D07F-46CC-8293-6E9EC1CE520E@ifi.uio.no> <bf0dd742-fa48-695f-7949-cb2b8d40231f@erg.abdn.ac.uk> <9EF400E0-8810-434F-A1CC-EAE13B006F7C@tzi.org> <6cee1b1f-93c8-d930-3e0b-a34f627521e3@erg.abdn.ac.uk> <54C340AB-C746-4D58-8B11-A6EF9EE15DFE@tzi.org> <1de7101e-2455-face-0978-4aba45169de7@erg.abdn.ac.uk> <BB806211-E27B-445E-8638-E31156006E17@tzi.org> <CAKKJt-cUEhf-1nY=aBzrgJVZvdqHPM5sRba2Qob1ESrjXUv9-Q@mail.gmail.com>
To: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/loops/l3OHm-h1xWvOJO0YGY46-3bInnU>
Subject: Re: [LOOPS] draft-li-tsvwg-loops-problem-opportunities-05 - why do traffic engineering at the overlay as well?
X-BeenThere: loops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Local Optimizations on Path Segments <loops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/loops>, <mailto:loops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/loops/>
List-Post: <mailto:loops@ietf.org>
List-Help: <mailto:loops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/loops>, <mailto:loops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Jul 2020 09:03:28 -0000

Hi Spencer,

Interesting questions.

> On 2020-07-17, at 14:03, Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com> wrote:
> 
> I'm thinking about "first, do no harm". 
> 	• If a path has a large number of short flows that respond to CE, that's awesome.

Indeed.  Let’s talk about short + transactional (“chatty”) flows, though.

> 	• If a path has a large number of short flows that don't respond to CE markings, but do respond to drops, the senders should slow down as much as you can slow down an exchange with less than a dozen packers, when they see drops. Is LOOPS making that situation better, or worse?

Until countermeasures such as circuit breakers hit, worse.  Since LOOPS is converting drops to marks, it is converting observed congestion events into ignored congestion events for these flows.

So where do you get those flows that don’t respond to CE markings?  And how did you avoid them breaking other CE-marking places in the Internet already?  I think the argument really boils down to “we can’t have LOOPS adding CE-marking places because CE-marking is bad”, and I’m not sure we have consensus on the latter.

I know about the case where a middle box somwhere on the path outside the LOOPS segment removes ECE markings from TCP headers; this is about the worst case I can think about (but it is already bad without LOOPS).

For short flows, it may be worth to consider what “respond to drops” means.  For a short flow, the drop can be (1) in the establishment phase (where LOOPS cannot help TCP flows), essentially acting as a flow-delayer or flow-preventer (in the presence of racing).  LOOPS is not changing that picture.  Or it can be (2) in the IW10 phase, and there LOOPS is likely to react to a drop by adding a packet or two to the ones that already are flowing.  Doesn’t help with any congestion in the LOOPS-controlled path segment, but doesn’t really hurt that much.

> 	• If a path has a large number of short flows that don't respond to CE markings or drops, I THINK an aggregate circuit breaker would cover that situation. Is LOOPS making that situation better, or worse? 

LOOPS is introducing a circuit breaker where there might not have been one before, so it is making the situation better.

The question of course is how fast does it trip (i.e., what is the damage until it does), and what exactly does it circuit-break.  I would have designed it to just stop expending packets on recovery (i.e., stop LOOPS), but not to stop traffic at all.  That would mean there would be no benefit, but also no lasting damage.

I think in the end we will need to limit any damage from misbehaving endpoints (whether doing that on their own or due to some bad interventions from middle boxes) by limiting the amount of repair that is being done by LOOPS.

Grüße, Carsten