Re: [Stackevo-discuss] Scope of stackevo and ossification in DC

Brian Trammell <> Tue, 08 December 2015 08:56 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 8EEAE1A90C7 for <>; Tue, 8 Dec 2015 00:56:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.912
X-Spam-Status: No, score=-1.912 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 92hKLIKvNWGi for <>; Tue, 8 Dec 2015 00:56:02 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 19C931A90C8 for <>; Tue, 8 Dec 2015 00:56:02 -0800 (PST)
Received: from [IPv6:2001:67c:10ec:2a49:8000::b9] (unknown [IPv6:2001:67c:10ec:2a49:8000::b9]) by (Postfix) with ESMTPSA id 1C3371A0213; Tue, 8 Dec 2015 09:56:01 +0100 (CET)
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
Content-Type: multipart/signed; boundary="Apple-Mail=_62E66371-350D-4F98-B2B5-E3E4023048FC"; protocol="application/pgp-signature"; micalg=pgp-sha512
X-Pgp-Agent: GPGMail 2.5.2
From: Brian Trammell <>
In-Reply-To: <>
Date: Tue, 8 Dec 2015 09:55:59 +0100
Message-Id: <>
References: <>
To: Tom Herbert <>
X-Mailer: Apple Mail (2.2104)
Archived-At: <>
Subject: Re: [Stackevo-discuss] Scope of stackevo and ossification in DC
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IP Stack Evolution Discussion List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 08 Dec 2015 08:56:04 -0000

hi Tom,

(apologies on the delay getting back to this thread -- i'm finally done traveling, just in time to go off for the holidays...)

> On 17 Nov 2015, at 19:49, Tom Herbert <> wrote:
> [ Reposting from stackevo list]
> Hello,
> Similar to the use of protocols on the Internet we are hitting the
> transport protocol ossification problem in the data center.
> Specifically, performance optimizations in networking devices only
> support TCP or UDP, and without these optimizations this negatively
> impacts our use of other protocols.

Right. But this is a fundamental problem, I think. NIC offloads reach pretty deeply into the transport protocol, and as such won't work with new transports whether they're encrypted or not until those new transports. A question: for NIC offload, how much of the win comes from segmentation offloading, and how much comes from other trickery? If the biggest win really is bundling a bunch of packets into a single context switch, then how would the performance of the current offload architecture compare with a smart library on top of approaches like netmap?

> One example of this is the need
> for fine grained ECMP which has become driver behind many of the
> foo-over-UDP proposals (e.g. MPLS/UDP, GRE/UDP, ...).

So this is a separate issue -- ECMP is a (semi-elegant) hack, predicated on the assumption that things on a five-tuple need to stay together and things on separate five-tuples don't. NAT + TCP (any reordering-intolerant transport, really) makes this assumption more or less hold. Driving it in the opposite direction -- using knowledge that there's ECMP on path to do cheap traffic engineering -- leads to the unintended consequences that foo-over-udp brings with it.

What you really want architecturally is a way for the network layer (at a gateway) to explicitly say "keep these packets together" and "it's okay to split these packets apart". It'd be even better if we had a way to request/measure/enforce actual path diversity without manually managing tunnels, but this is sadly explicitly a non-feature of our routing protocols. In any case this seems to have a harder incremental deployment story than simple transport state exposure.

> This problem is likely a proper subset of the general problem, but
> might be more amenable to some "simpler" solutions. Is this within
> scope of stackevo?

It very much seems to be, yes. Let's keep this discussion going on this list...

Thanks, cheers,


> Thanks,
> Tom
> _______________________________________________
> Stackevo-discuss mailing list