Re: [Stackevo-discuss] On boundaries and interfaces in transport protocol evolution (was Re: [tsvwg] draft-byrne-opsec-udp-advisory)

Tom Herbert <> Tue, 28 July 2015 16:08 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 707581ACDA2 for <>; Tue, 28 Jul 2015 09:08:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.978
X-Spam-Status: No, score=-1.978 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-0.7] autolearn=unavailable
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id NPd58pYdP7Gh for <>; Tue, 28 Jul 2015 09:08:12 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 3739A1ACDA7 for <>; Tue, 28 Jul 2015 09:07:48 -0700 (PDT)
Received: by iggf3 with SMTP id f3so127729619igg.1 for <>; Tue, 28 Jul 2015 09:07:47 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=/8P+9l/8zXGhBopehJs8h2NE/2zDDJ/Mh0JMwLwuA9k=; b=XdH22Gmd8t6QuRz3tGIlOIYByf6u18kamFmced3OSqpJvtWXwMTaCqgBPflIDMG8vw S47DX98DspZEJNhzTI6t5Hf4/uzJsPaUBZqOa/sT2rVF9M765i6LQrnKC0oQ8Ogvjo52 sxU8kEuO7IgQXqQUO5oA3ySU7UPJqTDDDWw6QLvnBzpT88PgIfevW3s1J9cUL4QhKWTx c2VjcO28MGezUykjw6vhDbjx0oCY2ylXOvWPdSJfoJWh1hWkpBnKBEY59bxV3texzyVR CEk62C46nDBn1WXou1cEj2mtgv92/+acPGN0cWwVQoFMmZeJ37PlB3+RjYtT/B/6C965 na0g==
X-Gm-Message-State: ALoCoQk4GzvzxVDtffEq8BAWd1+YsON1fxwrILpLoyQQOwrqArA1ueXgsOTjPqLlwq5zi193b9aG
MIME-Version: 1.0
X-Received: by with SMTP id bc6mr7897613igb.24.1438099667638; Tue, 28 Jul 2015 09:07:47 -0700 (PDT)
Received: by with HTTP; Tue, 28 Jul 2015 09:07:47 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <>
Date: Tue, 28 Jul 2015 09:07:47 -0700
Message-ID: <>
From: Tom Herbert <>
To: Brian Trammell <>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
X-Mailman-Approved-At: Tue, 28 Jul 2015 09:17:35 -0700
Cc:, Ca By <>, "" <>, "" <>, Joe Touch <>
Subject: Re: [Stackevo-discuss] On boundaries and interfaces in transport protocol evolution (was Re: [tsvwg] draft-byrne-opsec-udp-advisory)
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IP Stack Evolution Discussion List <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 28 Jul 2015 16:08:22 -0000

Hi Brian,

Some replies inline...

> I've skimmed the draft but need to look into GUE more deeply. Is there code I can play with somewhere?
> for GUE in Linux for FOU which allows direct
encapsulation of any IP protocol over UDP (GUE and FOU share code

> I'm only passingly familiar with the situation here -- I doubt any of the TCP/IP stack code from the kernel that was around the last time I dug into it (1997) is still there. But I will observe that an interface with fifty hardpoints isn't an interface, it's an arbitrary line drawn through what is essentially an integrated design.
For the most part the interfaces are clean. SCTP and DCCP similarly
interact with the rest of the stack although probably not as intensely
since TCP tends to be the first focus for optimizations.

> Protocol offloading is pretty much the opposite of what we're trying to do here -- offloads only work for ossified protocols pretty much by definition, so we'd have to give these up in the near term for new protocols.
Not really, many good protocol offloads are generic mechanisms that
permit broad application. For instance, if implemented properly HW
checksum offload can work on any transport layer protocol checksum
even if device does not know how to parse the the transport or even
network layer or any used encapsulation protocols. CRC could be
similarly offloaded (like for SCTP) at least on transmit. For UDP
encapsulations, we are requesting that vendors implement offloads that
are agnostic to (the many :-) ) different encapsulation protocols (see

>> The
>> stack interacts with packet steering mechanisms (deliver packets to
>> right CPU),
> This may be exposing my ignorance, but packet steering works on (a function of) the 5-tuple, no? One of the points we're conceding here is that the ports are de facto part of the network layer, and appear at fixed offsets from the end of the network header.
The 5-tuple hash for TCP and UDP is what most devices have
implemented, however IMO there is a better solution moving forward. As
I pointed out before, not all packets contain port numbers (IP
fragments, IPIP, GRE/IP, MPLS/IP, etc.). Also, for ECMP and RSS
hardware needs to parse the transport protocol so currently we are
effectively limited from using anything other than TCP or UDP in the
DC for high packet rate applications. I have come to believe that the
real fix for these problems (as opposed to just blindly encapsulating
everything in UDP) is to use IPv6 flow labels for load balancing (a la
RFC6438). We've implemented the necessary kernel support, and are
actively requesting HW vendors to use the field as input to ECMP and
RSS (several already do). We intend to enable automatic flow label
generation by default in Linux after due diligence, so in the not too
distant future expect to see a lot of IPv6 packets on the Internet
with non-zero flow labels!