[Stackevo-discuss] On boundaries and interfaces in transport protocol evolution (was Re: [tsvwg] draft-byrne-opsec-udp-advisory)

Brian Trammell <ietf@trammell.ch> Tue, 28 July 2015 08:27 UTC

Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\))
Content-Type: multipart/signed; boundary="Apple-Mail=_83B25A57-805D-4EB7-AEA4-30F1343B8F6E"; protocol="application/pgp-signature"; micalg="pgp-sha512"
From: Brian Trammell <ietf@trammell.ch>
In-Reply-To: <CALx6S37cm0fNTQ6g26tSwhEC_BRG_UEtEXKfpCfyus7BF5szdQ@mail.gmail.com>
Date: Tue, 28 Jul 2015 10:27:48 +0200
Message-Id: <C229130D-336B-4E13-9927-117C32016BBE@trammell.ch>
References: <CAD6AjGRA0-z6H9b2UEBSoOmkdmcVuCkfxhfaOuzZ2jgwLm+fZA@mail.gmail.com> <55AEED07.9080804@isi.edu> <CAD6AjGSgnSBo_RxMoecvMTvWGMQhv1CGu6Pc0gAes0zOBRB1Gg@mail.gmail.com> <EA4C43BE752A194597B002779DF69BAE23DB842D@ESESSMB303.ericsson.se> <DFB2C14B-9C6D-4393-A9B4-434D58C9DED7@trammell.ch> <CAD6AjGTuHwW+RY3hc6+DmY=T2RT847HZ_RNbNmByumc45zQ-8A@mail.gmail.com> <7CFB38B0-F4E9-4C49-AEA0-FFA3E5BD41B0@trammell.ch> <CALx6S37cm0fNTQ6g26tSwhEC_BRG_UEtEXKfpCfyus7BF5szdQ@mail.gmail.com>
To: Tom Herbert <tom@herbertland.com>
Archived-At: <http://mailarchive.ietf.org/arch/msg/stackevo-discuss/xnE-3Vb6URS6loRu4hEL-kzOO7g>
Cc: stackevo-discuss@iab.org, Ca By <cb.list6@gmail.com>, "draft-byrne-opsec-udp-advisory@tools.ietf.org" <draft-byrne-opsec-udp-advisory@tools.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>, Joe Touch <touch@isi.edu>
Subject: [Stackevo-discuss] On boundaries and interfaces in transport protocol evolution (was Re: [tsvwg] draft-byrne-opsec-udp-advisory)
Precedence: list

Hi, Tom,

Thanks for your message. A partial reply for now, may revisit a few bits later once I've chewed through the post-IETF queue a bit...

> On 26 Jul 2015, at 04:17, Tom Herbert <tom@herbertland.com> wrote:
> 
>> This is tantamount to saying the future of the Internet is over TCP, which is not a particularly useful future. The inability to get new transport protocols deployed isn't just an issue with NATs, it's an issue with the kernelspace/userspace boundary in the endpoints. This has more than NAT to do with why we failed to deploy SCTP.
>> 
> 
> Hi Brian,
> 
> Can you provide any more context to support this statement? SCTP has
> been around a long time, and also there is SCTP-over-UDP which should
> already facilitate userspace implementation of SCTP (in fact that
> there is a publicly available userspace SCTP stack
> (https://docs.google.com/presentation/d/190sBrVsLICDn6ayni9bYTjZNRUgpOjlvUhbIVhyjfpA/edit#slide=id.p).
> When I Googled "why isn't SCTP deployed", I get
> http://stackoverflow.com/questions/1171555/why-is-sctp-not-much-used-known
> which seems to mostly imply that SCTP is not deployed because home
> routers can deal with it (at least as of 2011). I suppose the demise
> of SCTP deployment could be a classic chicken-and-the-egg problem: OS
> vendors won't invest in a protocol that can be reliably be transported
> over the Internet, and router vendors are probably unwilling invest in
> support for protocols that aren't widely being used.

First, I should have used a different tense in my statement above:

"This has more than NAT to do with why we *have* failed to deploy SCTP." We haven't deployed SCTP widely *yet*, but history isn't over yet. If I wasn't optimistic, I wouldn't be posting to a list called "stack evolution". :)

I agree with you that it's chicken-and-egg. But it would be easier to conjure chickens from the vacuum without needing to hack the kernel to do so.

As someone who was introduced to SCTP by Randall's chapter in the third edition of Stevens, I was excited about using the features it described (at the time, transport-layer multihoming and multistreaming) in my applications. I then had occasion to build a compliant IPFIX implementation in 2006 (for which SCTP is the MTI), and suddenly found myself wedged between a maze of kernel patches and userland implementations on the one side and the depths of the OpenSSL codebase trying to wedge DTLS atop SCTP on the other -- for that was the way you were supposed to do security for SCTP before we decided reversing the stack was the way to go and put SCTP on DTLS on UDP. I was not a good enough programmer to get it to work reliably. In any case, the maze of kernel patches meant I could never get the compliance guys to sign off on deploying it; I suspect there's still a fair deployed base of that particular application running IPFIX over TLS over TCP for exactly that reason.

This is admittedly a relatively specific anecdote. I'll observe that with the move to mobile this problem is getting worse, not better, though.

Or tl;dr: even an open source kernel doesn't fix all your problems with requiring kernel changes to get an app deployed on a new stack.

> In any case, for any proposed transport protocol, the questions of why
> existing protocols (e.g. SCTP, DCCP) aren't sufficient or are
> chronically non-deployable will be undoubtably raised. IMO, we should
> separate out transport protocol definition from the delivery mechanism
> intended to facilitate the short term or "transitional" goal of
> deployment on the current Internet (e.g. encapsulation in UDP). SCTP
> already demonstrates this model since it is both a native IP protocol
> (132) and can be encapsulated in UDP (RFC6951). Encapsulation of
> pretty much any IP protocol (foo-over-UDP for TCP, DCCP, etc.) is
> straightforward, and in fact GUE already defines a common and
> extensible method that allows encapsulation of any IP protocol over
> UDP without needing a port number for each protocol.

I've skimmed the draft but need to look into GUE more deeply. Is there code I can play with somewhere?

In any case I agree completely that the "evolutionary" requirement ("support userspace implementation and deployment of new transports in the Internet") should be separated from the "transitional" requirement (UDP encaps).

> As for the desire to run transport protocols in userspace, I would
> point out that transport layer stacks are *not* implemented in a
> vacuum. For instance, a fully functional and performant kernel TCP
> stack will directly interact with the IP layer protocols, driver
> layer, the device, and other sub-systems in the kernel. Interactions
> with IP protocol layer are needed to support PMTU discovery, explicit
> congestion notification, and network QoS via diff-serv. For PMTU
> discovery, we need an interface to set DF for IPv4 and to get ICMP PTB
> messages, for ECN we need to get bits from the IP header of marked
> packets, for diff-serv we need a mechanism to set DSCP bits. I would
> be surprised if support for these were not requirements of a new
> transport protocol intended for Internet scale.

I agree with you here: we need a standard kernel interface for getting PMTU and ECN information back from the IP stack into userspace, and a bidirectional interface for DSCP -- and the abstract API should be reasonably portable across platforms. These are not merely necessary for transport protocols in userspace; this information is often useful up at the application layer, as well.

> Other than that, there are probably >50 places (at least in Linux
> stack) where TCP interacts with other layers.

I'm only passingly familiar with the situation here -- I doubt any of the TCP/IP stack code from the kernel that was around the last time I dug into it (1997) is still there. But I will observe that an interface with fifty hardpoints isn't an interface, it's an arbitrary line drawn through what is essentially an integrated design.

At least for Linux, then, there would need to be a design effort to determine which of these interfaces are essential to expose, and which are essentially accidents of localized optimization decisions. But some of the particular interfaces you mention sound like exactly what would be needed to put a "userspace transport layer" atop IP in Linux, especially the "wider" interfaces (sendmmsg, recvmmsg, sendfile).

> This includes the
> routing layer (like re-routing for failing connections), saving of
> routing path metrics such as ssthresh and ICW which can be applied to
> new connections to a common destination, interactions with queuing
> layer to prioritize and optimize transmission (TCP sockets, unlike UDP
> sockets can be directly flow controlled by local egress queues). There
> is a rich set of protocol offloads in devices to provide checksum
> offload, LSO, LRO, as well as software analogues in GSO and GRO.

Protocol offloading is pretty much the opposite of what we're trying to do here -- offloads only work for ossified protocols pretty much by definition, so we'd have to give these up in the near term for new protocols.

> The
> stack interacts with packet steering mechanisms (deliver packets to
> right CPU),

This may be exposing my ignorance, but packet steering works on (a function of) the 5-tuple, no? One of the points we're conceding here is that the ports are de facto part of the network layer, and appear at fixed offsets from the end of the network header.

Thanks again, cheers,

Brian

> pacing mechanisms to mitigate the effect of packet bursts
> within a flow, has interactions with DOS mitigation mechanisms,
> interactions with memory management for network buffers. There are
> interfaces to set IP options or extension headers, appropriately set
> IPv6 flow labels. There's also newer interactions with mechanisms to
> fight buffer bloat such as TCP small queues and Byte Queue Limits
> (BQL). Kernels also implement protocol aware NAT and packet filtering
> per TCP flags and fields. For UDP specifically, there are newer socket
> APIs which can be used improve efficiency of the UDP data path
> (sendmmsg, recvmmsg). If a new transport protocol is intended to serve
> high throughput (like for serving video) then support for sendfile
> might be important
> (https://people.freebsd.org/~rrs/asiabsd_2015_tls.pdf is interesting
> reading on where kernel support is likely headed for that).
> 
> For many of these items there are already kernel interfaces that a
> user space stack could use to get of set the appropriate information
> (will vary between OSes), for some newer features new interfaces will
> need to be developed, for a few extending support into userspace might
> not be feasible. Support for some features, such as LSO or LRO, would
> require the kernel or device to understand the an encapsulated
> transport protocol as well.
> 
> Along these lines, for anyone who is planning on attending Linux
> Plumber's Conference next month we're planning a BOF to discuss kernel
> support for user space transport protocols. We are posing two
> questions: 1) is there anything we can do from the kernel side to
> address the perception that OS development and deployment impedes
> advancements in transport protocols 2) assuming user space transport
> protocols become popular, what sort of support and interfaces can we
> provide to optimize them (like for above interactions).
> 
> Thanks,
> Tom

Attachment: signature.asc

Re: [Stackevo-discuss] [Stackevo] draft-byrne-ops… Brian Trammell
Re: [Stackevo-discuss] [Stackevo] draft-byrne-ops… Joe Touch
Re: [Stackevo-discuss] [tsvwg] [Stackevo] draft-b… Joe Touch
Re: [Stackevo-discuss] [tsvwg] [Stackevo] draft-b… Dave Taht
[Stackevo-discuss] On boundaries and interfaces i… Brian Trammell
Re: [Stackevo-discuss] On boundaries and interfac… Tom Herbert

[Stackevo-discuss] On boundaries and interfaces in transport protocol evolution (was Re: [tsvwg] draft-byrne-opsec-udp-advisory)

Attachment: signature.asc