Re: [multipathtcp] Options or Payload: pros and cons

Costin Raiciu <> Wed, 02 December 2009 14:50 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 6CF4E3A685B for <>; Wed, 2 Dec 2009 06:50:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -3.266
X-Spam-Status: No, score=-3.266 tagged_above=-999 required=5 tests=[AWL=-0.667, BAYES_00=-2.599]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id vnoDqathOzlF for <>; Wed, 2 Dec 2009 06:50:45 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 7CAF43A67EC for <>; Wed, 2 Dec 2009 06:50:45 -0800 (PST)
Received: from ([]) by with esmtpsa (TLSv1:AES128-SHA:128) (C.Raiciu authenticated) (Exim 4.54) id 1NFqMv-000HWL-CQ for; Wed, 02 Dec 2009 14:39:49 +0000
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1077)
From: Costin Raiciu <>
In-Reply-To: <>
Date: Wed, 2 Dec 2009 14:50:32 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <>
To: " List" <>
X-Mailer: Apple Mail (2.1077)
Subject: Re: [multipathtcp] Options or Payload: pros and cons
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Multi-path extensions for TCP <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 02 Dec 2009 14:50:47 -0000

Hi Bryan,

Your mail is not late. It's spot on, as we're still trying to figure out the best solution. 
You make some very good points.

Thanks for the thoughts.

On 30 Nov 2009, at 15:24, Bryan Ford wrote:

> Trying to catch up on the list activity before, during, and after the Hiroshima meeting, I'm wondering if there's a summary of any significant progress on the "options vs payload" debate.  To add my $.02 to part of the list discussion that might already be obsolete, but maybe not...
> On Nov 10, 2009, at 9:21 PM, Costin Raiciu wrote:
>> Options
>> + nice with tcp
>> + single subflow mptcp if friendly with IDSes, and traffic normalizers (i.e. data flow is what the app receives).
>> - reliability is more difficult (how often do you retransmit?) drop packets? what's the failure mode when options don't get through?
>> - boxes may crash when parsing unknown stuff. The hope is that on SYN processing is more robust with unknown options. ECN story is frightening :). Murari (Microsoft)
>> says that even changing position of padding bits in options crashes some middleboxes; I've heard a similar story from the Linux kernel experience. However, we can't (cleanly) avoid sending options on SYNs.
> Not to mention the show-stopper in my view:
> - there just isn't enough free option space left to do what needs to be done, as has been pointed out before (e.g., by Bill Herrin).  Especially given that some of these options would need to carry cryptographically unpredictable identifiers (Connection IDs, Subflow IDs, etc.), which will need to be either "fairly big" or "growable" or both.
>> Payload
>> + option space limits no longer matter; can do funkier stuff later, like negotiate keys, etc.
>> + easier to get reliability; retransmit of control data linked to data retransmit
>> + easier to deal with middleboxes that resegment (it just works!)
>> + new options on syns may be allowed, but not on data (is this true)?
>> - feels like a hack, not the traditional way to evolve tcp
> From my architectural perspective MPTCP is a new, separate transport protocol that happens to use legacy TCP as an underlying "Flow/Endpoint Layer" protocol for legacy network compatibility purposes.  As such, from my perspective, adding a bunch of new options to legacy TCP feels like the hack to me.  In contrast, putting the new protocol's new stuff into a new "multipath transport layer" cleanly layered atop and independent of the legacy TCP "flow/endpoint layer" - i.e., encoded within the legacy TCP payloads - feels like the architecturally clean approach.  All a matter of perspective, of course.
>> - implies encoding of any data sent MUST  be ASCII; middleboxes (ids, traffic normalizers) that look at content and see binary stuff will probably drop the data; finally, if we used binary stuff, there is no guarantee for fate sharing. . Text encoding is less space efficient; does it matter?
> This issue seems like either a red herring or just the tip of the iceberg.  Any way you cut it, the stuff a NAT/firewall is going to see within an MPTCP subflow is often going to look a lot different from the same application running atop classic TCP.  That's going to confuse some middleboxes; the question is just how hard we want to work to avoid that.
> Consider the obvious "common case" of an HTTP conversation atop a classic TCP flow versus two or more MPTCP subflows.  Suppose the HTTP request and response headers are small, as they typically are, and are (sometimes) followed by a large binary blob, e.g., a JPEG.  Assume as is likely the HTTP request/response text fits in one IP packet.  If MPTCP is doing its job, then that first payload segment containing the HTTP request/response header will flow over only one subflow (most likely the initial subflow); the first TCP payload data any middlebox sees flowing on any OTHER subflow is going to come from somewhere in the middle of the binary blob.  Unless MPTCP deliberately takes some really special measures to hack around this (e.g., by redundantly sending the initial HTTP request/response on all subflows), this binary blob data IS going to confuse any middlebox that is trying to look into one of those other subflows expecting a parseable HTTP request or response.  The question is then just how badly the middlebox behaves in that situation, and I expect to find a wide range of answers in the real world.
> The problem here is that, unlike most incremental, backward-compatible TCP enhancements of the past like SACK, PAWS, etc., MPTCP is not just a control-plane tweak but fundamentally involves slicing and dicing the payload data.  If a legacy middlebox is looking at an MPTCP subflow, ignoring any new MPTCP negotiation options but expecting to find an HTTP header in the payload because the subflow is directed at port 80, it's going to get confused because what's running on top of the legacy TCP subflow is not really HTTP anymore (as the port number 80 suggests it should be) but rather an MPTCP payload with HTTP somehow more deeply layered/embedded atop that.  If we really want to avoid such forms of confusion, I see at least the following options:
> 1. Have MPTCP use a different TCP port number: e.g., allocate one TCP port number for all MPTCP subflows, and embed the "real" application-visible port number in an MPTCP header embedded in TCP payload.  Then legacy middleboxes just see legacy TCP connections to an unknown port: on the upside, they are unlikely to expect to see an HTTP header and fall over if they don't; on the downside, such TCP connections will be less likely to get through many firewalls, will break applications' attempts at doing TCP-based NAT traversal, etc.
> 2. Have MPTCP use the application's port number in the initial flow but a "special" port number for other subflows, in the hopes that the middlebox only really cares about the beginning of the stream and will "stop paying attention" in any detail after it sees, e.g., the expected HTTP header.  Non-initial subflows are still likely to get filtered as above, but at least the initial flow will look "normal" up until the point MPTCP starts actually distributing traffic.  Application-directed NAT traversals will probably work to the extent they work under classic TCP, though only on the initial flow: other subflows will probably just fail, effectively disabling MPTCP for such connections.
> 3. Design MPTCP to use not just TCP but in fact HTTP-over-TCP as its legacy-network-compatibility "Flow/Endpoint Layer".  That is, _every_ MPTCP subflow looks like an HTTP connection to port 80 (regardless of the application running atop MPTCP), and its TCP payload starts with its own pseudo-HTTP header whose purpose is merely to get through legacy middleboxes.  This is likely to get through more middleboxes, but adds overhead and all the usual complications of tunneling.  Furthermore, based on Microsoft's experience with DirectAccess I hear even HTTP-over-TCP is often not adequate to get through many middleboxes - therefore, one might even consider...
> 4. Encapsulate MPTCP within an HTTP-over-SSL-over-TCP tunnel, as Microsoft did with IP-over-HTTPS (  All MPTCP subflows look to the network like HTTPS connections to port 443.  The internal HTTP layer doesn't matter to the network since it's encrypted; its presence is just so that port 443 can be shared with other conventional web server functions on the responder side of the connection.
> I'm not seriously suggesting that the MPTCP spec should necessarily incorporate any of these alternatives; just pointing out how slippery this slope is once we start down it.  Perhaps it does at least represent a further argument that MPTCP needs the architectural flexibility to have multiple compatibility Flow/Endpoint layers "plugged into" it underneath the multipath transport logic, each providing varying compatibility/efficiency/desperation tradeoffs.
> Bryan