Re: [tcpm] More TCP option space in a SYN: draft-briscoe-tcpm-syn-op-sis-02

Andrew Yourtchenko <ayourtch@cisco.com> Thu, 25 September 2014 23:28 UTC

Return-Path: <ayourtch@cisco.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8A9AC1A00F4 for <tcpm@ietfa.amsl.com>; Thu, 25 Sep 2014 16:28:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.087
X-Spam-Level:
X-Spam-Status: No, score=-14.087 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, J_CHICKENPOX_14=0.6, J_CHICKENPOX_52=0.6, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.786, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qvHMo5kNaZho for <tcpm@ietfa.amsl.com>; Thu, 25 Sep 2014 16:28:04 -0700 (PDT)
Received: from alln-iport-2.cisco.com (alln-iport-2.cisco.com [173.37.142.89]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 854F41A00F0 for <tcpm@ietf.org>; Thu, 25 Sep 2014 16:28:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=18276; q=dns/txt; s=iport; t=1411687684; x=1412897284; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=SQhaBp/3p2Jb+o27aGByXvXsximt8u84tyXGKU0ZdQc=; b=C3gFMQ294/y3EmSnnbq4EwBCcznzopbozCylRazBczMhjb+jns1iKCQn l+kdKzfTy2R+m7bS7H5sBYPPqlpY4JeYidTrgxruYjC4yXSYR+YL+NGgv zJjncsCEJBl4bWQCQvxG3sniMcvhmdsYH0K+9pW3N7EjcYxbYwrZMHter A=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AsgPABmkJFStJA2N/2dsb2JhbABXCYMOU1cEiEStaAEBAQEBAQUBcgGSewqHTgKBBhYBe4QDAQEBAwEBAQELGRECKwkEBwULCxguJzAGDgUbiBsIDcFgAReGEoQYhRIHCgFQB4RLAQSPSIZSgj6GLZN0g2VqAYEFBwIXBB6BAgEBAQ
X-IronPort-AV: E=Sophos;i="5.04,600,1406592000"; d="scan'208";a="81372289"
Received: from alln-core-8.cisco.com ([173.36.13.141]) by alln-iport-2.cisco.com with ESMTP; 25 Sep 2014 23:28:01 +0000
Received: from xhc-aln-x03.cisco.com (xhc-aln-x03.cisco.com [173.36.12.77]) by alln-core-8.cisco.com (8.14.5/8.14.5) with ESMTP id s8PNS1V7021418 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 25 Sep 2014 23:28:01 GMT
Received: from ams-ayourtch-8813.cisco.com (10.55.47.212) by xhc-aln-x03.cisco.com (173.36.12.77) with Microsoft SMTP Server (TLS) id 14.3.195.1; Thu, 25 Sep 2014 18:28:00 -0500
Date: Fri, 26 Sep 2014 01:27:42 +0200
From: Andrew Yourtchenko <ayourtch@cisco.com>
X-X-Sender: ayourtch@ayourtch-mac
To: Bob Briscoe <bob.briscoe@bt.com>
In-Reply-To: <201409251842.s8PIgUdQ015414@bagheera.jungle.bt.co.uk>
Message-ID: <alpine.OSX.2.00.1409260049040.69041@ayourtch-mac>
References: <201409222045.s8MKjZdD002071@bagheera.jungle.bt.co.uk> <542344DA.9020905@isi.edu> <201409250956.s8P9uae9013452@bagheera.jungle.bt.co.uk> <alpine.OSX.2.00.1409251716260.69041@ayourtch-mac> <201409251842.s8PIgUdQ015414@bagheera.jungle.bt.co.uk>
User-Agent: Alpine 2.00 (OSX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
X-Originating-IP: [10.55.47.212]
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/7fHP2E5_WVgTDywa7DtJjqD9S7g
Cc: tcpm IETF list <tcpm@ietf.org>, Joe Touch <touch@isi.edu>
Subject: Re: [tcpm] More TCP option space in a SYN: draft-briscoe-tcpm-syn-op-sis-02
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Sep 2014 23:28:07 -0000

Bob,

On Thu, 25 Sep 2014, Bob Briscoe wrote:

> Andrew,
>
> At 17:41 25/09/2014, Andrew Yourtchenko wrote:
>> Bob, Joe, all,
>> 
>> some comments below, maybe with some obvious questions as I did not follow 
>> the latest discussions very closely.
>> 
>> On Thu, 25 Sep 2014, Bob Briscoe wrote:
>> 
>>> Joe,
>>> 
>>> At 23:25 24/09/2014, Joe Touch wrote:
>>>> Hi, Bob (et al.),
>>>> It's good to have a more detailed description of the proposal.
>>>> I still find a dual-SYN solution untenable, though, as it has been for
>>>> other upgrade paths in the past (e.g., IPv6).
>>> 
>>> That's why I prefer syn-op-sis because it only uses 2 SYNs for transition.
>> 
>> Both syn-op-sis and tcp-syn-ext-opt talk about using
>> different port pairs - is this due to the explosion with the number of 
>> possibilities for the three-way handshake recovery ?
>
> SYN-EOS OOB (within tcp-syn-ext-opt) uses the same src & dst ports.
> SYN-EOS DS (within tcp-syn-ext-opt) and syn-op-sis use same dst, but 
> different src.
>
> Rationale for latter: too many middleboxes out there that would reject a SYN 
> to/from the same ports, in their misguided attempts to be 'helpful' by 
> removing SYNs that appear to be duplicate or same 4-tuple but different 
> initial sequence no. Not only many firewalls, but all split connections would 
> not forward the second SYN.

oh, *different* ISNs. Yes. I remember having to file a bug in at least 
one of the middleboxes that did not like such a scenario, it got fixed, 
but probably there're more. An experiment would be definitely interesting.

>
>
>> (Of course the application will anyway see only 1 file descriptor, which 
>> will assume the identity of the connection that succeeds, right ?)
>
> Correct.
>
>
>
>>> I've included a new section on ways to use only 1 SYN during transition 
>>> (building a white-list) and ultimately moving to 1 SYN in the future, only 
>>> falling back to the legacy SYN in series rather than parallel if you hit a 
>>> legacy listener.
>> 
>> The fundamental difference between the transition to the extended option 
>> space and IPv4->IPv6 transition is the absence of the DNS to signal the 
>> capabilities of the endpoint before the connection even starts - if the 
>> target host has A&AAAA you only have to deal with the failures on the path, 
>> while if the target host does not have AAAA record, it's worthless to try 
>> IPv6.
>> 
>> OTOH, when attempt to negotiate the extended option space, you do not have 
>> any hints about the remote end - besides maybe past memory.
>> 
>> In RFC6555 we tried to avoid dual-SYN as much as possible by providing a 
>> headstart that shrinks and inverses on failures - so the reference to 
>> RFC6555 being "parallel" is not correct, see section 5.5:
>>
>>    The primary purpose of Happy Eyeballs is to reduce the wait time for
>>    a dual-stack connection to complete, especially when the IPv6 path is
>>    broken and IPv6 is preferred.  Aggressive timeouts (on the order of
>>    tens of milliseconds) achieve this goal, but at the cost of network
>>    traffic.  This network traffic may be billable on certain networks,
>>    will create state on some middleboxes (e.g., firewalls, intrusion
>>    detection systems, NATs), and will consume ports if IPv4 addresses
>>    are shared.  For these reasons, it is RECOMMENDED that connection
>>    attempts be paced to give connections a chance to complete.  It is
>>    RECOMMENDED that connection attempts be paced 150-250 ms apart to
>>    balance human factors against network load.
>
> The dual SYN approaches could use a delay between the SYNs like 6655 does 
> (not nec exactly the same numbers). However, in the short flow world, I would 
> argue that 250ms is completely unacceptable unless you're fairly certain the 
> first one will work, which is more-or-less the v4/v6 scenario that happy 
> eyeballs addressed. Reasoning: multiply 250ms by a few connections with 
> serial dependencies in a sequence and you have delays of seconds.
>
> I've made a note to change the description of 6555 in the next rev - clearly 
> the word parallel is not strictly correct anyway - one serial (e.g. network) 
> interface can't actually send two SYNs in parallel, the closest they can 
> leave is back-to-back.

True... you might start with some conservative values, but 
then as you get RTT estimate for the connection, an implementation might 
opt to save that RTT estimate in a cache 
(http://www.tedunangst.com/flak/post/2Q-buffer-cache-algorithm seems maybe 
appropriate for this type of use?) And then we're somewhere in 6555 
territory anyway. So much for simplicity! :-)

>
> Regarding billable total traffic:
> Damon Wischik once proposed to improve performance by duplicating the most 
> sensitive packets at flow start:
>        "three copies of the first packet of every message, two copies of the 
> second and one copy of subsequent packets"
> He calculated these 6 extra full-sized packets per flow would (in 2005) 
> increase total traffic by 2.7% (because elephants dominate total traffic).
> <http://rsta.royalsocietypublishing.org/content/366/1872/1941.full>
> Duplicating just the legacy part of the SYN (60B max) would have a much 
> smaller total effect (I could calculate it if you want - probably sub-0.1%). 
> Certainly too small for people to care about the increment on bills or even 
> packet load.

Thanks for the link! I will read it in detail tomorrow, for now just 
glanced, and there does not seem to be any discussion of what happens if 
the link is constrained ?

Some time ago I had to work a nontrivial amount of time over a 3G 
throttled down to 64Kbps, and a number of tricks were required to change 
the experience from "unusable" to "slow but eventually loaded": switch 
from Chrome to Firefox (primarily because of my lack of knowledge of deep 
tuning for the former, maybe there was a better way!), and reducing the 
number of simultaneous connections down to 4, from 32.

Maybe it's small enough and I am being paranoid, but the 64kbps 
environments are still real (unfortunately).

>
>
>> In today's conditions, a simple 300ms headstart for IPv6 if AAAA is 
>> advertised might be sufficient, as the subpar IPv6 connectivity
>> disappears, and more of the corner cases get into play for RTTs.
>> 
>> So, depending on how much of a drop rate we'd expect for a "new" SYN across 
>> the variety of network scenarios, the simpler approach of just using 300ms 
>> delay could be reasonable due to its simplicity - an experiment could show.
>> 
>> Doing this experiment on both address families at once could also reveal 
>> the drop rates to be different - I'd expect the IPv6 drop rates to be 
>> smaller - so maybe there is an opportunity to simplify things - if it "just 
>> works" in IPv6 using a single fancy-looking SYN in a sufficient %% of 
>> cases, that could be a great simplification *and* will give additional use 
>> case for IPv6 assuming there is enough demand for more space in the 
>> options.
>
> The delay between the pair should be implementation dependent anyway. No 
> point arguing this to death in standards... but I will at least state the 
> opposing view...
>
> The world is changing. Latency is everything and soon it will be even more 
> than everything. If one values latency, the delay between the two will be as 
> small as possible.

Agreed.

>
> My simple view of the world is that, dual SYN seems the /only/ way to extend 
> TCP option space. I'm not proud of it. It's not nice. But it's necessary. If 
> dual SYNs require more memory for connection state, it's OK as long as the 
> need for more memory appears slowly enough to be able to deploy it. If this 
> is the price of extra capabilities, and those capabilities are valued, then 
> we invest in memory to provide the capabilities.
>
> Whether the number of connections that need >40B TCP options will suddenly 
> jump, or gradually grow, I can't predict tho.
>

Again, makes sense.

>
>
>>> The other two in tcp-syn-ext-opt have to be 2-SYN for ever.
>>> 
>>>> Also, I don't particularly think that putting the options at the end of
>>>> the segment serves any useful purpose other than to make this option
>>>> handle the option space in a different way than the default DO header 
>>>> field.
>>> 
>>> That's the architectural point.
>>> There's also a tactical point. It should get through every middlebox.
>> 
>> Help me understand this argument. If we were sending the "old" and "new" 
>> SYNs on the same 5-tuple - sure, the data must be identical.
>> But if source ports are different, then a typical middlebox is either happy 
>> about the "data in the SYN" or not ?
>> 
>> Or, are you saying that placing them at the end of the packet makes it 
>> difficult for the current architectures to "grab" them and mess around with 
>> them ? Sure, it does. Especially higher-performing boxes might look at the 
>> start of the packet.
>> 
>> But that's *today's* architectures - if someone pays one of my employer's 
>> BUs enough to create a box that fiddles with these options, that box will 
>> get created. Not an architecturally beautiful reality, but it is there.
>
> Two clarifications:
>
> 1) By putting options after the payload, I'm including the (currently common) 
> case of a SYN with no payload. Then the extra options will start immediately 
> at the Data Offset.
>
> 2) A longer clarification...
>
> Let's separate middleboxes into two types (I'm trying to deal with both):
> - (meddleboxes) those that want to mess with headers for security
> - (muddleboxes) those that happen to mess with headers, because their 
> developers never stopped to think that TCP might change.
>
> Muddleboxes tend to forward the payload unaltered. Some muddleboxes look at 
> HTTP headers and try to be clever. By putting extra options after the Data 
> Offset, they will pass the former. By putting them at the end of the payload 
> they will mostly pass the latter.
>
> Meddleboxes might well evolve to find these new extra options at the end of 
> the packet. But in the meantime, we've got over the transition of deploying a 
> new protocol. Then the meddleboxes are on the back foot, and they have to 
> justify removing these packets that are useful to someone. If they can - e.g. 
> for security reasons - then fine. But they are not just removing packets 
> because they are using a new rev of the protocol.
>
> As I said, the most important thing is we will have separated TCP options 
> into two sets:
> - Those that middleboxes (past and future) can mess with
> - Those that middleboxes (future) should think much harder about before 
> messing with them
>
> Then, this (maybe) enables the next step - for endpoints to encrypt those 
> options they don't want middleboxes to interact with at all. (See my response 
> to Alex Zimmerman's post in this thread.)
>

Thanks for the detailed explanation! This might be useful to just 
copypaste into the draft in the next rev, for the rationale.

Now, a trick question. We all know IPv4 is entrenched with middleboxes 
necessarily meddling with L7 because they have to do NAT in the process.
What about IPv6 ? I have a totally unfounded hope that there the situation 
might be better, at least for now ?

>
>
>>>> I'm glad to hear what others think and agree that an unbiased comparison
>>>> of the approaches would be a useful place to for the discussion to 
>>>> start...
>>> 
>>> Yes, I'm already repeating myself - you're right we need wider opinions.
>> 
>> Even though I am nominally "on the list of alternatives" due to LOIC, I've 
>> disliked the idea shortly after, after trying to implement it - the linux 
>> kernel's accelerated NIC path showed that the invalid CRC would be dropped 
>> very early, before the software can see it - so I would argue against that 
>> approach today from implementation standpoint as well.
>> 
>> Hopefully arguing with my past self does buy me an "unbiased" ribbon, or at 
>> least a half of it :-)
>
> Certainly, this seems to give you the role of unbiased /and/ educated by 
> experience.
>
> I'm also no longer like an idea that my past self proposed (SYN-EOS DS), for 
> the same reasons Joe didn't like it - the two SYNs will often not end up in 
> the same place to be re-united. But I still have a candidate I do prefer 
> (syn-op-sis). That rules me out for the role of innocent bystander.

Probably the best way to decide is to implement both, get them in parallel 
into the olde wilde internets and see what sticks better!

--a

>
>
> Bob
>
>
>> --a
>> 
>>> 
>>> 
>>> 
>>> Bob
>>> 
>>> 
>>>> Joe
>>>> On 9/22/2014 7:55 AM, Bob Briscoe wrote:
>>>> > Joe,
>>>> >
>>>> > I've just posted a revision here:
>>>> > <http://tools.ietf.org/html/draft-briscoe-tcpm-syn-op-sis-02>
>>>> >
>>>> > * I've taken more care to explain it. It was a bit terse before. So 
>>>> it's
>>>> > ready for the list to review properly now.
>>>> >
>>>> > It would be useful if someone wrote a comparison with the other ideas 
>>>> on
>>>> > how to do this, including the ones we wrote jointly and SLO, LOIC and
>>>> > DBormam's 4-way HS. Better someone independent of all the authors does
>>>> > this. But I want to get syn-op-sis right first, a comparison won't be
>>>> > much use until its stable.
>>>> >
>>>> >
>>>> > The underlying idea is still the same. The reasons I've updated it are
>>>> > many (the change log taken from the appendix is pasted at the end of
>>>> > this email for convenience).
>>>> >
>>>> > * Mainly I've articulated that all these approaches that add TCP 
>>>> options
>>>> > after the Data Offset are not just hacks to get more space; they add a
>>>> > new architectural capability to TCP - a distinction between TCP options
>>>> > that might be relevant to middleboxes, and ones that are only for the
>>>> > destination (a bit like destopt in IPv6 - needing to specify which TCP
>>>> > options are for the destination only is ironic but necessary).
>>>> >
>>>> > * I've also realised that this approach will ultimately return to a
>>>> > single SYN solution, whereas both the ideas in
>>>> > draft-touch-tcpm-tcp-syn-ext-opt will always require two SYNs.
>>>> >
>>>> >
>>>> >
>>>> > Bob
>>>> >
>>>> > Pasted from the change log:
>>>> >
>>>> >    From briscoe...-01 to briscoe...-02:
>>>> >
>>>> >       Technical changes:
>>>> >
>>>> >       *  Defined the client behaviour dependent on which response
>>>> >          arrives first.
>>>> >
>>>> >       *  Allowed retransmission of either SYN or SYN-U if no response
>>>> >          from either.
>>>> >
>>>> >       *  Redefined EOO as an offset from the end of the packet, not 
>>>> from
>>>> >          the beginning of the payload.
>>>> >
>>>> >       *  Added section on Migration to a Single Handshake.  Reworded
>>>> >          dual handshake so that it is not mandatory for the client to
>>>> >          send dual SYNs simultaneously; only the relation between the
>>>> >          SYNs and the response to either is mandatory, while parallel
>>>> >          SYNs is purely for latency reduction.
>>>> >
>>>> >       *  Added rules for writing TCP options, i.e. i) options like TFO
>>>> >          MUST NOT be located in the TCP header and ii) add no-ops to
>>>> >          align on 4-octet boundary.
>>>> >
>>>> >       *  Added rules for forwarding TCP options, i.e. only the
>>>> >          destination looks for TCP options after the Data Offset, not
>>>> >          middleboxes.
>>>> >
>>>> >       *  Moved the Explicit Handshake variant (SYN-L) into the body 
>>>> from
>>>> >          the appendix, and recommended the choice could be down to
>>>> >          implementers or apps.  Included section on corner cases.
>>>> >
>>>> >       *  Introduced more normative language throughout the Protocol
>>>> >          Spec.
>>>> >
>>>> >       Editorial changes:
>>>> >
>>>> >       *  Added temporary motivation section
>>>> >
>>>> >       *  Added confusible terminology to Terminology section.
>>>> >
>>>> >       *  Divided protocol spec into sub-sections.
>>>> >
>>>> >       *  Handshake table: Clarified that the two columns under each
>>>> >          server represent separate threads, that may run on separate
>>>> >          servers, without co-ordination.  Represented message
>>>> >          dependencies in the alignment of the rows.
>>>> >
>>>> >       *  Explained the table.
>>>> >
>>>> >       *  Explained why a legacy server won't ever pass SYN-U to the 
>>>> app.
>>>> >
>>>> >       *  More precisely described loss as 'not arrived before a
>>>> >          timeout', and explained the tradeoff between latency and extra
>>>> >          TCP options.
>>>> >
>>>> >       *  Gave reasoning for locating TCP options in three groups.
>>>> >
>>>> >       *  Acknowledged Rob Hancock for the architectural idea of hiding
>>>> >          an extension to a protocol in the layer above.
>>>> >
>>>> >       *  Appendix about protocol alternatives now only presents the 
>>>> SYN-
>>>> >          UD alternative, given the implicit/explicit handshake choice
>>>> >          has been moved to the body.
>>>> >
>>>> >       *  Rewrote appendix about comparing the choices to treat the two
>>>> >          pairs of choices separately, rather than discussing all four
>>>> >          combinations of pairs of choices.
>>>> >
>>>> > ________________________________________________________________
>>>> > Bob Briscoe,                                                  BT
>>> 
>>> ________________________________________________________________
>>> Bob Briscoe,                                                  BT 
>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
>> 
>> ________________________________________________________________
>> Bob Briscoe,                                                  BT