Re: [tcpm] More TCP option space in a SYN: draft-briscoe-tcpm-syn-op-sis-02

Bob Briscoe <bob.briscoe@bt.com> Thu, 25 September 2014 18:42 UTC

Return-Path: <bob.briscoe@bt.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C0CBB1A87B0 for <tcpm@ietfa.amsl.com>; Thu, 25 Sep 2014 11:42:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.787
X-Spam-Level:
X-Spam-Status: No, score=-2.787 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.786, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iccuV5uZpasc for <tcpm@ietfa.amsl.com>; Thu, 25 Sep 2014 11:42:35 -0700 (PDT)
Received: from hubrelay-rd.bt.com (hubrelay-rd.bt.com [62.239.224.98]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0356F1A87BA for <tcpm@ietf.org>; Thu, 25 Sep 2014 11:42:34 -0700 (PDT)
Received: from EVMHR71-UKRD.domain1.systemhost.net (10.36.3.109) by EVMHR66-UKRD.bt.com (10.187.101.21) with Microsoft SMTP Server (TLS) id 14.3.195.1; Thu, 25 Sep 2014 19:42:31 +0100
Received: from EPHR02-UKIP.domain1.systemhost.net (147.149.100.81) by EVMHR71-UKRD.domain1.systemhost.net (10.36.3.109) with Microsoft SMTP Server (TLS) id 8.3.348.2; Thu, 25 Sep 2014 19:42:31 +0100
Received: from bagheera.jungle.bt.co.uk (132.146.168.158) by EPHR02-UKIP.domain1.systemhost.net (147.149.100.81) with Microsoft SMTP Server id 14.3.181.6; Thu, 25 Sep 2014 19:42:31 +0100
Received: from BTP075694.jungle.bt.co.uk ([10.215.130.93]) by bagheera.jungle.bt.co.uk (8.13.5/8.12.8) with ESMTP id s8PIgUdQ015414; Thu, 25 Sep 2014 19:42:30 +0100
Message-ID: <201409251842.s8PIgUdQ015414@bagheera.jungle.bt.co.uk>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Thu, 25 Sep 2014 19:42:28 +0100
To: Andrew Yourtchenko <ayourtch@cisco.com>
From: Bob Briscoe <bob.briscoe@bt.com>
In-Reply-To: <alpine.OSX.2.00.1409251716260.69041@ayourtch-mac>
References: <201409222045.s8MKjZdD002071@bagheera.jungle.bt.co.uk> <542344DA.9020905@isi.edu> <201409250956.s8P9uae9013452@bagheera.jungle.bt.co.uk> <alpine.OSX.2.00.1409251716260.69041@ayourtch-mac>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-Scanned-By: MIMEDefang 2.56 on 132.146.168.158
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/lHE4suJd6OYn3aIMG7N0DPKnSQs
Cc: tcpm IETF list <tcpm@ietf.org>, Joe Touch <touch@isi.edu>
Subject: Re: [tcpm] More TCP option space in a SYN: draft-briscoe-tcpm-syn-op-sis-02
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Sep 2014 18:42:38 -0000

Andrew,

At 17:41 25/09/2014, Andrew Yourtchenko wrote:
>Bob, Joe, all,
>
>some comments below, maybe with some obvious questions as I did not 
>follow the latest discussions very closely.
>
>On Thu, 25 Sep 2014, Bob Briscoe wrote:
>
>>Joe,
>>
>>At 23:25 24/09/2014, Joe Touch wrote:
>>>Hi, Bob (et al.),
>>>It's good to have a more detailed description of the proposal.
>>>I still find a dual-SYN solution untenable, though, as it has been for
>>>other upgrade paths in the past (e.g., IPv6).
>>
>>That's why I prefer syn-op-sis because it only uses 2 SYNs for transition.
>
>Both syn-op-sis and tcp-syn-ext-opt talk about using
>different port pairs - is this due to the explosion with the number 
>of possibilities for the three-way handshake recovery ?

SYN-EOS OOB (within tcp-syn-ext-opt) uses the same src & dst ports.
SYN-EOS DS (within tcp-syn-ext-opt) and syn-op-sis use same dst, but 
different src.

Rationale for latter: too many middleboxes out there that would 
reject a SYN to/from the same ports, in their misguided attempts to 
be 'helpful' by removing SYNs that appear to be duplicate or same 
4-tuple but different initial sequence no. Not only many firewalls, 
but all split connections would not forward the second SYN.


>(Of course the application will anyway see only 1 file descriptor, 
>which will assume the identity of the connection that succeeds, right ?)

Correct.



>>I've included a new section on ways to use only 1 SYN during 
>>transition (building a white-list) and ultimately moving to 1 SYN 
>>in the future, only falling back to the legacy SYN in series rather 
>>than parallel if you hit a legacy listener.
>
>The fundamental difference between the transition to the extended 
>option space and IPv4->IPv6 transition is the absence of the DNS to 
>signal the capabilities of the endpoint before the connection even 
>starts - if the target host has A&AAAA you only have to deal with 
>the failures on the path, while if the target host does not have 
>AAAA record, it's worthless to try IPv6.
>
>OTOH, when attempt to negotiate the extended option space, you do 
>not have any hints about the remote end - besides maybe past memory.
>
>In RFC6555 we tried to avoid dual-SYN as much as possible by 
>providing a headstart that shrinks and inverses on failures - so the 
>reference to RFC6555 being "parallel" is not correct, see section 5.5:
>
>    The primary purpose of Happy Eyeballs is to reduce the wait time for
>    a dual-stack connection to complete, especially when the IPv6 path is
>    broken and IPv6 is preferred.  Aggressive timeouts (on the order of
>    tens of milliseconds) achieve this goal, but at the cost of network
>    traffic.  This network traffic may be billable on certain networks,
>    will create state on some middleboxes (e.g., firewalls, intrusion
>    detection systems, NATs), and will consume ports if IPv4 addresses
>    are shared.  For these reasons, it is RECOMMENDED that connection
>    attempts be paced to give connections a chance to complete.  It is
>    RECOMMENDED that connection attempts be paced 150-250 ms apart to
>    balance human factors against network load.

The dual SYN approaches could use a delay between the SYNs like 6655 
does (not nec exactly the same numbers). However, in the short flow 
world, I would argue that 250ms is completely unacceptable unless 
you're fairly certain the first one will work, which is more-or-less 
the v4/v6 scenario that happy eyeballs addressed. Reasoning: multiply 
250ms by a few connections with serial dependencies in a sequence and 
you have delays of seconds.

I've made a note to change the description of 6555 in the next rev - 
clearly the word parallel is not strictly correct anyway - one serial 
(e.g. network) interface can't actually send two SYNs in parallel, 
the closest they can leave is back-to-back.

Regarding billable total traffic:
Damon Wischik once proposed to improve performance by duplicating the 
most sensitive packets at flow start:
         "three copies of the first packet of every message, two 
copies of the second and one copy of subsequent packets"
He calculated these 6 extra full-sized packets per flow would (in 
2005) increase total traffic by 2.7% (because elephants dominate 
total traffic).
<http://rsta.royalsocietypublishing.org/content/366/1872/1941.full>
Duplicating just the legacy part of the SYN (60B max) would have a 
much smaller total effect (I could calculate it if you want - 
probably sub-0.1%). Certainly too small for people to care about the 
increment on bills or even packet load.


>In today's conditions, a simple 300ms headstart for IPv6 if AAAA is 
>advertised might be sufficient, as the subpar IPv6 connectivity
>disappears, and more of the corner cases get into play for RTTs.
>
>So, depending on how much of a drop rate we'd expect for a "new" SYN 
>across the variety of network scenarios, the simpler approach of 
>just using 300ms delay could be reasonable due to its simplicity - 
>an experiment could show.
>
>Doing this experiment on both address families at once could also 
>reveal the drop rates to be different - I'd expect the IPv6 drop 
>rates to be smaller - so maybe there is an opportunity to simplify 
>things - if it "just works" in IPv6 using a single fancy-looking SYN 
>in a sufficient %% of cases, that could be a great simplification 
>*and* will give additional use case for IPv6 assuming there is 
>enough demand for more space in the options.

The delay between the pair should be implementation dependent anyway. 
No point arguing this to death in standards... but I will at least 
state the opposing view...

The world is changing. Latency is everything and soon it will be even 
more than everything. If one values latency, the delay between the 
two will be as small as possible.

My simple view of the world is that, dual SYN seems the /only/ way to 
extend TCP option space. I'm not proud of it. It's not nice. But it's 
necessary. If dual SYNs require more memory for connection state, 
it's OK as long as the need for more memory appears slowly enough to 
be able to deploy it. If this is the price of extra capabilities, and 
those capabilities are valued, then we invest in memory to provide 
the capabilities.

Whether the number of connections that need >40B TCP options will 
suddenly jump, or gradually grow, I can't predict tho.



>>The other two in tcp-syn-ext-opt have to be 2-SYN for ever.
>>
>>>Also, I don't particularly think that putting the options at the end of
>>>the segment serves any useful purpose other than to make this option
>>>handle the option space in a different way than the default DO header field.
>>
>>That's the architectural point.
>>There's also a tactical point. It should get through every middlebox.
>
>Help me understand this argument. If we were sending the "old" and 
>"new" SYNs on the same 5-tuple - sure, the data must be identical.
>But if source ports are different, then a typical middlebox is 
>either happy about the "data in the SYN" or not ?
>
>Or, are you saying that placing them at the end of the packet makes 
>it difficult for the current architectures to "grab" them and mess 
>around with them ? Sure, it does. Especially higher-performing boxes 
>might look at the start of the packet.
>
>But that's *today's* architectures - if someone pays one of my 
>employer's BUs enough to create a box that fiddles with these 
>options, that box will get created. Not an architecturally beautiful 
>reality, but it is there.

Two clarifications:

1) By putting options after the payload, I'm including the (currently 
common) case of a SYN with no payload. Then the extra options will 
start immediately at the Data Offset.

2) A longer clarification...

Let's separate middleboxes into two types (I'm trying to deal with both):
- (meddleboxes) those that want to mess with headers for security
- (muddleboxes) those that happen to mess with headers, because their 
developers never stopped to think that TCP might change.

Muddleboxes tend to forward the payload unaltered. Some muddleboxes 
look at HTTP headers and try to be clever. By putting extra options 
after the Data Offset, they will pass the former. By putting them at 
the end of the payload they will mostly pass the latter.

Meddleboxes might well evolve to find these new extra options at the 
end of the packet. But in the meantime, we've got over the transition 
of deploying a new protocol. Then the meddleboxes are on the back 
foot, and they have to justify removing these packets that are useful 
to someone. If they can - e.g. for security reasons - then fine. But 
they are not just removing packets because they are using a new rev 
of the protocol.

As I said, the most important thing is we will have separated TCP 
options into two sets:
- Those that middleboxes (past and future) can mess with
- Those that middleboxes (future) should think much harder about 
before messing with them

Then, this (maybe) enables the next step - for endpoints to encrypt 
those options they don't want middleboxes to interact with at all. 
(See my response to Alex Zimmerman's post in this thread.)



>>>I'm glad to hear what others think and agree that an unbiased comparison
>>>of the approaches would be a useful place to for the discussion to start...
>>
>>Yes, I'm already repeating myself - you're right we need wider opinions.
>
>Even though I am nominally "on the list of alternatives" due to 
>LOIC, I've disliked the idea shortly after, after trying to 
>implement it - the linux kernel's accelerated NIC path showed that 
>the invalid CRC would be dropped very early, before the software can 
>see it - so I would argue against that approach today from 
>implementation standpoint as well.
>
>Hopefully arguing with my past self does buy me an "unbiased" 
>ribbon, or at least a half of it :-)

Certainly, this seems to give you the role of unbiased /and/ educated 
by experience.

I'm also no longer like an idea that my past self proposed (SYN-EOS 
DS), for the same reasons Joe didn't like it - the two SYNs will 
often not end up in the same place to be re-united. But I still have 
a candidate I do prefer (syn-op-sis). That rules me out for the role 
of innocent bystander.


Bob


>--a
>
>>
>>
>>
>>Bob
>>
>>
>>>Joe
>>>On 9/22/2014 7:55 AM, Bob Briscoe wrote:
>>> > Joe,
>>> >
>>> > I've just posted a revision here:
>>> > <http://tools.ietf.org/html/draft-briscoe-tcpm-syn-op-sis-02>
>>> >
>>> > * I've taken more care to explain it. It was a bit terse before. So it's
>>> > ready for the list to review properly now.
>>> >
>>> > It would be useful if someone wrote a comparison with the other ideas on
>>> > how to do this, including the ones we wrote jointly and SLO, LOIC and
>>> > DBormam's 4-way HS. Better someone independent of all the authors does
>>> > this. But I want to get syn-op-sis right first, a comparison won't be
>>> > much use until its stable.
>>> >
>>> >
>>> > The underlying idea is still the same. The reasons I've updated it are
>>> > many (the change log taken from the appendix is pasted at the end of
>>> > this email for convenience).
>>> >
>>> > * Mainly I've articulated that all these approaches that add TCP options
>>> > after the Data Offset are not just hacks to get more space; they add a
>>> > new architectural capability to TCP - a distinction between TCP options
>>> > that might be relevant to middleboxes, and ones that are only for the
>>> > destination (a bit like destopt in IPv6 - needing to specify which TCP
>>> > options are for the destination only is ironic but necessary).
>>> >
>>> > * I've also realised that this approach will ultimately return to a
>>> > single SYN solution, whereas both the ideas in
>>> > draft-touch-tcpm-tcp-syn-ext-opt will always require two SYNs.
>>> >
>>> >
>>> >
>>> > Bob
>>> >
>>> > Pasted from the change log:
>>> >
>>> >    From briscoe...-01 to briscoe...-02:
>>> >
>>> >       Technical changes:
>>> >
>>> >       *  Defined the client behaviour dependent on which response
>>> >          arrives first.
>>> >
>>> >       *  Allowed retransmission of either SYN or SYN-U if no response
>>> >          from either.
>>> >
>>> >       *  Redefined EOO as an offset from the end of the packet, not from
>>> >          the beginning of the payload.
>>> >
>>> >       *  Added section on Migration to a Single Handshake.  Reworded
>>> >          dual handshake so that it is not mandatory for the client to
>>> >          send dual SYNs simultaneously; only the relation between the
>>> >          SYNs and the response to either is mandatory, while parallel
>>> >          SYNs is purely for latency reduction.
>>> >
>>> >       *  Added rules for writing TCP options, i.e. i) options like TFO
>>> >          MUST NOT be located in the TCP header and ii) add no-ops to
>>> >          align on 4-octet boundary.
>>> >
>>> >       *  Added rules for forwarding TCP options, i.e. only the
>>> >          destination looks for TCP options after the Data Offset, not
>>> >          middleboxes.
>>> >
>>> >       *  Moved the Explicit Handshake variant (SYN-L) into the body from
>>> >          the appendix, and recommended the choice could be down to
>>> >          implementers or apps.  Included section on corner cases.
>>> >
>>> >       *  Introduced more normative language throughout the Protocol
>>> >          Spec.
>>> >
>>> >       Editorial changes:
>>> >
>>> >       *  Added temporary motivation section
>>> >
>>> >       *  Added confusible terminology to Terminology section.
>>> >
>>> >       *  Divided protocol spec into sub-sections.
>>> >
>>> >       *  Handshake table: Clarified that the two columns under each
>>> >          server represent separate threads, that may run on separate
>>> >          servers, without co-ordination.  Represented message
>>> >          dependencies in the alignment of the rows.
>>> >
>>> >       *  Explained the table.
>>> >
>>> >       *  Explained why a legacy server won't ever pass SYN-U to the app.
>>> >
>>> >       *  More precisely described loss as 'not arrived before a
>>> >          timeout', and explained the tradeoff between latency and extra
>>> >          TCP options.
>>> >
>>> >       *  Gave reasoning for locating TCP options in three groups.
>>> >
>>> >       *  Acknowledged Rob Hancock for the architectural idea of hiding
>>> >          an extension to a protocol in the layer above.
>>> >
>>> >       *  Appendix about protocol alternatives now only presents the SYN-
>>> >          UD alternative, given the implicit/explicit handshake choice
>>> >          has been moved to the body.
>>> >
>>> >       *  Rewrote appendix about comparing the choices to treat the two
>>> >          pairs of choices separately, rather than discussing all four
>>> >          combinations of pairs of choices.
>>> >
>>> > ________________________________________________________________
>>> > Bob Briscoe,                                                  BT
>>
>>________________________________________________________________
>>Bob Briscoe,                                                  BT 
>>_______________________________________________
>>tcpm mailing list
>>tcpm@ietf.org
>>https://www.ietf.org/mailman/listinfo/tcpm
>
>________________________________________________________________
>Bob Briscoe,                                                  BT