Re: [tcpm] More TCP option space on SYNs

Joe Touch <touch@isi.edu> Sat, 31 May 2014 19:21 UTC

Return-Path: <touch@isi.edu>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F00D1A0073 for <tcpm@ietfa.amsl.com>; Sat, 31 May 2014 12:21:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.951
X-Spam-Level:
X-Spam-Status: No, score=-1.951 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_34=0.6, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vefdfiYCSmnu for <tcpm@ietfa.amsl.com>; Sat, 31 May 2014 12:21:25 -0700 (PDT)
Received: from darkstar.isi.edu (darkstar.isi.edu [128.9.128.127]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2290C1A0063 for <tcpm@ietf.org>; Sat, 31 May 2014 12:21:25 -0700 (PDT)
Received: from [192.168.1.93] (pool-71-105-87-112.lsanca.dsl-w.verizon.net [71.105.87.112]) (authenticated bits=0) by darkstar.isi.edu (8.13.8/8.13.8) with ESMTP id s4VJKrKC023160 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sat, 31 May 2014 12:20:56 -0700 (PDT)
Message-ID: <538A2B96.3030900@isi.edu>
Date: Sat, 31 May 2014 12:20:54 -0700
From: Joe Touch <touch@isi.edu>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: Bob Briscoe <bob.briscoe@bt.com>
References: <20140425221257.12559.43206.idtracker@ietfa.amsl.com> <2586_1398464386_535ADF82_2586_915_1_535ADF56.9050106@isi.edu> <CF8D8E25-E435-4199-8FD6-3F7066447292@iki.fi> <5363AF84.8090701@mti-systems.com> <5363B397.8090009@isi.edu> <CAO249yeyr5q21-=e6p5azwULOh1_jUsniZ6YPcDYd69av8MMYw@mail.gmail.com> <DCC98F94-EA74-4AAA-94AE-E399A405AF13@isi.edu> <655C07320163294895BBADA28372AF5D2CFE36@FR712WXCHMBA15.zeu.alcatel-lucent.com> <20140503122950.GM44329@verdi> <655C07320163294895BBADA28372AF5D2D009E@FR712WXCHMBA15.zeu.alcatel-lucent.com> <201405221710.s4MHAY4S002037@bagheera.jungle.bt.co.uk> <537E3ACD.5000308@isi.edu> <1AD79820-22C1-4500-84D1-1383F264D68C@weston.borman.com> <201405231213.s4NCDa5P005525@bagheera.jungle.bt.co.uk> <537F8202.4020907@isi.edu> <201405281715.s4SHFMm0014634@bagheera.jungle.bt.co.uk> <538623B9.2060209@isi.edu> <201405301642.s4UGgcvY030471@bagheera.jungle.bt.co.uk> <5388EB6F.4010405@isi.edu> <201405311819.s4VIJt2V003823@bagheera.jungle.bt.co.uk>
In-Reply-To: <201405311819.s4VIJt2V003823@bagheera.jungle.bt.co.uk>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: touch@isi.edu
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/jh0F6fxRratSUpRzEADeILr8xN4
Cc: "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] More TCP option space on SYNs
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 31 May 2014 19:21:27 -0000

Hi, Bob,

On 5/31/2014 11:19 AM, Bob Briscoe wrote:
> Joe,
>
> Thx for consolidating this thread. I've given it a new subject line.
>
> 1) You've silently made an important alteration to the proposed
> protocol. You've put the extra-options directly in the TCP option space
> of the C-SYN, not within the payload.

I hadn't assumed that; I was describing only which SYN carried which 
options, not where they were. As you do, I was assuming the C-SYN 
options were in the data space.

To be clear, in my variant - using data outside the sync'd connection 
(with ACK clear), the option space is similarly in the data portion of 
the 'extra' segment.

> Yes, your altered proposal is cleaner. However, don't imagine I didn't
> think of this. I did and I deliberately didn't do it this way.We have a
> choice:
>          clean and vulnerable vs. messy but robust.
>
> I'm not wedded to using port 80 and http headers, but this is perhaps
> the most pragmatic approach.

That's fraught with a whole slew of other problems, notably the 
middleboxes that currently try to translate, proxy, or otherwise modify 
anything they see as port 80.

Further, that would limit the number of pending connections.

> It will be really unorthodox to define such
> a protocol I know. We would have to say something like
>
>          "The dst port of the C-SYN MUST be 80, and the payload MUST start
>          with the constant magic_token, where
>          magic_token = 'PUT / HTTP/1.1<CRLF>Connection : DSO<CRLF><CRLF>'
>          "
>
> I'm sorry if even thinking about this makes you feel dirty :|
>
> Other suggestions for inner protocols are welcome, including tunnelled
> protocols, as long as middleboxes widely forward them, given their dst
> port.

I don't see what this gets us other than potential interactions. There's 
no reason to use HTTP to format TCP options, and it would necessitate 
new encodings and new processing for existing options that might end up 
in the extended space.

> 2) The main problem with your notation is it doesn't say /where/ the
> info is placed.

EDO says the options are at the head of the data segment, just with a 
longer pointer - the same way as current options.

For extending the SYN using a separate segment - whether a C-SYN or 
unsync'd data segment - the data similarly comes at the head of the 
data, just that there's no user data permitted after it.

> I've added notation as follows:
> TCP(base header [TCP options [APP(header[payload])]])
>
> And for the record I've made the if-else logic clearer.
>
> Where I've made more than clarifying edits inline, I've described them
> and tagged them with [BB].
>
> At 21:34 30/05/2014, Joe Touch wrote:
>> Hi, Bob,
>>
>> Let's get back to the core, in a simpler fashion, so other can follow it.
>>
>> I stand by my "there's no way to extend the space in the initial SYN",
>> but you've convinced me there *might* be a way to provide extended
>> space that can occur during the first phase of the TWHS. I think the
>> dual-SYN approach still isn't viable, but I've outlined an alternative
>> below that's similar but doesn't have the same baggage, IMO.
>>
>> Again, I'm still concerned by what midboxes might do to this...
>>
>> What do others think??
>>
>> Joe
>>
>> For quick review, here's what I understand:
>>
>>                 dso = dual-syn option
>>                         dso-D = data
>>                         dso-C = control
>>                 conn_id = identifier to link the two SYNs together
>>                 extra_opt = options that didn't fit in legacy SYN
>>                 fit_opt = options that do fit in the legacy SYN
>          new client endpoint sends
>                  TCP(port A SYN [dso-D(conn_id) + fit_opt] )
>                  TCP(port B SYN [dso-C [APP(headers [conn_id +
> extra_opt] ) ] ] )

IMO, the conn_ID in the C-SYN should similarly be in the regular option 
space, but I don't think it matters.

> [BB]: i/APP(headers...)/
>
>                          if (legacy server endpoint) { sends back two
> connections:
>                                  TCP(port A SYN-ACK [fit_opt] )
>                                  TCP(port B SYN-ACK [??] )
>>                                 (it's interpretation of extra_opt)

I'm assuming EDO after the SYN, which means the SYN-ACK has as much 
space as it wants for options, as would every other segment.

>                                          new client endpoint responds:
>                                          TCP(port A ACK) (established)
>                                          TCP(port B RST)
>
>>                         Notes about legacy servers:
>>                                 - they do twice the work on SYNs
>>                                 - they might keep twice the state
>>                                 (if not using cookies)
>>                                 - they might clean state if the RST
>>                                 is received, but that state might
>>                                 persist indefinitely (until the next
>>                                 connection, depending on timeouts, etc.)
>>
>>                         -----
>                          } elif (new server endpoint) { sends back one
> connection:
>                                  TCP(port A SYN-ACK [edo + fit_opt +
> extra_opt] )
>
> [BB]: s/dso-d/edo/

Sure. At this point we're basically in EDO-land.

>                                          new client endpoint responds:
>                                          TCP(port A ACK) (established)
>
>>
>>                         Notes:
>>                                 - can stall when dso-D SYN arrives
>>                                 before dso-C SYN, up to some limit
>>                                 - twice the work on SYNs (or more)
>                          }
>
>> Here's what I was assuming, though admittedly it's not documented (yet):
>>
>>         - no significant impact on TCP connection rate for
>>         legacy servers
>>
>>         - no significant impact on TCP connection rate for
>>         legacy clients
>>
>>         - impact dominated by processing the extended option space
>>         for extended clients
>>
>>         - impact dominated by processing the extended option space
>>         for extended servers
>>
>>         - compatible with typical TCP processing optimizations,
>>         notably SYN cookies
>>                 you did provide a potential way forward for these
>>
>>         - capable of successfully traversing typical NATs
>>
>> Your approach has the following properties:
>
> The 3 bullets below are not useful ways to describe performance impact.
> They selectively describe whichever gives the most pessimistic picture
> out of:
> a) either the instantaneous performance change at the moment of connection
> b) or the worst-case long-run performance impact
>
> They don't describe the average long-run performance impact, which is
> important for sizing machines.
>
> Worse, the instantaneous performance impact is only significant when a
> machine's SYN processing time is large relative to the e2e delay, which
> would be a highly unusual scenario on public networks (even in scenarios
> such as intra-data-centre, it's hard to reduce e2e delay to approach SYN
> processing time, but you could for intra-machine connections).

SYN processing is a known load - and potential overload - on machines. 
That's why it's reasonable to focus on it.

>>         - halves the server connection rate for updated servers
>>         from legacy clients when this option is in use
>
> Eh? The long-run server connection rate will be fractionally decreased
> due to updated clients using extra options (which is your third case
> below), but the instantaneous server connection rate seen by a legacy
> client is unchanged, because it only sends one SYN.

Agreed. I was doing too many of these at once...

>>         - lowers (to some extent, if not halves) the client
>>         connection rate of updated clients to all servers
>>         when this option is in use
>>
>>         - halves (roughly) the server rate for all servers
>>         when this option is in use
>
> Nope. All long-run server rates are reduced by 1/(1+e), where e is the
> fraction of connections using extra options.

the rate for connections when this option is in use is halved - I didn't 
see that the sentence above could be parsed the other way. the overall 
rate depends on the fraction of connections using the option.

>> It also:
>>
>>         - doubles the number of SYNs in the network
>
> Nope. The number of SYNs in the network is inflated by e where e is the
> fraction of connections using extra options.

You keep bringing this up - I don't like a solution that should be used 
only a little. If it works, we should assume it works everywhere all the 
time.

>>         - susceptible to lack of fate-sharing problems, e.g.,
>>         if the two SYNs experience different firewall configurations
>
> Nope. It's fairer to say it's potentially susceptible to second-order
> fate-sharing problems like your firewall example (the first-order fate
> sharing problems have been addressed).

I have no idea why you think firewalls are a second-order fate-sharing, 
or what that even means. Fate sharing means fate is shared. When fate is 
not shared by the SYNs, this method has problems. There's no mechanism 
to recover the missing D-SYN or C-SYN.

>>         - reduces the space available for fit_opt due to the need
>>         for the conn_id even in the fall-back D-SYN, which means
>>         less option space in the SYNs for fall-back connections
>
> Yup.
>
>
>>         the conn_id which may need to be very large because it
>>         needs to be unique per source port and source IP address
>>         because that information is lost during NAT translation
>
> Given many NATs will typically make the src IPs of both SYNs the same, I
> suggest a larger conn_id should be a fall-back option for the client,
> not a default.

Multiple NATs are quite common - carrier grade NATs in the net, and user 
NATs at the home WIFI point.

Any multiple-NAT, or any path through even a single carrier-grade NAT 
will end up potentially un-doing or over-doing source IP overlap, 
creating false positives and false negatives that break this mechanism.

That's really the crux - fate sharing and the need to bind the two SYNs 
- that make this untenable, IMO.

> Even if the src IPs of both SYNs are different once they reach the
> server, the high end bits will invariably be the same.

You can't know this - the two SYNs could have gone through different 
numbers of NATs or different NATs entirely, so the bits could change 
arbitrarily.

> So the max size
> of the contents of the DSO TCP option can be 6B, and the server can take
> the rest of conn_id from the higher bits of the src IP addr of each SYN.
> This is a variant of the idea in <draft-wing-nat-reveal-option>.

FWIW, you're now in the land where we would have to start explaining the 
privacy and tracking implications of those bits. I'd really like to 
avoid that.

> In fact, the server doesn't even need a small conn_id for clients that
> know they are not behind a NAT and that want more option space in the
> D-SYN - then the server could use the src port & src IP for the conn_id.
>
> To summarise, these options could be distinguished by the length field
> of the dual-syn option.
> Length = 2B                             => conn_id = src netaddr + src port
> Length = 6B = 2B +4B conn_id_short      => conn_id = src netaddr +
> conn_id_short
> Length = 8B = 2B +6B conn_id_long       => conn_id = hsb(src netaddr) +
> conn_id_long
>
> Given there have been numerous other attempts to reveal a connection ID
> that is preserved through middleboxes [RFC6967], rather than defining a
> dual-syn option that carries a conn_id, we might want to design a TCP
> connection ID option with a flag to say whether it is also part of a
> dual SYN pair or not.
>
> Where
> * hsb(src netaddr) is the netaddr with the lowest 16 bits truncated
> * src netaddr is the network address (IPv4, IPv6, or any other network
> protocol)
>
> To reduce latency, a host could use the default short_conn_id for all
> connections at first, then:
> - if it finds that DSO persistently doesn't work it falls back to the
> long_conn_id for all connections
> - it occasionally tests the short and zero options to see if it can use
> shorter DSO options.
>
>
>>         - requires the ISNs to be related (see RFC6528 - if there's
>>         a rule to generate it, there will be code to validate that
>>         rule, and eventually a BCP to encourage that validation -
>>         typically from the same RFC author)
>
> Eh? The ISNs can and should be independent. To be robust against
> middleboxes that rewrite sequence numbers, we must not required ISNs to
> be related.
>
>
>> I agree that you have proposed potentially viable ways to deal with
>> the SYN cookie, and that RST state is not an issue.
>
> A feature that I think it's fair to add:
>          - Good chance of passing through app-layer middleboxes that
> forward
>          unrecognised TCP options unchanged, but not those that discard
> them.
>
>
>> However, there are too many problems with this, IMO, to call it viable.
>
> Once your over-pessimistic analyses of the performance impact are
> corrected, and my ideas to reduce the size of the conn_id are taken into
> account, it's a different story.
>
> But it's up to the WG to decide whether this is worth taking further.
> Not just you or I.

Agreed.

>> Here's another trick that might clean up the above a little:
>
> <snip - I'll respond separately to your later updates on this ASO idea,
> with ACK=0>
>
> Cheers
>
>
> Bob
>
>
> ________________________________________________________________
> Bob Briscoe,                                                  BT