Re: [tcpm] SYN extension using ACK=0 data packets

Joe Touch <touch@isi.edu> Sat, 31 May 2014 22:01 UTC

Return-Path: <touch@isi.edu>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1F2FC1A0108 for <tcpm@ietfa.amsl.com>; Sat, 31 May 2014 15:01:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.551
X-Spam-Level:
X-Spam-Status: No, score=-2.551 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xdx4BzJu3qJz for <tcpm@ietfa.amsl.com>; Sat, 31 May 2014 15:01:46 -0700 (PDT)
Received: from darkstar.isi.edu (darkstar.isi.edu [128.9.128.127]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 913561A00B2 for <tcpm@ietf.org>; Sat, 31 May 2014 15:01:46 -0700 (PDT)
Received: from [192.168.1.91] (pool-71-105-87-112.lsanca.dsl-w.verizon.net [71.105.87.112]) (authenticated bits=0) by darkstar.isi.edu (8.13.8/8.13.8) with ESMTP id s4VM1PqN018870 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Sat, 31 May 2014 15:01:30 -0700 (PDT)
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\))
Content-Type: text/plain; charset="us-ascii"
From: Joe Touch <touch@isi.edu>
In-Reply-To: <201405312113.s4VLDEbG004301@bagheera.jungle.bt.co.uk>
Date: Sat, 31 May 2014 15:01:25 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <6C4E6E63-F3F4-4364-9459-794957DC8799@isi.edu>
References: <20140425221257.12559.43206.idtracker@ietfa.amsl.com> <2586_1398464386_535ADF82_2586_915_1_535ADF56.9050106@isi.edu> <CF8D8E25-E435-4199-8FD6-3F7066447292@iki.fi> <5363AF84.8090701@mti-systems.com> <5363B397.8090009@isi.edu> <CAO249yeyr5q21-=e6p5azwULOh1_jUsniZ6YPcDYd69av8MMYw@mail.gmail.com> <DCC98F94-EA74-4AAA-94AE-E399A405AF13@isi.edu> <655C07320163294895BBADA28372AF5D2CFE36@FR712WXCHMBA15.zeu.alcatel-lucent.com> <20140503122950.GM44329@verdi> <655C07320163294895BBADA28372AF5D2D009E@FR712WXCHMBA15.zeu.alcatel-lucent.com> <201405221710.s4MHAY4S002037@bagheera.jungle.bt.co.uk> <537E3ACD.5000308@isi.edu> <1AD79820-22C1-4500-84D1-1383F264D68C@weston.borman.com> <201405231213.s4NCDa5P005525@bagheera.jungle.bt.co.uk> <537F8202.4020907@isi.edu> <201405281715.s4SHFMm0014634@bagheera.jungle.bt.co.uk> <538623B9.2060209@isi.edu> <201405301642.s4UGgcvY030471@bagheera.jungle.bt.co.uk> <5388EB6F.4010405@isi.edu> <5389263C.8010202@isi.edu> <201405312113.s4VLDEbG004301@baghee! ra.jungle.bt.co.uk>
To: Bob Briscoe <bob.briscoe@bt.com>
X-Mailer: Apple Mail (2.1878.2)
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: touch@isi.edu
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/k-PHZSMGo-Y0tRNiAam5sLqxY50
Cc: "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] SYN extension using ACK=0 data packets
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 31 May 2014 22:01:49 -0000

Hi, Bob,

On May 31, 2014, at 2:13 PM, Bob Briscoe <bob.briscoe@bt.com> wrote:

> Joe,
> 
> Hope it's ok to have changed the subject line, with tcpm still in cc.
> 
> I'm afraid I'm not as excited about ACK=0 as you are. It's certainly cleaner than anything we've come up with so far.
> 
> However, I see the goal as finding a way to send a supplement to a SYN that is invalid in some way to legacy TCP servers, but likely to appear valid to many/most middleboxes. I suspect many middleboxes and firewalls will discard the ACK=0 segment.
> 
> When assessing the DSO scheme, you were adamant that anything invalid to an endpoint would always eventually become invalid to middleboxes. In the excitement of finding a nice clean way of doing ASO, that critique seems to have suddenly become unimportant.

I worry about checksums in particular - which get recalculated, so using a different checksum means we're sure to fail through a middlebox that doesn't recalculate it properly (if it's stored in a new place), or that won't validate it (if it's in the current checksum field).

I can't say that middleboxes don't check for ACK=0 packets, but it seems a lot of work to do unless they're perceived to be an issue. I doubt they look at the ACK at all unless the SYN or FIN is set.

> Whatever, there may be a place for both solutions:
> a) ASO (ACK=0) for paths where a NAT might be the worst middlebox on the path
> b) DSO for paths with stateful firewalls, TCP normalisers etc.

DSO is better through stateful firewalls only when the FBP packet precedes the SYN. Keep in mind that the FBP can also be sent multiple times after the SYN anyway, with small (1ms) delays; only the first will end up being used, and loss of all of them would still be recoverable.

> The cellular world is certainly more like (b).

Cellular uses GGN (carrier-grade NATs), which could make DSO SYN pairing more difficult.

> Whether there are many parts of the public Internet left like (a) is unclear. (b) seems to have become the norm for most paths.

If we're counting numbers, there are a *lot* more hosts behind NATs. The ones behind CGNs probably swamp all others combined.

> More inline...
> 
> At 01:45 31/05/2014, Joe Touch wrote:
>> Hi, all,
>> 
>> Some additional information below about "another trick" I proposed, inspired by Bob's dual-SYN mechanism, courtesy a few long discussions today with Ted Faber.
>> 
>> I'll be glad to offer this as a potential solution in the doc.
>> 
>> Joe
>> 
>> On 5/30/2014 1:34 PM, Joe Touch wrote:
>> ...
>>> Here's another trick that might clean up the above a little:
>> 
>> FWIW, I had explained it below as being based on sending out-of-window data; Ted pointed out that I had been assuming that the FBP ACK bit wasn't set - which means the sequence number might be more usefully matched to that of the SYN.
>> 
>> See below...
>> 
>>>     aso - after SYN option
>> 
>>                length = 2 (just a flag)
>>                length = 3 (or 4) indicating the length of the
>>                        FBP expected
>> 
>>>     FBP - front bumper packet (best I could do on names today)
>>>         a packet
>>                ISN = same as the associated SYN
>>                ACK = cleared (i.e., a data packet NOT part of
>>                synchronized connection - see why that's useful below)
> 
> For the avoidance of doubt,
> I assume these two packets have the same src port.

Same IP addresses, same ports, *because* they're associated with the same connection.

> and I assume the FBP has SYN=0.

Yes. All control bits are 0, including ACK.

> Coincidentally, a couple of weeks ago, when I was looking for places to find more bits in the TCP header, I noted that during an established connection ACK=0 is never used, so I looked into setting ACK=0 and overloading the ack_number field.
> 
> 
>>>     new endpoint sends:
>>> 
>>>         SYN + aso + fix_opt
>>>         FBP + aso + extra_opt
>> 
>>                extra_opt in the data field
>>                total length < min MTU (576 for IPv4, 1280 for IPv6)
>>                again ACK bit is zero
>> 
>>>             legacy endpoint sends back one connections:
>>>                 SYN-ACK + fix_opt
>>> 
>>>                 if seg arrives before SYN,
>> 
>>                        it is silently dropped, because
>>                        the ACK bit is clear (this is
>>                        explicit in RFC793)
>> 
>>>                 if seg arrives after SYN,
>> 
>>                        it is silently dropped, because
>>                        the ACK bit is clear (this is also
>>                        explicit in RFC793)
> 
> Two important questions:
> 1) are most/all legacy TCP implementations faithful to the spec?

That's something I'll take a look at. 

> 2) if a TCP endpoint is meant to drop SYN=0, ACK=0, then many middleboxes surely will.

Middleboxes drop things whose checksum fails, or when a packet comes in for a connection that hasn't been established. They don't go out of their way to do a lot of other work AFAICT.

> Both these will need experimental testing.

Certainly.

> Nit: I wouldn't exactly say RFC793 is clear. You have to follow 2 pages of quite imprecise descriptive logic, starting from p66. But, yes it is eventually fairly unambiguous. Paraphrasing, it says:
> 
> SEGMENT ARRIVES
>        ...
>        if state = SYN-SENT
>                1. Check ACK bit
>                        if ACK = 1
>                                do stuff
>                2. Check RST bit
>                        if RST = 1
>                                if ACK OK enter CLOSED state
>                                elif no ACK, drop
>                3. Check security & precedence
>                        Do stuff
>                4. Check SYN bit
>                        Should only get here if ACK OK, or no ACK and no RST
>                        if SYN = 1
>                                do stuff
>                5. if SYN = 0 and RST = 0
>                        drop

In CLOSED, it says to send a RST (like any other packet for a non-connection).

In LISTEN, it says:

        Any other control or text-bearing segment (not containing SYN)
        must have an ACK and thus would be discarded by the ACK
        processing.  An incoming RST segment could not be valid, since
        it could not have been sent in response to anything sent by this
        incarnation of the connection.  So you are unlikely to get here,
(where 'here' is a segment with ACK=0 and no other control bits set)
        but if you do, drop the segment, and return.

In SYN-SENT, it says:

      fifth, if neither of the SYN or RST bits is set then drop the
      segment and return.

In all other states, it says:

      if the ACK bit is off drop the segment and return

If that's not direct and explicit, I don't know what is.

>>>             ----
>>> 
>>>             new endpoint sends back one connection:
>>> 
>>>                 SYN-ACK + options + ....
>>> 
>>>             a) if FBP arrives before SYN,
>> 
>>                        it can be silently dropped, but
>>                        it's probably useful for new endpoints
>>                        to hold onto these (without action)
>>                        for a while; they can be silently
>>                        discarded if there are too many
>>                        (which will just result in a
>>                        retransmission and an extra RTT)
>> 
>>>             b) if FPB arrives with the SYN, they can be
>>>             processed together
> 
> By 'arrives with' I assume you mean 'within some time duration'.

Yes - I was thinking of the case where the FBP is either cached (as per (a)), or is in the segment queue at the time the SYN is being processed.

>>>                 the SYN-ACK can include responses to
>>>                 the extra_opts in addition to the
>>>                 fix_opts, and says "FBP received"
>>> 
>>> 
>>>             c) if FPB arrives after the SYN:
>>> 
>>>                 SYN-ACK proceeds, but sends
>>>                 back "wait for option response".
>>> 
>>>                 at this point, the source re-sends FBP
>>>                 until an ACK is received that indicates
>>>                 "FBP received", or times-out as with
>>>                 any connection that doesn't finish TWHS
> 
> There's a dilemma whether the server:
> * prioritises latency and goes ahead without the extra options,
> * or prioritise completeness and blocks until the extra options arrive.

There's no dilemma - if the SYN says "FBP is coming", then it means that an upgraded server MUST wait for the FBP.

I.e., the FBP is paired with the SYN-ACK, and the handshake doesn't complete at the client - the client should NOT send the final ACK of the TWHS - until the SYN-ACK is received *and* confirmation that the FBP has been received. That can happen together inside the SYN-ACK if known, or the SYN-ACK would say "FBP missing" and the client would retransmit the FBP *until* the server confirms it (it would send an option confirmation, not an ACK).

> I would take the view that if the FBP is late there's a chance it got snarled up in a middlebox, and it will never get through no matter how many times it's retransmitted.

The client can timeout, and can retry without the extended SYN space in that case.

> Rather than make the choice between latency and completeness at protocol design time, we need a flag in either scheme (on the DSO or ASO option) for the client to say which it wants. Then:
> * the client can choose latency if it knows the extra_opts are not critical.
> * otherwise it can choose completeness, and if it doesn't work after a retransmit or two, the client won't block for ever; it can work out the compromise set of options that fit in a single SYN but still 'work'.

A flag that says "do what a legacy server would do if you have to wait X ms" seems fine to me too.

>>>             I'm still thinking as to whether the ACK number
>>>             might indicate whether FBP has been received,
>> 
>> There are a few ways to handle this, but IMO the best is to have:
>> 
>>                SYN-ACK aso with length=2 means "waiting for FBP"
>> 
>>                SYN-ACK aso with length=3 (or 4) could mean
>>                "got the FBP", or might even indicate how
>>                many bytes the FBP extension contained
>> 
>> Note - the SYN-ACK and all subsequent segments can assume that aso == edo, i.e., they can use the same EDO as spec'd in version 01 of this draft.
>> 
>>> This is cleaner as follows:
>>> 
>>>     - no need for conn_id coordination
>>> 
>>>     - no need for conn_id to consume option space for fall-back
>>> 
>>>     - avoids double-load for legacy servers
> 
> With a typical legacy server that reflects SYN cookies, less activity is doubled:
> * my scheme: sends out two SYN cookies
> * your scheme: sends out one SYN cookie and drops one packet, probably after a long chain of logic, because the FBP is an unusual packet.

The activity involved in parsing a packet is small compared to that of setting up a SYN cookie or creating TCB state (where SYN cookies aren't used).

>>>     - no problem with fate-sharing
> 
> To be as consistently pessimistic as you were with my scheme, I think you mean that you have created machinery to synthesise fate sharing between a connection and an invalid segment, and it may not work in cases that you may not have thought of.

Your scheme uses two different SYNs on different ports - that's a recipe for fates NOT being shared.

Mine uses two segments on the same addresses with the same ports - if they're not fate-shared, then neither will the rest of the TCP connection be, and TCP won't work (if that fate matters, e.g., through different NATs).

>>>     - traverses a NAT just fine
> 
> As above, surely TCP normalising middleboxes and incoming stateful firewalls at the server end are likely to discard the FBP.

I don't think so, but I agree we'll have to try. This is a completely different situation, IMO, than expecting boxes that *already* are known to validate TCP checksums to do otherwise. In this case, we don't *know* the solution won't work from the start.

>>> Upgraded servers still need to wait for the 'seg', but they could get
>>> that retransmitted if necessary.
> 
> See discussion of latency vs completeness dilemma above.

Latency vs. completeness is true for all variants that use multiple packets - that's one reason I don't like that aspect of any of these approaches.

>> And there's a small amount of additional processing to discard the FBP at legacy endpoints, but silently discarding one packet per connection doesn't seem like a huge effort to me.
> 
> Despite this being an exceptional drop (see earlier), this is still a reasonably fair statement.

And I'm glad to concede that we don't know the cost on the code cache, etc. of exercising a dusty, dark path of processing. 

>> Note - there's no special RST processing, based on Ted's observation about "data without ACK" being considered "data sent in the unsynchronzed state" -- something legacy TCP explicitly silently discards.
> 
> ...at least in theory.

And in requirements. Yes, I'll see if I can find out what BSD and Linux do - though I have more hope of BSD being correct than Linux.

Joe