Re: [tcpm] New Version Notification for draft-touch-tcpm-tcp-edo-01.txt

Joe Touch <touch@isi.edu> Wed, 28 May 2014 17:59 UTC

Return-Path: <touch@isi.edu>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0A1A01A0B76 for <tcpm@ietfa.amsl.com>; Wed, 28 May 2014 10:59:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.651
X-Spam-Level:
X-Spam-Status: No, score=-3.651 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_18=0.6, J_CHICKENPOX_41=0.6, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dx_6k9NEkAuh for <tcpm@ietfa.amsl.com>; Wed, 28 May 2014 10:59:45 -0700 (PDT)
Received: from vapor.isi.edu (vapor.isi.edu [128.9.64.64]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A74361A01AC for <tcpm@ietf.org>; Wed, 28 May 2014 10:59:45 -0700 (PDT)
Received: from [128.9.160.166] (abc.isi.edu [128.9.160.166]) (authenticated bits=0) by vapor.isi.edu (8.13.8/8.13.8) with ESMTP id s4SHwHak018383 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Wed, 28 May 2014 10:58:17 -0700 (PDT)
Message-ID: <538623B9.2060209@isi.edu>
Date: Wed, 28 May 2014 10:58:17 -0700
From: Joe Touch <touch@isi.edu>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: Bob Briscoe <bob.briscoe@bt.com>
References: <20140425221257.12559.43206.idtracker@ietfa.amsl.com> <2586_1398464386_535ADF82_2586_915_1_535ADF56.9050106@isi.edu> <CF8D8E25-E435-4199-8FD6-3F7066447292@iki.fi> <5363AF84.8090701@mti-systems.com> <5363B397.8090009@isi.edu> <CAO249yeyr5q21-=e6p5azwULOh1_jUsniZ6YPcDYd69av8MMYw@mail.gmail.com> <DCC98F94-EA74-4AAA-94AE-E399A405AF13@isi.edu> <655C07320163294895BBADA28372AF5D2CFE36@FR712WXCHMBA15.zeu.alcatel-lucent.com> <20140503122950.GM44329@verdi> <655C07320163294895BBADA28372AF5D2D009E@FR712WXCHMBA15.zeu.alcatel-lucent.com> <201405221710.s4MHAY4S002037@bagheera.jungle.bt.co.uk> <537E3ACD.5000308@isi.edu> <1AD79820-22C1-4500-84D1-1383F264D68C@weston.borman.com> <201405231213.s4NCDa5P005525@bagheera.jungle.bt.co.uk> <537F8202.4020907@isi.edu> <201405281715.s4SHFMm0014634@bagheera.jungle.bt.co.uk>
In-Reply-To: <201405281715.s4SHFMm0014634@bagheera.jungle.bt.co.uk>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: touch@isi.edu
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/sWS7n3o-WaHqR46QcCi7vDi-JmY
Cc: David Borman <dab@weston.borman.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] New Version Notification for draft-touch-tcpm-tcp-edo-01.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 May 2014 17:59:48 -0000


On 5/28/2014 10:13 AM, Bob Briscoe wrote:
> Joe,
>
> At 18:14 23/05/2014, Joe Touch wrote:
>
>
>> On 5/23/2014 5:13 AM, Bob Briscoe wrote:
>>> David,
>>>
>>> 1) Parallel control channel
>>> ___________________________
>>> Client A sends two SYNs back-to-back to an existing well-known port
>>> (e.g. 80).
>>
>> You can send in whatever order you want; packets will be reordered,
>> lost, and sent along alternate paths.
>
> Of course.
>
> But as I suggested initially, we can standardise the protocol so that an
> upgraded host synthesises shared fate.

Fate is dependent on path.

> Eg. a 2-octet TCP option on the
> D-type SYN that says "Hold for x [ms] to wait for the supplementary
> C-type SYN, where x is a lot less than the usual time you hold SYN
> connection state (e.g. x=2). If SYN C hasn't arrived by then, continue
> without it, and discard it if it arrives."
 >
> And if the C-SYN arrives first, hold it for y [ms] waiting for the
> corresponding D-SYN. Where for example y=2 as well.

(C=legacy; D=extended - from your earlier email, for context)

Legacy clients now experience connection delays when contacting upgraded 
servers - why is that acceptable?

FWIW, many endpoints don't hold SYN state at all if they implement SYN 
cookies, so 2ms would end up overloading the server with state much 
larger than it currently keeps.

>> FWIW, do these use the same source port and ISN?
>>         - if they do, it'll reset the connection
>>         - if they don't, you're now limiting the number of
>>         concurrent connections to roughly half:
>>         http://www.isi.edu/touch/pubs/infocomm99/infocomm99-web/
>
> Unless we can think of a way for the C-SYN to be discarded by a legacy
> TCP stack (but not a middlebox), I'm assuming we need to get the C-SYN
> up to the legacy app-layer before discard... So, I was assuming they use
> different source ports (otherwise, as you say, a legacy TCP stack could
> reset the D SYN connection if it arrived second)

If they use different source ports, you could be limiting the connection 
rate. It's not enough to just "discard" late D-SYNs; they could be for 
new connections (how do you know the two are paired?), which means you 
need to RST them and keep state for 2MSL.

> The ISN on the C-SYN is redundant - we might be able to think of another
> use for that field.

But it's not redundant if you're talking to a legacy server, so you 
can't do anything with that field.

> On an upgraded host, max concurrent connections would only be slightly
> impacted, because of the v small timeout of C-SYNs.

See above; you need to know how to pair C-SYNs and D-SYNs. Consider that 
legacy hosts and upgraded hosts might be behind the same NAT, at which 
point how do you know the two SYNs are even from the same host?

> The semantics of the option-space-extension option would be to only hold
> the C-type SYN for a timeout and only the D-type SYN creates full
> connection state (I'm deferring SYN cookie behaviour to tomorrow for now
> - this is a straw man).

I don't think you can defer a widely-deployed feature like SYN cookies, 
and that seems to kill this.

> Admittedly, the number of concurrent connections a /legacy/ host can
> support could reduce by up to half (if all remote TCP clients are
> sending the new option). But they have the choice of upgrading to the
> new stack to stop wasting their memory.

Both legacy clients and legacy servers are impacted. We already know 
that servers are overloaded with connections, so halving that number 
seems like a non-starter. As does adding x ms delay for *each* legacy 
client.

>>> * SYN D, establishes a regular data connection, with sufficient TCP
>>> options to be workable but they still fit within the existing 40B option
>>> limit.
>>> * SYN C establishes another parallel connection to the same well-known
>>> port that looks like regular data from the outside (it could even be an
>>> extension to HTTP to ensure middleboxes will let it pass), but it talks
>>> a new app-layer 'TCP control' protocol inside.
>>
>> What happens when they arrive out of order? What happens when you get
>> D but not yet C? How long do you wait for C?
>
> See above.
> The timeouts might be standardised, or they might be declared in the
> option as a hint (for the host to ignore if it is under stress).
 >
  >> This is the problem with dual-stack approaches - new endpoints
>> penalize legacy endpoints if there's a stall, and undermine new
>> endpoints if they don't.
>
> Have I satisfied you that this can be solved sufficiently? Bearing in
> mind the gain, it's reasonable to have to accept some pain.

The pain is the problem. The smaller the timeout, the smaller the pain 
to legacy clients but the lower probability the extension will be useful 
to upgraded clients. The method cuts the rate of connections for legacy 
servers in half - which is *huge*. And it adds to the state overhead of 
the upgraded clients that need to do double the work for *every* connection.

Ultimately, it's roughly equivalent to "try both and shut down the one 
you don't want" -- the dual-stack approach we already know about.

>>> If there is no support for the new app-layer protocol on port 80 the
>>> control channel just shuts down with a suitable HTTP error, while SYN D
>>> has opened a data connection with sufficient TCP options to be workable.
>>> If the new app-layer TCP control protocol is supported on port 80, the
>>> parallel control channel (C) adds unlimited additional control
>>> flexibility to the data channel (D) hardly any added latency.
>>>
>>> Establishing a similar control channel in the opposite direction would
>>> be fairly trivial.
>>>
>>> There are few, if any, middlebox problems with the above approach.
>>> However, there are certainly other problems, but no more insurmountable
>>> than all the problems that have already been discussed with taking the
>>> 'easy' route of EDO:
>>> * A secure binding would have to be added to bind channel C to a secret
>>> known only to the originator of channel D, otherwise it would open up
>>> data channels to spoof control channel attacks. This binding could be
>>> built on a TCP-AO option in channel D.
>>
>> Yes, that's another problem.
>
> Fairly straightforward to solve using standard techniques.

I think that warrants more than a claim. You need to explain how you 
avoid false postives through NATs.

>>> * Channel C would need some way to refer to the segments of channel D
>>> that was robust against re-segmentation.
>>
>> Which means it won't work in the current Internet, because
>> resegmentation is also widespread (though evil, IMO).
>
> Well, resegmentation isn't usually a problem on a SYN anyway.
>
> We can't improve on the general pain caused by resegmentation. I was
> only talking about the delta pain that my strawman would suffer, where a
> resegmenting function sees data on my SYN and doesn't understand my
> strawman, so it won't patch up the damage in my strawman protocol that
> it would have patched up by altering sequence numbers in a regular
> single SYN with a payload.

You're right, in the sense that SYN data would typically be discarded 
anyway at the endpoint if it implemented SYN cookies.

> I think this is a corner case.

It is, but you raised it in a general sense; were you thinking of it in 
terms of just SYNs or other segments?

>>> * The main problem is that the two channels don't share fate;a control
>>> packet can be delayed relative to the point in the data stream at which
>>> it is attempting to exert control, possibly for a RTT if it is lost and
>>> has to be retransmitted. However, this is not insurmountable. The
>>> control protocol could include a mode to "synthesise shared fate", by
>>> making the data channel buffer data until an associated control segment
>>> had arrived. This would duplicate the latency impact of a loss or delay
>>> on either channel, but one can imagine mitigations that would consign
>>> this latency impact to corner cases.
>>> * It's a bit of a mess, but that comes with the territory when trying to
>>> fix legacy protocol problems.
>>> * The internal stack architecture seems to require a trombone back down
>>> into the kernel from user-space, but that is not insurmountable - a shim
>>> within the kernel on port 80 (for example) could redirect control
>>> channel data across to the "TCP control channel module" in the kernel,
>>> while passing non-control channel connections to user-space.
>>>
>>> 2) Build on LOIC
>>> ______________________
>>> Long option with invalid checksum <draft-yourtchenko-tcp-loic-00>
>>
>> Won't work through current NATs, which won't recalculate the checksum
>> properly.
>
> I'm building on the general idea of using something invalid, not using
> the checksum idea specifically.

Anything considered invalid by an endpoint might be checked by an 
overzealous midbox.

>>> At 18:53 22/05/2014, John Leslie wrote:
>>>>    That's too big of a change to ask folks to believe it safe.
>>>
>>> When I read an idea, I don't take it as set in stone and just find a
>>> hole and dismiss it. I see it as a potential stepping stone to a
>>> solution and think about how it could be done better. In fact, Andrew
>>> Yourtchenko said that was the intention of his write-up of LOIC.
>>>
>>> I believe that an approach worth further thought would be a mixture of
>>> the control channel idea and the invalid checksum idea. I'm thinking of:
>>> * a pure control SYN (C) sent first, then a base SYN (D) sent
>>> back-to-back, both to the same port.
>>
>> Again, please don't assume back-to-back means anything.
>
> See above.
> Actually, even tho order isn'[t guaranteed, it is important to get them
> in the right order to optimise performance (much as TCP doesn't assume
> perfect order, but it goes smoothest with reasonable ordering).

Right, but I want to emphasize that things emitted back-to-back could 
end up being delivered days later (e.g., when a link goes down, some 
routers just keep the queues there). The impact in those cases needs to 
be considered.

>>> * SYN C would contain something invalid to cause a legacy TCP stack or
>>> legacy app to discard it (and hopefully less probability that a
>>> middlebox would), e.g. a payload that is invalid for the application
>>> protocol on the port.
>>
>> But so will a NAT, etc.
>
> No. You can design a payload that has headers that a NAT will ignore but
> an end-host will have to process.

You can try, but if an end-host looks at it you can be sure that a 
midbox will do the same - if not today, then soon.

 > E.g. HTTP connection control headers
> newly defined for this protocol that would be ignored by NATs, without
> any other HTTP behaviour, so a legacy host does nothing at the app layer.

Those are rewritten by "helpful" midboxes all the time, FWIW.

>>> * there would be additional TCP options in the payload of SYN C to be
>>> added to the TCP options that arrived separately on the base SYN
>>> * The control SYN could be bound crytographically to the base SYN (as
>>> already described).
>>> * It could use the shim-like control stack arangement described earlier.
>>>
>>> By focusing solely on extending the SYN, this would avoid the ongoing
>>> shared fate problems that a separate control channel suffers throughout
>>> the connection. There would still be shared fate problems with 2 SYNs
>>> (e.g. the two SYNs get re-ordered), but the protocol would have to be
>>> designed to be robust to that (naively, SYN D could include a new TCP
>>> option that told a new stack to wait a few ticks for a SYN C, but that
>>> would be vulnerable to meddleboxes). Not insurmountable.
>>
>> AFAICT, it is.
>
> Still?

If it means halving the performance of legacy servers and delaying 
legacy clients, yes.

Joe