Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05

Yuchung Cheng <ycheng@google.com> Sun, 24 November 2013 00:42 UTC

MIME-Version: 1.0
In-Reply-To: <528B9E16.70602@erg.abdn.ac.uk>
References: <655C07320163294895BBADA28372AF5D0EAFCE@FR712WXCHMBA15.zeu.alcatel-lucent.com> <526AF5B1.9070906@isi.edu> <655C07320163294895BBADA28372AF5D0EBFC6@FR712WXCHMBA15.zeu.alcatel-lucent.com> <526E8E6B.1060806@isi.edu> <0B96A5D7-0DAE-46FF-8D9A-311307BF7493@netapp.com> <3578D243-D0F6-41E8-B515-380C35BB27B9@isi.edu> <9267762C-FD7C-4BC6-85FB-730E774F7EEB@oracle.com> <527016BD.4090609@isi.edu> <88430820-7495-491A-AE7A-D3850973AA35@oracle.com> <527036D0.4030508@isi.edu> <2C14475E-675C-40BB-9DD6-8C2871161903@oracle.com> <CAK6E8=ccEmc-ghgbNwmxB6DwWMO+c4JmBx=-RnRMv1nZO9COyQ@mail.gmail.com> <2B85C60B-2301-4B1D-8176-044DAEA817A6@erg.abdn.ac.uk> <CAK6E8=dLnHYL2Gc5DydZuAhMvyGSqavSLZLwoF9-oTqU+P6evg@mail.gmail.com> <528B9E16.70602@erg.abdn.ac.uk>
From: Yuchung Cheng <ycheng@google.com>
Date: Sat, 23 Nov 2013 16:41:34 -0800
Message-ID: <CAK6E8=ea9c9RbPwzED=ts6xkAONhbMgzdJR+H1-EtrHGepCyBA@mail.gmail.com>
To: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Content-Type: text/plain; charset="ISO-8859-1"
Cc: "tcpm@ietf.org" <tcpm@ietf.org>, "draft-ietf-tcpm-fastopen@tools.ietf.org" <draft-ietf-tcpm-fastopen@tools.ietf.org>
Subject: Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05
Precedence: list

On Tue, Nov 19, 2013 at 9:21 AM, Gorry Fairhurst <gorry@erg.abdn.ac.uk> wrote:
>
> See response in-line...
>
> Many of these are small items,
>
> Somewhere down there is the significant point that I do not yet understand,
> i.e. what are the potential CC issues that need to be evaluated - especially
> how TFO  respond to severe path overload? (see below)
>
> Gorry
>
>
> On 09/11/2013 18:06, Yuchung Cheng wrote:
>>
>> Hi Gorry,
>>
>> Sorry for the last response. Too many good sessions and dinner parties
>> during this IETF for me to respond promptly.
>>
>> On Sat, Nov 2, 2013 at 4:09 PM, Gorry <gorry@erg.abdn.ac.uk> wrote:
>>>
>>>
>>> I have reviewed the current version of TFO and have the comments below.
>>> This is rather long ...
>>>
>>> Nits in Section 1 - I think the document here talks of servers and
>>> clients from a TCP endpoint perspective, this is normal for TCP, but the
>>> document also raises implications for apps designers who may have a very
>>> different view on what is intended by the word "server" and "client". To
>>> avoid doubt maybe you should explain this in the intro?
>>
>> They are already explained in the "Terminology" section.
>>
> OK, but it's terse!
>
>>>
>>> Page 3, para 2 is not clear. I think it is just the detail of the
>>> wording, but it would be good to see clear text.
>>>
> In RFC 6936 we wrote (albeit slightly different problem, with a different
> protocol):
>         "IPv6 nodes MUST provide a way for the application/protocol to
>         indicate the set of ports that will be enabled to send datagrams
>         with a zero UDP checksum.  This may be implemented by enabling a
>         transport mode using a socket API call when the socket is
>         established, or by a similar mechanism.  It may also be
>         implemented by enabling the method for a pre-assigned static
>         port"
>
>
>>> Last para section 2: to me, the requirements are not clear.
>>>
>>> I think this piece of text means: TCP stacks should NOT enable TCP-TFO by
>>> default. The stack should provide a global flag to enable this. An
>>> application wishing to use this must enable each TCP socket on a per
>>> service/per port basis.
>>>
>>> I think the other comments/questions maybe more significant.
>>>
>>> section 2 also suggests the main change is to allow data in SYN, I have
>>> noted that some people think this is the only change --- it is not. The
>>> bigger implication is that the document proposes to allow new data sent
>>> during the 3WHS. This change needs to be more clearly noted,  sending an IW
>>> of data before there is an ACK on the return path is a significant change to
>>> TCP semantics and it's behaviour under heavy load.
>>
>>
>> will revise sec2 and discuss w/ you before we publish the revision.
>>
> Let's do that.
>
>>
>>>
>>> Page 5 fast open, bullet 3. : This says the server can send data before
>>> the 3WHS completes. I am curious what this actually means, is this implied
>>> that the use is extricated to standards-track mechanisms, I.e. I am looking
>>> here for some explicit understanding of whether this implies IW3. Or is the
>>> experiment linked to both a proposal for iW10 and this new experiment?
>>
>> will add "initial congestion window of " before the "... the
>> congestion control in .." to make it more clear.
>> or
>>
>>>
>>> Last para 4.1.3 the text says /should include remote port numbers too/
>>> Can we be clear which port numbers are used. Is this the received server
>>> port number by the client? Actually, do not understand what is intended by
>>> the paragraph, can you provide an example to be sure what is meant?
>>
>> a "remote" port number of the client means the "server port". but will
>> replace remote/server to avoid confusion.
>>
> That works.
>
>
>> The paragraph is added because of your suggestion to add negative caching.
>>
> OK
>
>
>>>
>>> Section 4.3.2. I hoped this would say something explicit about not
>>> sending SYN segments that are larger than the default MTU. This seems like a
>>> bad thing to do.
>>
>> It's a bad thing to send any segment larger than the MTU. Why is that
>> specific to Fast Open?
>>
> ... Let me check:
> 1. TFO could allow PMTU to be cached across sessions.
> 2. Multi-homing/ Multipath routing could result in a different path for a
> new connection.
> 3. If TFO used a cahched PMTU with the SYN (or before the SYN-ACK), then
> these could be black-holded.
>
> - While local MTU is known, remote (or path) MTU can only be validated once
> there is a path... To me, allowing TFO to use larger MTU seems like an
> unnecessary extra complication to have to deal with.
I am still not getting your concern. Path MTU is cached on a per
"path" basis. If you use a different/new path you start with some
default MTU. btw, there is no Section 4.3.2 in the latest draft.

We only send SYN-data when we've done a previous handshake w/ the
server, and have both the cookie and path information. The path (mtu)
can change anyway, just like a normal data session, and the risk
holds. If SYN-data is lost, we will retransmit a regular SYN (and
cache the new MTU).

>
>
>>>
>>> Why are the examples only for web? This is a TCP spec.
>>
>> Sure there are a lot more applications can benefit from that, but Web
>> is what motivates our work and the application is complex enough to
>> demonstrate many aspects of Fast Open. If you have another good
>> examples, you are welcome to suggest one.
>>
> .. I could OR it mya be wise to just say something similar to what you have
> written above at the end of section 1?
>
>
>>>
>>> There are two sections 6.3.2.
>>>
>>> Second Section 6.3.2 - I think this applies to SSL/TLS in general,
>>> doesn't it? ... I think the section should address the general case, and
>>> then provide an example for https, rather than assuming a web-centric use.
>>
>>
>> Yes it is generic, but anyone who knows SSL/TLS good enough knows that.
>>
> So could you name the section TLS over TCP ... and then give HTTP over TLS
> as an example?
Sure and that's a good idea.

>
>
>>>
>>> Section 7.1  first para last line.please check text, this pathology is
>>> not necessarily malicious!!!
>>
>> sure we can s/malicious/pathological. does this deserves three !!!?
>>
> Fine (indeed, we want those nice people to fix all their middleboxes).
>
>
>>> The issue seems to be mainly related to NAPT where ports are used to
>>> discriminate senders?
>>> Also the shared NAPT can result in a range of RTT and PMTU to the same
>>> IP, but this is already possible with ECMP routing.
>>>
> So In this case there is need to understand the implications of using a
> cached RTT and cached PMTU value. What concerns me is that we need
> experimental data to show that this is safe - not just to realise that this
> may have no benefit for TFO.
>
>
>>> The lack of issues listed in section 7 raises serious concerns to me. I
>>> do not list that is provided currently as the only, or even most significant
>>> reasons why this is deemed to be experimental work. Indeed, the lack of
>>> identification of potential CC issues is why this document is in my opinion
>>> NOT ready to complete a WGLC - if the issues are not listed how can we
>>> evaluate if the method is safe enough to experiment or can later be
>>> evaluated properly.
>>>
>>> Here are some additional issues I would like considered by the group and
>>> if the group is happy these may be safe enough to recommend testing in the
>>> general internet, then I think we should document that these are things to
>>> be confirmed.
>>>
>>> 1) I raised on the list that a negative impact of using the SYN as a
>>> probe to check for new option support. The examples I gave were ECN
>>> interactions and IPv6 probing. ECN if used will probe for ECN support in the
>>> SYN segment. If we include data on the SYN and then there is a lack of
>>> response, it is assumed by the sender that both TFO and ECN are not
>>> supported. This is an implication of the model we use to negotiate for
>>> features that may be black-holed. It is not an issue with the method, but I
>>> think we need tone clear this is something that may happen. A similar case
>>> exists if a client attempts TFO to a server using IPv6 ... Is it assumed
>>> that the path does not support IPv6 or TFO (or actually both). It does not
>>> happen when multiple options are negotiated.
>>
>> if a SYN-data is dropped, TFO will retransmit a regular SYN (with ECN if
>> used).
>>
> May be good, I agree - but I do not see this in the text (RFC 3168 would
> allow this, it kind of suggests clearing ECT on retx).
>
>>>
>>> 2) the draft proposes caching RTT per path and using this for SYN
>>> timeout. This updates a standards track RFC, and will have implications in
>>> some network paths that may seriously degrade performance, in other cases it
>>> may improve performance. Examples of negative performance are cases where an
>>> initial packet can take much longer to process than a subsequent packet.
>>> This can and does occur in bandwidth-on-demand L2 networks, where the first
>>> packet causes path setup. It also occurs to some extent with policing and
>>> routing devices that build cached state from the first packet in a flow. I
>>> think this is one of the reasons why the RFC-series has to date not
>>> described a shortened RTT method for the SYN.
>>
>>
>>>
>>> Another issue is that previous RTT (or PMTU) is not necessarily a good
>>> indication of future value in a different 5tuple, since ECMP etc can lead to
>>> very different path characteristics for a session with a different port
>>> number. The current text therefore proposes an experiment that is beyond
>>> what was previously considered safe. It does not specify a way to avoid the
>>> server making the same mistake next connection.
>>
>>
>> We'll just take RTT caching out of the text if that makes people happy.
>>
> Maybe, although I'm not pushing for that, but it may be OK - if it is an
> orthogonal issue that TFO does not rely upon. If you want to mention caching
> these, I think the doc needs to explain more.
>
>
>>>
>>> 3) seeding the SYN RTT from a lower value makes the sender more
>>> aggressive in heavily congested networks. The sender is also made more
>>> aggressive by sending IW data packets before there is any indication the
>>> path can contain even a single data segment. This is significant change to
>>> standard behaviour. If the proposal is to use IW 3 (as standard) it still
>>> needs to be called out. If the proposal is to use a larger experimental
>>> number then I have concerns here that this is a significant change that
>>> needs an automated recovery method to prevent significant collateral damage
>>> on capacity-limited paths - there needs to be a way to stop a server doing
>>> this each time resulting in recurring loss!
>>
>>
>> when the IW of packets are not all acked, loss recovery is triggered
>> and window is reduced. this is part of standard CC.
>>
>> also when you use "significant" four times, please back up with a good
>> theory or data or both. and abuse that word does not it make more
>> significant.
>>
> So let's see if we can see agree on what happens:
>
> - If the path is very lossey - severe congestion:
> TCP standard sends one SYN segment with some probability of loss, and if it
> sees loss backs off and retransmits the SYN. As (over)load increases, new
> sessions add 40B and the new sessions will often defer start-up, controlling
> their rate.
>
> - If TFO path is very lossey - severe congestion: IW full-sized segments are
> sent. Each new session does this, adding to the load.
> Each new session adds  MSS*IW = 6000B (IW=3) or 15000B (IW=10).
>
> I concede this applies to severe congestion. Under this case, it seems 150x
> more traffic to me and this before before CC is engaged. Or is this wrong?
I can make similar case for any recent proposal that will blow up the
network, e.g., rto-restart, newcwv. The question is how practical and
common those cases are. if the link is thin, not doing Fast Open with
same IW will experience similar heavy losses. Let's face it: if the
capacity is not enough to handle the demand, there is little can be
done.

The concern has been raised by Michael Scharf's, and the step 6 in
Section 4.2.2 does mention that. I did ask the list for any better
solution for that but I've never heard back any suggestion.
>
>
>>>
>>> 4) Are there any changes to PMTUD behaviour that need experience? PMTU
>>> was naturally cached and I am not sure if this changes the behaviour or not.
>>> It does of course allow a packet with a larger MTU to be sent before the
>>> path has been initially validated via the 3WHS (I.e. After a path change
>>> while idle, this can generate TCP segments in a SYN that are larger than the
>>> receiver advertised MSS <I presume the recipient will reset the connection?>
>>> - I suspect this is not crucial, but I would also assume that the SYN itself
>>> should not use an increased PMTU value to carry the data. Is this correct?
>>
>> in our implementation, we use what ever MTU (possibly discovered in a
>> prior PMTU) cached to send SYN-data.
>>
> One option could be that the document could identify this as a topic to be
> explored (like RTT-caching, this does not seem to be a core technique
> required for FastOpen, but if you mention cachingthese, I think the WG
> should highlight that this
> can also say there are potential concerns and a need for experimentation?
Not sure RTT or PMTU caching should be part of TFO experimentation b/c
the potential concerns seem orthogonal to the core protocol of Fast
Open.

>
>
>>>
>>> Are there other case where this change will impact the network for
>>> specific path characteristics?
>>>
>>> Have others thought through these issues and feel they are ok? Or at
>>> least ok for experimental deployment?
>>>
>>> Gorry
>>>
> P.S. A tiny NiT on 7.2:
>
> "Careful experimentation is necessary to evaluate if cookie-less TFO
>    is practical."
> - I think the experiment that is needed by the IETF is "to evaluate if
> cookie-less TFO could be safe for deployment in the general Internet."
Will make that change.
>

[tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Scharf, Michael (Michael)
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Scharf, Michael (Michael)
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Scharf, Michael (Michael)
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Scheffenegger, Richard
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Scharf, Michael (Michael)
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Smith, Donald
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Wesley Eddy
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Kevin Lahey
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Kevin Lahey
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Kevin Lahey
[tcpm] WGLC for draft-ietf-tcpm-fastopen-05 - Fol… Scharf, Michael (Michael)
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Yuchung Cheng
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Jakob Heitz
[tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05 Gorry
Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05 Yuchung Cheng
Re: [tcpm] WGLC for draft-ietf-tcpm-fastopen-05 Yuchung Cheng
Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05 Gorry Fairhurst
Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05 Yuchung Cheng
Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05 Scharf, Michael (Michael)
Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05 Yuchung Cheng
Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05 gorry
Re: [tcpm] ] WGLC for draft-ietf-tcpm-fastopen-05 Yuchung Cheng
[tcpm] Summary of mt comments/questions WGLC for … Gorry Fairhurst
Re: [tcpm] Summary of mt comments/questions WGLC … Yuchung Cheng
Re: [tcpm] Summary of mt comments/questions WGLC … Scharf, Michael (Michael)
Re: [tcpm] Summary of mt comments/questions WGLC … Yuchung Cheng
Re: [tcpm] Summary of mt comments/questions WGLC … Scharf, Michael (Michael)