Re: [tcpm] Review of draft-ietf-tcpm-rtorestart-02.txt

Per Hurtig <per.hurtig@kau.se> Mon, 09 June 2014 06:39 UTC

Return-Path: <prvs=0237d2d5e1=per.hurtig@kau.se>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 469251A0353 for <tcpm@ietfa.amsl.com>; Sun, 8 Jun 2014 23:39:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.15
X-Spam-Level: *
X-Spam-Status: No, score=1.15 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HELO_EQ_SE=0.35] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AyW5gYi_P2oH for <tcpm@ietfa.amsl.com>; Sun, 8 Jun 2014 23:39:47 -0700 (PDT)
Received: from tiger.dc.kau.se (smtp.kau.se [193.10.220.38]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E58F41A0303 for <tcpm@ietf.org>; Sun, 8 Jun 2014 23:39:46 -0700 (PDT)
X-Spam-Processed: mail.kau.se, Mon, 09 Jun 2014 08:39:14 +0200 (not processed: spam filter heuristic analysis disabled)
X-Authenticated-Sender: perhurt@kau.se
X-MDRemoteIP: 130.243.25.129
X-Return-Path: per.hurtig@kau.se
X-Envelope-From: per.hurtig@kau.se
Message-ID: <5395569F.6070303@kau.se>
Date: Mon, 09 Jun 2014 08:39:27 +0200
From: Per Hurtig <per.hurtig@kau.se>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: "Zimmermann, Alexander" <Alexander.Zimmermann@netapp.com>
References: <38DA60FD-7629-4FCB-BD19-830FC7F5232C@netapp.com> <8CEBB7AD-88F3-45CA-A1C6-D5306EEE02B3@kau.se> <CF98B1E0-1168-482C-B946-081A56FB70E0@netapp.com>
In-Reply-To: <CF98B1E0-1168-482C-B946-081A56FB70E0@netapp.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/5IzguuxVK8DlCasya_5tssSnKs0
Cc: "draft-ietf-tcpm-rtorestart@tools.ietf.org" <draft-ietf-tcpm-rtorestart@tools.ietf.org>, "tcpm@ietf.org Extensions" <tcpm@ietf.org>
Subject: Re: [tcpm] Review of draft-ietf-tcpm-rtorestart-02.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Jun 2014 06:39:49 -0000

Sorry for the late reply,

On 2014-05-21 10:28, Zimmermann, Alexander wrote:
> Hi Per,
>
> Am 15.05.2014 um 12:00 schrieb Per Hurtig <per.hurtig@kau.se>:
>
>> Hi Alexander
>>
>> thanks for the review, please see my comments inline…
>>
>>
>> On 13 May 2014, at 11:57, Zimmermann, Alexander <Alexander.Zimmermann@netapp.com> wrote:
>>
>>> Hi folks,
>>>
>>> first of all thank you for writing this draft. The draft is clearly written: the
>>> problem is well described, the doc is self-contained (it’s not necessary to read
>>> dozens of other RFCs to understand the problem), and the authors state why the
>>> doc is experimental and what further experiments are needed for being a STD.
>>> However, on the technical side I see some open points. In detail:
>>>
>>> * Sec 1, 4. para: „a considerable number of flows have such properties“
>>> Can you backup this with some numbers? This is exactly the point I raised at the
>>> (last?) IETF. I would like to see some data on how severe the problem is we would
>>> like to fix.
>>>
>>
>> For short flows (e.g. web) there are a number of references in the draft. These are
>> the primary target for the mechanism. We’ll elaborate some more on the importance
>> for other types of traffic.
>
> Your draft focus on two different kinds of flows: a) on short flows and b) on flows
> w/ low transmission rates. For the former you give two references [RJ10] and [FDT13],
> but not for the latter. You say that „a considerable number of flows have such
> properties“ and my question was, if you can backup this w/ some numbers. Additionally,
> Yuchung reports that he wasn’t able to see much improvements while implementing your
> scheme. Do you have some data that you can share to see how much your scheme helps to
> improve the latency?

Yes, we just recently completed a series of experiments and are 
currently finishing the analysis. The results look good, but we'll come 
back shortly to the list with details.

>
> Also I’m not sure if we really focus on short flows. Is it not rather the case that we
> concentrate us on tail losses? Your scheme will not only kick in for short flows, it
> gets activated at the end of any stream. Or do I miss something here? (BTW [FDT13]
> reports that 77% of the RXmits are RTO-based, it doesn’t say that 77% are short flows.)
>

That's correct, the focus is indeed tail loss.

>>
>>
>>> * Sec 1.1: please change this subsection to a section (1.1 => 2) and also
>>> introduce your new state variable rrthresh here
>>>
>>
>> I don’t get why the section should be renumbered. The “terminology” is often the
>> last subsection of the introduction of most drafts and RFCs.
>
> This was only a nit. In German language we have the grammar rule that you cannot
> introduce a subsection x.1 if you don’t have a second subsection in the same section.
> While searching a little bit in the web, it seems that is also valid in English
> (http://www.cs.berkeley.edu/~pattrsn/talks/writingtips.html). Anyway, it was only
> a nit :-)
>

Well, you do have a point. A single subsection is not a beautiful thing, 
regardless of language :)

>> Furthermore, it seems
>> slightly overkill to introduce the variable there. RFCs that manages many more
>> variables, e.g. RFC6298 does not explicitly introduce any variable in this section.
>
> and RFC 5681 or RFC 6675 are counterexamples :-) Anyway, the question is more whether
> or not we need an additional state variable in the TCB. If you think that they are cases
> possible where ‚rrthresh‘ should not be initialized with ‚dupthresh + 1', then we should
> introduce the state variable and explain why we need it. Otherwise, I recommend do
> use ‚dupthresh + 1’ instead of 'rrthresh‘
>

ok, will fix.

>>
>>> * Sec 1, 5. para: „Spurious timeouts typically degrade the performance of flows
>>> with multiple bursts of data, as a burst following a spurious timeout might
>>> not fit within the reduced congestion window (cwnd)“
>>>
>>> This is (only) true with respect to your algo, not in general. The general
>>> problem of a spurious timeout is the cwnd reduction, the go-back-N
>>> retransmissions, … See RFC 3522. After reading section 4.2 of your draft I
>>> understand what you want to see here. Please rephrase the section and maybe
>>> add the spurious RTO RFC as reference.
>>>
>> What is meant is the actual cwnd reduction. We should clarify this.
>>
>>
>>> * Sec 3: Suppose FlightSize is 2 and you have exactly one segment to send,
>>> your algo doesn’t trigger since step 2.b isn’t true. Bug? I would say yes.
>>>
>> Good catch! The question is if it’s worth fixing since the algorithm will
>> become more complex
>
> Not really. You can 1) restart your RTO (as usual), 2) transmit new data,
> 3) re-arm your RTO with RTO - T_earliest if FlightSize < 4. In terms of „re-arming“
> we did more or less the same in RFC6069…
>

Will check you RFC and see if it's applicable in our scenario, thanks.

>> and the situation you mention is really a corner case that
>> requires either (i) the cwnd to be exactly 3 segments large;
>
> less than 4, no?
>

After you reported this, we went through it rather carefully, and I 
actually thinks its *exactly* 3, but it was I while ago and I might 
remember this incorrectly.

>> or (ii) having a
>> packet written to the socket just between previous data transmission and the
>> arrival of the acknowledgment.
>
> Yes
>
>>
>> Furthermore, this mechanism is only an optimization to the standard timer. So
>> if it doesn’t work in this particular scenario it won’t break anything.
>
> Sure, but if we exclude some traffic pattern which we can on the other hand
> easily include by a small algo change, we should do that. Nevertheless, you
> are right if we only speak here about rare corner cases (and I don’t know
> the answer) we can ignore them.
>
>> It will
>> just not be triggered.
>>
>>> * Sec 3: Why the condition 2.b is different from the early retransmission
>>> condition 2.b or 3.b? Is there any specific reason why we exclude the
>>> advertised receive window part from the condition?
>>>
>> Because the advertised window can be small in situations where it’s not
>> preferable to use RTO restart. For instance, in the middle of a transfer where
>> it’s better to wait for fast/early retransmit to kick in.
>
> Ah, OK I see. Could you please explain this in the draft? Is it valid to say
> that RTO restart try to cover all cases where early retransmit does’t work?
> I have the feeling that I not fully understand which cases should be covered by
> which algo.
>

Will clarify it!



Thank you once again for reviewing,
Per

>>
>>> * Sec 3: IMO the algo/doc is too much Linux driven. I would like to see a
>>> segment-based *and* byte-based version of the algo, like RFC 5827.
>>>
>> Yes, this comment was given by several people at the last IETF. We will update
>> the draft to address this.
>>
>>> * General: IMO I would be a little bit easier to read the doc if you give the
>>> algo a proper name. By reading „RTO restart“ I had sometimes trouble to
>>> know if mean your algo or the „action“ of restarting the RTO.
>>>
>> Agreed, this will be fixed.
>>
>>
>>
>> Thank you,
>> Per Hurtig
>>
>>>
>>> Alex
>>>
>>>
>>>
>>
>>
>