Re: [tcpm] draft-ietf-tcpm-rtorestart-04

"Scharf, Michael (Michael)" <michael.scharf@alcatel-lucent.com> Mon, 01 December 2014 15:31 UTC

Return-Path: <michael.scharf@alcatel-lucent.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8F4A71A1EED for <tcpm@ietfa.amsl.com>; Mon, 1 Dec 2014 07:31:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.91
X-Spam-Level:
X-Spam-Status: No, score=-6.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ea5CBZkJ2uqo for <tcpm@ietfa.amsl.com>; Mon, 1 Dec 2014 07:31:29 -0800 (PST)
Received: from smtp-fr.alcatel-lucent.com (fr-hpida-esg-02.alcatel-lucent.com [135.245.210.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5BE7C1A1EF0 for <tcpm@ietf.org>; Mon, 1 Dec 2014 07:31:15 -0800 (PST)
Received: from fr712usmtp2.zeu.alcatel-lucent.com (unknown [135.239.2.42]) by Websense Email Security Gateway with ESMTPS id 320FAE667C6AA; Mon, 1 Dec 2014 15:31:09 +0000 (GMT)
Received: from FR711WXCHHUB01.zeu.alcatel-lucent.com (fr711wxchhub01.zeu.alcatel-lucent.com [135.239.2.111]) by fr712usmtp2.zeu.alcatel-lucent.com (GMO) with ESMTP id sB1FUpXf018216 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Mon, 1 Dec 2014 16:31:10 +0100
Received: from FR712WXCHMBA15.zeu.alcatel-lucent.com ([169.254.7.81]) by FR711WXCHHUB01.zeu.alcatel-lucent.com ([135.239.2.111]) with mapi id 14.03.0195.001; Mon, 1 Dec 2014 16:30:52 +0100
From: "Scharf, Michael (Michael)" <michael.scharf@alcatel-lucent.com>
To: Anna Brunstrom <anna.brunstrom@kau.se>, "tcpm@ietf.org" <tcpm@ietf.org>
Thread-Topic: [tcpm] draft-ietf-tcpm-rtorestart-04
Thread-Index: AQHP9vVqrXJrvR/Ze0yRRPU/kEN2LZxYL7CAgAHTccCAIOa4AIAAFzWQ
Date: Mon, 01 Dec 2014 15:30:51 +0000
Message-ID: <655C07320163294895BBADA28372AF5D166AC52C@FR712WXCHMBA15.zeu.alcatel-lucent.com>
References: <20141027231119.14539.21372.idtracker@ietfa.amsl.com> <544ED2B8.4020208@kau.se> <655C07320163294895BBADA28372AF5D1669053B@FR712WXCHMBA15.zeu.alcatel-lucent.com> <545F5B71.5090906@kau.se> <655C07320163294895BBADA28372AF5D16699B3C@FR712WXCHMBA15.zeu.alcatel-lucent.com> <547C7D0C.8060500@kau.se>
In-Reply-To: <547C7D0C.8060500@kau.se>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [135.239.27.41]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/DmDnQ6Qt5Y892g9xx7LwIEt-RuU
Subject: Re: [tcpm] draft-ietf-tcpm-rtorestart-04
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Dec 2014 15:31:36 -0000

Hi Anna,

> > Some follow-up comments. I omit the parts for which there seems to be
> agreement on the need for a document update.
> >
> >>> * Section 4: I also wonder if the text should define "previously
> >> unsent segments". The term "previously unsent data" is very well-
> known
> >> in the RFC series. A quick search revealed that "previously unsent
> >> segments" is usually only used to refer to situations when segments
> >> with this property have indeed been transmitted. Is my understanding
> >> correct that draft-ietf-tcpm-rtorestart-04 uses this term
> differently?
> >> I think it here refers to data that will be sent *in future*. In
> that
> >> case, I wonder if it is really clear how the sender can determine
> the
> >> number of "previously unsent segments". For instance, that number
> may
> >> depend on the segmentation strategy of the sender, in particular for
> >> applications that use many small write() calls. In this case a TCP
> >> stack may e.g. decide to send partial segments. I am not sure if the
> >> actual number of segments created from the "unsent data" is really
> >> known in advance always. Or do I miss something?
> >>
> >> You are correct that "previously unsent segments" refers to data
> that
> >> will be sent in the future. This is the same meaning as in RFC 3517
> and
> >> RFC 4653. A definition of this term could be added to Section 2, but
> I
> >> am not sure if it is needed? To better understand the possible
> conflict
> >> in terminology, could you point to some of RFCs that use the term to
> >> refer to already transmitted segments? Intuitively that seems a bit
> >> strange to me.
> > I didn't say that there is a conflict in terminology. I might be
> wrong, but to me draft-ietf-tcpm-rtorestart-04 defines the term
> "previously unsent segments" as a *new* TCP state variable.
> 
> Ok, I understood your first mail as two separate issues: 1) that the
> term "previously unsent segment" had a different meaning from other
> RFCs, 2) it was not clear how to calculate the number of previously
> unsent segments. So my pointer to RFC 3517 and RFC 4653 was with
> respect
> to issue 1 above.  But if I understand correctly now, your question is
> mainly about the second issue of how one can estimate the number of
> previously unsent segments.
> 
> We can add "previously unsent segments" to the terminology section as a
> variable, with the discussion on how to estimate the number of
> previously unsent segments then still contained in section 5.3. Is this
> what you are looking for?

Yes, I look for an exact definition of this parameter given that it is used in the algorithm. I think the document should clearly define the variable "previously unsent segments" and discuss how a TCP stack can determine this variable. That definition should cover both a TCP stack that stores unsent data internally in a segment-based buffer as well as a TCP stack that uses a byte-based send buffer. 
 
> > The value of that variable may be obvious for certain architectures
> of a TCP stack, in particular if segmentation into packets occurs on
> socket write() calls, i.e., before data is actually sent to network. As
> far as I recall, this is how the Linux stack works internally. But I'd
> like to avoid that an RFC describes an algorithm that can only be
> implemented for certain architecture of a TCP stack.
> >
> > RFC 3517 and RFC 4653 either refer to the "amount of previously
> unsent data" (which doesn't need to be further defined IMHO), or the
> "transmission of previously unsent segments" (which is not what draft-
> ietf-tcpm-rtorestart-04 refers to). For instance, here is the exact
> wording of the closest matches I could find in RFC 3517 and RFC 4563:
> >
> > RFC 3517 Section 5 Processing and Acting Upon SACK Information
> >
> >       "(2) If no sequence number 'S2' per rule (1) exists but there
> >            exists available unsent data and the receiver's advertised
> >            window allows, the sequence range of one segment of up to
> SMSS
> >            octets of previously unsent data starting with sequence
> number
> >            HighData+1 MUST be returned."
> >
> >     => "previously unsent data", NOT segments
> >
> > RFC 3517 Section 5 Algorithm Details
> >
> >    "Note: The first and second
> >     duplicate ACKs can also be used to trigger the transmission of
> >     previously unsent segments using the Limited Transmit algorithm
> >     [RFC3042]."
> >
> >     => "transmission of previously unsent segments", NOT the number
> of segments queued
> >
> > RFC 4653 Section 2 NCR Description
> >
> >    "The first Extended Limited Transmit variant, Careful Limited
> >     Transmit, calls for the transmission of one previously unsent
> >     segment, in response to duplicate acknowledgments, for every two
> >     segments that are known to have left the network."
> >
> >     => "transmission of previously unsent segments", NOT the number
> of segments queued
> >
> > RFC 4653 Section 3.2 Terminating Extended Limited Transmit and
> Preventing Bursts
> >
> >    "(T.3) A TCP is now permitted to transmit previously unsent data
> as
> >           allowed by cwnd, FlightSize, application data availability,
> and
> >           the receiver's advertised window."
> >
> >     => "previously unsent data", NOT segments
> >
> > RFC 4653 Section 3.3. Extended Limited Transmit
> >
> >    "(E.2) If the comparison in equation (1), below, holds and there
> are
> >           SMSS bytes of previously unsent data available for
> >           transmission, then the sender MUST transmit one segment of
> SMSS
> >           bytes."
> >
> >     => "previously unsent data", NOT segments
> >
> > Unless there is a definition of "number of previously unsent
> segments", I think this variable has to be defined and it has to be
> explained how it is calculated.
> >
> >> As discussed in section 5.3, the number of unsent segments can be
> >> estimated from the amount of unsent data and the SMSS. Note that we
> are
> >> only talking about data that is already available in the send queue,
> >> and
> >> later write calls can of course create additional segments. I guess
> a
> >> TCP stack that has one SMSS of data in its send queue is free to
> send
> >> this as multiple segments if it wants to, but I assume that would
> not
> >> be
> >> the normal behavior. As the estimate is not very critical it is not
> >> really a problem if this happens occasionally.
> > If Nagle's algorithm is disabled, sending out multiple segments would
> be the normal behavior in this case.
> 
> Why would turning off Nagle's algorithm make the sender split up data
> ALREADY AVAILABLE in the send queue into multiple small-sized segments
> when it gets a chance to send? Note that the sender is blocked by the
> cwnd when the ACK arrives. While technically allowed I guess, I do not
> think this would be normal.

As far as I can tell, you refer by a "send queue" to a byte-based stack, right?

Example: Let us assume an application that triggerd three socket write() calls of 100B after having disabled Nagle, but CWND currently does not allow sending a new packet.
  write(100B)
  write(100B)
  write(100B)

How many unsent segments does the TCP stack than have?

For a byte-based stack, the most likely answer seems to be one segment of 300B, given that the data has to be queued.

But for a segment-based stack, it depends whether the stack internally creates a segment immediately after each write() call. In this case, the stack could create *three* segments of 100B and schedule all of them for transmission. Alternatively, the stack could detect during/after the second and third write() call that there is queued data less than the MSS and merge the segments into one segment, resulting again in one segment of 300B. The latter operation could imply memory copies in the data structure that stores the segment, and this may be more expensive than scheduling three segments. Because of this effort, I could imagine some TCP stacks would have three segments of unsent data in this example, in particular if Nagle is disabled.

I have not checked recently what TCP stacks in reality do in that case, i.e., my example may not be realistic. But there are both segment- and byte-based TCP stacks in the wild, and the document should explain whether the parameter "number of previously unsent segments" depends on the stack-internal segmentation strategy, or not.

> > I could imagine that disabling Nagle's algorithm could be pretty
> common among thin stream applications. This is why the draft should
> IMHO discuss if disabling Nagle's algorithm affects the method.
> 
> I do not think it affects it. We can of course say this. Do you think
> this would be good/needed? (There are of course a lot of things
> available in TCP/SCTP that does not affect RTO restart.)

Unless I miss something, disabling Nagle algorithm does affect the algorithm depending on the segmentation strategy used by the stack. See the example above.

But my key question is about dependency on the segmentation strategy. Nagle is just one aspect of this.

> >> The addition of "previously unsent segments" is there to capture a
> >> corner case (see issue raised by Alex in
> >> http://www.ietf.org/mail-archive/web/tcpm/current/msg08780.html), so
> it
> >> will only be occasionally used. If we then get an incorrect
> estimate,
> >> say a sender with one SMSS of data would instead send this as two
> SMSS,
> >> and the algorithm triggers "incorrectly", this would be equivalent
> to
> >> using a RTOR threshold of rrthresh + 1 so nothing critical happens.
> (In
> >> some cases rrthresh + 1 may even be better.)
> > If a wrong parameter for the estimate is possible but not critical,
> this should be explained in the document.
> 
> Ok, we can mention this in section 5.3.

OK

> > In general, the cited email from Alex stated:
> >
> > "* Sec 3: IMO the algo/doc is too much Linux driven. I would like to
> see a
> >    segment-based *and* byte-based version of the algo, like RFC
> 5827."
> >
> > I fully agree to this statement and I think it has not been
> completely addressed in -04 so far.
> >
> >>> * Section 9: I wonder if an attacker could try to send certain ACK
> >> patterns to increase the risk of timeouts e.g. at a Web Server,
> >> leveraging the smaller RTO value. If successful, the RTOs could
> >> significantly reduce application performance. Probably, such an
> >> attacker could cause much more harm by other means, i.e., this may
> not
> >> be a significant security risk. But the current security analysis is
> >> very short.
> >>
> >> The section is very short, but this is because we do not really see
> any
> >> new security issues.
> >>
> >>    If I understand your example above correctly, the receiver is
> trying
> >> to somehow reduce performance for its own application by
> manipulating
> >> the ACK pattern. But if you are trying to mess up things for your
> own
> >> application you can deliver incorrect data, throw away any data
> >> received
> >> or do whatever you want so I do not think trying to possibly slow
> down
> >> the sender is the thing to worry about in this attack scenario. If
> >> someone finds a relevant security issue it should of course be
> >> documented, but I do not think it makes sense to put in something
> like
> >> the above just to make the section longer.
> > I was thinking e.g. about an off-path attacker that tries to inject
> ACKs into the TCP connection to trigger RTOs to harm application
> performance. I am not arguing that this is really a relevant attack
> scenario, but it seems not impossible to try that.
> 
> If you inject early ACKs you may be able to affect the RTO calculation
> and thereby increase the risk of spurious retransmissions, but this
> applies to RFC 6298 as well so there is nothing RTO restart specific
> about the attack. You do not increase the risk of a spurious RTO with
> RTO restart by sending an early ACK as this will be adjusted for in the
> RTO restart calculation, the timer always trigger one RTO after the
> transmission no matter when the ACK arrives.
> 
> What should probably be clarified in this section is that the security
> considerations in RFC 6298 still applies also with RTO restart, but
> that
> no additional security problems are known. Unless someone detects some
> specific problems of course. We will clarify this.

Thanks

Michael