Re: [tcpm] Updated Section 3 of draft-ietf-tcpm-1323bis

"Scheffenegger, Richard" <rs@netapp.com> Wed, 15 May 2013 19:07 UTC

Return-Path: <rs@netapp.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AA0EC21F90FC for <tcpm@ietfa.amsl.com>; Wed, 15 May 2013 12:07:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.082
X-Spam-Level:
X-Spam-Status: No, score=-10.082 tagged_above=-999 required=5 tests=[AWL=-0.083, BAYES_00=-2.599, J_CHICKENPOX_33=0.6, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XasgCG4GnX-y for <tcpm@ietfa.amsl.com>; Wed, 15 May 2013 12:07:01 -0700 (PDT)
Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77]) by ietfa.amsl.com (Postfix) with ESMTP id 5D2DF21F901D for <tcpm@ietf.org>; Wed, 15 May 2013 12:06:57 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.87,678,1363158000"; d="scan'208";a="53869266"
Received: from smtp2.corp.netapp.com ([10.57.159.114]) by mx12-out.netapp.com with ESMTP; 15 May 2013 12:06:57 -0700
Received: from vmwexceht01-prd.hq.netapp.com (vmwexceht01-prd.hq.netapp.com [10.106.76.239]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id r4FJ6vIi004552; Wed, 15 May 2013 12:06:57 -0700 (PDT)
Received: from SACEXCMBX02-PRD.hq.netapp.com ([169.254.1.61]) by vmwexceht01-prd.hq.netapp.com ([10.106.76.239]) with mapi id 14.03.0123.003; Wed, 15 May 2013 12:06:57 -0700
From: "Scheffenegger, Richard" <rs@netapp.com>
To: "mallman@icir.org" <mallman@icir.org>, "tcpm (tcpm@ietf.org)" <tcpm@ietf.org>
Thread-Topic: [tcpm] Updated Section 3 of draft-ietf-tcpm-1323bis
Thread-Index: Ac5Rnme2H3qd1lv1Tgmf6iL4BL4X3g==
Date: Wed, 15 May 2013 19:06:56 +0000
Message-ID: <012C3117EDDB3C4781FD802A8C27DD4F24B91C7C@SACEXCMBX02-PRD.hq.netapp.com>
Accept-Language: de-AT, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.104.60.114]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "David Borman (David.Borman@quantum.com)" <David.Borman@quantum.com>
Subject: Re: [tcpm] Updated Section 3 of draft-ietf-tcpm-1323bis
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 May 2013 19:07:06 -0000

Hi,

Before I again post an update with disputed sections of text, here is my current version of section 3. Note that the title was also changed to put the emphasis away from the RTTM/RTO update part.

I've tried to keep all the comments reflected in this updated text, but this might not yet reflect the concensus!

Best regards,
  Richard



3.  TCP Timestamp Option

3.1.  Introduction

   TCP measures the round trip time (RTT), primarily for the purpose of
   arriving at a reasonable value for the Retransmission Timeout (RTO)
   timer interval.  Accurate and current RTT estimates are necessary to
   adapt to changing traffic conditions, while a conservative estimate
   of the RTO inveral is necessary to minimize spurious RTOs.

   When [RFC1323] was originally written, it was perceived that taking
   RTT measurements for each segment, and also during retransmissions,
   would contribute to reduce spurious RTOs, while maintaining the
   timeliness of necessary RTOs.  At the time, RTO was also the only
   mechanism to make use of the measured RTT.  It has been shown, that
   taking more RTT samples has only a very limited effect to optimize
   RTOs [Allman99].

   This document makes a clear distinction between the round trip time
   measurement (RTTM) mechanism, and subsequent mechanisms using the RTT
   signal as input, such as RTO (see Section 3.4).

   The timestamp option is important when large receive windows are
   used, to allow the use of the PAWS mechanism (see Section 4).
   Furthermore, the option is useful for all TCP's, since it simplifies
   the sender and allows the use of additional optimizations such as
   Eifel ([RFC3522], [RFC4015]) and others.

3.2.  Timestamp Option

   TCP is a symmetric protocol, allowing data to be sent at any time in
   either direction, and therefore timestamp echoing may occur in either
   direction.  For simplicity and symmetry, we specify that timestamps
   always be sent and echoed in both directions.  For efficiency, we
   combine the timestamp and timestamp reply fields into a single TCP
   Timestamp Option.

   TCP Timestamp Option (TSopt):

   Kind: 8

   Length: 10 bytes

          +-------+-------+---------------------+---------------------+
          |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
          +-------+-------+---------------------+---------------------+
              1       1              4                     4



   The Timestamp Option carries two four-byte timestamp fields.  The
   Timestamp Value field (TSval) contains the current value of the
   timestamp clock of the TCP sending the option.

   The Timestamp Echo Reply field (TSecr) is valid if the ACK bit is set
   in the TCP header; if it is valid, it echoes a timestamp value that
   was sent by the remote TCP in the TSval field of a Timestamp option.
   When TSecr is not valid, its value MUST be zero.  However, a value of
   zero does not imply TSecr being invalid.  The TSecr value will
   generally be from the most recent Timestamp Option that was received;
   however, there are exceptions that are explained below.

   A TCP MAY send the Timestamp option (TSopt) in an initial <SYN>
   segment (i.e., segment containing a SYN bit and no ACK bit), and MAY
   send a TSopt in other segments only if it received a TSopt in the
   initial <SYN> or <SYN,ACK> segment for the connection.

   Once TSopt has been successfully negotiated (sent and received)
   during the <SYN>, <SYN,ACK> exchange, TSopt MUST be sent in every
   non-<RST> segment for the duration of the connection, and SHOULD be
   sent in a <RST> segment (see Section 4.2 for details).  If a non-
   <RST> segment is received without a TSopt, a TCP MAY drop the segment
   and send an <ACK> for the last in-sequence segment.  A TCP MUST NOT
   abort a TCP connection if a non-<RST> segment is received without a
   TSopt.

   If a TSopt is received on a connection where TSopt was not negotiated
   in the initial three-way handshake, the TSopt MUST be ignored and the
   packet processed normally.

   In the case of crossing <SYN> segments where one <SYN> contains a
   TSopt and the other doesn't, both sides MAY send a TSopt in the
   <SYN,ACK> segment.

   TSopt is required for the two mechanisms described in sections 3.3
   and 4.2.  There are also other mechanisms that rely on the presence
   of the TSopt, e.g.  [RFC3522].  If a TCP stopped sending TSopt at any
   time during an established session, it interferes with these
   mechanisms.  This update to [RFC1323] describes explicitly the
   previous assumption (see Section 4.2), that each TCP segment must
   have TSopt, once negotiated.

3.3.  The RTTM Mechanism

   RTTM places a Timestamp Option in every segment, with a TSval that is
   obtained from a (virtual) "timestamp clock".  Values of this clock
   MUST be at least approximately proportional to real time, in order to
   measure actual RTT.

   These TSval values are echoed in TSecr values in the reverse
   direction.  The difference between a received TSecr value and the
   current timestamp clock value provides a RTT measurement.

   When timestamps are used, every segment that is received will contain
   a TSecr value.  However, these values cannot all be used to update
   the measured RTT.  The following example illustrates why.  It shows a
   one-way data flow with segments arriving in sequence without loss.
   Here A, B, C... represent data blocks occupying successive blocks of
   sequence numbers, and ACK(A),... represent the corresponding
   cumulative acknowledgments.  The two timestamp fields of the
   Timestamp Option are shown symbolically as <TSval=x,TSecr=y>.  Each
   TSecr field contains the value most recently received in a TSval
   field.

              TCP  A                                     TCP B

                              <A,TSval=1,TSecr=120> ----->

                   <---- <ACK(A),TSval=127,TSecr=1>

                              <B,TSval=5,TSecr=127> ----->

                   <---- <ACK(B),TSval=131,TSecr=5>

                . . . . . . . . . . . . . . . . . . . . . .

                              <C,TSval=65,TSecr=131> ---->

                   <---- <ACK(C),TSval=191,TSecr=65>

                                  (etc.)

   The dotted line marks a pause (60 time units long) in which A had
   nothing to send.  Note that this pause inflates the RTT which B could
   infer from receiving TSecr=131 in data segment C. Thus, in one-way
   data flows, RTTM in the reverse direction measures a value that is
   inflated by gaps in sending data.  However, the following rule
   prevents a resulting inflation of the measured RTT:

   RTTM Rule: A TSecr value received in a segment MAY be used to update
              the averaged RTT measurement only if the segment advances
              the left edge of the send window, i.e.  SND.UNA is
              increased.

   Since TCP B is not sending data, the data segment C does not
   acknowledge any new data when it arrives at B. Thus, the inflated
   RTTM measurement is not used to update B's RTTM measurement.


3.4.  Updating the RTO value

   [Ludwig00] and [Floyd05] have highlighted the problem that an
   unmodified RTO calculation, which is updated with per-packet RTT
   samples, will truncate the path history too soon.  This can lead to
   an increase in spurious retransmissions, when the path properties
   vary in the order of a few RTTs, but a high number of RTT samples are
   taken on a much shorter timescale.

   Implementers should note that with timestamps multiple RTTMs can be
   taken per RTT.  The [RFC6298] RTO estimator has weighting factors,
   alpha and beta, based on an implicit assumption that at most one RTTM
   will be sampled per RTT.  When multiple RTTMs per RTT are available
   to update the RTO estimator, this implicit assumption must be
   considered.  An implementation suggestion is detailed in Appendix G.


{ 3.5. - former 3.4. - not changed }


Appendix G.  RTO calculation modification

   This document RECOMMENDS that the standard RTO calculation
   ([RFC6298]) is modified in the following way.  We roughly know how
   many samples a congestion window worth of data will yield, not
   accounting for ACK compression, and ACK losses.  Such events will
   result in more history of the path being reflected in the final value
   for RTO, and are uncritical.  This modification will approximate the
   RTO estimator described in [RFC6298], regardless how many samples are
   taken per window:

      ExpectedSamples = ceiling(FlightSize / (SMSS * 2))

      alpha' = alpha / ExpectedSamples

      beta' = beta / ExpectedSamples

   Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs".

   Instead of using alpha and beta in the algorithm of [RFC6298], use
   alpha' and beta' instead:

      RTTVAR <- (1 - beta') * RTTVAR + beta' * |SRTT - R'|

      SRTT <- (1 - alpha') * SRTT + alpha' * R'

      (for each sample R')