comments on RFC1323.bis

Vern Paxson <vern@ee.lbl.gov> Tue, 05 May 1998 00:02 UTC

Delivery-Date: Mon, 04 May 1998 20:02:35 -0400
Return-Path: tcplw-relay@services.BSDI.COM
Received: from cnri.reston.va.us (ns.cnri.reston.va.us [132.151.1.1]) by ietf.org (8.8.5/8.8.7a) with ESMTP id UAA15602 for <ietf-archive@ietf.org>; Mon, 4 May 1998 20:02:34 -0400 (EDT)
Received: from services.BSDI.COM (services.BSDI.COM [205.230.225.19]) by cnri.reston.va.us (8.8.5/8.8.7a) with ESMTP id UAA06062 for <IETF-archive@cnri.reston.va.us>; Mon, 4 May 1998 20:04:59 -0400 (EDT)
Received: (from daemon@localhost) by services.BSDI.COM (8.8.7/8.8.8) id SAA28321 for tcplw-list@bsdi.com; Mon, 4 May 1998 18:02:13 -0600 (MDT)
Received: from mailfilter.bsdi.com (mailfilter.BSDI.COM [205.230.225.21]) by services.BSDI.COM (8.8.7/8.8.8) with ESMTP id SAA28318; Mon, 4 May 1998 18:02:08 -0600 (MDT)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by mailfilter.bsdi.com (BSDI-MF 1.0) with ESMTP id SAA20690 env-from (vern@ee.lbl.gov); Mon, 4 May 1998 18:01:05 -0600 (MDT)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5) id RAA13212; Mon, 4 May 1998 17:02:05 -0700 (PDT)
Message-Id: <199805050002.RAA13212@daffy.ee.lbl.gov>
To: David Borman <dab@bsdi.com>
Cc: tcplw@bsdi.com
Subject: comments on RFC1323.bis
Date: Mon, 04 May 1998 17:02:05 -0700
From: Vern Paxson <vern@ee.lbl.gov>

Here are context diffs to the nroff source to fix some typos and phrasing,
and also to point out some (minor) issues that need to be addressed.  These
last are done by introducing comments in the source, except when the comments
are made inside a display.

Some other issues:

	* The document doesn't specify the relationship between the
	  options.  For example, if you use window scaling, then is
	  it a MUST that you use timestamps too?  Or a SHOULD?  Or ... ?

	* I added some MUSTs and SHOULDs (and the obligatory RFC 2119 cite
	  to go with them).  But I may have missed some places where
	  these should be used.

	* A significant technical issue: the current RTTM discussion
	  does not mention anything about altering the constants used
	  for the exponentially-weighted moving average when updating
	  the estimate of RTT.  Sally Floyd has pointed out that using
	  the usual constants is incorrect when the RTT is updated more
	  than once per window; their use will result in an RTT estimate
	  that is much more sensitive to transient changes in RTT.

	  I think at a minimum the document needs to point out that
	  there is an open issue here.

	* It needs a "security considerations" section.  I sketched
	  some thoughts on what might go in one.

- Vern


--- rfc1323.bis.ORIG	Mon May  4 16:55:48 1998
+++ rfc1323.bis	Mon May  4 16:54:46 1998
@@ -72,6 +72,9 @@
 There is no one-line answer to the question: "How fast can TCP go?".
 There are two separate kinds of issues, performance and reliability,
 and each depends upon different parameters.  We discuss each in turn.
+.sp
+(This document uses terms such as MUST and SHOULD.
+See RFC 2119 for the exact interpretation of these terms.)
 .IN +0.3i
 .LT "1.1  TCP Performance" 0.3i
 .sp
@@ -127,8 +130,10 @@
 corresponding increase of the probability of more than one packet per
 window being dropped.  This could have a devastating effect upon the
 throughput of TCP over an LFN.  In addition, if a congestion control
-mechanism based upon some form of random dropping were introduced into
-gateways, randomly spaced packet drops would become common, possible
+mechanism based upon some form of random dropping (such as discussed
+in RFC2309)
+were introduced into
+gateways, randomly spaced packet drops would become common, possibly
 increasing the probability of dropping more than one packet per
 window.
 .sp
@@ -318,7 +323,7 @@
 However, some buggy TCP implementation might be crashed by the first
 appearance of an option on a non-SYN segment.  Therefore, for each of
 the extensions defined below, TCP options will be sent on non-SYN
-segments only after an exchange of options on the the SYN segments has
+segments only after an exchange of options on the SYN segments has
 indicated that both sides understand the extension.  Furthermore, an
 extension option will be sent in a <SYN,ACK> segment only if the
 corresponding option was received in the initial <SYN> segment.
@@ -333,6 +338,12 @@
 segment, adding 12 bytes to the 20-byte TCP header.  We
 believe that the bandwidth saved by reducing unnecessary
 retransmissions will more than pay for the extra header bandwidth.
+.\" How does the Timestamps option help with reducing unnecessary
+.\" retransmissions?  It only will if currently the RTO estimates
+.\" are too low.  While some TCP implementations suffer from this
+.\" problem, most do not.  In particular, using the coarse-grained
+.\" BSD RTO algorithm works quite conservatively.  So this argument
+.\" is not right.
 .sp
 There is also an issue about the processing overhead for parsing the
 variable byte-aligned format of options, particularly with a
@@ -342,7 +353,7 @@
 and if it is verified then use a fast path.  Hosts that use this
 canonical layout will effectively use the options as a set of
 fixed-format fields appended to the TCP header.  However, to retain the
-philosophical and protocol framework of TCP options, a TCP must be
+philosophical and protocol framework of TCP options, a TCP MUST be
 prepared to parse an arbitrary options field, albeit with less
 efficiency.
 .sp
@@ -415,7 +426,7 @@
 with the SYN bit on and the ACK bit off).  It may also be sent in a
 <SYN,ACK> segment, but only if a Window Scale option was received in
 the initial <SYN> segment.  A Window Scale option in a segment without
-a SYN bit should be ignored.
+a SYN bit SHOULD be ignored.
 .sp
 The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself
 is never scaled.
@@ -577,7 +588,7 @@
 set in the TCP header; if it is valid, it echos a timestamp value that
 was sent by the remote TCP in the TSval field of a Timestamps option.
 .mc 
-When TSecr is not valid, its value must be zero.
+When TSecr is not valid, its value MUST be zero.
 .mc
 The TSecr value will generally be from the most recent Timestamp option
 that was received; however, there are exceptions that are explained
@@ -608,6 +619,8 @@
 represent the corresponding cumulative acknowledgments.  The two
 timestamp fields of the Timestamps option are shown symbolically as
 <TSval= x,TSecr=y>.  Each TSecr field contains the value most recently
+.\" "most recently received" conflicts with 3.4, which spells out
+.\" exactly which value is kept
 received in a TSval field; these echoed values. labelled "TS.Recent",
 are shown in parentheses.
 .nf
@@ -625,7 +638,11 @@
 4.  (130)  <--- <ACK(B),TSval=130,TSecr=6>             (6)
                 
        . . . ( Pause for 60 timestamp clock ticks ) . . . . 
-                  
+
+.\" This formatting is messed up.  Epochs 4 and 5 appear
+.\" twice.  In the first 4, TS.Recent is 130, but in the
+.\" subsequent 5, we have TSecr=120.  Also, why in 5 is
+.\" TSval=1??
       
 5.  (130)             <C,TSval=1,TSecr=120> --->       (1)
          
@@ -636,6 +653,8 @@
 5.            ... <--- <y,ACK(A),TSval=191,TSecr=5>    (5)
 
 
+.\" What is the point of this second, less precisely specified
+.\" example??
 
    TCP  A                                          TCP B
       
@@ -685,8 +704,8 @@
 .LT (A) 0.5i
 Delayed ACKs.
 .sp
-Many TCP's acknowledge only every Kth segment out of a group of
-segments arriving within a short time interval; this policy is known
+RFC1122 requires TCP's to acknowledge every 2nd full-sized segment.
+The policy of acknowledging only every 2nd segment is known
 generally as "delayed ACKs".  The data-sender TCP must measure the
 effective RTT, including the additional time due to delayed ACKs, or
 else it will retransmit unnecessarily.  Thus, when delayed ACKs are in
@@ -704,7 +723,7 @@
 situation the sender should be conservative about retransmission.
 Furthermore, it is better to overestimate than underestimate the RTT.
 An ACK for an out-of-order segment should therefore contain the
-timestamp from the most recent segment that advanced the window.
+timestamp from the most recent (in-order) segment that advanced the window.
 .sp
 The same situation occurs if segments are re-ordered by the network.
 .sp
@@ -734,7 +753,8 @@
 SEG.TSval >= TSrecent and SEG.SEQ <= Last.ACK.sent
 .IN -0.3i
 then SEG.TSval is copied to TS.Recent; otherwise, it is
-ignored.
+ignored.  Note that this test replaces Karn's algorithm [Karn87],
+required by 4.2.3.1 of of RFC1122.
 .sp
 .LT (3) 0.5i
 When a TSopt is sent, its TSecr field is set to the current TS.Recent
@@ -743,7 +763,10 @@
 The following examples illustrate these rules.  Here A, B, C...
 represent data segments occupying successive blocks of sequence
 numbers, and ACK(A),...  represent the corresponding acknowledgment
-segments.  Note that ACK(A) has the same sequence number as B.  We show
+segments.  Note that ACK(A) has the same sequence number as B, because
+the first sequence number of B is the one immediately followly the
+last sequence number included in A.
+We show
 only one direction of timestamp echoing, for clarity.
 .IN +0.5i
 .LT o 0.5i
@@ -857,8 +880,8 @@
 connection will be discarded by the normal 3-way handshake and
 sequence number checks of TCP.
 .sp
-It is recommended that RST segments NOT carry timestamps, and that RST
-segments be acceptable regardless of their timestamp.  Old duplicate
+RST segments SHOULD NOT carry timestamps, and RST
+segments SHOULD be accepted regardless of their timestamp.  Old duplicate
 RST segments should be exceedingly unlikely, and their cleanup function
 should take precedence over timestamps.
 .IN +0.3i
@@ -869,7 +892,7 @@
 .IN +0.5i
 .LT R1) 0.5i
 If there is a Timestamps option in the arriving segment and SEG.TSval <
-TS.Recent and if TS.Recent is valid (see later discussion), then treat
+TS.Recent and if TS.Recent is valid (see later discussion in 4.2.3), then treat
 the arriving segment as not acceptable:
 .IN +0.5i
 Send an acknowledgement in reply as specified in RFC-793 page 69 and
@@ -939,12 +962,15 @@
 If B's retransmission was triggered by the "fast retransmit" algorithm,
 i.e., by duplicate ACKs, then the queued segments that caused these
 ACKs must have been received already.
+.\" .sp
+.\" Even if a segment were delayed past the RTO, the Fast Retransmit
+.\" mechanism [Jacobson90c] will cause the delayed
+.\" packets to be retransmitted at the same time as B.2, avoiding an extra
+.\" RTT and therefore causing a very small performance penalty.
+.\"
+.\" ^^^^ This isn't right: fast retransmission will only retransmit
+.\" one packet, not all of the delayed packets.
 .sp
-Even if a segment were delayed past the RTO, the Fast Retransmit
-mechanism [Jacobson90c] will cause the delayed
-packets to be retransmitted at the same time as B.2, avoiding an extra
-RTT and therefore causing a very small performance penalty.
-.sp
 We know of no case with a significant probability of occurrence in
 which timestamps will cause performance degradation by unnecessarily
 discarding segments.
@@ -973,7 +999,7 @@
 .sp
 To make this more quantitative, any clock faster than 1 tick/sec will
 reject old duplicate segments for link speeds of ~8 Gbps.  A 1ms
-timestamp clock will work at link speeds up to 8 Tbps (8*10**12) bps!
+timestamp clock will work at link speeds up to 8 Tbps (8*10**12 bps)!
 .sp
 .LT (b) 0.5i
 The timestamp clock must not be "too fast".
@@ -1124,7 +1150,7 @@
 .LT "4.3.  Duplicates from Earlier Incarnations of Connection" 0.3i
 .sp
 The PAWS mechanism protects against errors due to sequence number
-wrap-around on high-speed connection.  Segments from an earlier
+wrap-around on high-speed connections.  Segments from an earlier
 incarnation of the same connection are also a potential cause of old
 duplicate errors.  In both cases, the TCP mechanisms to prevent such
 errors depend upon the enforcement of a maximum segment lifetime (MSL)
@@ -1179,8 +1205,19 @@
 .ne 2
 [Braden89] Braden, R., editor,
 "Requirements for Internet Hosts -- Communication Layers",
-RFC 1122, October, 1989
+RFC 1122, October, 1989.
+.sp
 .ne 2
+[Braden98] Braden, B., et al,
+"Recommendations on Queue Management and Congestion Avoidance in the Internet",
+RFC 2309, April, 1998.
+.sp
+.ne 2
+[Bradner97]
+S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels",
+RFC 2119, March 1997.
+.sp
+.ne 2
 [Clark87]  Clark, D., Lambert, M., and L. Zhang, "NETBLT: A Bulk Data
 Transfer Protocol", RFC 998, MIT, March 1987.
 .sp
@@ -1335,6 +1372,8 @@
 .sp
 So, the MSS value to be sent in an MSS option should be
 equal to the effective MTU minus the fixed IP and TCP headers.
+.\" But you don't know the effective MTU when sending an initial SYN
+.\" and doing PMTU discovery
 Since both IP and TCP options are ignored when calculating the
 value for the MSS option,
 if there are any IP or TCP options to be sent in a packet,
@@ -1361,6 +1400,9 @@
 fragmented, and packets sent with the constraints in
 the lower right of this grid will cause IP fragmentation,
 the only way to guarantee that this doesn't happen is for
+.\" It's not the "only way" - the other way is to confine
+.\" the behavior to the first column (MSS adjusted to include
+.\" options), since that's always conservative.
 the data sender to decrease the TCP data length by the
 size of the IP and TCP options.
 And since the sender will be adjusting the TCP data
@@ -1439,6 +1481,8 @@
 not the major contributor to this problem; the RTT is the limiting
 factor in how quickly connections can be opened and closed.  Therefore,
 this problem will be no worse at high transfer speeds.
+.\" It is worse at high transfer speeds, because you can sustain
+.\" more connections per second.
 .sp
 .LT (b) 0.5i
 Allow old duplicate segments to expire.
@@ -1526,9 +1570,9 @@
 is disabled.
 The Karn algorithm disables all RTT measurements during
 retransmission, since it is ambiguous whether the ACK is
-is for the original packet, or the retransmitted packet.
+for the original packet, or the retransmitted packet.
 With Timestamps, that ambiguity is removed since the TSecr
-in the ACK will contain the TSval from which ever data
+in the ACK will contain the TSval from whichever data
 packet made it to the destination.
 .sp
 .LT (b) 0.5i
@@ -1552,7 +1596,7 @@
 to fill in the SEG.WND value, not SND.WND.
 .sp
 .LT (d) 0.5i
-New pseudo-code summary has been added in Appendix E.
+A new pseudo-code summary has been added in Appendix E.
 .sp
 .LT (e) 0.5i
 Appendix A has been expanded with information about
@@ -1584,7 +1628,7 @@
 Clock Values
 
     my.TSclock:      Local source of 32-bit timestamp values
-    my.TSclock.rate: Period of my.TSclock (1 ms to 1 sec).
+    my.TSclock.rate: Tick granularity of my.TSclock (1 ms to 1 sec).
     
 Per-Connection State Variables
 
@@ -1649,7 +1693,7 @@
                          (my.TSclock - SEG.TSecr)*my.TSclock.rate ) ;
     }
 
-    if Segment contains WSopt) then {
+    if (Segment contains WSopt) then {
           Snd.wind.scale = SEG.WSopt;
           Snd.WS.OK = TRUE;
     }
@@ -1701,6 +1745,9 @@
           else
                 Update_SRTT( /* for compatibility */
                        (my.TSclock - Start.Time)/my.TSclock.rate);
+		       ** Won't this update the RTT estimate on every
+		       ** segment rather than once per window, requiring
+		       ** new EWMA constants?
     }
 }
 
@@ -1972,7 +2019,15 @@
 .ne 3
 .LT "Security Considerations" 0.3i
 .sp
-Security issues are not discussed in this memo.
+"Security issues are not discussed in this memo" is no longer
+acceptable.  A few considerations that come to mind: window scaling
+makes denial-of-service easier if one can find an endless TCP data source
+(such as chargen) since it can be made to send data at a higher rate
+than it otherwise could; if mandatory, timestamps could making TCP
+spoofing more difficult, because the spoofer has a harder time crafting
+the timestamp echoes for the spoofed side of the connection; accepting
+RSTs regardless of their timestamps doesn't make it any harder or
+easier to spoof RST packets.
 .sp
 .LT "Authors' Addresses" 0.3i
 .sp
@@ -1980,10 +2035,10 @@
 Van Jacobson
 University of California
 Lawrence Berkeley Laboratory
-Mail Stop 46A
+Mail Stop 50B/2239
 Berkeley, CA 94720
 .sp
-Phone: (415) 486-6411
+Phone: (510) 486-7519
 EMail: van@ee.lbl.gov
 .sp 2
 .ne 8