Re: [tcpm] draft-ietf-tcpm-1323bis
"Scheffenegger, Richard" <rs@netapp.com> Fri, 26 November 2010 01:26 UTC
Return-Path: <rs@netapp.com>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 226D43A6AEC for <tcpm@core3.amsl.com>; Thu, 25 Nov 2010 17:26:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.938
X-Spam-Level:
X-Spam-Status: No, score=-8.938 tagged_above=-999 required=5 tests=[AWL=-1.539, BAYES_50=0.001, J_CHICKENPOX_33=0.6, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uv-ya51GZJbN for <tcpm@core3.amsl.com>; Thu, 25 Nov 2010 17:26:44 -0800 (PST)
Received: from mx4.netapp.com (mx4.netapp.com [217.70.210.8]) by core3.amsl.com (Postfix) with ESMTP id D13B63A6A9E for <tcpm@ietf.org>; Thu, 25 Nov 2010 17:26:42 -0800 (PST)
X-IronPort-AV: E=Sophos;i="4.59,258,1288594800"; d="scan'208";a="227965676"
Received: from smtp3.europe.netapp.com ([10.64.2.67]) by mx4-out.netapp.com with ESMTP; 25 Nov 2010 17:27:43 -0800
Received: from ldcrsexc2-prd.hq.netapp.com (emeaexchrs.hq.netapp.com [10.65.251.110]) by smtp3.europe.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id oAQ1ReLb000756; Thu, 25 Nov 2010 17:27:41 -0800 (PST)
Received: from LDCMVEXC1-PRD.hq.netapp.com ([10.65.251.108]) by ldcrsexc2-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 26 Nov 2010 01:27:40 +0000
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Date: Fri, 26 Nov 2010 01:27:38 -0000
Message-ID: <5FDC413D5FA246468C200652D63E627A0B9AD5FE@LDCMVEXC1-PRD.hq.netapp.com>
In-Reply-To: <201003221915.UAA02621@TR-Sys.de>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [tcpm] draft-ietf-tcpm-1323bis
Thread-Index: AcrJ9D82AnH7H+TmTqCH5lJNlBbjGjDEf+Ww
References: <201003221915.UAA02621@TR-Sys.de>
From: "Scheffenegger, Richard" <rs@netapp.com>
To: David Borman <david.borman@windriver.com>, Braden@ISI.EDU, van@parc.com
X-OriginalArrivalTime: 26 Nov 2010 01:27:40.0875 (UTC) FILETIME=[1E0F91B0:01CB8D09]
Cc: Alfred HÎnes <ah@TR-Sys.de>, tcpm@ietf.org, mallman@icir.org
Subject: Re: [tcpm] draft-ietf-tcpm-1323bis
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Nov 2010 01:26:48 -0000
Bob, Dave, Van, Group, Is this WG Item still alive? I would be interested in your opinion of how to address some potential optimizations in the Timestamp algorithm and RTT Measurement, when one takes SACK interactions into account. More specifically, I finally found the original discussion which explains why RFC1323(bis) has the requirement on both Sender and Receiver to a) reflect only very specific timestamps, and b) only make use of these under very certain circumstances. However, if the sender also takes available SACK information into account, some of these restrictions are lifted, and that may in turn allow improved RTTM (especially during a window containing loss - very likely showing higher latencies than during in-sequence delivery), and also improved SACK recovery while strictly adhering to packet conservation. Best regards, Richard Scheffenegger ------------------------ From braden Mon Aug 19 17:22:45 1991 Received: from braden.isi.edu by venera.isi.edu (5.61/5.61+local-3) id <AA18322>; Mon, 19 Aug 91 17:22:45 -0700 Date: Mon, 19 Aug 91 17:21:38 PDT From: braden Posted-Date: Mon, 19 Aug 91 17:21:38 PDT Message-Id: <9108200021.AA11332@braden.isi.edu> Received: by braden.isi.edu (4.0/4.0.3-4) id <AA11332>; Mon, 19 Aug 91 17:21:38 PDT To: end2end-interest Subject: RFC-1072 Folks, Recently we (Van, Dave Borman, and I) have been discussing some details of the timestamp echo mechanism of RFC-1072. There is considerable pressure to enter RFC-1072 and RFC-1185 into the Internet standards track, but I recently noticed some ambiguities and omissions. Upon further discussion with Van, I discovered that his implementation violates my understanding of what RFC-1072 meant. I thought this topic ought might be of wider interest, so I am forwarding the string of relevant messages. We might talk about it in Stockholm. Bob ____________________________________________________________________________ [[I have recently noticed that RFC-1072 did not explain how to prevent old duplicate ACKs from falling into the current window after sequence number wrap-around. I discussed this with Van at the Gigabits meeting, and he said he wants to send timestamps on ACKs as well as data segments. Then I wrote:]] From braden@ISI.EDU Fri Aug 9 14:48:19 1991 Date: Fri, 9 Aug 91 14:47:52 PDT From: braden@ISI.EDU To: dab@cray.com, van@helios.ee.lbl.gov Subject: RFC-1072 & RFC-1185 Cc: braden@ISI.EDU, postel@ISI.EDU Friends, The issue of whether or not to include TCP Echo options on ACK segments per RFC-1185 made me aware that there are other ambiguities in these RFCs. Since the IESG now wants to make them Internet standards, we ought to remove ambiguities. To this end, I have written down two descriptions of the algorithms. The first in in psuedo-code; the second is as a delta on RFC-793 section 3.9 Event Processing. I would appreciate your looking this over carefully, to see whether it agrees with your understanding. One new issue that came up: When TCP receives a data segment outside the window, it sends an ACK for the current window in reply. Question: should this timestamp echo the timestamp from the (old duplicate) data segment? (The code as written here does send it). Bob [[text deleted as immaterial to this thread. Dave Borman actually read my pseudo-code and pointed out some bugs.]] ______________________________________________________________________________ [[Van called me and we discussed his implementation of RFC-1072 in the DARTnet TCP, which did not seem to me to match RFC-1072. Van was trying to persuade me that it was logically equivalent, just implemented differently]] _______________________________________________________________________________ From van@ee.lbl.gov Mon Aug 12 15:57:18 1991 To: braden@ISI.EDU Subject: Re: RFC-1072: timestamps and a bit flag In-Reply-To: Your message of Sun, 11 Aug 91 16:30:43 PDT. Date: Mon, 12 Aug 91 15:57:31 PDT From: Van Jacobson <van@ee.lbl.gov> Bob, I was ambiguous on the phone. Depending on where in the path you sit, there are different left edges of the window. BSD tracks both the 'receiver' left edge (relative to rcv_nxt) and the 'left edge known to peer' (relative to the ack number in the last packet you sent). The latter view is used as part of the rcvr silly-window avoidance to prevent window advertisments that retract the window. We have a receiver state variable, rcv_acked, that tracks the last sequence number acked. (E.g., when a packet is sent rcv_acked is updated from seg_ack. In earlier versions of BSD an equivalent variable, rcv_adv, tracked the end of the window. In 4.4, I realized that the code got simpler if you tracked the beginning.) So, the timestamp test we do in tcp_input is if (ti->ti_seq == tp->rcv_acked) /* * this segment at left edge of window know to peer. * record its timestamp for echo reply and * incoming validity checking (per rfc1072 & 1185). */ tp->rcv_tstamp = tstamp; (There are earlier validity checks, including testing the incoming timestamp against tp->rcv_tstamp so we're sure at this point that the incoming segment is legal & 'recent'.) This handles both delayed acks & retransmits. I believe it does just what 1072 wants. - Van _____________________________________________________________________ From braden@ISI.EDU Mon Aug 12 16:45:30 1991 Date: Mon, 12 Aug 91 16:45:15 PDT From: braden@ISI.EDU To: van@ee.lbl.gov Subject: Re: RFC-1072: timestamps and a bit flag Cc: braden@ISI.EDU I don't think your trick will work for case (B) on page 13 of RFC-1072: "A hole in the sequence space". It goes on about how valuable it is to have RTT's for ACKs resulting from out-of-order segments. If I understand your message, none of the out-of-order segments will update rcv_tstamp, so the resulting ACKs will all echo the same timestamp value. Is this really what you want?? A.1 ---> (sets tstamp); B is lost C.3 ---> <-- ACK A.1 D.4 ---> <-- ACK A.1 E.5 ---> <-- ACK A.1 B.6 ---> <-- ACK E.6 Right? Bob ________________________________________________________________________ From braden@ISI.EDU Tue Aug 13 17:00:58 1991 Date: Tue, 13 Aug 91 17:00:23 PDT From: braden@ISI.EDU To: van@helios.ee.lbl.gov Subject: For discussion... Cc: braden@ISI.EDU, dab@cray.com Here is what I think RFC-1072 and RFC-1185 imply should happen: Assume sequence of data segments A, B, C, ... Receiver does RFC-1185 Timestamp comparison against value: V <A,ECopt=1> --------------------> 0 (say) <B,ECopt=2> --------------------> 1 <---- <ACK(B),ECRopt=1> <C,ECopt=3> ------> (lost) <D,ECopt=4> --------------------> 1 <----- <ACK(B),ECRopt=4> (retransmit C) <C,ECopt=5> -------------------> 1 <------ <ACK(D),ECRopt=5 Now, I believe your code would do the following with the same sequence: <A,ECopt=1> --------------------> 0 (say) <B,ECopt=2> --------------------> 1 <---- <ACK(B),ECRopt=1> <C,ECopt=3> ----------> (lost) <D,ECopt=4> --------------------> 1 <----- <ACK(B),ECRopt=1> (retransmit) <C,ECopt=5> -------------------> 1 <------ <ACK(D),ECRopt=5 Dave, I would be interested to know how your implementation would behave... Bob _________________________________________________________________________ From van@ee.lbl.gov Wed Aug 14 06:18:15 1991 To: braden@ISI.EDU Cc: dab@cray.com Subject: Re: For discussion... In-Reply-To: Your message of Tue, 13 Aug 91 17:00:23 PDT. Date: Wed, 14 Aug 91 06:18:18 PDT From: Van Jacobson <van@ee.lbl.gov> Bob, Ouch. You're too good a protocol lawyer & you caught me. Yes, 1072 suggests the first sequence and I would generate the second. Note that the results for all of 1185 and case (a) & (c) of 1072 p.13 are the same for both schemes and what I do is more conservative for case (b). (the sender will overestimate the rtt by some amount that should be <= twice what the algorithm in 1072 would estimate.) So I'll still argue for my scheme as an alternative, safe and slightly simpler way to implement a 1072 receiver. - Van ________________________________________________________________________ From braden@ISI.EDU Wed Aug 14 09:44:21 1991 Date: Wed, 14 Aug 91 09:43:49 PDT From: braden@ISI.EDU To: van@ee.lbl.gov Subject: Re: For discussion... Cc: braden@ISI.EDU, dab@cray.com van, What may appear to be protocol lawyering is just simple-mindedness -- which, like guilt, can sometimes be put to positive use. Hmmmmm. When we put together RFC-1072, we were trying to do the best damn job we could in measuring the true RTT, as free from biases as possible.... that was the whole rationale for the timestamp echo in RFC-1072. I was dealing with the control theorist Van Jacobson. Now I have a feeling I am dealing with the implementor Van Jacobson, and hearing something different. It seems that you are trying to reduce the amount of state (one 32-bit timestamp instead of two, and one fewer control bits) in the tcbcb. Is the control theorist dormant? Are you really sure this is moving in a constructive direction?? Bob ___________________________________________________________________________ From braden@ISI.EDU Thu Aug 8 12:59:17 1991 Date: Thu, 8 Aug 91 12:58:49 PDT From: braden@ISI.EDU To: van@helios.ee.lbl.gov Subject: more Braden bother... Cc: braden@ISI.EDU van, Well, I didn't get a LOT of joy out of the Dartnet netinet/ modules. It seems puzzlingly partial... eg it does not seem to USE the returned Echo reply values to compute the RTT (is there something subtle going on here I don't understand?); and, it does not seem to implement the "newest segment from the oldest sequence number" algorithm of RFC-1072. Hmmm. My [delayed] bogosity meter is going off, concerning your idea of the symmetry between data and ACKs. OK, so we can send ECopts in both data and ACK segments. But echoing ACK segment timestamps in data segments seems broken... the timing depends upon how soon the application decides to send a response, which would artificially inflate the RTT measurement. Perhaps I misunderstood your reasoning; If your argument is just that it is simpler to code and/or faster to execute to always stuff some ECRopt value into every packet, but ignore ECRopt values received on data segments, then I at least understand... Bob ________________________________________________________________________ From dab@berserkly.cray.com Thu Aug 15 08:32:46 1991 Date: Thu, 15 Aug 91 10:31:07 -0500 From: dab@berserkly.cray.com (David Borman) To: braden@ISI.EDU, van@helios.ee.lbl.gov Subject: Re: For discussion... Well, using Bobs example, my code would do the following: Receiver does RFC-1185 Timestamp comparison against value: V <A,ECopt=1> --------------------> 0 (say) <B,ECopt=2> --------------------> 1 <---- <ACK(B),ECRopt=2> <C,ECopt=3> ------> (lost) <D,ECopt=4> --------------------> 2 (compare against 2, not 1!) <----- <ACK(B),ECRopt=4> (retransmit C) <C,ECopt=5> -------------------> 2 <------ <ACK(D),ECRopt=5 <E,ECopt=6> -------------------> 5 <------ <ACK(D),ECRopt=6 Uff da. In my code, the indicated ack (*) will have an ECRopt of 4, not 1, but not because of what's written in 1072 or 1185... As written, will respond with the ECRopt value of whatever was most recently received in an ECopt. My code more or less looks like: tp->echo_value Most recently received ECopt tp->echo_timestamp ECopt value from last packet received at the left edge of the window. ECHO_NEEDED Flag to indicate that tp->echo_value needes to be sent back in ECRopt. ECHO_RCVD Flag to indicate that ECopt was found while processing the options. On recieve: Process TCP options: If echo option is received, its value is saved in tp->echo_value, and ECHO_NEEDED and ECHO_RCVD flags are set. If the packet is then determined to be at the left edge of the window, and the ECHO_RCVD bit is set, tp->echo_value is copied into tp->echo_timestamp. Otherwise, if the tp->echo_value is less than tp->echo_timestamp, the packet is tossed. On transmit: If ECHO_NEEDED is set, copy tp->echo_value into ECRopt, and clear ECHO_NEEDED. My code is obviously wrong. An old packet wandering in could muck things up, and I clearly violate case A on page 13 of 1072. So, let toss out some cases for discussion: 1) Packets arrive in sequence, and every packet is acked. This is not an issue, we all understand what should happen here. <A, ECopt=1> -------------------> 0 <---- <ACK(A), ECRopt=1> <B, ECopt=2> -------------------> 1 <---- <ACK(B), ECRopt=2> <C, ECopt=3> -------------------> 2 <---- <ACK(C), ECRopt=3> <D, ECopt=4> -------------------> 3 <---- <ACK(D), ECRopt=4> 2) Packets arrive in sequence, and some of the acks are delayed. What echo value do you use? RFC 1072, pg. 13, Case (A) says you use the oldest echo value received. I don't think that there is an issue here either. <A, ECopt=1> -------------------> 0 <B, ECopt=2> -------------------> 1 <C, ECopt=3> -------------------> 2 <---- <ACK(C), ECRopt=1> <D, ECopt=4> -------------------> 3 <E, ECopt=5> -------------------> 4 <F, ECopt=6> -------------------> 5 <---- <ACK(F), ECRopt=6> 3) Packets arrive out of order, and we are acking every packet. <A, ECopt=1> -------------------> 0 <---- <ACK(A), ECRopt=1> <C, ECopt=3> -------------------> 1 <---- <ACK(A), ECRopt=3> 3, 1, or no ECR at all? <B, ECopt=2> -------------------> 1 <---- <ACK(C), ECRopt=2> <E, ECopt=5> -------------------> 2 <---- <ACK(C), ECRopt=5> 5, 2, or no ECR at all? <D, ECopt=4> -------------------> 2 <---- <ACK(D), ECRopt=4> <F, ECopt=6> -------------------> 4 <---- <ACK(F), ECRopt=6> 4) Packets arrive out of order, and we are NOT acking every packet. <A, ECopt=1> -------------------> 0 <C, ECopt=3> -------------------> 1 <---- <ACK(A), ECRopt=1> <D, ECopt=4> -------------------> 1 <---- <ACK(A), ECRopt=4> 4, 1, or no ECR at all? <B, ECopt=2> -------------------> 1 <---- <ACK(C), ECRopt=2> <E, ECopt=5> -------------------> 2 <---- <ACK(E), ECRopt=4> 5, 2, or no ECR at all? <F, ECopt=6> -------------------> 5 <---- <ACK(F), ECRopt=6> I would argue that when sending an ACK due to out-of-order (lost) packets, you either 1) send the ECR with the pre-existing value that hasn't been sent yet (delayed acks), 2) if no ECR was already queued up, send the ECR with the value in the packet, or 3) don't send an ECR in the ack. Sending an ECR in the ack with the last left-window-edge value instead of the EC value just received could really screw things up. Imagine: <A, ECopt=1> -------------------> 0 <---- <ACK(A), ECRopt=1> (connection is idle for a while...) <C, ECopt=103> -------------------> 1 <---- <ACK(A), ECRopt=103> 103, 1, or no ECR at all? <B, ECopt=102> -------------------> 1 <---- <ACK(C), ECRopt=102> If the middle ack was 1, not 103, then when the ACK is received, the RTT value that is computed is going to be drastically out of wack. Either use the value that was received in the out-of-sequence packet, or don't send and ECR in the ack. Comments? I'll be fixing my code, but I'll wait until we all agree on what should really be happening. -Dave Borman, dab@cray.com ___________________________________________________________________________ From braden@ISI.EDU Fri Aug 16 14:42:15 1991 Date: Fri, 16 Aug 91 14:41:44 PDT From: braden@ISI.EDU To: dab@cray.com, van@helios.ee.lbl.gov Subject: RFC-1072/1185 issues Cc: braden@ISI.EDU Dave, Yes, your corrections to my pseudo-code are certainly right. I will incorporate them... Van, Are there any circumstances in which a TCP receiving data containing ECopts would send back an ACK that DID NOT contain an ECRopt? Dave argued in his message yesterday that the answer is "yes", when the only timestamp available to be echoed is known to be seriously out of date. Here is the same argument: When a host receives an ECRopt, it has to believe it and update its RTT estimate (I don't see a simple alternative). There are times when it SHOULD NOT believe it (an ECRopt in a data segment coming some long idle period but carrying a timestamp echoed from an ACK.) ------> <ACK,ECopt=17> ------> (idle) (=>bad RTT!) <---- <data,ECRopt=17> <----- The only way to heal this is for the ECopt received in an ACK segment to NOT be echoed in the next data segment that is sent. ------> <ACK,ECopt=17> ------> (ignore ECopt in ACK) (idle) <------ <data> <---------- Here is another argument: Consider a simplex data transfer. Assuming the ACKs carry ECopts, if we echo the ACK timestamps in the data segments, then the data receiver must calculate an RTT that it will never use, every time it receives a data segment. That wasted processing is much greater than the processing to include or exclude an ECRopt option. Am I convincing you? Bob _________________________________________________________________________ From van@ee.lbl.gov Mon Aug 19 06:51:22 1991 To: braden@ISI.EDU Cc: dab@cray.com Subject: Re: RFC-1072/1185 issues In-Reply-To: Your message of Fri, 16 Aug 91 14:41:44 PDT. Date: Mon, 19 Aug 91 06:51:36 PDT From: Van Jacobson <van@ee.lbl.gov> > When we put together RFC-1072, we were trying to do the best > damn job we could in measuring the true RTT, as free from biases > as possible.... that was the whole rationale for the timestamp > echo in RFC-1072. Wrong. You've made the classic mistake of confusing a means with an end. A TCP sender is trying to a compute a function L() best described as "the earliest time at which I will know some packet has (almost certainly) been lost in transit". The value of this function is crucial to the stability of the network (if L() is optimistic, spurious retransmits will fill up the network and it will congestion collapse). From a mix of theory and experience, we've developed a useful approximation for L() based on the current time, an estimate of the first & second order statistics of the rtt and some local state (e.g., Karn's algorithm on the recent retransmission history). Note that the rtt estimate is valuable only in that we use it to compute L(). We don't need a "true RTT", we need a conservative L(). The rationale for the 1072 timestamp was that a) every TCP implementation I know of (at least 5 independent implementations) measures the rtt on *at most* one packet per window of data sent (usually the packets iss+1, iss+W+1, iss+2W+1, ...). [This feature of the implementations is not intentional and not obvious -- one individual really believed his implementation timed almost every packet until I forced him to put in a trace buffer to look at which packets updated the rtt estimate.] b) Anyone who looks at how rtt varies over the life of a conversation notices that there is a great deal of structure. In particular, there is almost always a periodic variation with minima at integer multiples of the window size: (the reason for this is obvious if you consider how a window protocol loads the pipe.) | | ********* ********* ***** R | * * * T | * * * T | * * * | * * * | * * * | * * * | | ---+--------------+--------------+--------------+-------------- iss+1 W+iss+1 2W+iss+1 3W+iss+1 Sequence Number The consequence of (a) combined with (b) is that sampling artifacts (technically known as Nyquist aliasing) will result in a possibly substantial underestimate of the rtt. A similar effect also appears in the 2nd order statistics (the variance is much lower for the first packet of a window). The magnitude of the total effect is obviously a function of window size: for small W it's often negligible but for large W it can cause very poor estimates of L() (i.e., spurious retransmits). 1072 was intended to correct this problem by offering a simple way for implementations to time every packet and, thus, use the variance in the rtt over a window to help inflate L() to appropriately conservative values. In 1072, we gave 3 scenarios where the timestamp would be useful (cases a, b & c on p.13). Subsequent experience showed we got two out of three right but (b) is bogus: When congestion occurs (i.e., at a packet loss) the sender should get cautious ("let's not make the problem worse"). This implies that we want L() to increase. Clearly, the timestamp available to the receiver that will cause the biggest increase in L() at the sender is the timestamp of the left edge of the window (the timestamp of the most recent in-sequence packet). This is what my implementation sends. I believe it is correct on both theoretical & practical grounds. Actually, on practical grounds, (b) on p.13 is almost certainly irrelevant. 1072 wasn't intended to change the semantics of rtt (i.e., when the sender computes rtt) [though you seem to have missed or disagree with this point -- see the note later]. With the existing semantics, the sender can update rtt only when snd.una moves. Thus the echo reply resulting from the out-of-sequence packets will be ignored (since they cannot move snd.una) *except when* the ack for the packet at the left edge of the window gets lost. In this case, it is best on both information theoretic & system stability grounds if the 'duplicate' acks generated by out-of-sequence packets are indeed duplicates. I.e., if they have the same echo reply value as the ack for the packet at the left edge of the window. Moving on to Dave's scenarios: > 3) Packets arrive out of order, and we are acking every packet. > <A, ECopt=1> -------------------> 0 > <---- <ACK(A), ECRopt=1> > <C, ECopt=3> -------------------> 1 > <---- <ACK(A), ECRopt=3> 3, 1, or no ECR at all? > <B, ECopt=2> -------------------> 1 > <---- <ACK(C), ECRopt=2> The ack that results from C must have ECRopt=1, not 3. There are two separate ways of arriving at this: 1) If we consider what happens if the ack resulting from A gets lost, an ECRopt=3 in the ack from C will result in an *underestimate* of the rtt for A. This is the last thing you want to happen. If the ack from A does arrive, the ack from C doesn't move snd.una so the rtt isn't updated & it doesn't matter what you put in the ECRopt. Since the receiver doesn't know which acks will get dropped, the only safe choice is ECRopt=1 in the ack from C. 2) Ultimately the rtt will include the time it takes the receiving application to consume that data (for most implementations, this happens indirectly via delayed acks & closed window probes but everything works better if an implementation is structured so the application is in the receive loop. I.e., if acks are (usually) generated when the receiving application consumes the data, not when the data arrives at the receiver). Since data cannot be delivered out of sequence, the rtt for C is controlled by B (this is why the ACK(C) resulting from B's arrival has ECRopt=2). Since B hasn't arrived yet, all the receiver knows is the bounds on its timestamp: EC(A) <= EC(B) <= EC(C). The most conservative decision (in terms of the effect on L() at the sender) is for the receiver to use the lower bound, EC(A). [I will claim, without proof, that it's a good idea to be conservative about retransmits in the face of packet re-ordering.] > 4) Packets arrive out of order, and we are NOT acking every packet. > <A, ECopt=1> -------------------> 0 > <C, ECopt=3> -------------------> 1 > <---- <ACK(A), ECRopt=1> > <D, ECopt=4> -------------------> 1 > <---- <ACK(A), ECRopt=4> 4, 1, or no ECR at all? > <B, ECopt=2> -------------------> 1 > <---- <ACK(C), ECRopt=2> For the same reasons as above, the ack that results from D must have ECRopt=1, not 4. > Imagine: > <A, ECopt=1> -------------------> 0 > <---- <ACK(A), ECRopt=1> > (connection is idle for a while...) > <C, ECopt=103> -------------------> 1 > <---- <ACK(A), ECRopt=103> 103, 1, or no ECR at all? > <B, ECopt=102> -------------------> 1 > <---- <ACK(C), ECRopt=102> > > If the middle ack was 1, not 103, then when the ACK is received, > the RTT value that is computed is going to be drastically out of wack. Remember, 1072 changed the information available to compute the rtt. It did *not* change *when* the rtt is computed (i.e., only when new data is acked). The *only* way the ECRopt in the ack resulting from C can be used is when the ack from A is dropped (and, of course, C must be sent before the sender times out & retransmits A). In this case, the sender should get ECRopt=1 in the ack from C so it will compute a reasonable rtt for A. This will inflate L but that's perfectly legitimate -- packet loss in the reverse path increases uncertainty at the sender so it wants to be more conservative about retransmiting. Going back to Bob's last note, > When a host receives an ECRopt, it has to believe it and update > its RTT estimate (I don't see a simple alternative). There are > times when it SHOULD NOT believe it (an ECRopt in a data segment > coming some long idle period but carrying a timestamp echoed > from an ACK.) The "believe it" part of your statement is correct but the "and update the rtt estimate" doesn't follow. Nowhere in rfc1072 did we specify the sender's rtt estimation algorithm. It never occurred to me that anyone would conceive of changing what currently happens without echo/echo-reply --- the rtt estimate is updated only on 'new' acks (ones that move snd.una). Assuming that people don't make gratuitous changes to the rtt estimate semantics while implementing 1072, the scenario you're worried about can't arise and every packet should contain an ECRopt. > Consider a simplex data transfer. Assuming the ACKs carry > ECopts, if we echo the ACK timestamps in the data segments, then > the data receiver must calculate an RTT that it will never use, > every time it receives a data segment. That wasted processing > is much greater than the processing to include or exclude an > ECRopt option. Same mistake as above: 1072 did not suggest that the rtt estimation semantics be changed. I.e., the intent was that an implementation use exactly the code it does now but the step that looks like (in BSD): /* * If transmit timer is running and timed sequence * number was acked, update smoothed round trip time. */ if (tp->t_rtt && SEQ_GT(ti->ti_ack, tp->t_rtseq)) { tcp_xmit_timer(tp, tp->t_rtt); tp->t_rtt = 0; } could be augmented to time every packet if echo was negotiated: (in 4.4, "echo_reply" points to the ECROPT in an incoming packet (if there is one) & EXTRACT_ECROPT is a machine dependent (e.g., alignment constraints) macro to marshall the timestamp in the option.) if (echo_reply) tcp_xmit_timer(tp, now - EXTRACT_ECROPT(echo_reply)); else if (tp->t_rtt && SEQ_GT(ti->ti_ack, tp->t_rtseq)) { tcp_xmit_timer(tp, tp->t_rtt); tp->t_rtt = 0; } But this rtt estimate update code is executed *only* when new data is acked (snd_una moves forward) and no one proposed changing that. Thus there is no wasted effort updating rtt for the receiver side of a simplex connection (since snd_una will never move). The only cost of the ECRopt when it's not used is the cost of setting up the echo_reply pointer (if we specify a 'suggested order' for the EC/ECR opts, this can be done in two instructions). > Am I convincing you? By now, you've probably guessed the answer to this one :). - Van > -----Original Message----- > From: Alfred HÎnes [mailto:ah@TR-Sys.de] > Sent: Montag, 22. März 2010 20:16 > To: tcpm@ietf.org; mallman@icir.org > Subject: Re: [tcpm] poll for adoption of > draft-gont-tcpm-tcp-timestamps-03 > > Mark Allman wrote: > > ... > > > > - But, to me, the right thing to do here is to roll these changes > > into the work item this WG already has going: 1323bis. [...] > > Going? Gone ?? > It's listed on <http://tools.IETF.ORG/wg/tcpm/>: > > Expired: > draft-ietf-tcpm-1323bis -01 2009-03-04 Expired > > [ And, btw, -00 was published 2008-01-29, > so extrapolating linearly, I still hope for a -02 soon. > But the confidence level of statistics based on a sample of size 2 > is rather small. :-) ] > > A significant part of my review comments from early in 2009 > have been addressed in the -01 draft version, but other, > non-trivial, parts have been deferred to the next update. I > cannot recall discussion of this draft on the list since a > very long time. > > > So it looks like the choice for the WG might be having a > short document that can be shipped by the end of this year > (or even much faster), or with a -1323bis in 5 years or so. > > (Or would you prefer doing both?) > > IIRC, Fernando's timestamp draft has been phrased as a BCP > because feedback from the WG (MPLS IETF?) indicated it would > not be acceptable to the WG with normative language, on the > Standards Track. > > I personally would not oppose to Standards Track, but I'm not the WG. > > > Kind regards, > Alfred. > > -- > > +------------------------+------------------------------------ > --------+ > | TR-Sys Alfred Hoenes | Alfred Hoenes Dipl.-Math., > Dipl.-Phys. | > | Gerlinger Strasse 12 | Phone: (+49)7156/9635-0, Fax: -18 > | > | D-71254 Ditzingen | E-Mail: ah@TR-Sys.de > | > +------------------------+------------------------------------ > --------+ > > _______________________________________________ > tcpm mailing list > tcpm@ietf.org > https://www.ietf.org/mailman/listinfo/tcpm >
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Alfred Hönes
- [tcpm] poll for adoption of draft-gont-tcpm-tcp-t… Eddy, Wesley M. (GRC-MS00)[ASRC AEROSPACE CORP]
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Mark Allman
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Anantha Ramaiah (ananth)
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Alfred Hönes
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… L.Wood
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Mark Allman
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Joe Touch
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… John Heffner
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Joe Touch
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Mark Allman
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Joe Touch
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Mark Allman
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Mark Allman
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Mark Allman
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Joe Touch
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Mark Allman
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Joe Touch
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Alexander Zimmermann
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Alexander Zimmermann
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Joe Touch
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Christian Huitema
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Joe Touch
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Joe Touch
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Mark Allman
- Re: [tcpm] poll for adoption of draft-gont-tcpm-t… Fernando Gont
- Re: [tcpm] draft-ietf-tcpm-1323bis Scheffenegger, Richard