Re: [conex] New draft(s) on TCP modifications for ConEx

"Scheffenegger, Richard" <rs@netapp.com> Wed, 06 July 2011 19:56 UTC

Return-Path: <rs@netapp.com>
X-Original-To: conex@ietfa.amsl.com
Delivered-To: conex@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 11ECF21F8B69 for <conex@ietfa.amsl.com>; Wed, 6 Jul 2011 12:56:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.424
X-Spam-Level:
X-Spam-Status: No, score=-10.424 tagged_above=-999 required=5 tests=[AWL=-0.125, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WhdRRIyJzi37 for <conex@ietfa.amsl.com>; Wed, 6 Jul 2011 12:55:58 -0700 (PDT)
Received: from mx4.netapp.com (mx4.netapp.com [217.70.210.8]) by ietfa.amsl.com (Postfix) with ESMTP id C904821F8B2A for <conex@ietf.org>; Wed, 6 Jul 2011 12:55:57 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.65,488,1304319600"; d="scan'208";a="253590347"
Received: from smtp3.europe.netapp.com ([10.64.2.67]) by mx4-out.netapp.com with ESMTP; 06 Jul 2011 12:55:56 -0700
Received: from amsrsexc1-prd.hq.netapp.com (amsrsexc1-prd.hq.netapp.com [10.64.251.107]) by smtp3.europe.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id p66Jtsmp024763; Wed, 6 Jul 2011 12:55:55 -0700 (PDT)
Received: from LDCMVEXC1-PRD.hq.netapp.com ([10.65.251.108]) by amsrsexc1-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 6 Jul 2011 21:55:54 +0200
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 06 Jul 2011 20:55:33 +0100
Message-ID: <5FDC413D5FA246468C200652D63E627A0F1E3198@LDCMVEXC1-PRD.hq.netapp.com>
In-Reply-To: <KZrCHVKl.1309979845.5466340.karagian@ewi.utwente.nl>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [conex] New draft(s) on TCP modifications for ConEx
Thread-Index: Acw8EXKLiNaY6BHlT6+7+8SC3SqT0wAAa+sg
References: <5FDC413D5FA246468C200652D63E627A0F1E306B@LDCMVEXC1-PRD.hq.netapp.com> <KZrCHVKl.1309979845.5466340.karagian@ewi.utwente.nl>
From: "Scheffenegger, Richard" <rs@netapp.com>
To: Georgios Karagiannis <karagian@cs.utwente.nl>, Mirja Kühlewind <mirja.kuehlewind@ikr.uni-stuttgart.de>, conex@ietf.org
X-OriginalArrivalTime: 06 Jul 2011 19:55:54.0636 (UTC) FILETIME=[B722CCC0:01CC3C16]
Subject: Re: [conex] New draft(s) on TCP modifications for ConEx
X-BeenThere: conex@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Congestion Exposure working group discussion list <conex.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/conex>, <mailto:conex-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/conex>
List-Post: <mailto:conex@ietf.org>
List-Help: <mailto:conex-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/conex>, <mailto:conex-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2011 19:56:00 -0000

Hi Georgios,


So, you are proposing something orthogonal to TCP encoding (DCCP transports using CCID3/4)? Now this was the part I have been missing.

Regarding deployment complexity of accurate-ecn: 

We envision, that only one signaling/semantic will ultimately be chosen by the IETF. For that single method, a backwards-compatible handshake is described (section 2); 

If accurate ecn semantics are not negotiated between end hosts, a compatibility mode is also sketched (section 3.4). This approach allows to extract more than one bit per RTT as ECN feedback from regular RFC3168 tcp receivers. While in that state, the sender itself would obviously adhere to normal RFC3168 semantics. However, a sender may chose not to implement the advanced compatibility mode - but such a TCP sender could underestimate congestion volume (especially during heavy congestion episodes) and then be throttled.

Thus, there is a roadmap for incremental deployment of the accurate ecn feedback (for TCP).

Best regards,

Richard Scheffenegger



> -----Original Message-----
> From: Georgios Karagiannis [mailto:karagian@cs.utwente.nl]
> Sent: Mittwoch, 06. Juli 2011 21:17
> To: Scheffenegger, Richard; Mirja Kühlewind; conex@ietf.org
> Subject: RE: [conex] New draft(s) on TCP modifications for ConEx
> 
> Hi Richard,
> 
> Please see in line!
> 
> 
> On 7/6/2011, "Scheffenegger, Richard" <rs@netapp.com> wrote:
> 
> >
> >Hi Georgios,
> >
> >Thanks for the quick reply.
> >
> >I'm replying as the co-author of this draft.
> >
> >Your comment is only about one (of the three proposed) mechanisms to
> signal more accurate ECN feedback. As you have noticed, we introduce
> three different signaling encodings and semantics, each with increasing
> feature set (and accuracy) over the previous, but also with increasing
> complexity at implementation.
> 
> Georgios: Yes, sorry for not mentioning them, I was referring to the
> last
> two solutions.
> 
> >
> >Also note, that draft-kuehlewind-conex-accurate-ecn-00, even while
> submitted in the ConEx WG, really has a broader scope, any deals with
> the TCP layer alone. Conex is but one possible framework, where this
> more accurate ECN feedback will be required. Submitting this draft to
> TCPM was also shortly discussed internally, but ConEx really has a
> current demand. Nevertheless, the broader scope (and different
> constraints) of TCP should not get lost when discussing this draft.
> 
> Georgios: Okay, but my point regarding the fact that the the two latter
> proposed solutins biy your draft are complex to be deployed, is still
> valid, since protocol changes need to be accomplished in order to
> modify
> the semantics of the TCP header, i.e., the semantics of the flags: NS,
> CWR and ECE.
> 
> >
> >
> >
> >Reading your draft, I fail to understand how this may work. Using
> standard RFC3168 feedback, only a single (!) CE per RTT can be
> signaled, and then TCP will always react with a single CC response per
> RTT... regardless of how many CE marks the receiver actually received
> during the RTT following the initial CE.
> 
> Georgios: Note that the solution uses the implementation specified in
> RFC5348. According to this RFC5348,
> 
> "The receiver periodically sends feedback messages to the sender.
>    Feedback packets SHOULD normally be sent at least once per RTT,
>    unless the sender is sending at a rate of less than one packet per
>    RTT, in which case a feedback packet SHOULD be sent for every data
>    packet received.  A feedback packet SHOULD also be sent whenever a
>    new loss event is detected without waiting for the end of an RTT,
> and
>    whenever an out-of-order data packet is received that removes a loss
>    event from the history.
> 
>    If the sender is transmitting at a high rate (many packets per RTT),
>    there may be some advantages to sending periodic feedback messages
>    more than once per RTT as this allows faster response to changing
> RTT
>    measurements and more resilience to feedback packet loss."
> 
> >
> >Calculating the sending rate at the sender doesn't seem to accomplish
> what you aim at (or the ingenious idea is missing in that draft). With
> the widespread adoption of TCP SACK, and alternate congestion
> controller responses than NewReno (e.g. CUBIC, Compound, ...), the
> closed formula you quote is only a crude guess at the actual sending
> rate. These newer TCP congestion control reactions will conform to that
> formula only under certain (limited) circumstances, at best. (The
> formula gives a worst case estimate, but the typical sending rate will
> be higher). Furthermore, the goal of ECN is, to avoid loss (denoted "p"
> in the formula) due to congestion as much as possible. Because of the
> limited signal bandwidth (one bit per RTT) of RFC3168, compared to
> congestion loss (one bit per segment sent - conveyed more or less
> accurately by ACKs/SACKs), putting standard ECN feedback into "p" will
> not work.
> 
> Georgios: As I mentioned the RFC5348 (TCP Friendly Rate Control (TFRC))
> is used by this solution. So the accuracy of the congestion rate
> calculation will depend on the accuracy of the TFRC. Moreover,
> according
> to RFC5348:
>  "The throughput equation currently REQUIRED for TFRC is a slightly
>    simplified version of the throughput equation for Reno TCP from
>    [PFTK98].  Ideally, we would prefer a throughput equation based on
>    selective acknowledgment (SACK) TCP, but no one has yet derived the
>    throughput equation for SACK TCP, and simulations and experiments
>    suggest that the differences between the two equations would be
>    relatively minor [FF99] (Appendix B).", from [RFC5348]
> 
> Furthermore, note that p is the rate of loss events and is depending on
> the loss rate. Loss rate measurement is performed at the receiver,
> based
> on the detection of lost or marked packets from the sequence numbers of
> arriving packets.
> 
> p can be send from receiver towards the sender in feedback packets.
> In TFRC several feedback packets can be sent towards the sender within
> one RTT, depending on the sender data rate.
> 
> 
> 
> >
> >
> >Furthermore, the congestion volume, if I followed ConEx correctly, is
> the volume of bytes (measured as IP-layer bytes, NOT IP payload, or TCP
> payload bytes) which were received with CE set. With only a single bit
> of feedback per RTT, the congestion volume therefore has a lower bound
> of PMTU bytes per RTT (actually, the minimum packet size possible, when
> 1-byte TCP payload segments are sent for whatever reason), and an upper
> bound of floor(FlightSize/MSS)+1*(IP_Header+TCP_Header[TCP
> Options])+FlightSize.
> >
> >Your formula would, as it seems to me, always estimate the congestion
> volume to be close to cwnd/2 (excluding the IP Header, TCP Headers, and
> also excluding the actual number of congestions during that RTT). This
> may be a (very) conservative estimate during low congestion periods,
> but a very aggressive (=much too small) estimate during heavy
> congestion periods (e.g. 1 RTT, where each segment is CE marked).
> 
> Georgios: As I mentioned the RFC5348 (TCP Friendly Rate Control (TFRC))
> is used by this solution. So the accuracy of the congestion rate
> calculation will depend on the accuracy of the TFRC.
> 
> 
> >
> >
> >
> >Again, the goal for Accurate ECN feedback is to allow an exact (under
> common circumstances) feedback of the CE marks as seen by a TCP
> receiver, to be sent back to the sender. As TCP can not rely on
> external means to enforce honesty (Conex would be one such external
> means), ECN Nonce support would be beneficial.
> 
> Georgios: Yes I agree it might be beneficial.
> 
> 
> >
> >
> >One more word w.r.t. complexity: I believe that the gap between
> network performance and CPU performance will not shrink in the future.
> All developments over the last decade points to an ever widening gap.
> Therefore, schemes which couldn't be hoped to have any chance to be
> implemented, because of their complexity, such as Conex, are today
> running at wirespeed in 10G environments. The times, where one major
> design goal of a protocol was to use as few as possible machine
> instructions to encode/decode and interpret, are long past.
> >
> >I have seen early SACK and TS work was discussed in terms of number of
> machine instructions added to the TCP codepath (which is one piece of
> the puzzle to understand why timestamp semantics are defined the way
> they are).
> >
> >(Also, calculating the integer square root is more involved than doing
> some bit-banging ;)
> 
> Georgios: I think that you refer to my statement regarding
> implementation
> complexity of the Re-ECN solution. The term that I have used is not the
> correct one. I should have used deployment complexity instead of
> implmentation complexity.
> 
> Best regards,
> Georgios
> 
> 
> >
> >
> >Richard Scheffenegger
> >
> >> -----Original Message-----
> >> From: Georgios Karagiannis [mailto:karagian@cs.utwente.nl]
> >> Sent: Mittwoch, 06. Juli 2011 14:27
> >> To: 'Mirja Kühlewind'; conex@ietf.org
> >> Subject: Re: [conex] New draft(s) on TCP modifications for ConEx
> >>
> >> Hi Mirja
> >>
> >> I have (quickly)  read the document draft-kuehlewind-conex-accurate-
> >> ecn-00.
> >> So sorry if I have not understood the concept well!
> >>
> >> The described solution in this draft is based on [draft-briscoe-
> >> tsvwg-re-ecn-tcp-09] and is actually used to
> >> carry the required feedback from the receiver to the sender such
> that
> >> the
> >> sender could calculate
> >> the experienced congestion rate (from sender to receiver).
> >>
> >> In my opinion this solution being similar to the one proposed in
> >> [draft-briscoe- tsvwg-re-ecn-tcp-09]
> >> seems to be quite complex to be deployed, since protocol changes
> need
> >> to be
> >> accomplished in order to modify the
> >>  semantics of the TCP header, i.e., the semantics of the flags: NS,
> CWR
> >> and
> >> ECE.
> >>
> >>   In
> >> http://www.ietf.org/id/draft-karagiannis-conex-congestion-
> calculation-
> >> 00.txt
> >> another solution on
> >>   calculating the congestion rate at the sender is described where
> >> these
> >> changes in the semantics of
> >>   TCP are not required.
> >>
> >>  This document provides a solution to this problem as follows, see
> >>  Figure 2. in draft-karagiannis-conex-congestion-calculation-00.txt
> >>
> >>  When the sender needs to reduce its sending rate, then the sender
> can
> >>  calculate the exposed CONGESTION RATE by subtracting the TCP
> >>  throughput calculated during a Round Trip Time (RTT), i.e., (RTTi)
> >>  from the TCP throughput calculated at the same sender side during
> the
> >>  previous RTT, i.e., (RTTi-1), where i is an integer equal or higher
> >>  than 1.
> >>
> >> The TCP throughput at the sender side can be calculated using, e.g.,
> >> the following equation specified in [RFC5348]:
> >>
> >>
> >>    "The throughput equation for X_Bps, TCP's average sending rate in
> >>    bytes per second, is:
> >>
> >>                                 s
> >>    X_Bps = ---------------------------------------------------------
> -
> >>            R*sqrt(2*b*p/3) + (t_RTO *
> (3*sqrt(3*b*p/8)*p*(1+32*p^2)))
> >>
> >>    Where:
> >>
> >>       X_Bps is TCP's average transmit rate in bytes per second.
> (X_Bps
> >>       is the same as X_calc in RFC 3448.)
> >>
> >>       s is the segment size in bytes (excluding IP and transport
> >>       protocol headers).
> >>
> >>       R is the round-trip time in seconds.
> >>
> >>       p is the loss event rate, between 0 and 1.0, of the number of
> >> loss
> >>       events as a fraction of the number of packets transmitted.
> >>
> >>       t_RTO is the TCP retransmission timeout value in seconds.
> >>
> >>       b is the maximum number of packets acknowledged by a single
> TCP
> >>       acknowledgement.", copied from [RFC5368].
> >>
> >>
> >> Note that the transport path throughput calculated at the sender is
> >> defined
> >> as: The per flow transport sending rate as a function of the
> congestion
> >> rate, round-trip time, and segment size. The transport path
> throughput
> >> calculated at the sender using the TCP throughput equation specified
> in
> >> [RFC5348]. Note that in [RFC5348] the term congestion rate is
> denoted
> >> as
> >> loss event rate. According to [RFC5348] a loss event is defined as
> one
> >> or
> >> more lost or marked packets from a window of data, where a marked
> >> packet
> >> refers to a congestion indication (CE) from Explicit Congestion
> >> Notification
> >> (ECN) [RFC3168];
> >>
> >> Best regards,
> >> Georgios
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: conex-bounces@ietf.org [mailto:conex-bounces@ietf.org] On
> >> Behalf
> >> > Of Mirja Kühlewind
> >> > Sent: woensdag 6 juli 2011 10:23
> >> > To: conex@ietf.org
> >> > Subject: [conex] New draft(s) on TCP modifications for ConEx
> >> >
> >> > Hello,
> >> >
> >> > we submitted two new drafts regarding the needed TCP modifications
> >> for
> >> > ConEx.
> >> > As ConEx relies (partially) on ECN, a more accurate ECN feedback
> >> (more
> >> than
> >> > max. one signal per RTT) is needed. The same information would be
> >> > valuebale for other currently proposed TCP mechanisms e.g. DCTCP.
> >> This is
> >> > why, we decided to split this modification into a separate and
> tried
> >> to
> >> specify
> >> > this independent of ConEx. Thus we have two drafts now:
> >> >
> >> > 1.  draft-kuehlewind-conex-accurate-ecn
> >> > This draft currently proposes three different coding scheme to
> >> realize a
> >> more
> >> > accurate ECN feedback. All approaches use the available 'classic'
> ECN
> >> bit
> >> > space as only one mechanism (classic ECN or the new more accurate
> >> scheme)
> >> > is needed at a time. The goal is to chose one of the coding
> options
> >> and
> >> have
> >> > the other option and all discussions removed at later version of
> this
> >> draft.
> >> >
> >> > 2. draft-kuehlewind-conex-tcp-modifications
> >> > This draft describes all other modifications/recommendations to
> use
> >> ConEx
> >> > with TCP. This is the use of ECN and SACK and recommendations on
> the
> >> > credit bit.
> >> > This document contains several points which need further
> discussion
> >> e.g.
> >> the
> >> > handling and validity of credits.
> >> >
> >> > Please have a look at the drafts; Feedback is very welcome!
> >> >
> >> > Mirja and Richard
> >>
> >> _______________________________________________
> >> conex mailing list
> >> conex@ietf.org
> >> https://www.ietf.org/mailman/listinfo/conex