Re: [aqm] Codel's count variable and re-entering dropping state at small time intervals

"Agarwal, Anil" <Anil.Agarwal@viasat.com> Mon, 20 July 2015 17:09 UTC

Return-Path: <prvs=8643effd29=anil.agarwal@viasat.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4462C1ACE0C for <aqm@ietfa.amsl.com>; Mon, 20 Jul 2015 10:09:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.823
X-Spam-Level:
X-Spam-Status: No, score=0.823 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FRT_FUCK2=3.434, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BL8wpMv6dkS5 for <aqm@ietfa.amsl.com>; Mon, 20 Jul 2015 10:09:26 -0700 (PDT)
Received: from mta-us-west-01.viasat.com (mta-us-west-01.viasat.com [8.37.96.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A45B51B2C8C for <aqm@ietf.org>; Mon, 20 Jul 2015 10:09:14 -0700 (PDT)
Received: from pps.filterd (VCASPAM01.hq.corp.viasat.com [127.0.0.1]) by VCASPAM01.hq.corp.viasat.com (8.15.0.59/8.15.0.59) with SMTP id t6KH7NWP003282; Mon, 20 Jul 2015 17:09:12 GMT
From: "Agarwal, Anil" <Anil.Agarwal@viasat.com>
To: Roland Bless <roland.bless@kit.edu>, "aqm@ietf.org" <aqm@ietf.org>
Thread-Topic: [aqm] Codel's count variable and re-entering dropping state at small time intervals
Thread-Index: AQHQwwwaRnx1t6a7O0uhEpa9pN5u353klFbg
Date: Mon, 20 Jul 2015 17:09:10 +0000
Message-ID: <7A2801D5E40DD64A85E38DF22117852C70AE040F@wdc1exchmbxp05.hq.corp.viasat.com>
References: <55AD2695.8050605@kit.edu>
In-Reply-To: <55AD2695.8050605@kit.edu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
Content-Type: multipart/mixed; boundary="_002_7A2801D5E40DD64A85E38DF22117852C70AE040Fwdc1exchmbxp05h_"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2015-07-20_03:, , signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=0 kscore.compositescore=1 compositescore=0.9 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0.9 spamscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1506180000 definitions=main-1507200277
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/qeN26Q96rrYRZ61Nx4O-p1vz2dw>
Subject: Re: [aqm] Codel's count variable and re-entering dropping state at small time intervals
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Jul 2015 17:09:28 -0000

Hi Roland, Polina,

I had posted some suggestions on the bufferbloat forum, pointing out these issues (and a few more) and some suggestions to make the CoDel algorithm a bit stronger in these areas. It did not generate any feedback or discussion :(

https://lists.bufferbloat.net/pipermail/codel/2015-June/001023.html

I have attached the updated document.

Regards,
Anil

-----Original Message-----
From: aqm [mailto:aqm-bounces@ietf.org] On Behalf Of Roland Bless
Sent: Monday, July 20, 2015 12:49 PM
To: aqm@ietf.org
Subject: [aqm] Codel's count variable and re-entering dropping state at small time intervals

Dear All,

we (Polina and I) have two questions concerning the behavior of the `count` variable in Codel which can be summarized as:

1. after exiting dropping state, count is usually reset, unless "the next drop state is entered too close to the previous one". This special case is not explained in the text of the codel draft, but is present in all implementations, and, currently, there are at least three different versions of it (see below). We feel that implementers need more guidance here...

2. is 'count' supposed to be reset or saturated on overflow and what should be its maximum value (it makes a difference whether you are using 16-, 32-, or 64-bit counter)?

Regarding the first question:

Here are references to code that we looked to:
1) reference implementation in the draft:
https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dietf-2Daqm-2Dcodel-2D01-23page-2D20&d=BQICAg&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=UBp2UfYrCphbTZ57lgJgIVIdsn7wHFOSnchRFuxYizo&s=mTvb-Ky9Q-6-QWZGaNbh-BFqG3eYeYwnb2Ce4HQgTwE&e=
2) ns-2 code:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hbatmit_ns2.35_blob_master_queue_codel.cc-23L144&d=BQICAg&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=UBp2UfYrCphbTZ57lgJgIVIdsn7wHFOSnchRFuxYizo&s=p_7nQuMWbrfExiDTdfcnb6URkjaKpquZljRtBJzXS7E&e=
3) linux code:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_torvalds_linux_blob_110bc76729d448fdbcb5cdb63b83d9fd65ce5e26_include_net_codel.h&d=BQICAg&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=UBp2UfYrCphbTZ57lgJgIVIdsn7wHFOSnchRFuxYizo&s=9KiDPU9sBbzwqBltEg4S5BqSCoWyFWQ7rDYmbRs8whI&e=
4) cake code: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dtaht_sch-5Fcake_blob_master_codel5.h&d=BQICAg&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=UBp2UfYrCphbTZ57lgJgIVIdsn7wHFOSnchRFuxYizo&s=37wHZ75ipsCGBv0gWKnJMOrA2NGurvrgB82BmA1S7Gw&e= 

When codel encounters congestion it enters drop state, counts the number the of packets that were dropped since congestion was encountered, and drops packets in intervals of "interval" / sqrt (count). When there is no more congestion, the drop state is exited and this counter is reset unless "the next drop state is entered too close to the previous one".

This "too close to the previous one" is a part that is not well documented and differs in implementations:
1) In the ns-2 code it is in lines 187-200. The most important part is the comment that says that this control law is bad, but there is no better one available.
2) In the reference code these lines are the last lines on page 20. It has the same code (except // kmn decay tests line), but doesn't say that this solution is temporary.
3) In Linux code the lines in question are 334-348 and 352; first it uses 16 instead of 8, but this is consistent with comments, second it sets the count to something else that wasn't comprehensible (It doesn't look like the difference between previous count and count before that as the names would suggest, but rather a some improvement of count - 2, where 2 can be 3 or 1 or something ...)
4) For Cake the lines in question are 401-411, and in particular 404-405.

What is the "official" recommended version or is there any guidance on how to select one?

Regarding the second question:

In the reference implementation of the draft, count is 32-bit integer, but one cannot find any comments about the overflow behavior.
In Linux there is a comment to ignore overflow since there is no division, and due to the sqrt^-1 approximation it won't break at 0.
According to these two emails on the Cake archive:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.bufferbloat.net_pipermail_cake_2015-2DJune_000301.html&d=BQICAg&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=UBp2UfYrCphbTZ57lgJgIVIdsn7wHFOSnchRFuxYizo&s=ng4j5KlKbMFRTCqohAeT_JVpdk_dmurGUn-dOwVh8To&e= , https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.bufferbloat.net_pipermail_cake_2015-2DJune_000302.html&d=BQICAg&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=UBp2UfYrCphbTZ57lgJgIVIdsn7wHFOSnchRFuxYizo&s=vP5UapSOc8kBOg2pcCDrHLUnvtGvG25pp5NxZxkOf3k&e=  , it seems that the original intention of count was to be reset by overflow at 2^16, but according to experiments it is better to saturate it.

So, what is the desired overflow behavior for count: should it overflow or saturate, and if the latter applies, at which maximal value?

Best Regards,
 Roland and Polina (currently implementing stuff)

_______________________________________________
aqm mailing list
aqm@ietf.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_aqm&d=BQICAg&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=UBp2UfYrCphbTZ57lgJgIVIdsn7wHFOSnchRFuxYizo&s=zZz3_Y3bcShIVu57adw0GkG0wsaJsZBtGIV8z_YkyYQ&e=