RE: [NSIS] AD review: draft-ietf-nsis-ntlp-09 (M1: congestioncontrol/rate limiting issues)

<john.loughney@nokia.com> Fri, 02 June 2006 12:20 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Fm8dH-0002BQ-MJ; Fri, 02 Jun 2006 08:20:03 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Fm8dF-0002BL-MW for nsis@ietf.org; Fri, 02 Jun 2006 08:20:01 -0400
Received: from mgw-ext11.nokia.com ([131.228.20.170]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fm8dF-0005iM-3D for nsis@ietf.org; Fri, 02 Jun 2006 08:20:01 -0400
Received: from esebh105.NOE.Nokia.com (esebh105.ntc.nokia.com [172.21.138.211]) by mgw-ext11.nokia.com (Switch-3.1.8/Switch-3.1.7) with ESMTP id k52CJxFn026591; Fri, 2 Jun 2006 15:20:00 +0300
Received: from esebh102.NOE.Nokia.com ([172.21.138.183]) by esebh105.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 2 Jun 2006 15:19:59 +0300
Received: from esebe199.NOE.Nokia.com ([172.21.138.143]) by esebh102.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 2 Jun 2006 15:19:59 +0300
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [NSIS] AD review: draft-ietf-nsis-ntlp-09 (M1: congestioncontrol/rate limiting issues)
Date: Fri, 02 Jun 2006 15:19:59 +0300
Message-ID: <BAA65A575825454CBB0103267553FCCC18D7B4@esebe199.NOE.Nokia.com>
In-Reply-To: <A632AD91CF90F24A87C42F6B96ADE5C57EBF78@rsys005a.comm.ad.roke.co.uk>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [NSIS] AD review: draft-ietf-nsis-ntlp-09 (M1: congestioncontrol/rate limiting issues)
Thread-Index: AcaFhvvOfYVRj+/yQjqC/8hTUlbv6AAmOqzQAAepPMA=
From: john.loughney@nokia.com
To: robert.hancock@roke.co.uk, magnus.westerlund@ericsson.com, nsis@ietf.org
X-OriginalArrivalTime: 02 Jun 2006 12:19:59.0599 (UTC) FILETIME=[DDEC67F0:01C6863E]
X-Spam-Score: 0.2 (/)
X-Scan-Signature: 2086112c730e13d5955355df27e3074b
Cc:
X-BeenThere: nsis@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Next Steps in Signaling <nsis.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nsis>, <mailto:nsis-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:nsis@ietf.org>
List-Help: <mailto:nsis-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nsis>, <mailto:nsis-request@ietf.org?subject=subscribe>
Errors-To: nsis-bounces@ietf.org

Hi all,

>> M1. Congestion Control
>> 
>> This affects text not only in 5.3.3 but also 7.1.3 and 
>possibly other 
>> places. But I do have a general concern that the congestion control 
>> measurements described in the specification is underspecified.
>> 
>> First in 5.3.3 I don't see any normative minimal values, or even 
>> recommended values for T1 and T2 that will be safe to deploy on the 
>> internet. I don't find it acceptable that the developer needs to 
>> investigate which values that are safe to use and which are not.
There 
>> should also be some criteria documented for when it is acceptable to 
>> go beyond these values. So my concern here is that the retransmission

>> runs havoc and create way to many packets to be sent.
>
>This is a fair point. I suppose it is impossible to get away 
>permanently without giving concrete numbers in this case.
>
>In terms of what sort of values to use and how to describe the 
>constraints on them, I think a good model to follow is 
>probably the SIP INVITE transaction specification (17.1.1.2 of 
>rfc3261) which says [fortunately the timer names are the same...]
>
>"  The default value for T1 is 500 ms.  T1 is an estimate of the RTT
>   between the client and server transactions.  Elements MAY (though it
>   is NOT RECOMMENDED) use smaller values of T1 within closed, private
>   networks that do not permit general Internet connection.  T1 MAY be
>   chosen larger, and this is RECOMMENDED if it is known in advance
>   (such as on high latency access links) that the RTT is larger.
>   Whatever the value of T1, the exponential backoffs on
retransmissions
>   described in this section MUST be used."
>
>and later that T2 should be 64*T1. In our case, 500ms seems a 
>reasonable default also; I think T2<=64*T1 since there is a 
>separate bound on the
>T2 value from the signalling application (see the second 
>paragraph of 5.3.3). I would be tempted to relax the NOT 
>RECOMMENDED clause, since a smaller timeout would be valid and 
>possibly quite useful on a wider range of networks, in 
>particular Internet-connected networks but where it is known 
>that the Query should be answered within the local network. 
>Comments and text suggestions welcome.

What do you think of this, Magnus?

>> Secondly, there are also no values documented for the rate control. I

>> think it is necessary to document what internet safe values are here 
>> so that one does not cause problems. In addition is seems a bit 
>> simplistic to use a token bucket with some parameters selected based 
>> on the local link as GIST clearly sends messages beyond the local 
>> link. Thus one might have to consider being a bit smarter and more 
>> adaptive to what is seen for different flows.
>
>Here I am not so sure. The text here was informed by the 
>equivalent discussion for ICMPv6 (reference [26], now 
>RFC4443), which caused an extensive thread on the v6 mailing 
>list (start at 
>http://www1.ietf.org/mail-archive/web/ipv6/current/msg01343.htm
>l + another 60 or so messages).
>
>GIST is not ICMP but many of the same issues arise: messages 
>are generated in the IP 'control plane' (in so far as this is 
>a meaningful term), partly autonomously but mainly in response 
>to events initiated by end systems, the messages go beyond the 
>local link, rules have to be written so they apply to a host 
>and a core router and everything in between. The end result 
>for ICMP was to write something minimal. (The link bandwidth 
>here is used as an indicator for where a router is in a 
>network - core/access/whatever.
>It's clearly imperfect but there's nothing else apart from 
>dynamic adaptiveness to make use of, and fixed values seem even worse.)
>
>We'd really like to avoid adaptiveness in the D-mode state machine.
>The main use of D-mode should be for Queries/Responses for 
>which adaptation is not meaningful for initial messages (there 
>is no pre-existing state); if there is a large amount of 
>signalling data to send for a given flow, then GIST should 
>transition to C-mode anyway, and the rate limits chosen to be 
>cautious to encourage that. We aimed for robustness and 
>simplicity rather than performance.
>
>It might be possible to use some sort of adaptiveness to 
>select an appropriate rate to apply to refresh queries used 
>for GIST probing (see also your point at the end of L16). The 
>current situation is that you can probe as fast as you like 
>until you hit the rate limit, and that it's up to the 
>implementer to decide how fast is really necessary depending 
>on an assessment of route stability (for which I don't know 
>any good objective estimator). On the assumption that most 
>probes will go to the peer you already know about, one could 
>refine this to apply a separate token bucket limiter for probe 
>messages towards that peer, which was adapted according to 
>knowledge of congestion state with that peer (based on message 
>loss). We need input on whether that complexity is really 
>necessary however, since it doesn't change the situation for 
>the whole of D-mode but just a particular subset of it.

Comments on this?

>> Third, the implication and congestion issues with local repair seems 
>> to have been brushed over. Section 7.1.3 do indicate that you need to

>> take care, but nothing more. Are there some potential for aggregation

>> of the queries to minimize the load and have quicker convergence?
>
>There is no mechanism that I can think of. Certainly there is 
>no aggregation possible in general, since every affected flow 
>might be affected differently, especially if next-NSIS-routers 
>are many hops away. We depend on the rate limiting to prevent 
>the generated Queries causing a flood, but that's about it. 
>(There are aggregation techniques for transmitting the 
>notifications, but they take place at the NSLP level.)

Comments on this? I'm wondering if we need to specify anything here,
or just indicate what one should be aware of.

John

_______________________________________________
nsis mailing list
nsis@ietf.org
https://www1.ietf.org/mailman/listinfo/nsis