Re: [NSIS] AD review: draft-ietf-nsis-ntlp-09 (M1: congestion control/rate limiting issues)

Magnus Westerlund <magnus.westerlund@ericsson.com> Wed, 21 June 2006 09:04 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Fsydk-0001g3-Ao; Wed, 21 Jun 2006 05:04:48 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Fsydi-0001fy-MG for nsis@ietf.org; Wed, 21 Jun 2006 05:04:46 -0400
Received: from mailgw4.ericsson.se ([193.180.251.62]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Fsydh-0002JR-L2 for nsis@ietf.org; Wed, 21 Jun 2006 05:04:46 -0400
Received: from esealmw128.eemea.ericsson.se (unknown [153.88.254.121]) by mailgw4.ericsson.se (Symantec Mail Security) with ESMTP id D50EA4F0002; Wed, 21 Jun 2006 11:04:44 +0200 (CEST)
Received: from esealmw126.eemea.ericsson.se ([153.88.254.170]) by esealmw128.eemea.ericsson.se with Microsoft SMTPSVC(6.0.3790.1830); Wed, 21 Jun 2006 11:04:33 +0200
Received: from [147.214.30.119] ([147.214.30.119]) by esealmw126.eemea.ericsson.se with Microsoft SMTPSVC(6.0.3790.1830); Wed, 21 Jun 2006 11:04:25 +0200
Message-ID: <44990B99.2070008@ericsson.com>
Date: Wed, 21 Jun 2006 11:04:25 +0200
From: Magnus Westerlund <magnus.westerlund@ericsson.com>
User-Agent: Thunderbird 1.5.0.4 (Windows/20060516)
MIME-Version: 1.0
To: "Hancock, Robert" <robert.hancock@roke.co.uk>
Subject: Re: [NSIS] AD review: draft-ietf-nsis-ntlp-09 (M1: congestion control/rate limiting issues)
References: <A632AD91CF90F24A87C42F6B96ADE5C57EBFB7@rsys005a.comm.ad.roke.co.uk>
In-Reply-To: <A632AD91CF90F24A87C42F6B96ADE5C57EBFB7@rsys005a.comm.ad.roke.co.uk>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 21 Jun 2006 09:04:25.0510 (UTC) FILETIME=[B1B5C460:01C69511]
X-Brightmail-Tracker: AAAAAA==
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 3d7f2f6612d734db849efa86ea692407
Cc: nsis@ietf.org
X-BeenThere: nsis@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Next Steps in Signaling <nsis.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nsis>, <mailto:nsis-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:nsis@ietf.org>
List-Help: <mailto:nsis-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nsis>, <mailto:nsis-request@ietf.org?subject=subscribe>
Errors-To: nsis-bounces@ietf.org

Hi,

See inline.

Hancock, Robert wrote:
> hi,
> 
>> -----Original Message-----
>> From: Magnus Westerlund [mailto:magnus.westerlund@ericsson.com] 
>> Sent: 16 June 2006 15:27
>> To: Hancock, Robert
>> Cc: nsis@ietf.org
>> Subject: Re: [NSIS] AD review: draft-ietf-nsis-ntlp-09 (M1: 
>> congestion control/rate limiting issues)
>>
>>
>> Hancock, Robert wrote:
>>> Hi all,
>>>
>>> comments on the congestion control/rate limiting issues
>>> (M1 and in fact L16-part as well):
>>>
>>>> -----Original Message-----
>>>> From: Magnus Westerlund [mailto:magnus.westerlund@ericsson.com] 
>>>> Sent: 01 June 2006 15:23
>>>> To: nsis@ietf.org
>>>> Subject: [NSIS] AD review: draft-ietf-nsis-ntlp-09
>>>>
>>>> M1. Congestion Control
>>>>
>>>> This affects text not only in 5.3.3 but also 7.1.3 and 
>> possibly other 
>>>> places. But I do have a general concern that the 
>> congestion control 
>>>> measurements described in the specification is underspecified.
>>>>
>>>> First in 5.3.3 I don't see any normative minimal values, or even 
>>>> recommended values for T1 and T2 that will be safe to 
>> deploy on the 
>>>> internet. I don't find it acceptable that the developer needs to 
>>>> investigate which values that are safe to use and which are 
>>>> not. There 
>>>> should also be some criteria documented for when it is 
>>>> acceptable to go 
>>>> beyond these values. So my concern here is that the 
>>>> retransmission runs 
>>>> havoc and create way to many packets to be sent.
>>> This is a fair point. I suppose it is impossible to get away
>>> permanently without giving concrete numbers in this case.
>>>
>>> In terms of what sort of values to use and how to describe the
>>> constraints on them, I think a good model to follow is probably
>>> the SIP INVITE transaction specification (17.1.1.2 of rfc3261)
>>> which says [fortunately the timer names are the same...]
>>>
>>> "  The default value for T1 is 500 ms.  T1 is an estimate of the RTT
>>>    between the client and server transactions.  Elements 
>> MAY (though it
>>>    is NOT RECOMMENDED) use smaller values of T1 within 
>> closed, private
>>>    networks that do not permit general Internet connection. 
>>  T1 MAY be
>>>    chosen larger, and this is RECOMMENDED if it is known in advance
>>>    (such as on high latency access links) that the RTT is larger.
>>>    Whatever the value of T1, the exponential backoffs on 
>> retransmissions
>>>    described in this section MUST be used."
>>>
>>> and later that T2 should be 64*T1. In our case, 500ms seems 
>> a reasonable
>>> default also; I think T2<=64*T1 since there is a separate 
>> bound on the
>>> T2 value from the signalling application (see the second 
>> paragraph of
>>> 5.3.3). I would be tempted to relax the NOT RECOMMENDED 
>> clause, since
>>> a smaller timeout would be valid and possibly quite useful 
>> on a wider
>>> range of networks, in particular Internet-connected networks but
>>> where it is known that the Query should be answered within the local
>>> network. Comments and text suggestions welcome.
>> I think this is reasonable start. It seems to be to be a good idea to 
>> allow the adaptation of the timers based on feedback from the 
>> peer. It 
>> seems possible to use the GIST query and the response to measure the 
>> current RTT. Allowing modification of the timers to track 
>> that value + 
>> some processing delay + fudge factors seems one way forward.
> 
> ok, we can work on including this.
> 

Good, lets see how complex this becomes and if it is worth the effort. 
So I would suggest that the WG thinks up some strawmen to discuss around.

[snip]
> 
> that would be too easy ... ;-)
> 
> the end result of the ICMP discussion (as I read it) has been that
> there *are* no default parameters in the ICMP spec. (The relevant
> section is 2.4(f) of rfc4443.) There is a definition of a set of
> parameters (bucket rate N and size B), but the closest one gets to
> parameters is "For example, in a small/mid-size device, the possible 
> defaults could be B=10, N=10/s." which is about as tentative as it
> comes. And I would not say that the numbers are that useful; what to
> reuse would be the text that says
> - it's configurable
> - take into account where you are in the network
> (and implicitly that limiting based purely on inter-packet timers
> or absolute bandwidth caps are not useful, which was what was removed
> compared to the previous version of the spec.)
> 
> There have been analysis documents in the distant past about message
> rates, but only applying to QoS (based on RSVP experience) and even
> they have been controversial because they depend so strongly on
> assumptions
> about refresh reduction. Now we have a wider range of signalling
> applications, so the situation is likely to be even less clear.
> 
> the best advice to implementors I can think of would be to say that 
> *) the bucket size (B) should be related to the number of signalling
>    sessions that the node is expected to support
> *) the rate should allow the bucket to be discharged in a second
> 
> (and these would be SHOULD starting points).
> 
> That still doesn't give an actual number, but it does relate the number
> to a more 'objective' system sizing parameter that implementors will
> have to think about anyway.
> 

Well, could we provide some conservative numbers that are quite certain 
to no screw up the network completely even in some worst case scenario 
and provide those as default values with instructions how one may modify 
them. I am still concerned about the risk of misconfiguration leading to 
serious issues. An interesting complication that might be shown is that 
the maximum load from GIST will only show up when there is need for a 
massive amount of D-mode traffic, like when attempting to do local 
repair and refresh the routing state. The unfortunate things here is 
that these are situations when there is a high risk that there anyway is 
quite a heavy load on the network due to route changes etc.

I fear that we can't do much better than some conservative values and 
recommendations for how to configure this if you think you know what you 
are doing. The next step seems to be a fully adaptive scheme which will 
not be simple.

Cheers

Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB                | Phone +46 8 4048287
Torshamsgatan 23           | Fax   +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund@ericsson.com

_______________________________________________
nsis mailing list
nsis@ietf.org
https://www1.ietf.org/mailman/listinfo/nsis