Re: [sip-overload] Proposal: Support for the restriction algorithms should be mandatory for clients (draft-ietf-soc-overload-control-02)

<phil.m.williams@bt.com> Wed, 11 May 2011 10:34 UTC

Return-Path: <phil.m.williams@bt.com>
X-Original-To: sip-overload@ietfa.amsl.com
Delivered-To: sip-overload@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 80489E0701 for <sip-overload@ietfa.amsl.com>; Wed, 11 May 2011 03:34:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.794
X-Spam-Level: *
X-Spam-Status: No, score=1.794 tagged_above=-999 required=5 tests=[BAYES_50=0.001, HELO_MISMATCH_COM=0.553, SARE_LWSHORTT=1.24]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qZzqEJ9wslQq for <sip-overload@ietfa.amsl.com>; Wed, 11 May 2011 03:34:17 -0700 (PDT)
Received: from smtpe1.intersmtp.com (smtp63.intersmtp.COM [62.239.224.236]) by ietfa.amsl.com (Postfix) with ESMTP id C7F3CE0720 for <sip-overload@ietf.org>; Wed, 11 May 2011 03:34:16 -0700 (PDT)
Received: from EVMHT63-UKRD.domain1.systemhost.net (10.36.3.100) by RDW083A007ED63.smtp-e3.hygiene.service (10.187.98.12) with Microsoft SMTP Server (TLS) id 8.3.159.2; Wed, 11 May 2011 11:34:14 +0100
Received: from EVMHT05-UKBR.domain1.systemhost.net (193.113.108.58) by EVMHT63-UKRD.domain1.systemhost.net (10.36.3.100) with Microsoft SMTP Server (TLS) id 8.3.159.2; Wed, 11 May 2011 11:34:14 +0100
Received: from EMV04-UKBR.domain1.systemhost.net ([169.254.1.181]) by EVMHT05-UKBR.domain1.systemhost.net ([193.113.108.58]) with mapi; Wed, 11 May 2011 11:34:14 +0100
From: <phil.m.williams@bt.com>
To: <volker.hilt@alcatel-lucent.com>
Date: Wed, 11 May 2011 11:34:08 +0100
Thread-Topic: [sip-overload] Proposal: Support for the restriction algorithms should be mandatory for clients (draft-ietf-soc-overload-control-02)
Thread-Index: AcwGEgfMXeAsAOTdS3O+TCMuWeAiewJBjU+Q
Message-ID: <E4B3F0DC6D953D4EBEC223BC86FE322C4A45FEB70B@EMV04-UKBR.domain1.systemhost.net>
References: <E4B3F0DC6D953D4EBEC223BC86FE322C4A42034FB7@EMV04-UKBR.domain1.systemhost.net> <4DBA1D11.7030001@alcatel-lucent.com>
In-Reply-To: <4DBA1D11.7030001@alcatel-lucent.com>
Accept-Language: en-US, en-GB
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US, en-GB
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: sip-overload@ietf.org
Subject: Re: [sip-overload] Proposal: Support for the restriction algorithms should be mandatory for clients (draft-ietf-soc-overload-control-02)
X-BeenThere: sip-overload@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Overload <sip-overload.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-overload>, <mailto:sip-overload-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-overload>
List-Post: <mailto:sip-overload@ietf.org>
List-Help: <mailto:sip-overload-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-overload>, <mailto:sip-overload-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 10:34:18 -0000

Thanks Volker,

Apologies for the slow response (sent out just before I went on leave). I have included some comments below.

I'll look at the other responses...

Regards,

Phil Williams

-----Original Message-----
From: sip-overload-bounces@ietf.org [mailto:sip-overload-bounces@ietf.org] On Behalf Of Volker Hilt
Sent: 29 April 2011 03:06
To: sip-overload@ietf.org
Subject: Re: [sip-overload] Proposal: Support for the restriction algorithms should be mandatory for clients (draft-ietf-soc-overload-control-02)

Phil,

> SIP support for 3 restriction algorithms at client sources defined in
> soc-overload-control-02 is beneficial for the flexibility that it
> provides to potentially support a wide range of server overload controls
> and network applications. However, making proportional restriction,
> termed a 'loss-based' algorithm, mandatory for clients would severely
> limit the usefulness of this approach.
> I will summarise the reasons for this and why this algorithm has limited
> use in future generations of networks below.
> Practical issues
> -----------------------
> If client system suppliers are only required to support proportional
> restriction, then realistically it is likely that that is the only
> algorithm that will be provided by default, and the other algorithms
> will be seen as chargeable enhancements, making it difficult or
> expensive to deploy them.
> But, of the two functional ends of a closed adaptive overload control,
> it is the algorithm on the overloaded server that derives the control
> parameter that is not standardised and has dependency on node
> architecture, whereas the 3 client restriction algorithms are less
> subject to design variation, have been around for a long time, and are
> already widely deployed. Therefore we would expect it to be less
> contentious to standardise these methods within SIP, but in any case we
> would expect less sensitivity to design variations.
> It is not realistic to envisage a situation whereby an overloaded server
> supports several different client restriction algorithms simultaneously,
> because of the complexity in design of the algorithm, and because even
> if a unique solution is guaranteed, there will be issues of both speed
> of convergence to that solution, stability, and the need for the server
> capacity amongst upstream clients to be allocated in a predictable and
> precise way. This has implications for capacity guarantees, particularly
> at network boundaries (see below).
> If support for the 3 different restriction algorithms (or at least the 2
> in which we are interested) is mandated by the IETF rather than being
> optional, then a server subject to overload can guarantee that the
> sending entities will all use the same restriction algorithms, and can
> behave as predicted.

So you are arguing that a client should always implement the full set of 
restriction algorithms (e.g., loss, rate and window)?

[PhilW] Yes. I was arguing for all 3 so as to not favour any particular method. However, I think that SLAs most naturally require a maximum rate-based method, and for this purpose I don't really see the need to mandate window-based methods. However, I others may have particular reasons for applying a windowing method.

If we would mandate that, we would loose the possibility to extend the 
feedback types at a later point.
[PhilW] I'm not sure why this is the case. Could you clarify?

And, of course, we add the complexity 
of requiring clients to implement multiple algorithms that to the same.

[PhilW] Yes, there is an additional 'complexity', but as I pointed out these methods are well known and have been deployed widely for a long time, so I don't see this as being a significant issue. The fact that they would not be available by default and have to be specifically required would be a more serious factor, increasing costs and preventing or complicating their deployment.

> Performance imitations
> ----------------------------------
> The main performance limitation of proportional restriction is that it
> is vulnerable to sudden increases on the offered load at client sources,
> since the mean admitted rate after control is proportional to the mean
> arrival rate before control until an adaptation of the control parameter
> is made. But there will always be a delay to this control adaptation (at
> least in the closed loop method implied by the current specification),
> both because of the need for waiting sufficiently long at the overloaded
> server in order to obtain statistically accurate estimates before an
> adaptation is made, and the time and number of received messages
> required to distribute control to the clients.

Can you please point us to results that show this behavior?

[PhilW] This can be substantiated by theory, but note that there are many conditions that will affect the magnitude of this issue, which will depend upon the specific network application. The simulations that I did to investigate this were done over 15 years ago - I'm not sure that I can recover or replicate these in my current situation. Note a particular critical issue for this scenario that applies to all algorithms (and can therefore adversely affect modelling results) is 'synchronisation' or 'resonance'. This occurs when the instances of restriction algorithms at multiple sources are synchronised because they are using the same deterministic algorithm (e.g. 'm in n' rejected for proportional or synchronised bucket fill for rate-based). This results in very peaky arrivals at the overloaded server (hence highly variable throughput, long response time tails or lost messages, unstable control) There are good and not so good ways to overcome this...

I'd also recommend to look at the results of the SIP overload control 
design team that has investigated this problem.

[PhilW] I will do.

> This vulnerability is
> worse when the number of clients is 'large' with a high capacity
> relative to the server. In contrast, a maximum rate-based control is
> generally not so vulnerable to short term surges in load.

A rate based mechanism needs to split its capacity across upstream 
neighbors. If new clients arrive, the split has to be adjusted. In 
particular if you are dealing with a large set of clients some of which 
may be inactive for some time, this requires a quick readjustment of the 
allocated capacity. This topic has been discussion in the design 
considerations draft.

[PhilW] Yes, but for this situation an adjustment of the control parameter is required whatever restriction algorithm is used. In the case of proportional loss this will be the % rejected, and all source streams will be affected. A difference with a rate-based scheme is that once the leaky bucket is full the upper rate is bounded, so therefore there is an upper bound on the aggregated admission rate over all sources before any control adaptation takes place. Note also that where a stream is already under control by a restrictor bucket with an admission rate at the max, then any increase in the offered rate will not require any adjustment of control parameters. This isn't true for proportional rejection, since the admission rate is proportional to the offered (arrival) rate at each source until an adaptation has been distributed.

> The need to support precise capacity guarantees
> ------------------------------------------------------------------------
> It is common practice for agreements concerning capacity to be provided
> at network operator boundaries [from now on I'll use SP or Service
> Provider as a generic term encompassing Operators etc], and in many
> realistic applications this is essential (I recall that his requirement
> is absent from the original list of SIP overload control requirements?).
> It is also possible to want to provide guarantees to sub-streams of SP
> traffic.
> These guarantees must be practically useful, so that they can form the
> basis of a service level agreement between SPs. I.e. they must be simple
> enough to be easily understood by all parties, and above all they must
> be clear and precise in the sense that the behaviour of the capacity is
> predictable/deterministic (in a stochastic sense). SPs are not
> interested in technicalities of restriction algorithms, but will want
> the policies to be defined in terms of traffic characteristics that are
> straightforward to interpret and agree.
> Clearly the policies must also be efficient in the sense that imply that
> the available capacity can be fully utilised. I suggest that these are
> most easily expressed as a guaranteed minimum rate and a precise way in
> which 'spare' capacity not being used by a client originated stream is
> distributed over the other SPs, e.g. in terms of maximum rates
> determined by agreed proportions of the available unused allowances. Of
> course other policies are possible (but they may be less precise or more
> complex). Whatever policies are chosen, realising this is an inherent
> part of the server overload control, but the difficulty and complexity
> is dependent upon the method of call restriction is the clients.
> With proportional restriction, note that the percentages have nothing
> directly to do with the proportions of server capacity allocated to
> different clients. So there is no natural and simple way to map between
> the parameters of the agreement and the control parameters. If the same
> control level were applied to all client traffic, then the changes in
> the offered traffic from one client will always imply changes to the
> traffic admitted by another, (and in particular this applies to sudden
> large increases). To apply maximum rate-based guarantees would require
> monitoring of the received rate from each source separately in order
> that the offered traffic can be derived implicitly and thereby
> percentages derived for each specific source.
> In contrast, with a rate-based restriction, it is much simpler to
> implement policies defined in terms of maximum rates, even though these
> are adapted according to minimum guarantees and use of unused allowances
> in a precise and predictable way.

I don't think overload control is the right tool to police SLAs.

[PhilW] An overload control always enforces a capacity allocation method, and is therefore performing a policing function. This may be predictable, and either agreed between the owners of the traffic streams or not (and therefore would not be regarded formally as an SLA), or unpredictable, e.g. for badly designed overload control schemes, including the situation where the solution in terms of allocation over sources is not unique so that stochastic convergence never occurs. Unfortunately at certain network locations the absence of a predictable allocation can be unacceptable.

Let's assume for a second you are in fact using overload control for 
this purpose and are configuring your overload control rates to match 
your SLAs. Say you have two servers A and B (each with capacity 500 
req/s) and four upstream neighbors. The SLA with each of them is that 
you accept 250 req/s.

If one of your servers goes down, your capacity is cut in half. If you 
have configured overload control to honor you SLAs, your remaining 
server will melt down within ms. Your only choice to survive this 
situation is to use overload control and cut the rates below the SLA.

[PhilW] For this reason control (levels) at sources should always be specific to an overloaded server, i.e. there should not be a single restrictive control for multiple target servers. When load is distributed (balanced etc) over more than one destination server the distribution function should therefore always precede the restriction (overload) function, not the other way around. So that in your example each source would have two independent restrictive control associations, one with A and another with B. The load distribution over these might be simple round-robin, so that if they are both controlling at 125 req/s and say A fails, then the 125 req/sec to A would be rejected. There is no surge of traffic to B. Of course it is most likely that the overload would arise only after the failure, but this requires optimum setting of control levels for initiation of control. (More generally one can show that certain rate-based SLAs can be arranged to apply over a set of servers over which load is distributed, and that if the load is balanced in the same proportions as the rates, then the policy is invariant to the number of working servers in the set, but this requires some theory).

Thanks,

Volker (as individual)




> Comments please!
> Phil Williams
_______________________________________________
sip-overload mailing list
sip-overload@ietf.org
https://www.ietf.org/mailman/listinfo/sip-overload