Re: [MEDIACTRL] [sip-overload] WGLC: draft-ietf-soc-overload-design

"Parthasarathi R (partr)" <partr@cisco.com> Tue, 31 August 2010 17:09 UTC

Return-Path: <partr@cisco.com>
X-Original-To: mediactrl@core3.amsl.com
Delivered-To: mediactrl@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9A0FD3A69B2; Tue, 31 Aug 2010 10:09:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.781
X-Spam-Level:
X-Spam-Status: No, score=-6.781 tagged_above=-999 required=5 tests=[AWL=-2.579, BAYES_00=-2.599, GB_SUMOF=5, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=1.396, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u-zNVEo2+hi0; Tue, 31 Aug 2010 10:08:59 -0700 (PDT)
Received: from sj-iport-2.cisco.com (sj-iport-2.cisco.com [171.71.176.71]) by core3.amsl.com (Postfix) with ESMTP id A3AF33A6852; Tue, 31 Aug 2010 10:08:59 -0700 (PDT)
Authentication-Results: sj-iport-2.cisco.com; dkim=neutral (message not signed) header.i=none
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AhoFAArUfExAaMHG/2dsb2JhbACgCU5xpCucEYU3BIQ7iFE
X-IronPort-AV: E=Sophos; i="4.56,299,1280707200"; d="scan'208,217"; a="275628070"
Received: from syd-core-1.cisco.com ([64.104.193.198]) by sj-iport-2.cisco.com with ESMTP; 31 Aug 2010 17:09:29 +0000
Received: from xbh-bgl-412.cisco.com (xbh-bgl-412.cisco.com [72.163.129.202]) by syd-core-1.cisco.com (8.13.8/8.14.3) with ESMTP id o7VH9RSE025162; Tue, 31 Aug 2010 17:09:27 GMT
Received: from xmb-bgl-411.cisco.com ([72.163.129.207]) by xbh-bgl-412.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 31 Aug 2010 22:39:26 +0530
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CB492F.43F3155E"
Date: Tue, 31 Aug 2010 22:36:44 +0530
Message-ID: <A11921905DA1564D9BCF64A6430A62390293A4B8@XMB-BGL-411.cisco.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [sip-overload] [MEDIACTRL] WGLC: draft-ietf-soc-overload-design
Thread-Index: ActJIO+Rt7G/scMsTRqusfX/NC3A7wADfPvq
References: <4C71B1C3.6070805@ericsson.com> <A11921905DA1564D9BCF64A6430A62390293A4AF@XMB-BGL-411.cisco.com><4C7AA34D.4020000@alcatel-lucent.com> <A11921905DA1564D9BCF64A6430A62390293A4B0@XMB-BGL-411.cisco.com> <4C7AC02D.1000200@alcatel-lucent.com> <OF5FC5A3A1.0A30DB2F-ON8525778E.006FC85F-8525778E.0070FB2C@csc.com> <A11921905DA1564D9BCF64A6430A623903054F93@XMB-BGL-411.cisco.com> <4C7BC713.3010208@alcatel-lucent.com> <A11921905DA1564D9BCF64A6430A62390293A4B6@XMB-BGL-411.cisco.com> <034e01cb48c1$9b406dd0$d1c14970$@packetizer.com> <OF8F9DDFDC.C309487D-ON85257790.00545617-85257790.0054AE10@csc.com>
From: "Parthasarathi R (partr)" <partr@cisco.com>
To: Janet P Gunn <jgunn6@csc.com>, mediactrl@ietf.org, sip-overload@ietf.org
X-OriginalArrivalTime: 31 Aug 2010 17:09:26.0515 (UTC) FILETIME=[441BBC30:01CB492F]
Subject: Re: [MEDIACTRL] [sip-overload] WGLC: draft-ietf-soc-overload-design
X-BeenThere: mediactrl@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Media Control WG Discussion List <mediactrl.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/mediactrl>, <mailto:mediactrl-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mediactrl>
List-Post: <mailto:mediactrl@ietf.org>
List-Help: <mailto:mediactrl-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mediactrl>, <mailto:mediactrl-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 31 Aug 2010 17:09:12 -0000

Janet,
 
Your New paragraph of text about non-SIP traffic handling in SIP server looks good to me.
 
Thanks
Partha

________________________________

From: sip-overload-bounces@ietf.org on behalf of Janet P Gunn
Sent: Tue 8/31/2010 8:54 PM
To: mediactrl@ietf.org; sip-overload@ietf.org
Subject: Re: [sip-overload] [MEDIACTRL] WGLC: draft-ietf-soc-overload-design



Comments on draft-ietf-soc-overload-design-01 

Intro, third paragraph says: 
"For example, a PSTN gateway that runs 
   out of trunk lines but still has plenty of capacity to process SIP 
   messages should reject incoming INVITEs using a 488 (Not Acceptable 
   Here) response [RFC4412]." 

While it is true that 4412 DOES say to use 488 in this case, we have found that, in the real world, this can lead to incorrect mapping back to ISUP.  In at least some contexts, "503 with a Reason header field Q.850 cause value of 34 (no circuit available)" may be used instead of 488.  (I believe this is covered in a PTSC document.)  So I suggest 

"For example, a PSTN gateway that runs 
   out of trunk lines but still has plenty of capacity to process SIP 
   messages should reject incoming INVITEs using a response such as 488 (Not Acceptable 
   Here), as described in RFC4412." 

After this paragraph, I would add a new paragraph saying something like: 

"There are other failure cases in which a SIP server also serves non-SIP traffic (e.g., RTP packets, database queries and updates, event handling) which can lead to server overload.  These other loads may, or may not, be correlated with the SIP message volume. The server is unable to process all SIP requests due to resource constraints, but simply reducing the flow of SIP messages may not sufficiently reduce the load to avoid congestion collapse.  In this context, it is to be expected that the server has some other method of overload control addressing these other sources of load.  However, the specifics of the overload control for other traffic types, and the coordination of the different overload controls, are out of scope for this document."   

This should address Partha's, and others' concerns. 

Fourth paragraph. 

In addition to the other problems with 503 and Retry-After, 503 is used for other situations (with or without Retry-After), not just SIP Server overload.  A SIP Overload Control process based on 503 would have to specify exactly which cause values trigger the Overload Control. 

Section 2 

Even when SIP messages are not dropped, significant delay can cause time-outs which lead to retransmission.  I would change the second sentence to 
"When SIP is running over the UDP protocol, it will retransmit messages that were dropped or excessively delayed by a SIP server due to overload and thereby increase the offered load for the already overloaded server." 

At the end of section 2 you say 
  "Another challenge for SIP overload control is that the rate of the 
   true traffic source usually cannot be controlled.  Overload is often 
   caused by a large number of UAs each of which creates only a single 
   message.  These UAs cannot be rate controlled as they only send one 
   message.  However, the sum of their traffic can overload a SIP 
   server." 

In fact, the various wireless technologies DO have method for controlling the load "caused by a large number of UAs each of which creates only a single message."  Some of these are of the form "pick a random number and see if it exceeds the threshold you have been given". 

Examples include Access Class Barring, and Access Persistence Mechanism.  It would be possible to do something similar at the SIP level, though it would probably be redundant. 

My suggested rewording would be: 

"Another challenge for SIP overload control is controlling the rate of the true traffic source.  Overload is often caused by a large number of UAs each of which creates only a single message.  However, the sum of their traffic can overload a SIP server. The overload mechanisms suitable for controlling a SIP server (e.g., rate control) may not be effective for individual UAs.  In some cases, there are other non-SIP mechanisms for limiting the load from the UAs.  These may operate independently from, or in conjunction with, the SIP overload mechanisms described here.  In either case, they are out of scope for this document." 

Section 4 

Your model is built on the premise of a "sending entity" and a "receiving entity".  In the real world, not only is Server A sending SIP messages to Server B, but Server B is also sending SIP messages to Server A. 

I don't think you should clutter up your model by trying to address both directions at once, but you should state somewhere in the text that you have made that simplification/abstraction for ease of comprehension, and that any mechanism must work in the context of "SIP messages going both ways". 

My suggestion would be to add another sentence after 
"The model in Figure 1 shows a scenario with one sending and one 
   receiving entity.  In a more realistic scenario a receiving entity 
   will receive traffic from multiple sending entities and vice versa 
   (see Section 6)." 

My suggestion would be: 
"In addition, in a more realistic scenario, SIP messages will be going both directions, from B to A as well as A to B.  However, the overload control mechanisms in each direction can be considered independently." 

Then, in section 5.1, change 
"Each control loop between two servers is 
   completely independent of the control loop between other servers 
   further up- or downstream."   
To 
"Each control loop between two servers is 
   completely independent of the control loop between other servers 
   further up- or downstream, and of the control loop between the two servers in the other direction." 

Section 8, 
second paragraph 
After "An 
   overload control mechanism should ensure that the delay encountered 
   by a SIP message is not increased significantly during periods of 
   overload." 
Add 
"Significantly increased delay can lead to time-outs, and retransmission of SIP messages, making the overload worse." 

"Reactiveness" doesn't seem the right word to me.  "Responsiveness" sounds better to me. 

End of section 8 
Another important metric is the (cpu) load used by the overload "monitor" and "actuator". 

End of section 9 
Suggest changing 
"Explicit overload control 
   mechanisms can be differentiated based on the type of information 
   conveyed in the overload control feedback and whether the control 
   function is in the receiving or sending entity (receiver- vs. sender- 
   based overload control)." 
To 
"Explicit overload control 
   mechanisms can be differentiated based on the type of information 
   conveyed in the overload control feedback and whether the control 
   function is in the receiving or sending entity (receiver- vs. sender- 
   based overload control), or both." 

In 9.2, I think 
"A loss percentage enables a SIP server to ask an upstream neighbor to 
   reduce the number of requests it would normally forward to this 
   server by a percentage X. For example, a SIP server can ask an 
   upstream neighbor to reduce the number of requests this neighbor 
   would normally send by 10%.  The upstream neighbor then redirects or 
   rejects X percent of the traffic that is destined for this server." 
Should be 
"A loss percentage enables a SIP server to ask an upstream neighbor to 
   reduce the number of requests it would normally forward to this 
   server by a X%. For example, a SIP server can ask an 
   upstream neighbor to reduce the number of requests this neighbor 
   would normally send by 10%.  The upstream neighbor then redirects or 
   rejects 10% of the traffic that is destined for this server." 

End of 9.2 
WRT: 
"Thus, percentage throttling requires an adjustment of the throttling 
   percentage in response to the traffic received and may not always be 
   able to prevent a server from encountering brief periods of overload 
   in extreme cases." 
This is not unique to percentage throttling.  It is possible in rate based and window based methods as well.  In all cases, it is heavily dependent on the frequency of updates by the control mechanism.  But that needs to be balanced against the load generated by the control mechanism.  I am not sure whether it makes sense to say something in each method, or put it up front as a general comment. 

Sec 9.4 
Here again, remember that there are many other things that can generate 503, with or without Retry-After. 

Sec 11 
Last paragraph add: 
"Conversely, the semantics of any proposed approach should permit a variety of different algorithms." 




Nits/wordsmithing 
Note at end of section 6, change "different than" to "different from". 

Section 12 first para 
Change 
"Overload control can require a SIP server to prioritize requests and 
   select requests that need to be rejected or redirected." 
To 
"Overload control can require a SIP server to prioritize requests and 
   select requests to be rejected or redirected." 

Sec 12 
Third para 
Change 
"Responses should not be targeted when a SIP server is trying to 
   reduce load for a number of reasons." 
To 
"For a number of reasons, SIP responses should not be dropped in order to 
   reduce SIP processing load" 


Janet