Re: [bmwg] RAI-ART review of draft-ietf-bmwg-sip-bench-term-04 and draft-ietf-bmwg-sip-bench-meth-04
Carol Davids <davids@iit.edu> Mon, 16 July 2012 19:09 UTC
Return-Path: <davids@iit.edu>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8DFB211E82E8 for <bmwg@ietfa.amsl.com>; Mon, 16 Jul 2012 12:09:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.392
X-Spam-Level:
X-Spam-Status: No, score=-1.392 tagged_above=-999 required=5 tests=[AWL=1.207, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QTo97qUfVE6S for <bmwg@ietfa.amsl.com>; Mon, 16 Jul 2012 12:09:16 -0700 (PDT)
Received: from spamblock-1.iit.edu (spamblock-1.iit.edu [216.47.143.127]) by ietfa.amsl.com (Postfix) with ESMTP id 1851811E82E4 for <bmwg@ietf.org>; Mon, 16 Jul 2012 12:09:16 -0700 (PDT)
X-ASG-Debug-ID: 1342465799-03d9b5362c4f0de0001-1nqPoS
Received: from atlas-v-iv.iit.edu (atlas-p.iit.edu [216.47.143.155]) by spamblock-1.iit.edu with ESMTP id RYMMyAP2CKRXHDB5; Mon, 16 Jul 2012 14:09:59 -0500 (CDT)
X-Barracuda-Envelope-From: davids@iit.edu
X-Barracuda-Apparent-Source-IP: 216.47.143.155
MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: text/plain; charset="US-ASCII"
Received: from [192.168.0.197] (c-67-184-3-189.hsd1.il.comcast.net [67.184.3.189]) by atlas-v.iit.edu (Sun Java(tm) System Messaging Server 6.3-4.01 (built Aug 3 2007; 32bit)) with ESMTPA id <0M7900K19P9VVS50@atlas-v.iit.edu>; Mon, 16 Jul 2012 14:10:48 -0500 (CDT)
User-Agent: Microsoft-MacOutlook/14.2.3.120616
Date: Mon, 16 Jul 2012 14:09:54 -0500
From: Carol Davids <davids@iit.edu>
X-ASG-Orig-Subj: Re: [bmwg] RAI-ART review of draft-ietf-bmwg-sip-bench-term-04 and draft-ietf-bmwg-sip-bench-meth-04
To: "Worley, Dale R (Dale)" <dworley@avaya.com>, "rai@ietf.org" <rai@ietf.org>, Al Morton <acmorton@att.com>, "draft-ietf-bmwg-sip-bench-term@tools.ietf.org" <draft-ietf-bmwg-sip-bench-term@tools.ietf.org>, "bmwg@ietf.org" <bmwg@ietf.org>
Message-id: <CC29C656.21417%davids@iit.edu>
Thread-topic: [bmwg] RAI-ART review of draft-ietf-bmwg-sip-bench-term-04 and draft-ietf-bmwg-sip-bench-meth-04
In-reply-to: <CD5674C3CD99574EBA7432465FC13C1B22726A0A88@DC-US1MBEX4.global.avaya.com>
X-Barracuda-Connect: atlas-p.iit.edu[216.47.143.155]
X-Barracuda-Start-Time: 1342465799
X-Barracuda-URL: http://216.47.143.127:8000/cgi-mod/mark.cgi
X-Virus-Scanned: by bsmtpd at iit.edu
X-Barracuda-Spam-Score: 0.00
X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=7.0 tests=BSF_SC0_MISMATCH_TO
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.102890 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 BSF_SC0_MISMATCH_TO Envelope rcpt doesn't match header
Cc: Mary Barnes <mary.ietf.barnes@gmail.com>
Subject: Re: [bmwg] RAI-ART review of draft-ietf-bmwg-sip-bench-term-04 and draft-ietf-bmwg-sip-bench-meth-04
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/bmwg>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jul 2012 19:09:17 -0000
Dale, Thanks very much for your careful review of the drafts. Please see below for our responses and descriptions of what we have done to address the issues. We highlight your comments under RAI-ART REVIEW COMMENT:, and indicate our responses under RESPONSE:CD/VG: We also appreciate your pointing out the need for a final editorial review and have begun that work using the detail you provided. We plan to post the edited version in time for IETF 85 in November. Best, Carol Carol Davids Email: davids@iit.edu Skype: caroldavids1 .................................................................. RAI-ART REVIEW COMMENT: I. Technical issues A. Media The drafts seem to be undecided as to whether they want to benchmark media processing or not. At one point, it says that the benchmarking is entirely in the signaling plane. At another point, it is specified how to set the number of media streams per INVITE dialog. I suspect the conflict is that there are devices the WG wants to benchmark which do handle the media and their media handling affects performance significantly, but the WG is unsure how to parametrize media benchmarking. RESPONSE: CD/VG: The media parameters need to be specified because, if the DUT/SUT processes media, the media processing will impact the SIP Performance, using cycles that could otherwise have been used for SIP-related processing. We do not measure the performance of the audio and/or video. But we need to document the conditions under which the test was conducted if we are to be able to compare test results. We are measuring the signaling-plane throughput, but thisthroughput will vary depending upon the conditions of test. One important condition of test is media processing: Is the DUT processing media streams and if so, how many and what parameters define them? We note the presence and character of the media streams being processed by the device without measuring the quality of the resulting media. We define the the media according to the packet size, type of codec, and number of streams. These parameters describe the conditions under which the test is performed. But we do not measure the quality or other characteristics of the resulting media. We expect that the more work we ask the device under test to do, the lower will be its SIP throughput. RAI-ART REVIEW COMMENT: The simplest situation, of course, is if the DUT is a proxy and the media bypasses it entirely. Beyond that, even in the simplest media relaying situations, significant processing costs can arise. Even if the DUT isn't transcoding, it may have to rewrite the RTP codec numbers (as it may have modified the SDP passing through it). In more complex situations, the DUT may have to transcode the codecs. Some consideration has to be made of how to specify exactly what the media loads are. SDP already has ways to specify the exact encoding of media streams, including the number of samples per packet, which implies the packet size and packet rate. These drafts only allow specifying the packet size as a benchmark parameter, but packet size is notoriously ambiguous -- which layers of encapsulation of the encoded media are counted? But specifying the detailed encoding side-steps that question. RESPONSE:CD/VG: We will add a note that the testing organization may choose to specify more characteristics of the associated media and may keep records of comparative results. Our goal in this draft is to describe a method for measuring SIP throughput of the device under test. Organizations are encouraged to identify boundary conditions that they deem important and to perform tests under as many variants of these boundary conditions as they wish. RAI-ART REVIEW COMMENT: The draft wants to consider the duration of a media stream to be separately adjustable from the duration of the containing dialog, but the draft explicitly places out of scope the re-INVITE which is necessary to accomplish that realistically (that is, with the signaling matching the presented media packets). RESPONSE:CD/VG: The parameters of test are recorded in the test setup report. The session attempt rate and the total number of sessions to be attempted are identified in this report. These two numbers determine the total length of the test. The duration of a session is also identified in this report, but the name we assigned that parameter is "Media session hold time." We will change the name of the parameter to "Session hold time." When the INVITE-initiated session includes media, then the session hold time represents the duration of the media session. Whether or not we include media, the session ends when the BYE message is received by the emulated agent. We do not include the duration of a stream among the parameters of test. We do allow there to be multiple media streams per session, but the session is ended by a BYE. Re-INVITEs are a different question. They do not affect the duration of the session. But it is true that they consume processing cycles. We will discuss this further and appreciate your identifying the issue. We will also add a test parameter that describes the time between successive call attempts by the emulated agent. We recommend setting this parameter to 0 since setting it at a higher value will make the testing-to-failure take longer. RAI-ART REVIEW COMMENT: B. INVITE dialog structure The drafts seem to want to consider the establishment (or failure thereof) of an INVITE dialog to be instantaneous, after which the dialog continues for a chosen length of time, and then vanishes instantly. Little or no consideration is given to the various scenarios of call establishment, including the most common case: INVITE sent -- 183 response -- significant delay -- 200 response. Dialog teardown is not conceptualized as being a processing step that involves significant cost and may fail: "Session disconnect is not considered in the scope of this work item." This lack of consideration is enhanced in the forking cases, as the variety of scenarios (and their durations) increases. In addition, the drafts only consider forking which is done within the DUT, whereas it will be common in practice for forking to be done downstream of the DUT, presenting the DUT with a stream of 1xx responses from multiple endpoints, with a 2xx after an extended delay. Also, in regard to signaling benchmarking, INVITEs that ultimately fail are likely to be as costly as INVITEs that succeed, but there doesn't seem to be a defined parameter "fraction of attempted calls which succeed" (which controls the callee EAs). RESPONSE:CD/VG: RE DELAY: The delay in responding with a 200 OK after getting an INVITE is not specified in the current methodology document and is assumed to be instantaneous. Our intent is to stress the DUT as quickly as possible. Introducing a delay serves to increase the time before the DUT produces its first stress-induced error. A high interval (> 32s) may cause the DUT to enter a stable state and not be subject to stress. For this reason, we intentionally chose not to introduce delays before issuing a 200 OK. Note that many user agents automatically introduce delays by first sending a 180-ringing, etc. Any additional artificial delays, while easy to introduce, would be an additional tuning parameter that is subject to differing interpretations. RE FORKING: Regarding the forking case being done in the DUT --- this was a design choice since we wanted to model the complexity and delay caused by forking N branches downstream and collating the responses, etc. at the DUT. In other words, our assumption is that the DUT is a proxy (or a B2BUA) and is doing forking and response collating. The event that the tester arranges forking to happen downstream of the DUT is automatically captured when the device acting as a DUT is a user agent client and the next downstream SIP entity is forking and presenting responses to the DUT. RE THE COST OF FAILURES: Regarding the contention that "INVITEs that ultimately fail are likely to be as costly as INVITEs that succeed, but there doesn't seem to be a Defined parameter "fraction of attempted calls which succeed" (which controls the callee EAs)" --- it seems to us that the Session Establishment Performance benchmark (please see Section 5.2) covers this. RAI-ART REVIEW COMMENT: C. Loop detection All discussions of loop detection needs to based on the revised loop detection requirements in RFC 5393. RESPONSE:CD/VG: We will update the methodology document to ensure that loop detection is based on the revised loop detection requirements in rfc5393. This is a good catch! RAI-ART REVIEW COMMENT: D. Authentication In some SIP operations, authentication is commonly done. This can have various effects on the message flows that need to be taken into account in the benchmarks. For instance, a registrar may require that the registering UA authenticate itself. Commonly, the UA sends a REGISTER request, which is rejected with 401 because it contains a nonce that is too old. The UA then immediately sends another REGISTER with the nonce provided in the 401 response, and that request receives a 200 response. In this scenario, the number of effective REGISTER requests is half of the total REGISTER requests, leading to an apparent attempt failure rate of 50%, even though the middlebox is doing the Right Thing 100% of the time. This suggests that the definition of "attempt failure" needs to be updated so that a 4xx response "passed upstream" by the DUT is not counted as an attempt failure. In other scenarios, the DUT itself might be expected to enforce SIP authentication, which would require a somewhat different definition of attempt failure, and would be expected to have lower throughput. So some thought needs to be given as to whether these scenarios are to be benchmarked, and to document how authentication is to be handle in whatever benchmarks are defined. RESPONSE:CD/VG: We agree with your analysis, and in fact, the terminology document already recognizes this through its scope (please see Section 2.1 of the terminology document). It says, o REGISTER and INVITE requests may be challenged or remain unchallenged for authentication purpose as this may impact the performance benchmarks. Any observable performance degradation due to authentication is of interest to the SIP community. Whether or not the REGISTER and INVITE requests are challenged is a condition of test and will be recorded and reported. However, in the methodology document, we do not include any further guidance for this. As you point out, there is a need for some guidance. To remedy this, we propose that we add the following in Section 5.1 of The methodology document: Authentication option = ___________________________________ (on|off; if on, please see Note-2 below). Number of responses of the following type: 401: _____________ (if authentication turned on; N/A otherwise) 407: _____________ (if authentication turned on; N/A otherwise) 2xx-class: _____________ 1xx-class: _____________ Others: _____________ This information will provide the tester to analyze how many 401/407 were received and adjust the metrics in Section 5.2 and 5.3 accordingly. >
- [bmwg] RAI-ART review of draft-ietf-bmwg-sip-benc… Worley, Dale R (Dale)
- Re: [bmwg] RAI-ART review of draft-ietf-bmwg-sip-… Carol Davids