[bmwg] RAI-ART review of draft-ietf-bmwg-sip-bench-term-04 and draft-ietf-bmwg-sip-bench-meth-04
"Worley, Dale R (Dale)" <dworley@avaya.com> Fri, 27 April 2012 22:01 UTC
Return-Path: <dworley@avaya.com>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 83A4C21F8600; Fri, 27 Apr 2012 15:01:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.837
X-Spam-Level:
X-Spam-Status: No, score=-102.837 tagged_above=-999 required=5 tests=[AWL=-0.238, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DIdhL54BGIWP; Fri, 27 Apr 2012 15:01:28 -0700 (PDT)
Received: from p-us1-iereast-outbound.us1.avaya.com (p-us1-iereast-outbound.us1.avaya.com [135.11.29.13]) by ietfa.amsl.com (Postfix) with ESMTP id ECF1621F85FF; Fri, 27 Apr 2012 15:01:27 -0700 (PDT)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Av0EANcVm0/GmAcF/2dsb2JhbABEsgSBB4IQEig4BxIBFRcJCUInBAENDQwOh2sLngidC4sCgmqCTmMElw6FBYoqgwSBOA
X-IronPort-AV: E=Sophos;i="4.75,493,1330923600"; d="scan'208";a="6588193"
Received: from unknown (HELO co300216-co-erhwest.avaya.com) ([198.152.7.5]) by p-us1-iereast-outbound.us1.avaya.com with ESMTP; 27 Apr 2012 18:01:25 -0400
Received: from dc-us1hcex2.us1.avaya.com (HELO DC-US1HCEX2.global.avaya.com) ([135.11.52.21]) by co300216-co-erhwest-out.avaya.com with ESMTP; 27 Apr 2012 17:58:47 -0400
Received: from DC-US1MBEX4.global.avaya.com ([169.254.2.202]) by DC-US1HCEX2.global.avaya.com ([::1]) with mapi; Fri, 27 Apr 2012 18:01:25 -0400
From: "Worley, Dale R (Dale)" <dworley@avaya.com>
To: "rai@ietf.org" <rai@ietf.org>, Al Morton <acmorton@att.com>, "draft-ietf-bmwg-sip-bench-term@tools.ietf.org" <draft-ietf-bmwg-sip-bench-term@tools.ietf.org>, "bmwg@ietf.org" <bmwg@ietf.org>
Date: Fri, 27 Apr 2012 17:56:42 -0400
Thread-Topic: RAI-ART review of draft-ietf-bmwg-sip-bench-term-04 and draft-ietf-bmwg-sip-bench-meth-04
Thread-Index: AQHNJMFJRhoQSDLmPUGO2SgZHs/AAA==
Message-ID: <CD5674C3CD99574EBA7432465FC13C1B22726A0A88@DC-US1MBEX4.global.avaya.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailman-Approved-At: Sat, 28 Apr 2012 12:09:25 -0700
Cc: Mary Barnes <mary.ietf.barnes@gmail.com>
Subject: [bmwg] RAI-ART review of draft-ietf-bmwg-sip-bench-term-04 and draft-ietf-bmwg-sip-bench-meth-04
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/bmwg>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Apr 2012 22:01:29 -0000
I am the assigned RAI-ART reviewer for this draft. For background on RAI-ART, please see the FAQ at <http://wiki.tools.ietf.org/area/rai/trac/wiki/RaiArtfaq>. Please resolve these comments along with any other comments you may receive. This draft is on the right track but has open issues, described in the review. Please note that I am quite familiar with SIP but have little familiarity with benchmarking. Thus, some of the points I make might have implicit answers in the usual practice of benchmarking. But since one audience of these documents is the SIP community, such implicit answers should probably be made explicit. I. Technical issues A. Media The drafts seem to be undecided as to whether they want to benchmark media processing or not. At one point, it says that the benchmarking is entirely in the signaling plane. At another point, it is specified how to set the number of media streams per INVITE dialog. I suspect the conflict is that there are devices the WG wants to benchmark which do handle the media and their media handling affects performance significantly, but the WG is unsure how to parametrize media benchmarking. The simplest situation, of course, is if the DUT is a proxy and the media bypasses it entirely. Beyond that, even in the simplest media relaying situations, significant processing costs can arise. Even if the DUT isn't transcoding, it may have to rewrite the RTP codec numbers (as it may have modified the SDP passing through it). In more complex situations, the DUT may have to transcode the codecs. Some consideration has to be made of how to specify exactly what the media loads are. SDP already has ways to specify the exact encoding of media streams, including the number of samples per packet, which implies the packet size and packet rate. These drafts only allow specifying the packet size as a benchmark parameter, but packet size is notoriously ambiguous -- which layers of encapsulation of the encoded media are counted? But specifying the detailed encoding side-steps that question. The draft wants to consider the duration of a media stream to be separately adjustable from the duration of the containing dialog, but the draft explicitly places out of scope the re-INVITE which is necessary to accomplish that realistically (that is, with the signaling matching the presented media packets). B. INVITE dialog structure The drafts seem to want to consider the establishment (or failure thereof) of an INVITE dialog to be instantaneous, after which the dialog continues for a chosen length of time, and then vanishes instantly. Little or no consideration is given to the various scenarios of call establishment, including the most common case: INVITE sent -- 183 response -- significant delay -- 200 response. Dialog teardown is not conceptualized as being a processing step that involves significant cost and may fail: "Session disconnect is not considered in the scope of this work item." This lack of consideration is enhanced in the forking cases, as the variety of scenarios (and their durations) increases. In addition, the drafts only consider forking which is done within the DUT, whereas it will be common in practice for forking to be done downstream of the DUT, presenting the DUT with a stream of 1xx responses from multiple endpoints, with a 2xx after an extended delay. Also, in regard to signaling benchmarking, INVITEs that ultimately fail are likely to be as costly as INVITEs that succeed, but there doesn't seem to be a defined parameter "fraction of attempted calls which succeed" (which controls the callee EAs). C. Loop detection All discussions of loop detection needs to based on the revised loop detection requirements in RFC 5393. D. Authentication In some SIP operations, authentication is commonly done. This can have various effects on the message flows that need to be taken into account in the benchmarks. For instance, a registrar may require that the registering UA authenticate itself. Commonly, the UA sends a REGISTER request, which is rejected with 401 because it contains a nonce that is too old. The UA then immediately sends another REGISTER with the nonce provided in the 401 response, and that request receives a 200 response. In this scenario, the number of effective REGISTER requests is half of the total REGISTER requests, leading to an apparent attempt failure rate of 50%, even though the middlebox is doing the Right Thing 100% of the time. This suggests that the definition of "attempt failure" needs to be updated so that a 4xx response "passed upstream" by the DUT is not counted as an attempt failure. In other scenarios, the DUT itself might be expected to enforce SIP authentication, which would require a somewhat different definition of attempt failure, and would be expected to have lower throughput. So some thought needs to be given as to whether these scenarios are to be benchmarked, and to document how authentication is to be handled in whatever benchmarks are defined. II. Editorial issues The drafts appear to me to be, well, drafts. That is, they generally contain the intended technical content but the exposition is not complete or clear, and at points the reader has to guess at the exact meaning. At various points the sentences are not correct English, references are not complete or correct, and there are various points of contradiction between different parts of the documents. The document needs a proper final revision, with each paragraph edited over carefully to maximize its clarity and to verify that all parts of the documents are consistent with each other. Examples of editorial problems are: draft-ietf-bmwg-sip-bench-term-04 section 1 The term Throughput is defined in RFC2544 [RFC2544]. The definition is in RFC 1242; RFC 2544 refers to RFC 1242. This document uses existing terminology defined in other BMWG work. Examples include, but are not limited to: Device under test (DUT) (c.f., Section 3.1.1 RFC 2285 [RFC2285]). System under test (SUT) (c.f., Section 3.1.2, RFC 2285 [RFC2285]). In what way would the reader determine the relevant "other BMWG work"? This reference needs to be made definite in some way. The behavior of a stateful proxy is further defined in Section 16. This sentence was copied directly from RFC 3261, in whose section 16 the referenced definition exists. However, *this* document has no section 16. section 2.2 The figures are inconsistent in how they label the "Tester" boxes as "EA" or not. Is this difference meaningful? Figures 9 and 10 show "SUT" embracing the entire test setup, including the "Tester" boxes, whereas RFC 2285 section 3.1.2 says that SUT does not include the tester components of the setup. section 3.1.1 The various components of a "session" (usually "dialog" in SIP terminology) are given quasi-mathematical names that look peculiar to me. Worse, they're not completely correct: since each RTP stream has a corresponding RTCP stream, "session[x].medc" should be "session[x].medc[y]". (Is there any benefit of introducing this symbolism?) section 3.1.5 This section defines "overload" and then says "The distinction between an overload condition and other failure scenarios is outside the scope of this document which is blackbox testing." If the distinction is outside the scope, why is there a definition here? section 3.1.6 The definition of "Session Attempt" seems to be incorrect. Of course, a session attempt is each sending of an INVITE/SUBSCRIBE/MESSAGE, whether or not it is ultimately successful. But the definition written makes "session attempt" a time-varying property, which is true only until a response is received by the EA. section 3.1.8 An IS is identified by the Call-ID, To-tag, and From-tag of the SIP message that establishes the session. As written, this is incorrect, as the to-tag is present only in the response(s) to the INVITE. 2. If a media session is described in the SDP body of the signaling message, then the media session is established by the end of Establishment Threshold Time (c.f. Section 3.3.3). This sentence is correct as far as it goes, but there is no clear description (that I can tell) of when a media session is "established". Indeed, it's not clear to me what a proper way to set up the media is -- in real SIP systems, a proxy or SBC can start receiving early media from a callee before it has received *any* SIP responses from the callee, and the middlebox can have some difficulty matching the RTP to a dialog being set up. So the exact details of when to start the media flows are needed to make the benchmarking process reproducible. section 3.2.3 This section defines "SIP-Aware Stateful Firewall" as Device in test topology that provides Denial-of-Service (DoS) Protection to the Signaling and Media Planes for the EAs and Signaling Server But a device can be a SIP-Aware Stateful Firewall (in the ordinary sense of the words) without providing DoS protection. section 3.3.2 "IS Media Attempt Rate" is defined as Configuration on the EA for number of ISs with Associated Media to be established at the DUT per continuous one- second time intervals. However, "established" should be "attempted". section 3.3.3 Configuration of the EA for representing the amount of time that an EA will wait before declaring a Session Attempt Failure. The discussion makes clear that there may be different Establishment Threshold Times for IS and NS, but these times are written about in most places as if they were the same. draft-ietf-bmwg-sip-bench-meth-04 section 2 Refers to "Presence Rate", but draft-ietf-bmwg-sip-bench-term-04 says that presence is out of scope. section 4 What is the relationship between the data items described in this draft's "Benchmarking Considerations" and draft-ietf-bmwg-sip-bench-term-04's "Test Setup Parameters"? section 4.2 I'm having trouble understanding the various terms that include the word "server", including "Signaling Server", "The Server", "SIP Server". Are these intended to have the same meaning? Are they intended to cover the whole range of SIP "middleboxes" (including, e.g., proxies)? In section 2 is The DUT is a SIP Server, which may be any [RFC3261] conforming device. The SUT can be any device or group of devices containing RFC 3261 conforming functionality along with Firewall and/or NAT functionality. If read literally, that would *exclude* pure proxies, since they don't have firewall or NAT functionality. section 4.3 References "Figures 4 and 5", which don't exist. Does it mean to reference draft-ietf-bmwg-sip-bench-term-04? section 4.9 The formatting of the pseudocode is inconsistent. The variable "c" should be boolean rather than 0/1. section 6.6 1. If the DUT is being benchmarked as a proxy or B2BUA, and forking is supported in the DUT, then configure the DUT in the test topology shown in Figure 5 in [I-D.sip-bench-term]. If the DUT does not support forking, then this step can be skipped. 2. Configure a SUT according to the test topology shown in Figure 8 of [I-D.sip-bench-term]. It's not clear to me how one can configure the DUT/SUT according to both figures 5 and 8. And neither figure shows two callee EAs. The text suggests that DUTs that do not support forking can be tested, even though this is a test specifically of performance when the DUT is doing forking. Dale
- [bmwg] RAI-ART review of draft-ietf-bmwg-sip-benc… Worley, Dale R (Dale)
- Re: [bmwg] RAI-ART review of draft-ietf-bmwg-sip-… Carol Davids