[bmwg] Last Call: <draft-ietf-bmwg-sip-bench-term-08.txt> (Terminology for Benchmarking Session Initiation Protocol (SIP) Networking Devices) to Informational RFC
Carol Davids <davids@iit.edu> Tue, 04 March 2014 05:01 UTC
Return-Path: <davids@iit.edu>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8CA531A034D for <bmwg@ietfa.amsl.com>; Mon, 3 Mar 2014 21:01:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.599
X-Spam-Level:
X-Spam-Status: No, score=-0.599 tagged_above=-999 required=5 tests=[BAYES_05=-0.5, HTML_MESSAGE=0.001, J_CHICKENPOX_75=0.6, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dNpeZ9_5aKZk for <bmwg@ietfa.amsl.com>; Mon, 3 Mar 2014 21:01:13 -0800 (PST)
Received: from mail-ig0-f176.google.com (mail-ig0-f176.google.com [209.85.213.176]) by ietfa.amsl.com (Postfix) with ESMTP id 560381A034B for <bmwg@ietf.org>; Mon, 3 Mar 2014 21:01:13 -0800 (PST)
Received: by mail-ig0-f176.google.com with SMTP id uy17so9583387igb.3 for <bmwg@ietf.org>; Mon, 03 Mar 2014 21:01:10 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:user-agent:date:subject:from:to:cc:message-id :thread-topic:mime-version:content-type; bh=iN+8S/GJl2nmuzYPVscY/yRA9yDNDrzFlNLzp9ehUcc=; b=NPwWWUi4Dgfg3ZurM4kJMpvHg4y2ZRtbh2xSt7TBz4uMRnKSPe5NFbpXYTNm9PXoPV lzbUvgLlT0H0DGYoUB1lKyQlsI34Qznp0dQ7UE1PeKe4UGOprPfpK3rYAFkJdByk2+mC x+yVSwMwDtF+jizkPLP0dQGt5Ji8+1/QE4Q3QnyP4zzP6DY04FEThhg9yRECY9N2QuS3 +td+8hzpOHadwMr8vi986QifWbWs7yHoRYa7XiLt+4h7YyUEpYIbAsdQTncWHSvdOzbs 39nRlYD/MG9dlwx/wVpdGmNLUD4Nk8SMeZ7L5/BFwKbYYSa0uQUxO3MYzl8yainBSNEy 185Q==
X-Gm-Message-State: ALoCoQmAqqMrxMcqfFOhzZJ74BtJGZE3T91bPfMnGjcbAYedkA/8PvI5DotLgWz2TR/41iT0je27
X-Received: by 10.43.90.202 with SMTP id bj10mr21046227icc.48.1393909270062; Mon, 03 Mar 2014 21:01:10 -0800 (PST)
Received: from [192.168.0.197] (c-24-7-193-205.hsd1.il.comcast.net. [24.7.193.205]) by mx.google.com with ESMTPSA id om7sm46412696igb.10.2014.03.03.21.01.02 for <multiple recipients> (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 03 Mar 2014 21:01:08 -0800 (PST)
User-Agent: Microsoft-MacOutlook/14.3.9.131030
Date: Mon, 03 Mar 2014 23:01:00 -0600
From: Carol Davids <davids@iit.edu>
To: bmwg@ietf.org, Robert Sparks <rjsparks@nostrum.com>
Message-ID: <CF3AB9C0.588A3%davids@iit.edu>
Thread-Topic: Last Call: <draft-ietf-bmwg-sip-bench-term-08.txt> (Terminology for Benchmarking Session Initiation Protocol (SIP) Networking Devices) to Informational RFC
Mime-version: 1.0
Content-type: multipart/alternative; boundary="B_3476732464_843562"
Archived-At: http://mailarchive.ietf.org/arch/msg/bmwg/u3gHW570M1bwvuDSUynn8GQnlug
X-Mailman-Approved-At: Tue, 04 Mar 2014 07:11:30 -0800
Subject: [bmwg] Last Call: <draft-ietf-bmwg-sip-bench-term-08.txt> (Terminology for Benchmarking Session Initiation Protocol (SIP) Networking Devices) to Informational RFC
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Mar 2014 05:01:25 -0000
Robert, All, Below are replies to the comments provided by Robert Sparks on January 24, 2014, both general and also those specific to the terminology document. We will reply to the comments related to the Methodology document shortly. These comments were very helpful to us as we wrote version 09 of the documents. Robert¹s comments are identified by the double asterisks (**) . Our responses are identified by the triple asterisks (***). The General comments are identified by us as G: Item 1, G: Item 2, etc. The ones specific to a particular section are identified by the section to which they relate, preceded by a T in the case of comments related to the Terminology document and by an M in the case of comments related to the Methodology document. Regards, Carol Davids Vijay Gurbani Scott Poretsky Reviews of draft-ietf-bmwg-sip-bench-term-08 and draft-ietf-bmwg-sip-bench-meth-08 **G: Summary: These drafts are not ready for publication as RFCs. ***Response: G: Summary: We have edited the documents in light of the comments provided by Robert and other reviewers and also in light of experience running the tests in a lab environment and with the collaboration of a vendor of a commercial product. We changed the titles of the documents to reflect more accurately their scope; we reduced the number of benchmarks and the number of tests. We reduced the number of distinct test architectures to two and moved the illustrations of the two architectures to the Methodology document for ease of use. Details on these and other changes are inline below. **G: Item 1: First, some of the text in these documents shows signs of being old, and the working group may have been staring at them so long that they've become hard to see. The terminology document says "The issue of overload in SIP networks is currently a topic of discussion in the SIPPING WG." (SIPPING was closed in 2009). The methodology document suggests a "flooding" rate that is orders of magnitude below what simple devices achieve at the moment. That these survived working group last call indicates a different type of WG review may be needed to groom other bugs out of the documents. ***Response, G: Item 1: We removed comments and tests related to flooding from the documents. **G: Item 2: Who is asking for these benchmarks, and are they (still) participating in the group? The measurements defined here are very simplistic and will provide limited insight into the relative performance of two elements in a real deployment. The documents should be clear about their limitations, and it would be good to know that the community asking for these benchmarks is getting tools that will actually be useful to them. The crux of these two documents is in the last paragraph of the introduction to the methodology doc: "Finally, the overall value of these tests is to serve as a comparison function between multiple SIP implementations". The documents punt on providing any comparison guidance, but even if we assume someone can figure that out, do these benchmarks provide something actually useful for inputs? ***Response,G: Item 2: Yes, they are valuable to the community. 1. A major SBC vendor used these documents and the paid services of two students to perform the tests describe therein to learn the values of the benchmarks which were subsequently published for external release. 2. Regarding the measurements being simplistic: They were intentionally designed to be simplistic because the goal of the BMWG Working Group is not to reproduce real-world traffic in the lab. To quote the BMWG charter: To better distinguish the BMWG from other measurement initiatives in the IETF, the scope of the BMWG is limited to the characterization of implementations of various internetworking technologies using controlled stimuli in a laboratory environment. Said differently, the BMWG does not attempt to produce benchmarks for live, operational Networks. 3. Regarding the documents not providing any comparison guidance, again, that is intentional. The documents were designed such that testing two different implementations will result in two different reports that then can be compared by Operations personnel. It is not the job of the document itself to provide comparison guidance. The metrics generated by the methods in these documents define a frontiers beyond which "there be dragons." 4. In summary, we believe that these documents are useful and they have been used by vendors in the community. **G: Item 3: It would be good to explain how these documents relate to RFC6076. ***Response, G: Item 3: The authors have been in contact for several years and agreed that there is little overlap. The 6076 relates to the end to end performance of a service on a network. These drafts, on the other hand, refer to lab tests of a device . **G: Item 4: The terminology tries to refine the definition of session, but the definition provided, "The combination of signaling and media messages and processes that support a SIP-based service" doesn't answer what's in one session vs another. Trying to generically define session has been hard and several working groups have struggled with it (see INSIPID for a current version of that conversation). This document doesn't _need_ a generic definition of session - it only needs to define the set of messages that it is measuring. It would be much clearer to say "for the purposes of this document, as session is the set of SIP messages associated with an INVITE initiated dialog and any Associated Media, or a series of related SIP MESSAGE requests". (And looking at the benchmarks, you aren't leveraging related MESSAGE requests - they all appear to be completely independent). Introducing the concepts of Invite-initiated sessions and non-invite-initiated sessions doesn't actually help define the metrics. When you get to the metrics, you can speak concretely in terms of a series of INVITEs, REGISTERs, and MESSAGEs. Doing that, and providing a short introduction pointing folks with PSTN backgrounds relating these to "Session Attempts" will be clearer. To be clear, I strongly suggest a fundamental restructuring of the document to describe the benchmarks in terms of dialogs and transactions, and remove the IS and NS concepts completely. ***Response, G: Item 4:Re-definition of a session: We believe that the 3D depiction of the session is useful. As we state in the document, the definition is for the purpose of this document only. The reason we created it was to be able to refer to all the different cases ones in which we have a INVITE initiated session with media, in which case all three of the components are non-null; the case of a INVITE-initiated session without media, in which case the media and control components are null and only the Sig component is non-null; and the case of non-INVITE initiated sessions, such as REGISTER and MESSAGE, in which case again, the only non-null component is the Sig component. We will in the next revision of the document refer to the diagram and its nomenclature in our descriptions of the metrics and the test cases. Each test case describes the set of SIP messages and the order in which they should be sent. For this reason we do not need to define a session as this or that set of SIP requests. **G: Item 5: The INVITE related tests assume no provisional responses, leaving out the effect on a device's memory when the state machines it is maintaining transition to the proceeding state. Further, by not including provisionals, and building the tests to search for Timer B firing, the tests insure there will be multiple retransmissions of the INVITE (when using UDP) that the device being tested has to handle. The traffic an element has to handle and likely the memory it will consume will be very different with even a single 100 trying, which is the more usual case in deployed networks. The document should be clear _why_ it chose the test model it did and left out metrics that took having a provisional response into account. Similarly, you are leaving out the delayed-offer INVITE transactions used by 3pcc and it should be more obvious that you are doing so. Likewise, the media oriented tests take a very basic approach to simulating media. It should be explicitly stated that you are simulating the effects of a codec like G.711 and that you are assuming an element would only be forwarding packets and has to do no transcoding work. It's not clear from the documents whether the EA is generating actual media or dummy packets. If it's actual media, the test parameters that assume constant sized packets at a constant rate will not work well for video (and I suspect endpoints, like B2BUAs, will terminate your call early if you send them garbage). The sections on a series of INVITEs is fairly clear that you mean each of them to have different dialog identifiers. I don't see any discussion of varying the To: URI. If you don't, what's going to keep a gateway or B2BUA from rejecting all but the first with something like Busy? Similarly, I'm not finding where you talk about how many AoRs you are registering against in the registration tests. I think, as written, someone could write this where all the registers affected only one AoR. ***Response, G: Item 5: Item 5: Why not define all the metrics in terms of dialogs and transactions? These documents describe black-box testing. The evidence of the existence of transactions is that the session was set up. In the case of a REGISTER request, for example, we see the 200 OK to the REGISTER and know there was a successful session. **G: Item 6: Stress Testing: The methodology document calls Stress Testing out of scope, but the very nature of the Benchmarking algorithm is a stress test. You are iteratively pushing to see at what point something fails, _exactly_ by finding the rate of attempted sessions per second that the thing under test would consider too high. *** Response, G: Item 6: These are benchmark tests, designed to find the highest rate at which the system can handle session attempts with no failures of the application itself. The tests stop at the point where a single application error is observer. Stress testing would continue to run with an ever-increasing number of errors at the application layer, at ever higher rates until such time as the platform upon which the application runs, fails catastrophically, by, for example rebooting, or stopping operation entirely and failing to reboot. - - - - - - - - - - - - - - - - - - - - - - TERMINOLOGY: Now to specific issues in document order, starting with the terminology document (nits are separate and at the end): ** T (for Terminology document): The title and abstract are misleading - this is not general benchmarking for SIP performance. You have a narrow set of tests, gathering metrics on a small subset of the protocol machinery. Please (as RFC 6076 did) look for a title that matches the scope of the document. For instance, someone testing a SIP Events server would be ill-served with the benchmarks defined here. *** Response: T: The documents have been renamed as follows: Methodology for Benchmarking Session Initiation Protocol (SIP) Devices: Basic session setup and registration Terminology for Benchmarking Session Initiation Protocol (SIP) Devices:Basic session setup and registration ** T, section 1: RFC5393 should be a normative reference. You probably also need to pull in RFCs 4320 and 6026 in general - they affect the state machines you are measuring. *** Response, T, section 1: Agreed. We have pulled in rfc5393, rfc4320 and rfc6026. ** T, 3.1.1: As noted above, this definition of session is not useful. It doesn't provide any distinction between two different sessions. I strongly disagree that SIP reserves "session" to describe services analogous to telephone calls on a switched network - please provide a reference. SIP INVITE transactions can pend forever - it is only the limited subset of the use of the transactions (where you don't use a provisional response) that keeps this communication "brief". In the normal case, an INVITE an its final response can be separated by an arbitrary amount of time. Instead of trying to tweak this text, I suggest replacing all of it with simpler, more direct descriptions of the sequence of messages you are using for the benchmarks you are defining Here. ***Response, T, 3.1.1: Same as response to Item 4: Re-definition of a session: We believe that the 3D depiction of the session is useful. As we state in the document, the definition is for the purpose of this document only. The reason we created it was to be able to refer to all the different cases ones in which we have a INVITE initiated session with media, in which case all three of the components are non-null; the case of a INVITE-initiated session without media, in which case the media and control components are null and only the Sig component is non-null; and the case of non-INVITE initiated sessions, such as REGISTER and MESSAGE, in which case again, the only non-null component is the Sig component. Each test case describes the set of SIP messages and the order in which they should be sent. For this reason we do not need to define a session as this or that set of SIP requests. **T, 3.1.1: How is this vector notion (and graph) useful for this document? I don't see that it's actually used anywhere in the documents. Similarly, the arrays don't appear to be actually used (though you reference them from some definitions) - What would be lost from the document if you simply removed all this text? ***Response, T3.1.1: It is not necessary to refer to the diagram after the initial explanation. We do in fact refer to the components of the session in the methodology document. - - - - - - - **T, 3.1.5, Discussion, last sentence: Why is it important to say "For UA-type of network devices such as gateways, it is expected that the UA will be driven into overload based on the volume of media streams it is processing." It's not clear that's true for all such devices. How is saying anything here useful? ***Response: T, 3.1.5: We do not consider gateways anymore, so we have removed this from T,3.1.5. **T, 3.1.6: This definition says an outstanding BYE or CANCEL is a Session Attempt. Why not just say INVITE? You aren't actually measuring "session attempts" for INVITEs or REGISTERs - you have separate benchmarks for them. ***Response: T, 3.1.6: Agreed. The definition was modified to say, "A SIP INVITE or REGISTER request sent by the EA that has not received a final response." **T, 3.1.7: It needs to be explicit that these benchmarks are not accounting for/allowing early dialogs. ***Response: T, 3.1.7: Agreed. We added a sentence to that affect. **T, 3.1.8: The words "early media" appear here for the first time. Given the way the benchmarks are defined, does it make sense to discuss early media in these documents at all (beyond noting you do not account for it)? If so, there needs to be much more clarity. (By the way, this Discussion will be much easier to write in terms of dialogs). ***Response: T, 3.1.8: We now refer to early pre-call media, following what rfc3261 does in Section 20.11 when it first talks about early media. **T, 3.1.9, Discussion point 2: What does "the media session is established" mean? If you leave this written as a generic definition, then is this when an MSRP connection has been made? If you simplify it to the simple media model currently in the document, does it mean an RTP packet has been sent? Or does it have to be received?. For the purposes of the benchmarks defined here, it doesn't seem to matter, so why have this as part of the discussion anyway? ***Response: T, 3.1.9: We did not find that phrase in T 3.1.9, but we did find a SUBSCRIBE given as an example of a NS session and changed that to a REGISTER. **T, 3.1.9, Definition: A series of CANCELs meets this definition. ***Response: We have clarified the fact that we only consider the REGISTER request as an NS. The CANCELs are out of scope. **T, 3.1.10 Discussion: This doesn't talk about 3xx responses, and they aren't covered elsewhere in the document. ***Response: T, 3.1.10 Discussion: The 3xx has been added to the list as well. Only the 2xx is considered to be a success. **T, 3.1.11 Discussion: Isn't the MUST in this section methodology? Why is it in this document and not -meth-? ***Response: T, 3.1.11 Discussion: T3.1.11 was removed from version (-09). **T, 3.1.11 Discussion, next to last sentence: "measured by the number of distinct Call-IDs" means you are not supporting forking, or you would not count answers from more than on leg of the fork as different sessions, like you should. Or are you intending that there would never be an answer from more than one leg of a fork? If so, the documents need to be clearer about the methodology and what's actually being measured. ***Response: T, 3.1.11 Discussion: T3.1.11 was removed from version (-09). **T, 3.2.2 Definition: There's something wrong with this definition. For example, proxies do not create sessions (or dialogs). Did you mean "forwards messages between"? ***Response: T, 3.2.2 Definition: Wording was changed to, "Device in the test topology that facilitates the creation of sessions between EAs." **T, 3.2.2 Discussion: This is definition by enumeration since it uses a MUST, and is exclusive of any future things that might sit in the middle. If that's what you want, make this the definition. The MAY seems contradictory unless you are saying a B2BUA or SBC is just a specialized User Agent Server. If so, please say it that way. ***Response: T, 3.2.2 Discussion: The text now reads as follows: "The DUT is an RFC3261-compatible network intermediary such as ..." **T, 3.2.3: This seems out of place or under-explored. You don't appear to actually _use_ this definition in the documents. You declare these things in scope, but the only consequence is the line in this section about the not lowering performance benchmarks when present. Consider making that part of the methodology of a benchmark and removing this section. If you think it's essential, please revisit the definition - you may want to generalize it into _anything_ that sits on the path and may affect SIP processing times (otherwise, what's special about this either being SIP Aware, or being a Firewall)? ***Response: T, 3.2.3: References to firewalls both stateful and otherwise have been removed. **T, 3.2.5 Definition: This definition just obfuscates things. Point to 3261's definition instead. How is TCP a measurement unit? Does the general terminology template include "enumeration" as a type? Do you really want to limit this enumeration to the set of currently defined transports? Will you never run these benchmarks for SIP over websockets? **Response: T, 3.2.5 Definition: The set of transports now includes websockets, RFC (rfc7118). **T, 3.3.2 Discussion: Again, there needs to be clarity about what it means to "create" a media session. This description differentiates attempt vs success, so what is it exactly that makes a media session attempt successful? When you say number of media sessions, do you mean number of M lines or total number of INVITEs that have SDP with m lines? ***Response: T, 3.3.2 Discussion: This term was removed. ** T, 3.3.3: This would much clearer written in terms of transactions and dialogs (you are already diving into transaction state machine details). This is a place where the document needs to point out that it is not providing benchmarks relevant to environments where provisionals are allowed to happen and INVITE transactions are allowed to pend. ***Response: ** T, 3.3.3: This is about whether or not the attempt to set up a call has succeeded. It is about how we define success and failure. It is about how long you wait before you declare a failure. This section defines a parameter, measured in units of time, that represents the amount of time your EA Client will wait for a response from the EA Server, after the elapse of which the EA will declare a failure to establish a call. Remember, this is lab testing not end to end testing. We are not concerned with whether or not the call is ever set up after some errors have occurred. We are testing to failure. The failure to establish the session before X seconds have passed is a failure within the context of this test. The edited version reads as follows: 3.3.3. Establishment Threshold Time Definition: Configuration of the EA that represents the amount of time that an EA client will wait for a response from the EA server before declaring a Session Attempt Failure. **T, 3.3.4: How does this model (A single session duration separate from the media session hold time) produce useful benchmarks? Are you using it to allow media to go beyond the termination of a call? If not, then you have media only for the first part of a call? What real world thing does this reflect? Alternatively, what part of the device or system being benchmarked does this provide insight to? ***Response: T, 3.3.4: The term "Media Session Hold Time" was removed. **T, 3.3.5: The document needs to be honest about the limits of this simple model of media. It doesn't account for codecs that do not have constant packet sizes. The benchmarks that use the model don't capture the differences based on content of the media being sent - a B2BUA or gateway, may will behave differently if it is transcoding or doing content processing (such as DTMF detection) than it will if it is just shoveling packets without looking at them. ***Response: T, 3.3.5: The following changes were made to the definition: Definition: Configuration on the EA for a fixed number of frames or samples to be sent in each RTP packet of the media session. Discussion: For a single benchmark test, media sessions use a defined number of samples or frames per RTP packet. If two SBCs for example used the same codec but one put more frames into the RTP packets than the other, this might cause variation in performance benchmark measurements. Measurement Units: An Integer Number of frames or samples, depending on whether hybrid or sample-based codecs are used, respectively. Issues: None. See Also: In addition, a new parameter, "Codec Type" was added as follows: Definition: The name of the codec used to generate the media session. Discussion: For a single benchmark test, all sessions use the same size packet for media streams. The size of packets can cause variation in performance benchmark measurements. Measurement Units: This is an alphanumeric name assigned to uniquely identify the codec. Issues: None. See Also: In addition, this parameter was added to the Test Setup Report in M5.1. **T, 3.3.6: Again, the model here is that any two media packets present the same load to the thing under test. That's not true for transcoding, mixing, or analysis (such as for dtmf detection). It's not clear that if you have two streams, each stream has its own "constant rate". You call out having one audio and one video stream - how do you configure different rates for them? ***Response: **T, 3.3.6: This definition has been deleted. **T, 3.3.7: This document points to the methodology document for indicating whether streams are bi-directional or uni-directional. I cant find where the methodology document talks about this (the string 'direction' does not occur in that document). ***Response: T, 3.3.7: This definition has been deleted. **T, 3.3.8: This text is old - it was probably written pre-RFC5056. If you fork, loop detection is not optional. This, and the methodology document should be updated to take that into account. ***T, 3.3.8: This text has been removed. It relates to loop detection which is no longer considered in version 09. **T, 3.3.9: Clarify if more than one leg of a fork can be answered successfully and update 3.1.11 accordingly. Talk about how this affects the success benchmarks (how will the other legs getting failure responses affect the scores?) ***T, 3.3.9: Response: This text has been removed. It relates to forking which is no longer considered in version 09. **T, 3.3.9, Measurement units: There is confusion here. The unit is probably "endpoints". This section talks about two things, that, and type of forking. How is "type of forking" a unit, and are these templates supposed to allow more than one unit for a term? ***T, 3.3.9: Response: This text has been removed. It relates to forking which is no longer considered in version 09. **T, 3.4.2, Definition: It's not clear what "successfully completed" means. Did you mean "successfully established"? This is a place where speaking in terms of dialogs and transactions rather than sessions will be much clearer. ***Response: T, 3.4.2, Definition: The SER was re-defined as follows: 3.4.1. Session Establishment Rate Definition: The maximum value of the Session Attempt Rate that the DUT can handle for an extended, pre-defined, period with zero failures. Discussion: This benchmark is obtained with zero failure in which 100% of the sessions attempted by the Emulated Agent are successfully completed by the DUT. The session attempt rate provisioned on the EA is raised and lowered as described in the algorithm in the accompanying methodology document, until a traffic load at the given attempt rate over the sustained period of time identified by T in the algorithm completes without any failed session attempts. Sessions may be IS or NS or a mix of both and will be defined in the particular test. Measurement Units: sessions per second (sps) Issues: None. See Also: Invite-Initiated Sessions Non-Invite-Initiated Sessions Session Attempt Rate **T, 3.4.3, This benchmark metric is underdefined. I'll focus on that in the context of the methodology document (where the docs come closer to defining it). This definition includes a variable T but doesn't explain it - you have to read the methodology to know what T is all about. You might just say "for the duration of the test" or whatever is actually correct. ***Response: T, 3.4.3: This was a reference to Session Capacity, a concept that has been removed from version 09. **T, 3.4.3, Discussion: "Media Session Hold Time MUST be set to infinity". Why? The argument you give in the next sentence just says the media session hold time has to be at least as long as the session duration. If they were equal, and finite, the test result does not change. What's the utility of the infinity concept here? ***Response: T, 3.4.3, Discussion: This was a reference to Session Capacity, a concept that has been removed from version 09. **T, 3.4.4: "until it stops responding". Any non-200 response is still a response, and if something sends a 503 or 4xx with a retry-after (which is likely when it's truly saturating) you've hit the condition you are trying to find. The notion that the Overload Capacity is measurable by not getting any responses at all is questionable. This discussion has a lot of methodology in it - why isn't that (only) in the methodology document? ***Response: T, 3.4.4: This related to Session Overload Capacity, a concept that has been removed from version 09. **T, 3.4.5: A normal, fully correct system that challenged requests and performed flawlessly would have a .5 Session Establishment Performance score. Is that what you intended? The SHOULD in this section looks like methodology. Why is this a SHOULD and not a MUST (the document should be clearer about why sessions remaining established is important). Or wait - is this what Note 2 in section 5.1 of the methodology document (which talks about reporting formats) is supposed to change? If so, that needs to be moved to the actual methodology and made _much_ clearer. ***Response: T, 3.4.5: This section related to Session Establishment Performance, a concept that has been removed from version 09. **T, 3.4.6: You talk of the first non-INVITE in an NS. How are you distinguishing subsequent non-INVITES in this NS from requests in some other NS? Are you using dialog identifiers or something else? Why do you expect that to matter (why is the notion of a sequence of related non-INVITEs useful from a benchmarking perspective - there isn't state kept in intermediaries because of them - what will make this metric distinguishable from a metric that just focuses on the transactions?) ***Response: T, 3.4.6: This section related to Session Attempt Delay, a concept that was removed from version 09. **T, 3.4.7: What's special about MESSAGE? Why aren't you focusing on INFO or some other end-to-end non-INVITE? I suspect it's because you are wanting to focus on a simple non-INVITE transaction (which is why you are leaving out SUBSCRIBE/NOTIFY). MESSAGE is good enough for that, but you should be clear that's why you chose it. You should also talk about whether the payload of all of the MESSAGE requests are the same size and whether that size is a parameter to the benchmark. (You'll likely get very different behavior from a MESSAGE that fragments.) ***Response: T, 3.4.7: This section related to the IM Rate. We removed IM from the scope of these documents in version 09, due to the fact that there many ways to deliver such services and specifying one or the other to be tested would not be useful. **T, 3.4.7: The definition says "messages completed" but the discussion talks about "definition of success". Does success mean an IM transaction completed successfully? If so, the definition of success for a UAC has a problem. As written, it describes a binary outcome for the whole test, not how to determine the success of an individual transaction - how do you get from what it describes to a rate? ***Response: T, 3.4.7: IM is outside the scope of the documents in version 09. **T, Appendix A: The document should better motivate why this is here. Why does it mention SUBSCRIBE/NOTIFY when the rest of the document(s) are silent on them. The discussion says you are _selecting_ a Session Attempts Arrival Rate distribution. It would be clearer to say you are selecting the distribution of messages sent from the EA. It's not clear how this particular metric will benefit from different sending distributions. ***Response: T, Appendix A: Appendix A has been removed. - - - - - - - - - - - - - - - - - Comments related to the Methodology document will be sent later. Carol Davids Professor & Director RTC Lab Illinois Institute of Technology Office: 630-682-6024 Mobile: 630-292-9417 Email: davids@iit.edu Skype: caroldavids1 Web: rtc-lab.itm.iit.edu
- [bmwg] Last Call: <draft-ietf-bmwg-sip-bench-term… The IESG
- Re: [bmwg] Last Call: <draft-ietf-bmwg-sip-bench-… Robert Sparks
- Re: [bmwg] Last Call: <draft-ietf-bmwg-sip-bench-… Carol Davids
- [bmwg] Last Call: <draft-ietf-bmwg-sip-bench-term… Carol Davids