Document: draft-ietf-bmwg-sdn-controller-benchmark-meth-08 Reviewer: Stewart Bryant Review Date: 2018-04-16 IETF LC End Date: 2018-02-02 IESG Telechat date: 2018-04-19 Summary: This is a well written comprehensive test set for SDN controllers. Two minor points remain from my previous review that I would draft to the attention of the responsible AD. Major issues: None Minor issues: I found the large amount of text on Openflow that appears out of the blue in the appendix somewhat strange since the test suit is controller protocol agnostic. I understand that this is included by way of illustrative example. It might be useful to the reader to make a statement to this effect. [Authors] 1. Regarding your #1 comment, we will add a note to mention that 'OpenFlow protocol is used as an example to illustrate the methodologies defined in this document'. Hope this works Nits/editorial comments: The test traffic generators TP1 and TP2 SHOULD be connected to the first and the last leaf Network Device. SB> I am sure I know what does first and last mean, but the meaning should be called out. [Authors] We will clarify about the Test Traffic Generators (TP1/2) connectivity in the setup. We will update such that 'TP1 SHOULD be connected to Network Device 1 and TP2 SHOULD be connected to Network Device n'. ------------------------------------------------------------------------------------------------------------------- Spencer Dawkins Spencer Dawkins' No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) I have a few questions, at the No Objection level ... do the right thing, of course. I apologize for attempting to play amateur statistician, but it seems to me that this text 4.7. Test Repeatability To increase the confidence in measured result, it is recommended that each test SHOULD be repeated a minimum of 10 times. is recommending a heuristic, when I'd think that you'd want to repeat a test until the results seem to be converging on some measure of central tendency, given some acceptable margin of error, and this text Procedure: 1. Establish the network connections between controller and network nodes. 2. Query the controller for the discovered network topology information and compare it with the deployed network topology information. 3. If the comparison is successful, increase the number of nodes by 1 and repeat the trial. If the comparison is unsuccessful, decrease the number of nodes by 1 and repeat the trial. 4. Continue the trial until the comparison of step 3 is successful. 5. Record the number of nodes for the last trial (Ns) where the topology comparison was successful. seems to beg for a binary search, especially if you're testing whether a controller can support a large number of controllers ... [Authors] I would like to clarify that the above procedure is for single test trial. We recommend to repeat the procedure for atleast 10 times for better accuracy of results. This text Reference Test Setup: The test SHOULD use one of the test setups described in section 3.1 or section 3.2 of this document in combination with Appendix A. or some variation is repeated about 16 times, and I'm not understanding why this is using BCP 14 language, and if BCP 14 language is the right thing to do, I'm not understanding why it's always SHOULD. I get the part that this will help compare results, if two researchers are running the same tests. Is there more to the requirement than that? [Authors] Our intention is to help compare results, if two testers are running thesame tests. We do not have any other requirments than this.. In this text, Procedure: 1. Perform the listed tests and launch a DoS attack towards controller while the trial is running. Note: DoS attacks can be launched on one of the following interfaces. a. Northbound (e.g., Query for flow entries continuously on northbound interface) b. Management (e.g., Ping requests to controller's management interface) c. Southbound (e.g., TCP SYN messages on southbound interface) is there a canonical description of "DoS attack" that researchers should be using, in order to compare results? These are just examples, right? [Authors] You are correct. Note section is to give some examples to simulate DoS attacks Is the choice of [OpenFlow Switch Specification] ONF,"OpenFlow Switch Specification" Version 1.4.0 (Wire Protocol 0x05), October 14, 2013. intentional? I'm googling that the current version of OpenFlow is 1.5.1, from 2015. [Authors] This is intentional as all our examples are derived based on this version of the specification. ------------------------------------------------------------------------------------------------------------------- Eric Rescorla's No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) Rich version of this review at: https://mozphab-ietf.devsvcdev.mozaws.net/D3948 COMMENTS reported. 4.7. Test Repeatability To increase the confidence in measured result, it is recommended that each test SHOULD be repeated a minimum of 10 times. Nit: you might be happier with "RECOMMENDED that each test be repeated ..." Also, where does 10 come from? Generally, the number of trials you need depends on the variance of each trial. [Authors] The RECOMMENDED number 10 was arrived based on our experience during the benchmarking. I will discuss with other authors for changing SHOULD to RECOMMENDED. [Eric] The SHOULD versus RECOMMENDED has the same normative force in 2119. It's just editorial and I thought it would read better. [Authors] Updated SHOULD to RECOMMENDED in the revised version Test Reporting Each test has a reporting format that contains some global and identical reporting components, and some individual components that are specific to individual tests. The following test configuration parameters and controller settings parameters MUST be reflected in This is an odd MUST, as it's not required for interop. [Authors] The intent of specifying MUST is to capture relevant test parameters to enable Apple to Apple comparison of test test results across two testers/test runs. 5. Stop the trial when the discovered topology information matches the deployed network topology, or when the discovered topology information return the same details for 3 consecutive queries. 6. Record the time last discovery message (Tmn) sent to controller from the forwarding plane test emulator interface (I1) when the trial completed successfully. (e.g., the topology matches). How large is the TD usually? How much does 3 seconds compare to that? [Authors] The test duration varies depends on the size of the test topology. For a smaller topology (3 - 10) the TD was within a minute. So we kept the query interval of 3 seconds to accomodate smaller and larger topologies. [Eric] So, 3 seconds is a pretty big fraction of that. It introduces non-trivial random (I think) error. As for n-1, I *think* it's the right one here, but I'm not sure. It's what you use for "sample variance" typically. Have you talked to a statistician? [Authors] The 3 seconds query is used as a stop criteria for the test. The measurement is based on the time that the controller receives the last discsovery message. We have also reworded Step 4 to mention 3 seconds as 'RECOMMENDED value' (as below) "Query the controller every t seconds (RECOMMENDED value for t is 3) to obtain the discovered network topology information through the northbound interface or the management interface and compare it with the deployed network topology information. Total Trials SUM[SQUAREOF(Tri-TDm)] Topology Discovery Time Variance (TDv) ---------------------- Total Trials -1 You probably don't need to specify individual formulas for mean and variance. However, you probably do want to explain why you are using the n-1 sample variance formula. [Authors] We have added both formulas based on the feedback received in the mailing list. We are using n-1, as it is commonly used variance measure. Do we need an explanation here or providing any reference to this is sufficient? [Eric] Well, my point was that you could specify mean and variance in one place and not repeat them over and over Well, n-1 is typically used for sample variance. This is something a little different. [Authors] You are right. We used n-1 sample variance to correct the bias in the estimation of the population variance. Measurement: (R1-T1) + (R2-T2)..(Rn-Tn) Asynchronous Message Processing Time Tr1 = ----------------------- Nrx Incidentally, this formula is the same as \sum_i{R_i} - \sum_i{T_i} [Authors] Good suggestion, we have incorporated this in the new version. messages transmitted to the controller. If this test is repeated with varying number of nodes with same topology, the results SHOULD be reported in the form of a graph. The X coordinate SHOULD be the Number of nodes (N), the Y coordinate SHOULD be the average Asynchronous Message Processing Time. This is an odd metric because an implementation which handled overload by dropping every other message would look better than one which handled overload by queuing. [acm] If processing time were the only number reported, you're right. Although the early generation of controller benchmarking tools overlooked the important combinations of metrics, the Reporting Format adds the success/loss message performance: The report should capture the following information in addition to the configuration parameters captured in section 5. - Successful messages exchanged (Nrx) - Percentage of unsuccessful messages exchanged, computed using the formula (1 - Nrx/Ntx) * 100), Where Ntx is the total number of messages transmitted to the controller. BUT, it would be better if SHOULD or RECOMMENDED terms were used, to cover the case you identified. [Authors] Updated as SHOULD in the revised draft ------------------------------------------------------------------------------------------------------------------- Mirja Kühlewind's No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) Editorial comments: 1) sdn-controller-benchmark-term should probably rather be referred in the intro (instead of the abstract). [Authors] We have moved this sentence to Introduction. 2) Is the test setup needed in both docs (this and sdn-controller-benchmark-term) or would a reference to sdn-controller-benchmark-term maybe be sufficient? [Authors] We have added test setup in both drafts as per the feedback received in the mailing list. 3) Appendix A.1 should probably also be moved to sdn-controller-benchmark-term [Authors] We have removed Appendix section A.1 as the same test topology is illustrated in test setup. ------------------------------------------------------------------------------------------------------------------- Alissa Cooper's No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) Regarding this text: "The test SHOULD use one of the test setups described in section 3.1 or section 3.2 of this document in combination with Appendix A." Appendix A is titled "Example Test Topology." If it's really an example, then it seems like it should not be normatively required. So either the appendix needs to be re-named, or the normative language needs to be removed. And if it is normatively required, why is it in an appendix? The document would also benefit from describing what the exception cases to the SHOULD are (I guess if the tester doesn't care about having comparable results with other tests?). [Authors] We will remove this section and corresponding references in the draft as this is already captured in the test setup. ------------------------------------------------------------------------------------------------------------------- Adam Roach's No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) I again share Martin's concerns about the use of the word "standard" in this document's abstract and introduction. [Authors] Reworded the text in abstract section ------------------------------------------------------------------------------------------------------------------- Benjamin Kaduk's No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) In the Abstract: This document defines the methodologies for benchmarking control plane performance of SDN controllers. Why "the" methodologies? That seems more authoritative than is appropriate in an Informational document. [Authors] Agreed. We have repharsed the sentence as 'This document defines methodologies for benchmarking control plane performance of SDN controllers'. Why do we need the test setup diagrams in both the terminology draft and this one? It seems like there is some excess redundancy, here. [Authors] We agree, but this was done based on the feedback from the mailing list. In Section 4.1, how can we even have a topology with just one network device? This "at least 1" seems too low. Similarly, how would TP1 and TP2 *not* be connected to the same node if there is only one device? [Authors] You are right. This may not be a case with single node in the topology. We have fixed that assumption. Thank you for adding consideration to key distribution in Section 4.4, as noted by the secdir review. But insisting on having key distribution done prior to testing gives the impression that keys are distributed once and updated never, which has questionable security properties. Perhaps there is value in doing some testing while rekeyeing is in progress? [Authors] The intention of this draft is to benchmark a controller after bringing it to a stable state. So we do not recommend doing benchmarking during transitions to avoid inconsistency in the observed results. I agree with others that the statistical methodology is not clearly justified, such as the sample size of 10 in Section 4.7 (with no consideration for sample relative variance), use of sample vs. population veriance, etc. [Authors] We are using sample variance for all test calculations as it is widely used for finding the variance of sample from the mean. As we only use sample vairance, we felt it not required to clarify this further in the document scope. It seems like the measurements being described sometimes start the timer at an event at a network element and other times start the timer when a message enters the SDN controller itself (similarly for outgoing messages), which seems to include a different treatment of propagation delays in the network, for different tests. Assuming these differences were made by conscious choice, it might be nice to describe why the network propagation is/is not included for any given measurement. [Authors] We have captured information related to this in Section 4.5 It looks like the term "Nxrn" is introduced implicitly and the reader is supposed to infer that the 'n' represents a counter, with Nrx1 corresponding to the first measurement, Nrx2 the second, etc. It's probably worth mentioning this explicitly, for all fields that are measured on a per-trial/counter basis. [Authors] Noticed a forward reference to Nrxn in Section 5.1.3 (Step 5). We have addressed it in the revised draft. I'm not sure that the end condition for the test in Section 5.2.2 makes sense. [Authors] We have added additonal steps to 5.2.2 to capture the end condition. It seems like the test in Section 5.2.3 should not allow flexibility in "unique source and/or destination address" and rather should specify exactly what happens. [Authors] As suggested, we have removed the flexibility In Section 5.3.1, only considering 2% of asynchronous messages as invalid implies a preconception about what might be the reason for such invalid messages, but that assumption might not hold in the case of an active attack, which may be somewhat different from the pure DoS scenario considered in the following section. [Authors] You are right. But this test helps to understand the system behaviour when network devices malfunction and not DoS attacks. Section 5.4.1 says "with incremental sequence number and source address" -- are both the sequence number and source address incrementing for each packet sent? [Authors] Yes This could be more clear. It also is a little jarring to refer to "test traffic generator TP2" when TP2 is just receiving traffic and not generating it. [Authors] Both the sequence number and source address are increased. We will remove the term test traffic generator and simply mention TP2. Appendix B.3 indicates that plain TCP or TLS can be used for communications between switch and controller. It seems like this would be a highly relevant test parameter to report with the results for the tests described in this document, since TLS would introduce additional overhead to be quantified! [Authors] In Section 4.7, we have recommendeded to report this parameter in the test report The figure in Section B.4.5 leaves me a little confused as to what is being measured, if the SDN Application is depicted as just spontaneously installing a flow at some time vaguely related to traffic generation but not dependent on or triggered by the traffic generation. [Authors] You are correct. We have updated the sequence diagram to inject flow at the begining of the test to avoid confusion ------------------------------------------------------------------------------------------------------------------- Ignas Bagdonas' No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) The document seems to assume the OpenFlow dataplane abstraction model – which is one of the possible models; the practical applicability of such model to anything beyond experimental deployments is a completely separate question outside of the scope of this document. The methodology tends to apply to a broader set of central control based systems, and not only to the data plane operations – therefore the document seems to be setting at least something practically usable for benchmarking of such central control systems. Possibly the document could mention such assumptions made about the overall model where the methodology defined applies to. [acm] That's a good wording suggestion, it certainly captures the working group consensus to prepare the methods independent from OpenFlow, for wider applicability. However, we also received feedback from Stuart Bryant seeking *more* message specificity (appended below, see * ), which is not really possible for the general methods. Please try to strike a balance between these comments in discussion today, if possible! A nit: s/Khasanov Boris/Boris Khasanov, unless Boris himself would insist otherwise. ------------------------------------------------------------------------------------------------------------------- Suresh Krishnan's No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) I share Ignas's concern about this being too tightly associated with the OpenFlow model. [Authors] We have mentioned a text that OpenFlow is just used as illustrative example in this document * Section 4.1 The test cases SHOULD use Leaf-Spine topology with at least 1 Network Device in the topology for benchmarking. How is it even possible to have a leaf-spine topology with one Network Device? [Authors] We have fixed the text to use atleast 2 network network devices in the test topology in the revised draft ------------------------------------------------------------------------------------------------------------------- Martin Vigoureux's No Objection on draft-ietf-bmwg-sdn-controller-benchmark-meth-08: (with COMMENT) Hello, I have the same question/comment than on the companion document: I wonder about the use of the term "standard" in the abstract in view of the intended status of the document (Informational). Could the use of this word confuse the reader? [Authors] We have rephrased the abstract to avoid the usage of term 'standard'