Re: [ippm] review: draft-ietf-ippm-responsiveness-03

Hi Christoph,

> On 18. Jan 2024, at 00:46, Christoph Paasch <cpaasch=40apple.com@dmarc.ietf.org> wrote:
> 
> Hello,
> 
>> On Jan 17, 2024, at 12:03 PM, Sebastian Moeller <moeller0=40gmx.de@dmarc.ietf.org> wrote:
>>> 
>>> 
>>> On 17. Jan 2024, at 18:55, Christoph Paasch <cpaasch=40apple.com@dmarc.ietf.org> wrote:
>>>> On Dec 10, 2023, at 7:50 AM, Sebastian Moeller <moeller0=40gmx.de@dmarc.ietf.org> wrote:
>>>> 
> […]
>> 
>>>> 
>>>> 4.3.1.  Aggregating the Measurements
>>>> 
>>>>  The algorithm produces sets of 4 times for each probe, namely: tcp_f,
>>>>  tls_f, http_f, http_l (from the previous section).  The
>>>>  responsiveness of the network connection being tested evolves over
>>>>  time as buffers gradually reach saturation.  Once the buffers are
>>>>  saturated, responsiveness will stabilize.  Thus, the final
>>>>  calculation of network responsiveness considers the last MAD (Moving
>>>>  Average Distance - default to 4) intervals worth of completed
>>>>  responsiveness probes.
>>>> 
>>>> NOTE: for a network limited by MPS that means 400 latency samples, which is pretty impressive! For a PTC limited network however that could be considerably less. Could we require a verbose mode that reports the number of latency samples taken into account and that also reportes some measure of variance?
>>> 
>>> Implementations are free to report additional information. I’m not sure whether that should be specified in an IETF RFC, because the RFC defines the methodology only, no?
>> 
>> [SM] Well, there is this idea that if one reports aggregate data one should also give an idea how well that data holds, and often that starts with the number of data points included, something more and more on-line tests start to reveal (e.g. by allowing to download a csv of the detailed results, which includes the individual samples, so getting to the N just requires counting :) )
> 
> Sure. But this is IMO a tool/implementation-question.

[SM] Not really, IMHO this as validity of reporting question ;) The way things are phrased ATM, I believe the lowest possible number of samples for self might be MAD * 2, so 8 samples (we need at least 2 samples to calculate a standard deviation), while the expected number for most users will be closer to ~400, that is a really wide range and hence giving some indication seems really desirable. In the same line of thought, we might consider not only reporting the mean RPM but also a measure of deviation. To keep the default recommended result display clean and simple, I think this could be moved to a verbose ode if a client offers that, but if there is no verbose mode this should be reported (potentially as part of the confidence report). But I might be biassed here from requirements at work, and do not want to claim I am offering a universal truth here, but I do want to be sure what ever the draft recommends in this direction is a conscious decision and not an oversight. So what ever you decide here, I am fine with.

> 
>> 
>> [SM] True, as long as the draft does not imply that only 20 seconds is permitted, I am happy.
> 
> “20 seconds is permitted” ? I guess you mean “20 seconds is mandated”. Yes, we are not mandating 20 seconds. In general, everything is “permitted” :)

[SM] Yes, that is what I meant with "only 20 seconds is permitted"... I have a feeling quite a number of RFC based implementations stick really close to the text and do not think outside of the box... (case ion point, there is zero reason, why ICMPv6 should not have supported the base types that ICMPv4 supports like timestamps requests, but that was not stated explicitly in the relevant RFC as far as I can see, and now ICMPv6 lost the ability to use timestamps).

> 
>>>>  We define "High" confidence if the algorithm was able to fully reach
>>>>  stability based on the defined standard deviation tolerance.
>>>> 
>>>>  It must be noted that depending on the chosen standard deviation
>>>>  tolerance or other parameters of the methodology and the network-
>>>>  environment it may be that a measurement never converges to a stable
>>>>  point.  This is expected and part of the dynamic nature of networking
>>>>  and the accompanying measurement inaccuracies.  Which is why the
>>>>  importance of imposing a time-limit is so crucial, together with an
>>>>  accurate depiction of the "confidence" the methodology was able to
>>>>  generate.
>>>> 
>>>> QUESTION: Should we propose a recommended verbiage to convey the confidence to the user, to make different implementations easier to compare?
>>> 
>>> I think we did that, no? “Low” “Medium” and “High”.
>> 
>> [SM] I was daft and was expecting something like "The confidence score should be reported to the user as part of the main results", but I guess that is the implicit message of this section?
> 
> In my opinion it is fairly clear in the draft. Feel free to send a PR if you have an idea on how to make it more clear.

[SM] Fine then, I just note that goresponsiveness currently does not report that (I did not check it against all other parts of the draft yet, and there is still lots of progress in the go code, so it might have leaned that already ;) )

> 
>>>> 5.1.3.  Server side influence
>>>> 
>>>>  Finally, the server-side introduces the same kind of influence on the
>>>>  responsiveness as the client-side, with the difference that the
>>>>  responsiveness will be impacted during the downlink load generation.
>>>> 
>>>> 5.2.  Root-causing Responsiveness
>>>> 
>>>>  Once a responsiveness result has been generated one might be tempted
>>>>  to try to localize the source of a potential low responsiveness.  The
>>>>  responsiveness measurement is however aimed at providing a quick,
>>>>  top-level view of the responsiveness under working conditions the way
>>>>  end-users experience it.  Localizing the source of low responsiveness
>>>>  involves however a set of different tools and methodologies.
>>>> 
>>>>  Nevertheless, the Responsiveness Test allows to gain some insight
>>>>  into what the source of the latency is.  The previous section
>>>>  described the elements that influence the responsiveness.  From there
>>>>  it became apparent that the latency measured on the load-generating
>>>>  connections and the latency measured on separate connections may be
>>>>  different due to the different elements.
>>>> 
>>>>  For example, if the latency measured on separate connections is much
>>>>  less than the latency measured on the load-generating connections, it
>>>>  is possible to narrow down the source of the additional latency on
>>>>  the load-generating connections.  As long as the other elements of
>>>>  the network don't do flow-queueing, the additional latency must come
>>>>  from the queue build-up at the HTTP and TCP layer.  This is because
>>>>  all other bottlenecks in the network that may cause a queue build-up
>>>>  will be affecting the load-generating connections as well as the
>>>>  separate latency probing connections in the same way.
>>>> 
>>>> NOTE: This however requires that the output of the measurement tool exposes these two as separate numbers, no?
>>> 
>>> Yes, it does. But I don’t think that is relevant for the RFC itself, whose goal is to define the RPM metric itself. Not how implementations can expose more verbose information. I’m not 100% certain where to draw the line for this...
>> 
>> [SM] Nor am I, hence my question. That said, it would be nice if the draft would explicitly mention this as an option?
> 
> I will add a note here in this section:
> 
> Nevertheless, the Responsiveness Test allows to gain some insight into what the
> source of the latency is. To gain this insight, implementations of the responsiveness
> test are encouraged to have an optional verbose mode that exposes the inner workings
> of the algorithm. Specifically it is useful to expose TM(tcp_f), TM(tls_f), TM(http_f) and TM(http_l)
> to enable the root-causing analysis detailed hereafter.
> 
> The previous section described the elements that influence
> the responsiveness. From there it became apparent that the latency measured
> on the load-generating connections and the latency measured on separate connections
> may be different due to the different elements.
> 
> 
> Sounds good?

[SM] Excellent! Thanks!

> 
>>>> 6.  Responsiveness Test Server API
>>>> 
>>>>  The responsiveness measurement is built upon a foundation of standard
>>>>  protocols: IP, TCP, TLS, HTTP/2.  On top of this foundation, a
>>>>  minimal amount of new "protocol" is defined, merely specifying the
>>>>  URLs that used for GET and PUT in the process of executing the test.
>>>> 
>>>>  Both the client and the server MUST support HTTP/2 over TLS.  The
>>>>  client MUST be able to send a GET request and a POST.  The server
>>>>  MUST be able to respond to both of these HTTP commands.  The server
>>>>  MUST have the ability to provide content upon a GET request.  The
>>>>  server MUST use a packet scheduling algorithm that minimizes internal
>>>>  queueing to avoid affecting the client's measurement.
>>>> 
>>>> QUESTION: While I fully agree that is what a server should do, I would prefer to get numbers over not getting numbers... so maybe make this a SHOULD, also if the server needsto return time the request was received and time the response was sent out, maybe we can factor out the local processing?
>>> 
>>> Agree’d on the SHOULD use a packet scheduling […].
>>> 
>>> Regarding the time - see my response in the other thread.
>>> 
>>>> 
>>>> 
>>>>  As clients and servers become deployed that use L4S congestion
>>>>  control (e.g., TCP Prague with ECT(1) packet marking), for their
>>>>  normal traffic when it is available, and fall back to traditional
>>>>  loss-based congestion controls (e.g., Reno or CUBIC) otherwise, the
>>>>  same strategy SHOULD be used for Responsiveness Test traffic.  This
>>>>  is RECOMMENDED so that the synthetic traffic generated by the
>>>>  Responsiveness Test mimics real-world traffic for that server.
>>>> 
>>>> NOTE: This clearly needs to be under end-user control, that is there should be a way for the user to request/enforce either option (I am fine with the default being, what the local OS defaults to). L4S is an experimental RFC and is is expected to have odd failure modes so a L4S aware measurement tool should keep in mind that L4S itself needs to be measured/tested. At the very least the results need to report whether L4S was used or not.
>>> 
>>> IMO, this is an implementation choice, whether a tool exposes configuration options to enable/disable L4S and what kind of additional statistics it exposes.
>> 
>> [SM] Mmmh, I am a bit uneasy with that, as I said it should report if a test was using L4S or not, I accept that maybe RPM is not the best tool for A/B testing L4S ;)
> 
> In a verbose mode there are a lot of things that can be exposed. L4S on/off. IPv4/IPv6. Once we go down this “rabbit hole” of what else could be exposed, we will start listing every single bit along the protocol stack that may theoretically have an impact on the responsiveness.

[SM] Clearly not a black and white matter given the multitude of potentially relevant parameters, however the client will know whether L4S was used or not, while it is quite hard for a user to figure this out (I think one would need to get packet captures and then look into the headers and connection negotiation).

> 
>>>> 
>>>>  2.  A "large" URL/response: The server must respond with a status
>>>>      code of 200 and a body size of at least 8GB.  The server SHOULD
>>>>      specify the content-type as application/octet-stream.  The body
>>>>      can be bigger, and may need to grow as network speeds increases
>>>>      over time.  The actual message content is irrelevant.  The client
>>>>      will probably never completely download the object, but will
>>>>      instead close the connection after reaching working condition and
>>>>      making its measurements.
>>>> 
>>>> NOTE: Some ISPs in the past used compression on access links that will not give realistic results for highly compressible data chunks, but this likely is solved for TLS encrypted data. HOWEVER if we allow measurements without TLS this issue becomes potentially relevant again...
>>> 
>>> We explicitly disable compression in the HTTP-GET by omitting the `Accept-Encoding` header. I realize this is not specified. I will add it a bit higher up:
>>> 
>>> "The client MUST send the GET without the "Accept-Encoding" header to ensure the server will not compress the data."
>> 
>> [SM] Great, this was not even on my radar, I was more thinking about IP packet compression across the access link (like ITU V.42 and V.44 back in the old analog modem days)...
> 
> I see… so the recommendation would be to have random content in the HTTP payload. Not sure though how frequent IP packet compression is these days… (I didn’t even know it exists 😅)

[SM] I have no data on how likely this is, just a few anecdotal reports that it seems not universally extinct ;)

Regards
	Sebastian

> 
>>> Thanks a lot for your detailed feedback! It was very helpful! I hope I addressed all of your concerns and am happy to discuss further.
>> 
>> [SM] Thank you for your response, yes, most of my notes and nits are addressed/solved. As I wrote before this is a great draft that is ready to progress to the next stage.
> 
> Thanks!

> 
> Christoph
> 
>> 
>> Regards
>> Serbastian
>> 
>> 
>>> 
>>> I will push the changes to the git repo soon.
>>> 
>>> 
>>> Christoph
>>> 
>>>> 
>>>> 
>>>> [...]
>>>> _______________________________________________
>>>> ippm mailing list
>>>> ippm@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/ippm
> 
> 
> _______________________________________________
> ippm mailing list
> ippm@ietf.org
> https://www.ietf.org/mailman/listinfo/ippm