[LOOPS] Some observations regarding ioam//RE: Measuring forward latency

Hi,

IOAM work has been brought up offline from time to time. The basic question is if LOOPS can use ioam to some extent for its local measurement. 

Ioam (draft-ietf-ippm-ioam-data) defines four ioam-type: pre-allocated trace, incremental trace, POT(proof of transit), E2E(edge-to-edge). 
- Trace option type including pre-allocated and incremental is the core part of ioam. It is not appropriate for LOOPS to use it since LOOPS would work on tunnel endpoints. No intermediate node’s information is required. No large number of meta data to be collected would be necessary.
- POT is not relevant to LOOPS
- E2E option type seems the most relevant one. It defines 4 flags currently, 64-bit and 32-bit sequence number, timestamp seconds, timestamp subseconds. Edge here refers to ioam domain edge. 

I do not think ioam E2E is a good fit for LOOPS measurement. 

- LOOPS has a binding state machine and its own operation logic at the ingress/egress which deals with sequence numbers, acknowledgements/FEC and ECN marking or packet recovery policy. It would require additional information for example, FEC, ack number to be carried in data plane. However we should not expect IOAM E2E to be solely used for LOOPS to carry such information. IOAM E2E is also used for something else. So it is difficult to correlate LOOPS module with IOAM module in implementation to trigger state transition in LOOPS. It is even harder to determine if certain IOAM function (as IOAM has many metadata and features) is enabled in order to make LOOPS work properly. LOOPS deployment should not rely on ioam enablement.

- Conceptually IOAM is designed for monitoring and in-band meta data collection. LOOPS is to provide a best effort reliable tunnel. Its measurement would require some reaction, like ACK generation, to be taken and such reactions are very different from data collection and monitoring. There are some similar fields, e.g. timestamp, for both ioam and LOOPS. However, I do not think it provides significant benefit to repurpose a protocol just because of that. 

- In addition, IOAM E2E path may not be a single LOOPS enabled tunnel. Consider an ioam domain like R1-R2-R3-R4, IOAM E2E refers to R1 & R4. However LOOPS may work on R1-R2, R2-R3, R3-R4 respectively. This would require IOAM E2E has some specific handling which does not exist right now and may not be consistent with current ioam E2E design presumptions.

So I personally think LOOPS need have its own tunnel endpoints based measurement. And its relevant state machine should be well defined based on that measurement.
Any opinions regarding IOAM & LOOPS? 

Rgds,
Yizhou

-----Original Message-----
From: Liyizhou 
Sent: Wednesday, June 19, 2019 5:41 PM
To: 'Carsten Bormann' <cabo@tzi.org>; loops@ietf.org
Subject: RE: [LOOPS] Measuring forward latency

Hi Carsten,

There are different metrics to measure for LOOPS enabled segment, i.e. between ingress and egress.
- local RTT or one-way latency
- loss rate over certain time period
- throughput (by calculation)

The fundamental purpose to measure different metrics and monitor their change is to help determine what kind of loss signal (CE or not CE marking, recover or not recover the loss, etc) should be relayed to end host sender when a packet loss is discovered.  
So measurement should be frequent enough. 

TCP's delayed ACK mechanism looks also a good fit here if we set the delayed ACK number to be some larger value.

Another thing that is worth noting is which node will be consuming the measurement information. 
In local retransmission mode, ingress is the one consuming it as it determines whether to CE marking or stop retransmitting. In FEC mode, egress is more likely consuming the information as it recovers the packet and determine the loss signal independently.

Rgds,
Yizhou

-----Original Message-----
From: LOOPS [mailto:loops-bounces@ietf.org] On Behalf Of Carsten Bormann
Sent: Tuesday, June 18, 2019 6:42 AM
To: loops@ietf.org
Subject: [LOOPS] Measuring forward latency

The proposal in

https://tools.ietf.org/html/draft-welzl-loops-gen-info

measures forward latency by having the egress node send back some acknowledgements that carry a timestamp of arrival (expressed in egress node local clock time).  No timing information is conveyed in the forward direction.

This assumes that the local clock of the egress node is useful for relaying timing information to the ingress node.  In many cases in today’s networks, nodes are synchronized using NTP (or maybe even PTP), so the clocks should not be diverging wildly.  Also, the assumption is that today’s nodes can keep their clock drift in check well enough that the latency measurements are not useless.

However, there is some systematic error being introduced if the nodes are not perfectly in sync.  This error should be an essentially constant addend (positive or negative) to the measured latency (maybe even leading to mostly negative “latency” measurements, which may seem illogical but may not invalidate the congestion detection math).

If the latency measurement is mostly used to detect upticks in latency that might be indicative of congestion, all this should not be a big problem.

Alternative approaches might:

— need to send more information (essentially running a per-tunnel time synchronization protocol, which is wasteful) — complicate the processing.

Note that the ingress node has under control how many ACKs with a time stamp it receives, by controlling the rate of setting the “ACK desired” flag; an ingress node that does not care about latency might never set that flag (and rely on cumulative “bitmap” style acknowledgements, called “block 2” in section 4.3).

So are we happy with the latency measurement approach described in the proposal?

Note that an ingress node can independently estimate RTT by noting when packets have been sent (in ingress node clock time) and noting the arrival time of an acknowledgement for that packet (if the PSN is different per transmission there is no ambiguity).  The overhead for storing the transmission point in time nearly vanishes behind the storage of the packet itself (which is needed at least in retransmission mode).  This independent RTT measurement is necessary because the path for the reverse information may be rather different from the forward path and also because delayed acknowledgement schemes add to the RTT.

Does the proposed design look right?

Grüße, Carsten

--
LOOPS mailing list
LOOPS@ietf.org
https://www.ietf.org/mailman/listinfo/loops