[iccrg] 回复： New draft submitted for draft-pan-tsvwg-hpccplus-02.txt

Hi Yuchung,

That's a great question! In our current draft and also in the paper, we are focusing on the deployment of uniform transport to discuss the algorithm in a clean way. Actually, many CC algorithms talk in the same as well. 

However, I think you made a good observation that QoS is indeed involved with congestion control in production datacenters. We believe the WRR/QoS in the switch should work friendly with HPCC++. Because in datacenter we mostly use the same priority with different WRR/DWRR weights to avoid starvation from using straight priority(SP). (some small protocol messages like BGP are still using SP though). DWRR/WRR provides a minimum bandwidth guarantee for each queue. In this sense, upon the congestion, HPCC++ should know that at least we are safe to back off to my bandwidth guarantee. So the utilization convergence is still very fast to mitigate the emerging congestion. By contrast, I agree that we should be careful and slow in increasing the rate in this scenario since we are not the only protocol over the link. 

We are happy to work through those details in support QoS in HPCC++ if it is actually an essential concern.

Thanks,
Rui

------------------------------------------------------------------
发件人：Yuchung Cheng <ycheng=40google.com@dmarc.ietf.org>
发送时间：2020年12月15日(星期二) 20:45
收件人："Pan, Rong" <rong.pan@intel.com>
抄　送："Miao, Rui(缪睿)" <miao.rui@alibaba-inc.com>; "Liu, Hongqiang(洪强)" <hongqiang.liu@alibaba-inc.com>; jri.ietf <jri.ietf@gmail.com>; iccrg <iccrg@irtf.org>; Barak Gafni <gbarak@nvidia.com>; "Lee, Jeongkeun" <jk.lee@intel.com>; Barak Gafni <gbarak@mellanox.com>; Yuval Shpigelman <yuvals@mellanox.com>
主　题：Re: [iccrg] New draft submitted for draft-pan-tsvwg-hpccplus-02.txt

Hi Rong,

I actually think ingress or egress won't matter as much as qlen vs sojourn time based telemetry. I have not read the hpcc paper yet but I am curious its performance with multiple QoS being present -- apologize I have not read the paper yet:

Let's say a port has two queues for high (H) and low (L) priority packets separately.

A low-prio packet arrives at (an empty) L. It waits a long time because H is extremely busy. So the qlen either ingress or egress metered is 0 for this packet, but its sojourn or actual queuing time is large. The degree depends on the switch's priority scheduling policy of course. so the packet indeed experiences some congestion but not visible with qlen-INT metric.

On Tue, Dec 15, 2020 at 6:21 PM Pan, Rong <rong.pan@intel.com> wrote:
Yuchung,

Thanks for reviewing the draft. Good point about where/how to measure the queue length.  We will look into adding a paragraph to describe the difference between ingress or egress-based qlen or whether they can be mixed. From your experience, what issues do you think we need to look out for?

Best,

Rong

From: Yuchung Cheng <ycheng@google.com>
Date: Tuesday, December 15, 2020 at 3:17 PM
To: Barak Gafni <gbarak@nvidia.com>
Cc: NBU-Contact-Rui Miao <miao.rui@alibaba-inc.com>, iccrg <iccrg@irtf.org>, "Pan, Rong" <rong.pan@intel.com>, NBU-Contact-Harry Liu <hongqiang.liu@alibaba-inc.com>, "jri.ietf" <jri.ietf@gmail.com>, "Lee, Jeongkeun" <jk.lee@intel.com>, Barak Gafni <gbarak@mellanox.com>, Yuval Shpigelman <yuvals@mellanox.com>
Subject: Re: [iccrg] New draft submitted for draft-pan-tsvwg-hpccplus-02.txt

On Tue, Dec 15, 2020 at 3:03 PM Barak Gafni <gbarak@nvidia.com> wrote:
Hi,
Thanks for the interest in this work. For your question, at this point the draft has been kept open in terms of what is the exact inband telemetry technology to be used in order to implement the algorithm. The idea was to enable a variety of implementations. With that, one option we are focusing on is IOAM which is under work at IPPM WG, and has a data draft specifying formats for the communication of these metrics. Alongside this main data draft, there is a another draft in its initial work that adds few more fields, which may be used for HPCC++.
You are welcome to look here:
https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-11
https://tools.ietf.org/html/draft-gafni-ippm-ioam-additional-data-fields-00
Actually the qlen is very specifically defined:)
https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-11#section-5.4.2.7

I understand and agree with the intention to keep telemetry options more flexible (to get wider HW support). A paragraph explaining what are the key properties or requirements of these metrics to achieve a precise link load estimate would provide more guidance. For example the qlen defined in ippm draft is the "queue length at departure time". Will the algorithm work the same if qlen is metered at ingress (say some HW can't do egress for some reason). What if there are hybrid mix of different qlen measurements on the path.

      includes link load (txBytes, qlen, ts) and link spec (switch_ID,
      port_ID, B) at the egress port.  Note, each switch should record
      all those information at the single snapshot to achieve a precise
      link load estimate."

Any further feedback is welcome. 

Thanks,
Barak

From: Yuchung Cheng <ycheng@google.com> 
Sent: Tuesday, December 15, 2020 2:35 PM
To: NBU-Contact-Rui Miao <miao.rui@alibaba-inc.com>
Cc: iccrg <iccrg@irtf.org>; Pan, Rong <rong.pan@intel.com>; NBU-Contact-Harry Liu <hongqiang.liu@alibaba-inc.com>; jri.ietf <jri.ietf@gmail.com>; Lee, Jeongkeun <jk.lee@intel.com>; Barak Gafni <gbarak@mellanox.com>; Yuval Shpigelman <yuvals@mellanox.com>
Subject: Re: [iccrg] New draft submitted for draft-pan-tsvwg-hpccplus-02.txt

Interesting work!

It'd be good to know more precise requirements on INT to help both vendor supports (beside MLX) and CC evaluation

For example
qlen         | Telemetry info: link j queue length 

qlen == instant qlen snapshot at packet ingress or egress, on a per-port-per-queue basis, or some windowed-avg / aggregate etc. 

On Mon, Dec 14, 2020 at 4:11 PM Rui, Miao <miao.rui@alibaba-inc.com> wrote:
Hello ICCRG members,

Alibaba, Intel, and Mellanox have worked on an INT-based High Precision Congestion Control algorithm: HPCC++. We have posted an initial draft that can be found at
https://www.ietf.org/id/draft-pan-tsvwg-hpccplus-02.txt

The key design choice of HPCC++ is to use inband telemetry to provide fine-grained load information, such as queue size and accumulated tx traffic to compute precise flow rates. This has two major benefits:
1. HPCC++ can quickly converge to proper flow rates to highly utilize bandwidth while avoiding congestion;
2. HPCC++ can consistently maintain a close-to-zero queue for low latency.

We would love to hear your comments and feedback.

 Best regards,
Rui Miao
_______________________________________________
 iccrg mailing list
iccrg@irtf.org
https://www.irtf.org/mailman/listinfo/iccrg