Re: [Idr] draft-li-idr-congestion-status-extended-community

<bruno.decraene@orange.com> Wed, 19 July 2017 16:12 UTC

From: bruno.decraene@orange.com
To: li zhenqiang <li_zhenqiang@hotmail.com>, "draft-li-idr-congestion-status-extended-community@ietf.org" <draft-li-idr-congestion-status-extended-community@ietf.org>
CC: idr <idr@ietf.org>
Thread-Topic: draft-li-idr-congestion-status-extended-community
Thread-Index: AQHS/xR2ZquQmAoAMES5a+pKpEUSQaJbTbkg
Date: Wed, 19 Jul 2017 16:12:26 +0000
Message-ID: <14914_1500480746_596F84EA_14914_50_1_53C29892C857584299CBF5D05346208A47814C43@OPEXCLILM21.corporate.adroot.infra.ftgroup>
References: <12536_1499779219_5964D093_12536_99_1_53C29892C857584299CBF5D05346208A477FE298@OPEXCLILM21.corporate.adroot.infra.ftgroup> <HK2PR0601MB1361F407B2018606BEC9AEE5FCA00@HK2PR0601MB1361.apcprd06.prod.outlook.com>
In-Reply-To: <HK2PR0601MB1361F407B2018606BEC9AEE5FCA00@HK2PR0601MB1361.apcprd06.prod.outlook.com>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
Content-Type: multipart/alternative; boundary="_000_53C29892C857584299CBF5D05346208A47814C43OPEXCLILM21corp_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/8ugXj4ewfE1JYu1PYAsb6ZK-Caw>
Subject: Re: [Idr] draft-li-idr-congestion-status-extended-community
Precedence: list

Hello Zhenqiang,

Thanks for your reply.
Please see inline [Bruno]

From: li zhenqiang [mailto:li_zhenqiang@hotmail.com]
Sent: Monday, July 17, 2017 5:51 PM


Hello Bruno,

Thank you very much for your constructive comments. Please see my reply inline, begins with Reply. Sorry for the late response.

Best Regards,
________________________________
li_zhenqiang@hotmail.com<mailto:li_zhenqiang@hotmail.com>

From: bruno.decraene@orange.com<mailto:bruno.decraene@orange.com>
Date: 2017-07-11 21:20
To: draft-li-idr-congestion-status-extended-community@ietf.org<mailto:draft-li-idr-congestion-status-extended-community@ietf.org>
CC: idr@ietf.org<mailto:idr@ietf.org>
Subject: draft-li-idr-congestion-status-extended-community
Hi authors,

Please find below some minor comments.

§2  Congestion Status Extended Community

> The "Utilization" field is 1 octet.  Its value is the utilization of the exit link in unit of percent

- Is this the utilization on the link or before the link (i.e. before dropping the traffic in excess) ? IOW, can this be greater than 100%? I think I'd prefer the latter, but in all cases, I like to see a text detailing the handling of numbers > 100%.
Reply: If it is easy to get the utilization before the link, we can use it. But I am not sure about it. If you can, please contribute some text that I will incorporate in the next version.
[Bruno] Thanks.
I could propose
OLD:   Its value is the utilization of the exit link in unit of percent.
NEW:   Its value is the utilization of the exit link in unit of percent. It may be higher than 100% if the incoming traffic is higher than the link capacity.


- In order to offer more granularity, sometimes the traffic is rate limited to a capacity smaller than the physical link. e.g. the physical link has 10G of traffic, but only 3G is available to the user. Please explicit whether you are referring to physical or "real"/"offered"/"available" capacity. On my side, I'd prefer the latter.
Reply: I agree that the bandwidth should be the configured  available capacity of the link. I will explicit it in the next version.
 [Bruno] great thanks.
> The "Bandwidth" field is 1 octet.  Its value is the bandwidth of the exit link in unit of 10 gbps (gigabits per second).

I don't see that this is extremely future proof, as the usable range is 1-255. When this proposition gets used (in years from now), 100G links would presumably be the default. Meaning we would already have consumed *10 from the budget. Only leaving *25 for the future. I'd prefer a wider range with less precision. e.g. bandwidth is 10^^bandwidth in gbps. (or 2^^).
Reply: Indeed this is one of the key points we should consider during the design of the mechanism. In unit of 10 gbps, one octet can express the bandwidth from 10gbps to 2550gbps. I think it should be ok for the future. Your suggestion is difficult to express 40gbps, 100gbps and 400gbps etc. Anyway we will think about your suggestion and try to find a better solution. Your further suggestion is welcome.
[Bruno] I agree that my suggestion can’t express precise values.  Given that typically we have multiple/many ingress sending traffic to one single interface, I don’t feel that having a precise bandwidth value is an absolute requirement as one sender can’t use all that bandwidth. It need to share is with other ingress and has no way of knowing how much other ingress will adapt and send more or less traffic. But it’s really up to you.


> The link with bandwidth less than 10 gbps is not suitable to use this feature.

Why not? It looks to me that the %Utilization could be useful even if the bandwidth is not advertised. Possibly a "Bandwidth" reserved value (e.g. 0) could be specified to indicate that the bandwidth is not advertised. This would also fit the case where some ASes do not want to advertise their link capacity.
Reply: Agree. We will revise the draft to reflect this.
 [Bruno] ok, thanks
§3.  Application Considerations

> The SDN controller uses the exit link utilization information to steer the Internet access traffic among all the exit links from the perspective of the whole network.

Indeed, presumably this information is used to influence the routing behavior. May be the document should indicate that the reception of such community over IBGP session should not influence routing decision unless tunneling is used to reach the BGP Next-Hop. (to avoid forwarding loops, incremental deployment issues, complications in error handling).
Reply: Thanks for pointing out this. We will add some description in next revision.
 [Bruno] ok, thanks.
In addition, what are the interactions with the BGP Link Bandwidth Extended communities? https://tools.ietf.org/html/draft-ietf-idr-link-bandwidth  (I know that the draft expired, but still it's a WG document and an official IANA code point)
Ships in the night?
Reply: So far this draft didn’t consider the interaction with the link bandwidth Ext. community. As the link bandwidth Ext. community is non-ransitive, and the global administrator subfield is only 2-octets, it may not be applicable to all the scenarios described in this document.
[Bruno] I was not suggesting to replace congestion-status with link-bandwidth. I was asking for the expected behavior if link-bandwidth advertise a capacity of 40G while congestion-status advertise a capacity of 100G.

>   To avoid route oscillation, the exit router SHOULD set a threshold.   When the utilization change reaches the threshold, the exit router  SHOULD generate a BGP update message with congestion status extended  community.

I think that the document should better evaluate the churn introduced. In particular the churn is cumulative as the number of ASes crossed increase. e.g. if we assume that each AS use the community quite carefully by not advertising more than 1 update per hour, if we have 5 ASes on the way, the ingress AS can receive 5 (additional) updates per prefix per hour.
Reply: We will think about it further. Something like TTL (time to live) to be introduced? To have larger space to fit this, use BGP community container instead? Or do  you  have  some  suggestions?
 [Bruno] As of today, I was only calling for more text highlighting that the churn increase linearly with the number of ASes attaching this community on a given prefix.
One suggestion to improve the behavior: “When one BGP router needs to re-advertise a BGP path due to attribute changes, it SHOULD update its congestion-status-community at the same time. This allows reducing the churn as one the final ingress will receive a single UPDATE refreshing the N communities, rather than N UPDATEs, each refreshing one community”.
4.  Security Considerations
> This new extended community does not directly introduce any new security issues.

What about trust/cheating considerations? Especially from remote ASes with which you have zero relationship?
e.g. advertising alleged congestion in order to TE/influence routing of others ASes, advertising plenty of fake capacity to attract more traffic/customers, advertising that they are never experiencing any congestion for commercial reasons, fake advertisement on "behalf" of other ASes...
Reply: The trust/cheating problem you mentioned is a general issue for BGP. Can we make sure that the routes advertised by a BGP peer should be advertised by it? The BGP peer can generate some routes maliciously. Anyway,  we will add some analysis to this problem in next revision. For example,  the BGP receiver may choose to only trust the congestion  information advertised by some particular ASes, or ASes within particular hops. Any suggestion from you?
  [Bruno] I agree that this is a general trust issue in BGP, but this document proposes to advertise more information hence introduce new consideration. Plus in the end, if you don’t trust the data, there is no need to send them in the first place.
One possible deployment model is to filter congestion-status communities at the border of you trust/administrative domain. Hence all the one you receive are trusted.
Other options may be to try to sanitize the received value based on a-priori knowledge/expectations. (e.g. receiving a bandwidth capacity of 56kbit/s is highly suspicious in an Internet backbone). But this requires additional configuration to maintain.
More may also record the communities received over time, monitor the congestion e.g. via probing, detect inconsistency and choose to not trust anymore the ASes which advertise fake news…
--Bruno
Thanks,
Regards,
--Bruno

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

[Idr] draft-li-idr-congestion-status-extended-com… bruno.decraene
Re: [Idr] draft-li-idr-congestion-status-extended… li zhenqiang
Re: [Idr] draft-li-idr-congestion-status-extended… bruno.decraene
Re: [Idr] draft-li-idr-congestion-status-extended… li zhenqiang