Re: [Tsv-art] [Lsr] Tsvart early review of draft-ietf-lsr-isis-fast-flooding-06

"Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net> Thu, 29 February 2024 14:30 UTC
From: "Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net>
Message-Id: <655B1867-DE84-452D-8B0B-92E20C4CACE3@kuehlewind.net>
Content-Type: multipart/alternative; boundary="Apple-Mail=_615386D3-08DE-4829-9B8A-0917AAD5AE65"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\))
Date: Thu, 29 Feb 2024 15:30:07 +0100
In-Reply-To: <AS2PR02MB8839EC9039D4970F596AAC1CF04B2@AS2PR02MB8839.eurprd02.prod.outlook.com>
Cc: "Les Ginsberg (ginsberg)" <ginsberg=40cisco.com@dmarc.ietf.org>, "gsoligna@protonmail.com" <gsoligna@protonmail.com>, "draft-ietf-lsr-isis-fast-flooding.all@ietf.org" <draft-ietf-lsr-isis-fast-flooding.all@ietf.org>, "lsr@ietf.org" <lsr@ietf.org>, "tsv-art@ietf.org" <tsv-art@ietf.org>
To: bruno.decraene@orange.com
References: <170688584229.8474.2689075630611259243@ietfa.amsl.com> <AS2PR02MB883972D90690CAD347D0309FF0472@AS2PR02MB8839.eurprd02.prod.outlook.com> <75AD3A89-9BF7-495A-A53D-AFF6445CB47B@kuehlewind.net> <AS2PR02MB8839EC9039D4970F596AAC1CF04B2@AS2PR02MB8839.eurprd02.prod.outlook.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/TFgPZ3zaQ8mzaMxA8X9BT8MfArQ>
Subject: Re: [Tsv-art] [Lsr] Tsvart early review of draft-ietf-lsr-isis-fast-flooding-06
Precedence: list
Hi Bruno,

Sorry for my late reply.

Please see some more comments below.

> On 9. Feb 2024, at 21:50, bruno.decraene@orange.com wrote:
> 
> Hi Mirja,
>  
> Thanks for your replies. Please see inline [Bruno2]
>  
> On a side note, we have presented some tests results at IETF 111. If you want to have a look at them, please find below the slides. If you have some comments on the tests results, I would be obviously interested in your comments. Either on the list or of the list.
> https://datatracker.ietf.org/meeting/111/materials/slides-111-lsr-22-flow-congestion-control-00.pdf
>  
>  
>  
> > From: Lsr <lsr-bounces@ietf.org <mailto:lsr-bounces@ietf.org>> On Behalf Of Mirja Kuehlewind (IETF)
> > Sent: Thursday, February 8, 2024 2:06 PM
> > 
> > Hi Bruno,
> > 
> > Thanks for your replies.
> > 
> > On the high-level I think that some or most of the explanation you provide me below about parameter values, should actually go into the draft. I understand that there is not a one fits all but that’s why min/max values are often more important than recommended values. Yes, these might also not hold forever but maybe that just mean there is a point in future where it makes sense to update this RFC. Also you say it depends on the performance/capability of the router, however, I think you can say something like: with an average router today with these performance parameters, these are our tested values that showed good performance and also this and that value can e.g. scale with e.g. more CPU power. Something like this.
> > 
> > See further below.
>  
> [Bruno2] OK. See further below.
>  
> > > On 5. Feb 2024, at 19:17, bruno.decraene@orange.com <mailto:bruno.decraene@orange.com> wrote:
> > > 
> > > [+Les, Guillaume as we go quite deep in the discussion]
> > > 
> > > Hi Mirja,
> > > 
> > > Thank you for your review and comments. Very useful.
> > > 
> > > Please see inline [Bruno]
> > > 
> > >> From: Mirja Kühlewind via Datatracker <noreply@ietf.org <mailto:noreply@ietf.org>>
> > >> Sent: Friday, February 2, 2024 3:57 PM
> > >> 
> > >> Reviewer: Mirja Kühlewind
> > >> Review result: Not Ready
> > >> 
> > >> First of all I have a clarification question: The use the of flags TLV with the O flag is not clear to me. Is that also meant as a configuration parameter or is that supposed to be a subTLV that has to be sent together with the PSNP? If it is a configuration, doesn’t the receiver need to confirm that the configuration is used and how does that work in the LAN scenario where multiple configurations are used? If it has to be sent together with the PSNP, this needs to be clarified and it seem a bit strange to me that it is part of the same TLV. Or maybe I’m missing something completely about the flag?
> > > 
> > > [Bruno] The O-flag is advertised by the receiver in the Flags sub-TLV, which may be sent either in PSNP or IIH.
> > > That's not a configuration but a capability of the receiver which is signaled to the sender.
> > > That's only applicable to the point-to-point scenario, not the LAN scenario. ( as on a LAN there is no explicit acknowledgment of the receipt of LSPs between a given LSP transmitter and a given LSP receiver).
> > > 
> > >> it seem a bit strange to me that it is part of the same TLV
> > > 
> > > [Bruno]
> > > All those sub-TLVs, at least the one currently defined, carries
> > > (relatively) static parameters and are not required to be sent in all IIH or PSNP. The way IS-IS acknowledges the reception of LSP is not changed They are all grouped in a single TLV called " Flooding Parameters TLV" for grouping purpose and also because IS-IS has a limited TLV space.
> > > If the above does not clarify, could you please elaborate on what you feel "strange" about?
> > 
> > Okay, it a capability. Does that mean if the capability is announced the routing that has sent the announcement will consider order in all PSNP? I think this could be stated more clearly.
>  
> [Bruno2] Correct, that's a capability but also a commitment to do it.
> Would the following rephrase help?
>  
> OLD:
> When the O-flag (Ordered acknowledgement) is set, the LSPs will be
> acknowledged in the order they are received: a
> PSNP acknowledging N LSPs is acknowledging the
> N oldest LSPs received. The order inside the
> PSNP is meaningless. If the sender keeps track
> of the order of LSPs sent, this indication
> allows a fast detection of the loss of an
> LSP. This MUST NOT be used to alter the
> retransmission timer for any LSP. This MAY be used to
> trigger a congestion signal.</t>
>  
> NEW:
> When setting the O-flag, the LSP receiver MUST acknowledge LSPs in the same order than it has received the LSPs. Therefore, a PSNP acknowledging N LSPs is acknowledging the N oldest received LSPs. Note that the order of LSP-IDs inside the PSNP is meaningless.
> The LSP sender MAY use this information to faster detect the loss of an LSP and trigger a congestion signal. it MUST NOT use this information to alter the retransmission timer for any LSP.
>  
I guess the point that is not clear to me is, how does the sender of the LSP know that the receiver of the LCP understands the O-flag and has processed it accordingly? Maybe I missed something?

>  
> > > 
> > > 
> > >> Then, generally thank you for considering overload and congestion carefully.
> > >> Please see my many comments below, however, I think one important part is to ensure that the network/link doesn’t get normally overloaded with the parameter selected. You give some recommendation about parameters to use but not for all and, more importantly, it would be good to define boundaries for safe use.
> > >> What’s a “safe” min or max value? I know this question is often not easy to answer, however, if you as the expert don’t give the right recommendations, how should a regular implementer make the right choice?
> > > 
> > > [Bruno]
> > > Very fair points. And thank you for acknowledging that this question is not easy to answer...
> > > TL;DR: sorry, I don't know.
> > > 
> > > Two general statements first:
> > > - IS-IS is running in the control plane of two adjacent routers, typically in backbone of network operators. There is a single point to point link over fiber with typically latest interfaces speed (e.g. > 100G today). I would not assume that IS-IS would overload, or even significantly load this interface. From a jitter standpoint packet priority/CoS could be discussed but I would assume that I'm assuming that this is a different discussion.
> > > - currently IS-IS has no flow control nor congestion control. Given this, current values are very conservative (e.g., one packet every 33ms). At the same time that's a very important signaling for the network: we would prefer not dropping LSP but on the other hand delaying LSP for seconds is not helping. For historically and good reasons IS-IS implementers are very conservative. As of today, I would not assume that they would be too aggressive.
> > > - one problem of stating values in RFC is that those values may not age well. That's typically the case with IS-IS with some parameters values which are still 25 years old while CPU and networks have evolved significantly since then. So I'm a bit reluctant to write static values in stone again.
> > > 
> > > Coming back to min and max value:
> > > - I'm not an implementor, but I do care about my networks. If I were an implementor, I would play safe and advertise values which are safe for my implementation as receiver. I would rather use those values as a protection from a too aggressive sender, than as a permission to overload (DOS) me. Both sender and receiver are within the same administrative domain (for certain). In case of issue, the network operator will debug both the sender and receiver and blame the one which did not behave.
> > > - There are a wide range router size/price/capability/generation. Plus I would expect this value to gradually improve as the implementation improve thanks to the capabilities introduced by this document.
> > > - I'm definitely not a transport/TCP expert. Does TCP define such min and max values?
> > 
> > Yes and no. TCP has a default ACK rate of 2. There is no recommendation on e.g. the receive window, however, for TCP the purpose it to fully utilise the link and the receive window can be adjusted overtime depending on the current congestion window (and local load). Also there is no fixed pacing rate; pacing is dynamically calculated based on RTT and the congestion window. So in short I don’t think this is comparable as many of the parameter are not a fixed configuration and that’s because is not only about avoiding overload but also maximising throughput.
>  
> [Bruno2] OK, IS-IS history is a bit different than the TCP one
> As indicated in §3, IS-IS implementations had "historical behaviors" with static parameters configured on the LSP sender, irrespectively of the capability of the receiver (although they represent limitations of the receiver)
> This document allows two things:
> -a- the signaling of those current parameters by the receiver. That's useful now and easy to do.
> -b- new flow control and congestion control behavior. Using the new Receive Window sub-TLV and requiring improvements on the received (§5). That's more medium term in multi vendor networks as both the sender and the receiver needs to be upgraded (although, this may depend on what implementations already do) Just like with TCP I believe, different congestion control may be specified for the sender.
>  
> In retrospect, may be having two separate documents would have been easier.
>  
> Coming back to "safe" values. I'm a bit reluctant to indicate values which may quickly be outdated, especially given that TCP does not do it either, but I can live with adding typical acceptable range in section  6.2.4. Determining values to be advertised in the Flooding Parameters TLV
> Something like, to be added after the second paragraph

As you say there are two parts, one is setting stair parameters. The other is dynamically adjusting congestion control. Note that TCP always has congestion control. However, for you case a) you still need to give some recommendation to implementor otherwise they will get it wrong. It better to be too conservative than to aggressive. Also if you described what the assumption are for these recommendation e.g. in relation to CPU capabilities, implementors can adjust accordingly when these assumption change in future. I think there is still a lack of both in the comment: 1) making default recommendations or at least discussion approbate value ranges and 2) clearly spell out the assumptions made rather then just recommending random number (where something is recommended).

>  
> NEW:  Static values are dependent on the CPU generation, class of router and network scaling, typically the number of adjacent neighbors. As examples at the time of publication, LSP Burst Size could be in the range 5 to 20, LSP Transmission Interval in the range of 1ms to 33 ms, LPP in the range of 5 to 90 with a proposed 15. PartialSNPInterval in the range 50ms to 500ms with a proposed 200ms. Receive Window in the range of 30 to 200 with a proposed 60. In general, the larger the better the performance on links with high RTT.

I think it would be good to add these recommendation, however, I don’t think I saw this in the new draft version? However, as said above and inline with your concern about outdating, I would recommend to say even more why these? Are there any hard limits (MUST be larger/smaller than)? Which value depends on which things you name in the first sentence. E.g. is there anything that should scale with the number of neighbours? 


>  
> FYI, the IS-IS spec use this type of phrasing "A reasonable value is 10 s"
>  
> > > 
> > > We propose some guidance for the LPP (which is somewhat IS-IS
> > > specific) in 
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata <https://data/>
> > > tracker.ietf.org <http://tracker.ietf.org/>%2Fdoc%2Fhtml%2Fdraft-ietf-lsr-isis-fast-flooding-06%2
> > > 3section-5.1&data=05%7C02%7Cbruno.decraene%40orange.com <http://40orange.com/>%7C796ca607c751
> > > 4eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b6f5d20%7C0%7C0%7C6384
> > > 29945017174074%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l
> > > uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=FLHA6NCWwpX9DN2
> > > RK7K2oicXt5cGNEogmveul5oFYJQ%3D&reserved=0
> > > But for other parameters, I'm not sure that indicating values would be useful.
> > > 
> > >> Please see further comments below.
> > >> 
> > >> Section 4.7:
> > >> “NOTE: The focus of work used to develop the example algorithms discussed later in this document focused on operation over point-to-point interfaces. A full discussion of how best to do faster flooding on a LAN interface is therefore out of scope for this document.”
> > >> 
> > >> Actually this is quite important and also not clear to me. You do discuss how to interpret parameters in a LAN scenario but then you say you only give proper guidance how to adjust the sending rate for non-LAN. But what’s the right thing to do in LAN then? Why is LAN out of scope? If you don’t give guidance, I think you have to also say that this mechanism that enables using higher values in this document MUST NOT be used on LAN setups.
> > > 
> > > [Bruno] In point-to-point there is one sender for one receiver. In LAN, there are N receivers for 1 sender, and possibly N senders for all/each receiver. The guidance is whether the multiplicative factor is to be handled by the sender or the receiver. Document says that the value is used by the sender as-is, so it's up to the receiver to take into account the number of speaker on the LAN. This guidance seems required for correct semantic.
> > > 
> > > Then the TLV may carries different sub-TLVs. Some may be applicable to LAN (e.g., Burst Size).
> > > Some are less applicable to LAN because the way IS-IS acknowledge LSP is different in LAN and less dynamic. The LAN case is both less frequent those days (if not rare in backbones) and more difficult to handle as IS-IS acknowledges LSP in a slow and less explicit way hence we have a loose feedback loop to use. Eventually, someone could define new sub-TLV or procedure to improve the LAN case therefore I don't think that we should define TLV as not applicable to LAN.
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata <https://data/>
> > > tracker.ietf.org <http://tracker.ietf.org/>%2Fdoc%2Fhtml%2Fdraft-ietf-lsr-isis-fast-flooding-06%2
> > > 3section-6.2.1.2&data=05%7C02%7Cbruno.decraene%40orange.com <http://40orange.com/>%7C796ca607
> > > c7514eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b6f5d20%7C0%7C0%7C
> > > 638429945017182027%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> > > iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=vAQkKFNMlgZ
> > > zdjvjOmI4Z5fFOMSKCes3kIDPQFaNOo8%3D&reserved=0 does partially discuss
> > > and cover the LAN case. Possibly the operation would be further
> > > improved, but I think that it's too late to add specification for the
> > > LAN . This may be covered in a subsequent doc. (but really LAN is not
> > > a priority those days)
> > 
> > As I said above, I think either you need to provide appropriate and equavent guidance for LAN or you make to restrict this extension to non-LAN scenarios and say that explicitly in the (like "MUST NOT be used in LAN scenarios”).
>  
> [Bruno2]
> The sub-TLVs may be used in LAN and the LAN case is defined in §4.7.
> The flow control algorithm may be left out of scope. Section 6.2.1.2 content replaced with "Flow and congestion control on a LAN interface is out of scope for this document." Les essentially proposed the same in another email.

I still feel that 4.7. is also not specified fully enough to use any of this on LAN.

 
>  
> > >> 
> > >> Section 5.1:
> > >> “The receiver SHOULD reduce its partialSNPInterval. The choice of this lower value is a local choice. It may depend on the available processing power of the node, the number of adjacencies, and the requirement to synchronize the LSDB more quickly. 200 ms seems to be a reasonable value.”
> > >> 
> > >> Giving some recommended value is fine, however, it would be more important to ensure safe operation to give a range or at least a minimum value.
> > > 
> > > [Bruno] Maximum value is defined in the "old" IS-IS spec. Minimal value seems very implementation specific to me.
> > > More importantly, I don't think that safety comes into play but text could be more explicit on this. The goal is for the receiver to provide some "frequent" feedback to the sender so that the sender can adapter faster and hence be "safer": "Faster LSP flooding benefits from a faster feedback loop. This requires a reduction in the delay in sending PSNPs."
> > > Nothing breaks if the receiver is too slow. Quite the contrary, the flow control algorithm would slow down, hence be on the safe side.
> > > 
> > > In order to be more explicit, I would propose the addition of the following text:
> > > "The value of the "Partial SNP Interval sub-TLV" MAY be used by the sender for flow control and congestion control. It MUST NOT be used to trigger LSP retransmission."
> > > I'd rather add this in section 4.5 which defines the sub-TLV
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata <https://data/>
> > > tracker.ietf.org <http://tracker.ietf.org/>%2Fdoc%2Fhtml%2Fdraft-ietf-lsr-isis-fast-flooding-06%2
> > > 3name-partial-snp-interval-sub-tl&data=05%7C02%7Cbruno.decraene%40oran
> > > ge.com <http://ge.com/>%7C796ca607c7514eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b
> > > 6f5d20%7C0%7C0%7C638429945017187954%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> > > 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&
> > > sdata=yr0RSizTL1D1QdEsDYy4rAfErH%2BTtvxqaaZOw3AO7sU%3D&reserved=0
> > 
> > Not sure we talk about the same when we say min and max here. Given this is an interval, I thought it means if I have a smaller interval I will send PSNPs more often.
>  
> [Bruno2] Agreed
>  
> > Thus it’s a safety measure to define a minimum, no?
>  
> [Bruno2] a safety measure to protect from yourself? As the one choosing the value is the one responsible for being compliant with this value. The other system has no specific requirement.
> In any case, cf text above with range indicated.

Yes but I can overload the other end. Also as I said sometimes implementor don’t consider that fully and min value is an easy way to avoid that implementor shot themself in the foot. I agree that the protocol can not enforce that but it still useful to recommend something.

>  
> That being said, in order to increase the safety margin, one may advertise in the Partial SNP Interval sub-TLV a value higher than the one it's using. That would reduce its commitment. But I'm not sure that going into such detail will help with clarity.
>  
> > >> 
> > >> Also on use of normative language. Just saying “The receiver SHOULD reduce its partialSNPInterval.” Is a bit meaningless without saying when and to with value/by how much. I guess you should say something like “partialSNPInterval SHOULD be set to 200ms and MUST NOT be lower than X.”
> > > 
> > > [Bruno] Good point. Thank you. Proposed change:
> > > OLD:  The receiver SHOULD reduce its partialSNPInterval.
> > > NEW: For the generation of PSNPs, the receiver SHOULD use a partialSNPInterval smaller than the one defined in [ISO10589].
> > > 
> > > 
> > >> “The LPP SHOULD also be less than or equal to 90 as this is the maximum number of LSPs that can be acknowledged in a PSNP at common MTU sizes, hence waiting longer would not reduce the number of PSNPs sent but would delay the acknowledgements. Based on experimental evidence, 15 unacknowledged LSPs is a good value assuming that the Receive Window is at least 30 and that both the transmitter and receiver have reasonably fast CPUs.”
> > >> 
> > >> Why is the first SHOULD a SHOULD and not a MUST? What is a reasonable fast CDU?
> > > 
> > > [Bruno] The first "SHOULD" is a SHOULD because nothing breaks if it's not applied. Also the goal is for the receiver to provide frequent/fast feedback to the sender. If the throughput were very fast (e.g., 1Gb/s in a distant future), sending an acknowledgement (PSNP) every 90 LSPs would provide a feedback every 1 ms which seem relatively responsive, especially compared to the link RTT in WAN. Possibly this could be rephrased to better focus on the need.
> > > OLD:   The LPP SHOULD also be less than or equal to 90 as this is the maximum number of LSPs that can be acknowledged in a PSNP at common MTU sizes, hence waiting longer would not reduce the number of PSNPs sent but would delay the acknowledgements.
> > > NEW:  The LPP SHOULD be less than or equal to the maximum number of LSPs that can be acknowledged in a PSNP because waiting longer would not reduce the number of PSNPs sent but would delay the acknowledgements. This is 90 at common MTU sizes. 
> > > 
> > >> What is a reasonable fast CDU?
> > > Touché! That's the issue with indicating fixed value which will be outdated in the future.
> > > Proposed change:
> > > OLD:   Based on experimental evidence, 15 unacknowledged LSPs is a good value assuming that the Receive Window is at least 30 and that both the transmitter and receiver have reasonably fast CPUs.
> > > NEW:  Based on experimental evidence, 15 unacknowledged LSPs is a good value assuming that the Receive Window is at least 30.
> > > 
> > > 
> > >> Why would the receive window be 30? Is that also the value that you would recommend? So you maybe more generally aim to recommend to set the LPP to half the Receive Window (or does it have to be those specific values)?
> > > 
> > > LPP value is discussed in 
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata <https://data/>
> > > tracker.ietf.org <http://tracker.ietf.org/>%2Fdoc%2Fhtml%2Fdraft-ietf-lsr-isis-fast-flooding-06%2
> > > 3section-6.2.2.5&data=05%7C02%7Cbruno.decraene%40orange.com <http://40orange.com/>%7C796ca607
> > > c7514eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b6f5d20%7C0%7C0%7C
> > > 638429945017192504%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> > > iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=iGrT951Bcxu
> > > I%2BPOcgnPWFWo047LvukuRQd0qkZrKCqk%3D&reserved=0
> > > I would propose to add
> > > NEW: The choice of the LPP value is discussed in
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata <https://data/>
> > > tracker.ietf.org <http://tracker.ietf.org/>%2Fdoc%2Fhtml%2Fdraft-ietf-lsr-isis-fast-flooding-06%2
> > > 3section-6.2.2.5&data=05%7C02%7Cbruno.decraene%40orange.com <http://40orange.com/>%7C796ca607
> > > c7514eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b6f5d20%7C0%7C0%7C
> > > 638429945017196849%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> > > iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=nE98Qs2ZwJQ
> > > EcD0awdIb70a5ezBCoPO1h7OvGdGdtA4%3D&reserved=0
> > > 
> > > To answer your questions:
> > > - In tests, we found the LPP 15 is a good trade-of between increase feedback rate to the sender and increased load of acknowledging on both the receiver and sender.
> > > - As indicated in 
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata <https://data/>
> > > tracker.ietf.org <http://tracker.ietf.org/>%2Fdoc%2Fhtml%2Fdraft-ietf-lsr-isis-fast-flooding-06%2
> > > 3section-6.2.2.5&data=05%7C02%7Cbruno.decraene%40orange.com <http://40orange.com/>%7C796ca607
> > > c7514eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b6f5d20%7C0%7C0%7C
> > > 638429945017202090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> > > iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=%2BU6ssEbcP
> > > dTZvyeKItPBu6Kaf4QVa03ts6RmJB%2FSSE0%3D&reserved=0 for performance
> > > reasons its better if LPP is an integer fraction of the Receive
> > > Window. Hence choosing LPP 15 assume that the receive window is 30
> > > - we don't recommend the receive window to be 30. In general, the larger the better for links with high RTT. However, this is a new concept for IS-IS and indicating a high number may afraid IS-IS implementation.  30 LSPs is a 45ko receive window which seems relatively small for a control plane memory (typical between a laptop and a small server) and for a critical protocol.
> > > - I agree with you that setting LPP to half the receive window works fine. But a third would probably be even better with a large receive window. It really depends on the receive window and the goal is to provide a fast feedback: if the receive window is very large, we don’t necessarily want to delay PSNP too much.
> > > - it definitely does not have to be a specific value.
> > > - essentially, the LPP adds a delay to the feedback loop, in addition to the link RTT. (it adds "LSP sending rate"/LPP). Depending on the Receive Window and link RTT, LPP may reduce the achievable rate. But in many cases it would not. May be indicating that "desired LSP sending rate"/LPP be significantly smaller than link RTT would help the reader, but on the other hand I feel that it's a bit late for a change like this, unless you would support it given your experience in the transport area.
> > > 
> > I think you need to discuss all this in the draft.
>  
> [Bruno2]
> Adding in §5.1
> NEW: The smaller the LPP, the faster the feedback to the sender and possibly the higher the rate if the rate is limited by the end to end RTT (link RTT + time to acknowledge). But the higher the number of PSNPs sent and hence possibly the CPU and IO load on both the sender and receiver.
>  
> Also added:
> NEW: LPP should not be chosen too high as the congestion control starts with a congestion window of LPP+1.
>  
> As proposed above, some text has also been added on the Received Window when discussing the safe ranges.

This makes it better.
>  
> > >> 
> > >> Section 5.2:
> > >> 
> > >> “Therefore implementations SHOULD prioritize the receipt of Hellos and then SNPs over LSPs. Implementations MAY also prioritize IS-IS packets over other less critical protocols.”
> > >> 
> > >> 
> > >> What do mean by prioritise exactly? I find the second sentence here meaningless when you only say “less critical protocols”. What do you mean by this? How should I as an implementer decide which protocols are more or less critical?
> > > 
> > > [Bruno] On routers, packets being transited are typically forwarded at line rate. But packets received from a set of very high-speed interfaces (e.g. aggregated received bandwidth of 10Tb/s) to the router's control plane (e.g. a laptop) face a significant bottleneck. Typically this bottleneck can give priority to some packets and may rate limit flows to protect from (D)DOS.
> > > On one hand, relative priority between protocols is indeed a local choice of the implementer. Based on experience, he should know. I don't expect this to be novel, but as we increase the rate of IS-IS LSPs, the point is more important so we felt we should raise the point. To some extent, the question is the same for CPU allocation and scheduling. E.g., I would give more priority to IS-IS compared to BMP (monitoring) or even BGP even though BGP is an important routing protocol. Because essentially IS-IS is critical, the foundation of routing in the network and it's computation just assume that flooding is "perfect" so is sensitive to lack of global database consistency.
> > 
> > I think you also need to provide more explanation in the draft to make this recommendation useful.
>  
> [Bruno2]
> OLD:  Therefore implementations SHOULD prioritize the receipt of Hellos and then SNPs over LSPs. Implementations MAY also prioritize IS-IS packets over other less critical protocols
> NEW: Therefore implementations SHOULD prioritize IS-IS PDU on the way from the incoming interface to the IS-IS process. The relative priority of packets in decreasing order SHOULD be: Hellos, SNPs, LSPs. Implementations MAY also prioritize IS-IS packets over other protocols which are less critical for the router or network, less sensitive to delay or more bursty (e.g., BGP).
>  
> We'll see if people complain about comparing the relative importance of IETF protocols and we'll adapt if needed.

That’s is better. I guess the other option would be to not talk about prioritisation with respect to other protocols but say that timely execution is important for this protocol and this must be considered if other protocols are sending data over the same link/interface.

>  
> > 
> > > 
> > >> Section 6.1:
> > >> “Congestion control creates multiple interacting control loops between multiple transmitters and multiple receivers to prevent the transmitters from overwhelming the overall network.”
> > >> 
> > >> This is an editorial comment: I think I know what you mean but the sentence is not clear as there is always only one congestion loop between one transmitter and one receiver.
> > > 
> > > [Bruno] Yes you know much better than us. Hence a suggestion would be welcomed 😉.
> > > I don't feel that the sentence contradicts your point as:
> > > - I agree that "there is always only one congestion loop between one transmitter and one receiver"
> > > - The sentence is trying to say that here are multiple senders hence
> > > multiple control loops. And that they affecting each other as they may
> > > compete for the same common resource on the way
> > > 
> > > We are trying to explain the different between flow control and congestion control, why both are useful and possibly why congestion control is harder.
> > > Suggestion is welcomed.
> > 
> > Yes, the sentence is not wrong, that why I said it’s an editorial comment. I just think the sentence is not very clear. Actually not sure why this sentence is needed at all.
>  
> [Bruno2] The easy answer would be to remove the sentence. However during discussions in the WG, sometimes the distinction between flow control and congestion control was not always clear. E.g., if we have congestion control, do we need flow control in addition? 
> What about
> OLD:  Congestion control creates multiple interacting control loops between multiple transmitters and multiple receivers to prevent the transmitters from overwhelming the overall network.
> NEW: Congestion control prevents the set of transmitters from overwhelming the network on the path.

Works for me.

>  
> > >> 
> > >> Section 6.2.1:
> > >> “If no value is advertised, the transmitter should initialize rwin with its own local value.”
> > >> 
> > >> I think you need to give more guidance anyway but a good buffer size might be.
> > >> However, if you don’t know the other ends capability, I’m not sure if you own value is a good idea or if it would be better to be rather conservative and select a low value that still provides reasonable performance.
> > > 
> > > [Bruno] Actually we meant
> > > OLD:  If no value is advertised, the transmitter should initialize rwin with its own local value.
> > > NEW: If no value is advertised, the transmitter should initialize rwin with its locally configured value for this neighbor.
> > > 
> > > As https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Fdraft-ietf-lsr-isis-fast-flooding-06%23section-4&data=05%7C02%7Cbruno.decraene%40orange.com%7C796ca607c7514eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b6f5d20%7C0%7C0%7C638429945017206680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=CJ%2B5V8SQ8kpTeTWk6QGSVgS25foShKFAcCTKzde6L54%3D&reserved=0 <https://datatracker.ietf.org/doc/html/draft-ietf-lsr-isis-fast-flooding-06#section-4> says that if a value is not advertised, a locally configured value should be used.
> > > 
> > > I agree with you that more guidance would be helpful. Thank you for the comment.
> > > Yet not completely easy especially for IS-IS as this is a new concept and different implementations may be very different.
> > > 
> > > We could add the following text at the end of §6.2.1
> > > NEW: The RWIN value is of importance when the RTT is the limitation. In this case the optimal size is the desired LSP rate divided by the RTT. The RTT being the addition of the link RTT plus the time taken by the receiver to acknowledge the first received LSP in its PSNP. 50 or 100 may be reasonable default numbers.
> > > 
> > > Comments and suggestion welcomed.
> > > FYI:
> > > - typical hardware is the equivalent of a high end laptop although RAM
> > > tend to be less restricted those days (e.g. 128GB)
> > > - I would rate IS-IS as critical so would have priority
> > > - 100 IS-IS neighbhor seems a reasonable medium range
> > > - 1k to 10k LSP/s would already be great.
> > > - link RTT is very variable and depends to link distance. Could be 10km in dense area, 100s km in typical WAN in Europe, 1000 km seems high given country sizes and optical capabilities. Max would be intercontinental links crossing oceans.
> > 
> > Again, I think you need to explain this in he draft.
>  
> [Bruno2] Proposed NEW:
> The RWIN value is of importance when the RTT is the limiting factor for the throughput. In this case the optimal size is the desired LSP rate divided by the RTT. The RTT being the addition of the link RTT plus the time taken by the receiver to acknowledge the first received LSP in its PSNP. 50 or 100 may be reasonable default numbers. As an example, a RWIN of 100 requires a control plane input buffer of 150 ko per neighbor assuming an IS-IS MTU of 1500 octets and limits the throughput to 10000 LSPs per second and per neighbor for a link RTT of 10 ms. With a same RWIN, the throughput limitation is 2000 LSP per second when the RTT is 50ms. That's the maximum throughput assuming no other limitations such as CPU limitations.

I think that helps.

>  
> > > 
> > >> Section 6.2.1.1:
> > >> “The LSP transmitter MUST NOT exceed these parameters. After having sent a full burst of un-acknowledged LSPs, it MUST send the following LSPs with an LSP Transmission Interval between LSP transmissions. For CPU scheduling reasons, this rate may be averaged over a small period, e.g., 10-30ms.”
> > >> 
> > >> I not sure I fully understand what you mean by “averaged over a small period”?
> > >> What exactly?
> > > 
> > > [Bruno]
> > > Rate is averaged over a small period (T) : #LSP during T / T. In which case during this period, burst is allowed.
> > > e.g. with 100 LSP / 10 ms, one may send 10 LSPs every 1 ms rather than a strict 1 LSP every 0.1ms.
> > > In any case, the burst size is not to be exceeded.
> > > 
> > > If this is not clear, any suggestion welcomed.
> > 
> > That is actually not clear to me. Why do I need a burst-size then if I can exceed it?
>  
> [Bruno2] The burst-size if the number of LSPs which can be sent at infinite rate (aka back-to-back).
> Subsequently, one must not exceed the sustainable rate (1 LSP per "LSP Transmission Interval" in average)
>  
> The ability to send N LSPs back-to-back originates from the IS-IS spec and is common in existing implementations. Hence it's useful to signal; in particular for senders which don't want to implement the congestion control algorithm.
>  
> > > 
> > >> Section 6.2.1.2:
> > >> “f no PSNPs have been generated on the LAN for a suitable period of time, then an LSP transmitter can safely set the number of un-acknowledged LSPs to zero.
> > >> Since this suitable period of time is much higher than the fast acknowledgment of LSPs defined in Section 5.1, the sustainable transmission rate of LSPs will be much slower on a LAN interface than on a point-to-point interface.”
> > >> 
> > >> What a suitable period of time? Can you be more concrete?
> > > 
> > > [Bruno] Good question. But difficult question.
> > > Les, would you have some suggestion?
> > > Otherwise, rather than adding more text for the LAN case, I'd rather
> > > remove some text with the following change
> > > 
> > > OLD:
> > > However, an LSP transmitter on a LAN can infer whether any LSP receiver on the LAN has requested retransmission of LSPs from the DIS by monitoring PSNPs generated on the LAN. If no PSNPs have been generated on the LAN for a suitable period of time, then an LSP transmitter can safely set the number of un-acknowledged LSPs to zero. Since this suitable period of time is much higher than the fast acknowledgment of LSPs defined in Section 5.1, the sustainable transmission rate of LSPs will be much slower on a LAN interface than on a point-to-point interface.
> > > 
> > > NEW: /nothing/
> > > 
> > > As already stated, probably one could do better in the LAN case. E.g., advertising the delay between periodic CSNP (which would answer your question), sending in (LSP-ID) order, having the receiver send PSNP on a range of LSP-ID after a specific delay/LPP. But again, LAN is not seen as the priority in this document.
> > > 
> > >> Section 6.2.2.1
> > >> 
> > >> - As a side note, I don’t think figure 1 is useful at all…
> > > 
> > > [Bruno] OK. Note that Figure 2 is somewhat referring to Figure 1.
> > > Would you suggest removing it entirely or adding more information. E.g.
> > >   +---------------+
> > >   |               |
> > >   |               v
> > >   |     cwin = cwin0 = LPP + 1
> > >   |   +----------------------+
> > >   |   | Congestion avoidance |
> > >         cwin increases as LSPs are acked
> > >   |   + ---------------------+
> > >   |               |
> > >   |               | Congestion signal
> > >   ----------------+
> > > 
> > 
> > It’s a matter of taste. I realised that this relates to figure 2. Figure has at least some more information. I don’t find both figure super helpful but totally your choice!
> > 
>  
> [Bruno2] OK. At this point we'll keep the figures as some person found it useful.
>  
> > >> - cwin = LPP + 1: Why is LPP selected as the start/minimum value? Earlier on you say that LPP must be equal or less than 90 and recommend a value of 15.
> > >> These values seem already large.
> > > 
> > > [Bruno] The receiver will not acknowledge anything before LPP LSPs are sent (*). In the absence of feedback the sender has no feedback loop and not even information about RTT. So we may shape the LSP being sent, but we do need to send LPP LSP to get some feedback. +1 is too allow for a loss of LSP.
> > > (*) well it will acknowledge after a delay (Partial SNP Interval) but the suggested value is 200ms which seems significant and some implementation may not advertise this sub-TLV in which case default IS-IS value is in seconds. 
> > > 
> > > Retrospectively, we could have requested the receiver to quickly acknowledge the first received LSP (e.g., starting with LPP=1 and then increasing it). But we didn't and this may be seen as too late for a change. Still, adding this would not make existing implementation not compliant and may improve future evolutions. So I would welcome your feedback on this.
> > > Also this argument is only applicable for the first iteration of congestion avoidance. Subsequent ones could have use information from the past.
> > > 
> > > Regarding cwin of 16, yes that's a significant start, but we know that
> > > the single link can largely handle this (>100Gbit/s), that diffserv
> > > Cos is enabled if needed, that some existing implementations blindly
> > > sends an initial burst of 5 to 10 LSPs already (e.g., 10 for Cisco IOS
> > > "default optimized enabled" which is already years old)
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.google.com%2Furl%3Fsa%3Dt%26rct%3Dj%26q%3D%26esrc%3Ds%26source%3Dweb%26cd%3D%26cad%3Drja%26uact%3D8%26ved%3D2ahUKEwjNsuOpxZSEAxVHVaQEHVHuAhQQFnoECA4QAQ%26url%3Dhttps%253A%252F%252Fwww.cisco.com%252Fc%252Fen%252Fus%252Ftd%252Fdocs%252Fios-xml%252Fios%252Fseg_routing%252Fconfiguration%252Fxe-17%252Fsegrt-xe-17-book-cat8000%252Fsr-fast-convergence-default-optimize.pdf%26usg%3DAOvVaw2phvrUtDUS1MdlO4E2eRZt%26opi%3D89978449&data=05%7C02%7Cbruno.decraene%40orange.com%7C796ca607c7514eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b6f5d20%7C0%7C0%7C638429945017210942%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=K5ZzYaX6Z%2BUXysNVD3djNsS9godCZAe8s0a3Uiul5ic%3D&reserved=0 <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjNsuOpxZSEAxVHVaQEHVHuAhQQFnoECA4QAQ&url=https%3A%2F%2Fwww.cisco.com%2Fc%2Fen%2Fus%2Ftd%2Fdocs%2Fios-xml%2Fios%2Fseg_routing%2Fconfiguration%2Fxe-17%2Fsegrt-xe-17-book-cat8000%2Fsr-fast-convergence-default-optimize.pdf&usg=AOvVaw2phvrUtDUS1MdlO4E2eRZt&opi=89978449> , that we are talking with a "carrier grade" router with a significant price tag and an assumed significant performance.
> > > Even the 25 years old OSI specification allows for an initial burst of 10 back-to-back LSP.
> > > 
> > > That being said, if we feel that this is not appropriate, the expertise is on you side, so please say so.
> > 
> > As I said further below, I think it more important to recommend “safe” values. I guess 16 is fine but is 91 fine as well? Or even higher? I see your point about ACK; in TCP you actually sent ack for every packet in slow start.
>  
> [Bruno2] OK. Cf text added on safe range of value.
> Thanks for the info on TCP. Yes, sending one ACK per packet makes sense in slow start.
>  
> > 
> > > 
> > >> Section 6.2.2.2:
> > >> “This value should include a margin of error to avoid false positives (e.g., estimated MAT measure variance) which would have a significant impact on performance.”
> > >> 
> > >> First, you call the first congestion signal “Timer” however I think it should be call “Loss” where the loss detection algorithm you are proposing is based on a timer. In TCP the retransmission (and therefore loss detection) timer is initially set to 3xRTT and then adapted with more measurements (see RFC6298).
> > >> The text above, however, seems really too vague to implement that right. I guess you can take the simple approach and just set it to 3xRTT. However, given the delays on a point-to-point link are really small, I’m not sure a timer based on RTT is useful at all. Is the system even able to maintain a timer with that granularity? My understanding, however, is that LSP has a way to detect loss already in order to retransmit, therefore it would make more sense simply always reset the cwin when you retransmit a PDU. Or how does the LSP decide to send a retransmission?
> > > 
> > > [Bruno] Agreed with the name "Loss".
> > > I'm personally fine with the proposed simple way of using 3*RTT. What about also referring to RFC6298?
> > > Note that in our case, the timer is only used for detecting
> > > congestion. Retransmission uses a much larger timer (and is unchanged
> > > by this document)
> > > 
> > > Regarding link delay, within a dense area link delay would typically
> > > be "negligible". In WAN area, e.g., in France, link delay would easily be in the 1ms - 10 ms range. Crossing the Atlantic Ocean is larger. Also, the RTT includes the time for IS-IS to acknowledge. I would assume that this may be significant (10 ms or may be more than 100ms) in some conditions such as if the routing process is handling both IS-IS flooding and BGP computations.) In term of system timer granularity, this seems very implementation and condition specific.
> > > 
> > > Guillaume, please disagree if needed.
> > > 
> > >> 
> > >> “Reordering: a sender can record its sending order and check that acknowledgements arrive in the same order as LSPs. This makes an additional assumption and should ideally be backed up by a confirmation by the receiver that this assumption stands.”
> > >> 
> > >> Regarding re-ordering as an input: if a packet if “just” re-ordered but still received, it should not be taken as a congestion signal.  However, can that even happen on a point-to-point link? If you mean the packet was never received and there is a gap in the packet number, that’s again called loss and not reordering, but simply using a packet number based detection mechanism instead of a timer. However, based on the description above it is not fully clear to me how you think this would work and what you mean by “additional assumption”…? I think you further need to clarify this.
> > > 
> > > [Bruno] IS-IS has no notion of packet number nor re-ordering. The receiver may acknowledge the LSP in any order.  The receiver acknowledges set of LPP LSPs and ordering in that set does not matter.
> > > In addition, we don't want to change the way LSP are retransmitted. So we are ready to make a distinction between the loss message (e.g. not acknowledged within 5 seconds) and the congestion message (e.g., not acknowledged in order).
> > > With regards to point to point interface, clearly the fiber will not reorder packet. There could be multiple Ethernet member "below" the IP layer but AFAIK a given flow should aways be sent on the same Ethernet member. (here flow would be based on the group mac address as IS-IS do not use IP).
> > > 
> > > Trying to clarify
> > > OLD:  Reordering: a sender can record its sending order and check that acknowledgements arrive in the same order as LSPs. This makes an additional assumption and should ideally be backed up by a confirmation by the receiver that this assumption holds. The O flag defined in Section 4.4 serves this purpose.
> > > NEW: Reordering: if the receiver has signaled the O-flag (Ordered acknowledgement) (Section 4.4 ) a sender MAY record its sending order and check that acknowledgements arrive in the same order. If not this MAY be used to trigger a congestion signal.
> > 
> > Can reordering even happen on a point-to-point link and if so how? Is that really a sign of congestion?
>  
> [Bruno2] re-ordering does not happen on a point to point link. We may have multiple parallel links (Ethernet LAG) but a priori hashing rules would prevent reordering.
>  
> The point of the O-flag is not to detect re-ordering. The point is to detect missing LSPs:
> Sender send LSP: 1, 2, 3, 4
> Receiver: acknowledge LSPS 1,2, 4.
>  
> As the receiver has advertised the O-flag, he would have acknowledged LSP 3 if he had received it. Hence it has not received it. The sender can detect this thanks to the O-flag.
> This is because IS-IS does not encode any order inside the packet (compared to TCP)

Thanks! I saw that was changed in the draft.

>  
> > > 
> > >> Sec 6.2.2.3:
> > >> 
> > >> You call this refinement “Fast recovery” which I guess is inspired by TCP.
> > >> However, it’s a bit confusing because in TCP i’st a different algorithm. In TCP’s fast recovery, you do not reduce your congestion window to the initial value but only halve it (or decrease by a different factor depending on the congestion control algorithm). This is done if only some packets are lost while most packets still arrive. In TCP, resetting to the initial value only happens if a timeout occurs, meaning no data/acks have arrives for some time.
> > > 
> > > [Bruno] Guillaume, any feedback on this?
> > > One option is to remove those 2 refinements.
> > > That being said, I feel like the text is readable even if it does not match TCP algorithm. If the name brings confusion it can be easily changed.
> > > 
> > > 
> > >> Sec 6.2.2.4:
> > >> “The rates of increase were inspired from TCP [RFC5681], but it is possible that a different rate of increase for cwin in the congestion avoidance phase actually yields better results due to the low RTT values in most IS-IS deployments.”
> > >> 
> > >> I don’t think this is really a refinement but rather some consideration.
> > >> However, saying this without any further guidance, doesn’t seem really help or even harmful.
> > > 
> > > [Bruno] OK so this section could be removed.
> > > 
> > >> However, more generally, all in all I’m also not sure a fine-grained congestion control is really needed on a point-to-point link as there is only one receiver that could be overload and in that case you should rather adapt your flow control. I think what you want is to set the flow control parameter in the first place in a reasonable range.
> > > 
> > > [Bruno] I fully agree with you. We have first implemented flow control and had very good results with it (no loss of LSP, very good adaptation of the sender rate to the receiver rate). We really had to purposely create congestion in order to have some congestion.
> > > However, although there is a point to point link between adjacent routers, within the receiving routers there is typically a "switch" or a possible congestion between the N high speed incoming interfaces and the control plane (host). The feedback we received from multiple implementors is that the internal are platform dependent and touchy. They do experience some congestion on this path.
> > > So we added congestion control and this does improve the situation I case of congestion.
> > > 
> > >> If then there is actual congestion for some reason, meaning you detect a constant or increasing loss rate, maybe you want to implement some kind of circuit breaker by stop sending or fall-back to a minimum rate for a while (e.g. see RFC8085 section 3.1.10.). However why would that even happen on a point-point-link?
> > > 
> > > [Bruno] falling back to a minimum rate (current rate actually 😉) seems indeed like a simple and pragmatic solution. However if congestion happen frequently, this would significantly regrade performance. On the other hand, we found that reducing cwin was typically helpful. 
> > > So may be we could remove both refinements and add that minimally the
> > > sender can falls back to the current rate of one LSP every 33 ms. (or
> > > whatever value used before this document)
> > > 
> > >> 
> > >> Sec 6.2.3
> > >> “Senders SHOULD limit bursts to the initial congestion window.“ I don’t think this is a requirement because based on your specified algorithm this happens automatically. The initial window is LPP which is also the max number of PDUs that can be asked in on PSNP thus this is also the maximum number of packets you can sent out in receipt go the PSNP (given you don’t let the cwin growth beyond what’s ready to send). However, you have this new parameter for the LSP burst window. That’s what should limit the burst size (and be rather be smaller than LPP/the initial window).
> > > 
> > > [Bruno] OK. 
> > > So is you proposal the below change?
> > > OLD: Senders SHOULD limit bursts to the initial congestion window. A sender with knowledge that the receiver can absorb larger bursts, such as by receiving the LSP Burst Size sub-TLV from this receiver may use a higher limit.
> > > NEW: Senders SHOULD limit bursts to LSP Burst Size.
> > > 
> > >> Also pacing is used over Internet link to avoid overloading small buffer on the path, as the buffer size of the network element is unknown. This is not the case in your point-to-point scenario. If you know all buffer sizes, it probably sufficient to limit the burst size accordingly.
> > > 
> > > [Bruno] Agreed but the capacity and behavior of the "internal switch" in the receiving router is unknown in general and people don't want to know it to allow control plane implementation to be platform independent.
> > > Also a receiver may be connected to 100 other IS-IS routers sending their LSPs and the buffer size may be shared by all senders. So it seemed like pacing would be a good thing in the general case.
> > > 
> > >> 
> > >> Sec 6.2.4
> > >> “If the values are too low then the transmitter will not use the full bandwidth or available CPU resources.” As a side note, I hope that fully utilising the link or CPU just with LSP traffic is not the goal here. Which is another reason my you might not need a fine-grained congestion control: congestion has two goals, avoid congestion but also fully utilise the link -> I think you only need the former.
> > > 
> > > [Bruno] You are correct that fully utilizing the link or the CPU is
> > > not the goal. I would even say that fully utilizing the link would be
> > > well above the capability of IS-IS. But in our tests we achieved fully
> > > utilizing the CPU on the receiver when the used 10 senders in
> > > parallel. So essentially each sender was using 10% CPU of the
> > > receiver. But this included all the extra work done by IS-IS. And we
> > > had no loss of LSPs and the rate of each sender was controller by the
> > > rate of the receiver (just by using flow control, so no magic but
> > > still very efficient)
> > 
> > 
> > This all sounds to me like you need a more dynamic flow control (to avoid receiver overload) but I would probably not call that congestion control…
>  
> [Bruno2] We can illustrate our context with the below image.
> The difficult part is that the purple part. It is very specific to each implementation, is very specific (does not behave like a typical switch/router), not modelized, mostly unknown. Even within a single implementation/vendor, the IS-IS team would rather not deal with the purple part, nor figure out its limitation. That’s the part which trigger the addition of “congestion control”. Given this, I’m not sure how you would do a more dynamic flow control.
> There is a simpler case with smaller pizza box router, and mostly one NPU (ASIC) connecting all line cards and connected to the control plane. It could handle some QoS and buffer for IS-IS packets toward the control plane. But typically, one wants an IS-IS implementation which is platform independent and which don’t bother about that purple part.
>  
> <image003.jpg>
>  
>  
> The nice thing about flow control is that the sender doesn’t have to estimate or guess anything. The receiver can just tell the sender to limit its rate but announcing a lower receive window.
>  
> [Bruno2]  Fully agreed. I wished flow control alone was sufficient. And I think it mostly is, that’s why a rustic/any congestion control should probably just work.
> But if packet is lost in the purple part of the receiver, the control plane of the receiver is not aware of the issue/loss. Essentially, only the sender know that he sent a LSP and that LSP was not received.

Okay you are saying packets can get lost at the “internal switch” because there is no backpush to the NIC? Given this is a local elements it seems it would be more efficient to indicate the loss over a locally or not implement it in a blocking way. However, this is really not my expertise. I still feel that doing some kind of e2e congestion control for a local link seems one of the most complicated solutions.

Mirja



>  
> --Bruno
>  
> > > 
> > >> 
> > >> Sec 6.3
> > >> This section is entirely not clear to me. There is no algorithm described and I would not know how to implement this. Also because you interchangeably use the terms congestion control and flow control. Further you say that no input signal from the receiver is needed, however, if you want to send with the rate of acknowledgement received that is an input from the receiver. However, the window based TCP-like algorithms does actually implicitly exactly that: it only send new data if an acknowledgement is received. It further also takes the number of PDUs that are acknowledged into account because that can be configured. If you don’t do that, you sending rate will get lower and lower.
> > > 
> > > [Bruno] I agree with you.
> > > That section served two purposes:
> > > -  to indicate that the signaling specified on the receiver side (defined in sections 4 and 5) may actually be used by different congestion control algorithms.
> > > - as the result of a compromise when we had to merge two competing proposals at the time of WG adoption.
> > > 
> > > Les and Marek, it's up to you to reply on your section.
> > > 
> > > Alternatively, the last paragraph of section 6.1 "Overview" probably already covers the first objective and to be transparent, the sentence is inspired from RFC 9002 section 7. (QUIC Loss Detection and Congestion Control).
> > > And for the second objective, the text could possibly be moved to another document. But we would probably need WG feedback on this.
> > > 
> > > 
> > >> Some small nits:
> > >> - Sec 4: advertise flooding-related parameters parameters ->
> > >> advertise flooding-related parameters - Sec 5.1: PSNP PDUs -> PSNPs
> > >> or PSN PDUs - Sec
> > >> 5.2: Sequence Number Packets (SNPs) -> probably: Sequence Number PDUs (SNPs)? - Sec 6.2.1.1.: leave space for CSNP and PSNP (SNP) PDUs -> leave space for CSNPs and PSNPs  ?
> > > 
> > > [Bruno] Thank you for your careful review (of ISO terms...). Corrected.
> > > 
> > > 
> > > Again thank you very much for your careful review. Much appreciated and much useful.
> > > I'll upload a new version of the draft once we have converged or whenever you believe that this would help (e.g., when you ack a significant set of the proposed changes).
> > > 
> > > Best regards,
> > > --Bruno
> > > 
> > > 
> > >> 
> > >> 
> > > ______________________________________________________________________
> > > ______________________________________
> > > Ce message et ses pieces jointes peuvent contenir des informations
> > > confidentielles ou privilegiees et ne doivent donc pas etre diffuses,
> > > exploites ou copies sans autorisation. Si vous avez recu ce message
> > > par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
> > > 
> > > This message and its attachments may contain confidential or
> > > privileged information that may be protected by law; they should not be distributed, used or copied without authorisation.
> > > If you have received this email in error, please notify the sender and delete this message and its attachments.
> > > As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
> > > Thank you.
> > 
> > 
> > _______________________________________________
> > Lsr mailing list
> > Lsr@ietf.org <mailto:Lsr@ietf.org>
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Flsr&data=05%7C02%7Cbruno.decraene%40orange.com%7C796ca607c7514eeb037d08dc28a6e12e%7C90c7a20af34b40bfbc48b9253b6f5d20%7C0%7C0%7C638429945017215155%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=fot3SV70j9dHhV6uExWvoM%2B14TfcxIv06nv0ZYAmpFo%3D&reserved=0 <https://www.ietf.org/mailman/listinfo/lsr>
> > 
> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
> Thank you.
[Tsv-art] Tsvart early review of draft-ietf-lsr-i… Mirja Kühlewind via Datatracker
Re: [Tsv-art] Tsvart early review of draft-ietf-l… bruno.decraene
Re: [Tsv-art] Tsvart early review of draft-ietf-l… Les Ginsberg (ginsberg)
Re: [Tsv-art] Tsvart early review of draft-ietf-l… Les Ginsberg (ginsberg)
Re: [Tsv-art] Tsvart early review of draft-ietf-l… Mirja Kuehlewind (IETF)
Re: [Tsv-art] [Lsr] Tsvart early review of draft-… bruno.decraene
Re: [Tsv-art] [Lsr] Tsvart early review of draft-… bruno.decraene
Re: [Tsv-art] Tsvart early review of draft-ietf-l… Les Ginsberg (ginsberg)
Re: [Tsv-art] [Lsr] Tsvart early review of draft-… Mirja Kuehlewind (IETF)
Re: [Tsv-art] Tsvart early review of draft-ietf-l… Mirja Kuehlewind (IETF)
Re: [Tsv-art] [Lsr] Tsvart early review of draft-… bruno.decraene
Re: [Tsv-art] [Lsr] Tsvart early review of draft-… Mirja Kuehlewind (IETF)
Re: [Tsv-art] [Lsr] Tsvart early review of draft-… bruno.decraene