Re: [Gen-art] Review: draft-ietf-pcn-sm-edge-behaviour-08

The description of the factor U has been updated. The new text reads as 
follows:

    1.  [SM-specific] The sustainable aggregate rate (SAR) for the given
        ingress-egress-aggregate is estimated using the formula:

           SAR = U * NM-Rate

        for the latest reported interval, where U is a configurable
        factor greater than one which is the same for all ingress-egress-
        aggregates.  In effect, the value of the PCN-supportable-rate for
        each link is approximated by the expression

           U*PCN-admissible-rate

        rather than being calculated explicitly.

Tom Taylor

On 12/03/2012 2:28 PM, Russ Housley wrote:
> I am very confused about the state of this.  My skimming of the thread seems to indicate at least one unresolved issue.
>
> Russ
>
>
> On Jan 2, 2012, at 1:04 PM, Joel M. Halpern wrote:
>
>> The clarification on U is very helpful.  I look forward to comments from others on the routing based behavior / ECMP text removal / replacement question.
>>
>> On 1/2/2012 12:58 PM, Michael Menth wrote:
>>> Hi Joel, hi Tom,
>>>
>>> Am 02.01.2012 18:18, schrieb Joel M. Halpern:
>>>> Michael, I am not sure what to make of your recommended text abut ECMP.
>>>> ECMP is used by almost all operators. It is generally considered a
>>>> necessary tool in the tool-kit.
>>>> More significantly, at least for the egress understanding of the
>>>> ingress, it is not even the single operator's ECMP, but other
>>>> operators selections of paths that produce the issue. So even in the
>>>> unlikely event that this operator does not use ECMP, it still is not
>>>> sufficient.
>>>
>>> Then I better leave the ECMP issue for others to answer.
>>>
>>> The definition of U can be better corrected as follows (improved
>>> rewording of my previous email):
>>>
>>> U represents the average ratio of PCN-supportable-rate to
>>> PCN-admissible-rate over all the links of the PCN-domain.
>>> ->
>>> U is a domain-wide constant which implicitly defines the
>>> PCN-supportable-rate by U*PCN-admissible-rate on all links of the PCN
>>> domain.
>>>
>>> Best wishes,
>>>
>>> Michael
>>>
>>>
>>>>
>>>> Yours,
>>>> Joel
>>>>
>>>> On 1/2/2012 11:54 AM, Michael Menth wrote:
>>>>> Hi Tom, hi Joel,
>>>>>
>>>>> I wish you a happy new year!
>>>>>
>>>>> Here are my comments to address Joel's concerns:
>>>>>
>>>>> ====================================================================
>>>>>
>>>>> The issue with ECMP: I'd add a comment that CL and SM should not be in
>>>>> the presence of ECMP if routing information is used to determine
>>>>> ingress-egress-aggregates since this seems to be messy and error-prone.
>>>>>
>>>>> ====================================================================
>>>>>
>>>>> The following text may clarify at the beginning of Section 3.3.2 the
>>>>> relation
>>>>> between admission control and flow termination to address one of Joel's
>>>>> comments (for both SM and CL):
>>>>>
>>>>> In the presence of light pre-congestion, i.e., in the presence of a
>>>>> small,
>>>>> positive ETM-rate (relative to the overall PCN traffic rate), new
>>>>> flows may
>>>>> already be blocked. However, in the presence of heavy pre-congestion,
>>>>> i.e.,
>>>>> in the presence of a relatively large ETM-rate, termination of some
>>>>> admitted
>>>>> flows is required. Thus, flow blocking is logical prerequisite for flow
>>>>> termination.
>>>>>
>>>>> ====================================================================
>>>>>
>>>>> The following sentence in 3.3.2 should be corrected (only SM-specific):
>>>>>
>>>>> U represents the average ratio of PCN-supportable-rate to
>>>>> PCN-admissible-rate
>>>>> over all the links of the PCN-domain.
>>>>>
>>>>> ->
>>>>>
>>>>> U represents the ratio of PCN-supportable-rate to PCN-admissible-rate
>>>>> for all
>>>>> the links of the PCN-domain.
>>>>>
>>>>> ====================================================================
>>>>>
>>>>> I also recommend to change the following text as I think it may cause
>>>>> misinterpretations (applies both to SM and CL):
>>>>>
>>>>> If the difference calculated in the second step is positive, the
>>>>> Decision
>>>>> Point SHOULD select PCN-flows to terminate, until it determines that the
>>>>> PCN-traffic admission rate will no longer be greater than the estimated
>>>>> sustainable aggregate rate. If the Decision Point knows the bandwidth
>>>>> required by individual PCN-flows (e.g., from resource signalling used to
>>>>> establish the flows), it MAY choose to complete its selection of
>>>>> PCN-flows to
>>>>> terminate in a single round of decisions.
>>>>>
>>>>> Alternatively, the Decision Point MAY spread flow termination over
>>>>> multiple
>>>>> rounds to avoid over-termination. If this is done, it is RECOMMENDED
>>>>> that
>>>>> enough time elapse between successive rounds of termination to allow the
>>>>> effects of previous rounds to be reflected in the measurements upon
>>>>> which the
>>>>> termination decisions are based. (See [IEEE-Satoh] and sections 4.2
>>>>> and 4.3
>>>>> of [MeLe10].)
>>>>>
>>>>> ->
>>>>>
>>>>> If the difference calculated in the second step is positive (traffic
>>>>> rate to
>>>>> be terminated), the Decision Point SHOULD select PCN-flows to
>>>>> terminate. To
>>>>> that end, the Decision Point MAY use upper rate limits for individual
>>>>> PCN-flows (e.g., from resource signalling used to establish the
>>>>> flows) and
>>>>> select a set of flows whose sum of upper rate limits is up to the
>>>>> traffic
>>>>> rate to be terminated. Then, these flows are terminated. The use of
>>>>> upper
>>>>> limits on flow rates avoids over-termination.
>>>>>
>>>>> Termination may be continuously needed after consecutive measurement
>>>>> intervals for various
>>>>> reasons, e.g., if the used upper rate limits overestimate the actual
>>>>> flow rates.
>>>>> For such cases it is RECOMMENDED that enough time elapses between
>>>>> successive
>>>>> termination events to allow the effects of previous termination events
>>>>> to be
>>>>> reflected in the measurements upon which the termination decisions are
>>>>> based;
>>>>> otherwise, over-termination may occur. See [IEEE-Satoh] and Sections 4.2
>>>>> and
>>>>> 4.3 of [MeLe10].
>>>>>
>>>>> ====================================================================
>>>>>
>>>>> [IEEE-Satoh] is not a good key for Daisuke's work as the prefix "IEEE"
>>>>> makes it look like a reference to a standards document.
>>>>> You better use [SaUe10] or [Satoh10]. Applies both to CL and SM.
>>>>>
>>>>>
>>>>>
>>>>> Best wishes,
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>> Am 02.01.2012 15:21, schrieb Tom Taylor:
>>>>>> It shall be as you say, subject to comment from my co-authors when
>>>>>> they get back from holiday.
>>>>>>
>>>>>> On 01/01/2012 5:43 PM, Joel M. Halpern wrote:
>>>>>>> In-line...
>>>>>>>
>>>>>>> On 1/1/2012 4:06 PM, Tom Taylor wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/01/2012 2:58 PM, Joel M. Halpern wrote:
>>>>>>>>> Thank you for responding promptly Tom. Let me try to elaborate on
>>>>>>>>> the
>>>>>>>>> two issues where I was unclear.
>>>>>>>>>
>>>>>>>>> On the ingress-egress-aggregate issue and ECMP, the concern I
>>>>>>>>> have is
>>>>>>>>> relative to the third operational alternative where routing is
>>>>>>>>> used to
>>>>>>>>> determine where the ingress and egress of a flow is. To be blunt,
>>>>>>>>> as far
>>>>>>>>> as I can tell this does not work.
>>>>>>>>> 1) It does not work on the ingress side because traffic from a given
>>>>>>>>> source prefix can come in at multiple places. Some of these
>>>>>>>>> places may
>>>>>>>>> claim reachability to the source prefix. Some may not. While a given
>>>>>>>>> flow will use only one of these paths, there is no way to determine
>>>>>>>>> from
>>>>>>>>> routing information, at the egress, which ingress that flow used.
>>>>>>>>> 2) A site may use multiple exits for a given destination prefix.
>>>>>>>>> Again,
>>>>>>>>> while the site will only use one of these egresses for a given flow,
>>>>>>>>> there is no way for the ingress to know which egress it will be
>>>>>>>>> on the
>>>>>>>>> basis of routing information.
>>>>>>>>> Thus, the text seems to allow for a behavior that simply does not
>>>>>>>>> work.
>>>>>>>>
>>>>>>>> [PTT] I think the disconnect here is that you read the text to say
>>>>>>>> that
>>>>>>>> an individual node uses routing information to determine the IEA.
>>>>>>>> That
>>>>>>>> wasn't the intention. Instead, administrators use routing
>>>>>>>> information to
>>>>>>>> derive filters that are installed at the ingress and egress nodes.
>>>>>>>
>>>>>>> As far as I can tell, your response describes a situation even less
>>>>>>> effective than what I assumed.
>>>>>>> Firstly, it does not matter whether it is the edge node, the decision
>>>>>>> node, or the human administrator. Routing information is not enough to
>>>>>>> determine what the ingress-egress pairing is. The problems I describe
>>>>>>> above apply no matter who is making the decision.
>>>>>>> Secondly, having a human make the decision means that as soon as
>>>>>>> routing
>>>>>>> changes, the configured filters are wrong.
>>>>>>>
>>>>>>> I would suggest that the text in question be removed, and replaced
>>>>>>> with
>>>>>>> a warning against attempting what is currently described.
>>>>>>>
>>>>> My view is also that CL ans SM do not work in the presence of ECMP. This
>>>>> should be indicated as a warning.
>>>>>
>>>>>>>>>
>>>>>>>>> I am still confused about the relationship of section 3.3.2 to the
>>>>>>>>> behavior you describe. 3.3.2 says that as long as any excess
>>>>>>>>> traffic is
>>>>>>>>> being reported, teh decision point shall direct the blocking of
>>>>>>>>> additional flows. That does not match 3.3.1, and does not match your
>>>>>>>>> description.
>>>>>>>>
>>>>>>>> [PTT] I can't see the text in section 3.3.2 that says you continue to
>>>>>>>> block as long as any excess traffic is being reported. What I
>>>>>>>> think it
>>>>>>>> says is that as long as excess traffic is reported, the decision
>>>>>>>> point
>>>>>>>> checks to see whether the traffic being admitted to the aggregate
>>>>>>>> exceeds the supportable level. Excess traffic may be non-zero, yet no
>>>>>>>> termination may be required (i.e., traffic is below the second
>>>>>>>> threshold).
>>>>>>>
>>>>>>> I think I see what you are saying. If I am reading this correctly, the
>>>>>>> decision process must re-calculate to determine if there is
>>>>>>> termination
>>>>>>> every time it receives a report with non-zero excess and the port is
>>>>>>> already blocked. But it does not have to actually block anything.
>>>>>>> This however seems to depend upon the correct relative
>>>>>>> configuration of
>>>>>>> the limit that flips it into blocked state, the value of U, and maybe
>>>>>>> some other values.
>>>>>>> Put differently, I understand that the two are not contradictory.
>>>>>>> However, since the two things use different calculations, it is not at
>>>>>>> all clear that they are consistent. This may well be acceptable.
>>>>>>> But the
>>>>>>> difference in methods is likely to lead to confusion. So, as a minor
>>>>>>> (rather than major) comment, I would suggest that you provide
>>>>>>> clarifying
>>>>>>> text explaining why it is okay to use one condition to decide if there
>>>>>>> is blocking, but a different condition (which could produce a lower
>>>>>>> threshold) to decide how much to get rid of.
>>>>>>>
>>>>>>> Yours,
>>>>>>> Joel
>>>>>>>
>>>>>>>>>
>>>>>>>>> Yours,
>>>>>>>>> Joel
>>>>>>>>>
>>>>>>>>> On 1/1/2012 2:48 PM, Tom Taylor wrote:
>>>>>>>>>> Thanks for the review, Joel. Comments below, marked with [PTT].
>>>>>>>>>>
>>>>>>>>>> On 31/12/2011 4:50 PM, Joel M. Halpern wrote:
>>>>>>>>>>> I am the assigned Gen-ART reviewer for this draft. For
>>>>>>>>>>> background on
>>>>>>>>>>> Gen-ART, please see the FAQ at
>>>>>>>>>>> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
>>>>>>>>>>>
>>>>>>>>>>> Please resolve these comments along with any other Last Call
>>>>>>>>>>> comments
>>>>>>>>>>> you may receive.
>>>>>>>>>>>
>>>>>>>>>>> Document: draft-ietf-pcn-sm-edge-behaviour-08
>>>>>>>>>>> PCN Boundary Node Behaviour for the Single Marking (SM) Mode of
>>>>>>>>>>> Operation
>>>>>>>>>>> Reviewer: Joel M. Halpern
>>>>>>>>>>> Review Date: 31-Dec-2011
>>>>>>>>>>> IETF LC End Date: 13-Jan-2012
>>>>>>>>>>> IESG Telechat date: N/A
>>>>>>>>>>>
>>>>>>>>>>> Summary: This documents is almost ready for publication as an
>>>>>>>>>>> Informational RFC.
>>>>>>>>>>>
>>>>>>>>>>> Question: Given that the document defines a complex set of
>>>>>>>>>>> behaviors,
>>>>>>>>>>> which are mandatory for compliant systems, it seems that this
>>>>>>>>>>> ought to
>>>>>>>>>>> be Experimental rather than Informational. It describes something
>>>>>>>>>>> that
>>>>>>>>>>> could, in theory, later become standards track.
>>>>>>>>>>
>>>>>>>>>> [PTT] OK, we've wobbled on this one, but we can follow your
>>>>>>>>>> suggestion.
>>>>>>>>>>>
>>>>>>>>>>> Major issues:
>>>>>>>>>>> Section 2 on Assumed Core Network Behavior for SM, in the third
>>>>>>>>>>> bullet,
>>>>>>>>>>> states that the PCN-domain satisfies the conditions specified
>>>>>>>>>>> in RFC
>>>>>>>>>>> 5696. Unfortunately, look at RFC 5696 I can not tell what
>>>>>>>>>>> conditions
>>>>>>>>>>> these are. Is this supposed to be a reference to RFC 5559
>>>>>>>>>>> instead? No
>>>>>>>>>>> matter which document it is referencing, please be more specific
>>>>>>>>>>> about
>>>>>>>>>>> which section / conditions are meant.
>>>>>>>>>>
>>>>>>>>>> [PTT] You are right that RFC 5696 isn't relevant. It's such a long
>>>>>>>>>> time
>>>>>>>>>> since that text was written that I can't recall what the intention
>>>>>>>>>> was.
>>>>>>>>>> My inclination at the moment is simply to delete the bullet.
>>>>>>>>>>>
>>>>>>>>>>> It would have been helpful if the early part of the document
>>>>>>>>>>> indicated
>>>>>>>>>>> that the edge node information about how to determine
>>>>>>>>>>> ingress-egress-aggregates was described in section 5.
>>>>>>>>>>> In conjunction with that, section 5.1.2, third paragraph, seems to
>>>>>>>>>>> describe an option which does not seem to quite work. After
>>>>>>>>>>> describing
>>>>>>>>>>> how to use tunneling, and how to work with signaling, the text
>>>>>>>>>>> refers to
>>>>>>>>>>> inferring the ingress-egress-aggregate from the routing
>>>>>>>>>>> information. In
>>>>>>>>>>> the presence of multiple equal-cost domain exits (which does
>>>>>>>>>>> occur in
>>>>>>>>>>> reality), the routing table is not sufficient information to make
>>>>>>>>>>> this
>>>>>>>>>>> determination. Unless I am very confused (which does happen) this
>>>>>>>>>>> seems
>>>>>>>>>>> to be a serious hole in the specification.
>>>>>>>>>>
>>>>>>>>>> [PTT] I'm not sure what the issue is here. As I understand it,
>>>>>>>>>> operators
>>>>>>>>>> don't assign packets randomly to a given path in the presence of
>>>>>>>>>> alternatives -- they choose one based on values in the packet
>>>>>>>>>> header.
>>>>>>>>>> The basic intent is that packets of a given microflow all follow
>>>>>>>>>> the
>>>>>>>>>> same path, to prevent unnecessary reordering and minimize
>>>>>>>>>> jitter. The
>>>>>>>>>> implication is that filters can be defined at the ingress nodes to
>>>>>>>>>> identify the packets in a given ingress-egress-aggregate (i.e.
>>>>>>>>>> flowing
>>>>>>>>>> from a specific ingress node to a specific egress node) based on
>>>>>>>>>> their
>>>>>>>>>> header contents. The filters to do the same job at egress nodes
>>>>>>>>>> are a
>>>>>>>>>> different problem, but they are not affected by ECMP.
>>>>>>>>>>>
>>>>>>>>>>> Minor issues:
>>>>>>>>>>> Section 3.3.1 states that the "block" decision occurs when the CLE
>>>>>>>>>>> (excess over total) rate exceeds the configured limit. However,
>>>>>>>>>>> section
>>>>>>>>>>> 3.3.2 states that the decision node must take further stapes if
>>>>>>>>>>> the
>>>>>>>>>>> excess rate is non-zero in further reports. Is this inconsistency
>>>>>>>>>>> deliberate? If so, please explain. If not, please fix. (If it is
>>>>>>>>>>> important to drive the excess rate to 0, then why is action only
>>>>>>>>>>> initiated when the ratio is above a configured value, rather than
>>>>>>>>>>> any
>>>>>>>>>>> non-zero value? I can conceive of various reasons. But none are
>>>>>>>>>>> stated.)
>>>>>>>>>>
>>>>>>>>>> [PTT] We aren't driving the excess rate to zero, but to a value
>>>>>>>>>> equal to
>>>>>>>>>> something less than (U - 1)/U. (The "something less" is because of
>>>>>>>>>> packet dropping at interior nodes.) The assumption is that (U -
>>>>>>>>>> 1)/U is
>>>>>>>>>> greater than CLE-limit. Conceptually, PCN uses two thresholds.
>>>>>>>>>> When the
>>>>>>>>>> CLE is below the first threshold, new flows are admitted. Above
>>>>>>>>>> that
>>>>>>>>>> threshold, they are blocked. When the CLE is above the second
>>>>>>>>>> threshold,
>>>>>>>>>> flows are terminated to bring them down to that threshold. In
>>>>>>>>>> the SM
>>>>>>>>>> mode of operation, the first threshold is specified directly on a
>>>>>>>>>> per-link basis by the value CLE-limit. The second threshold is
>>>>>>>>>> specified
>>>>>>>>>> by the same value (U - 1)/U for all links. With the CL mode of
>>>>>>>>>> operation
>>>>>>>>>> the second threshold is also specified directly for each link.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nits/editorial comments:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>> _______________________________________________
>> Gen-art mailing list
>> Gen-art@ietf.org
>> https://www.ietf.org/mailman/listinfo/gen-art
>
>