Re: [Gen-art] Review: draft-ietf-pcn-sm-edge-behaviour-08

"Joel M. Halpern" <> Mon, 02 January 2012 18:04 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id CD63311E80B3 for <>; Mon, 2 Jan 2012 10:04:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -102.265
X-Spam-Status: No, score=-102.265 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, IP_NOT_FRIENDLY=0.334, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Nv9OSPEuCh9J for <>; Mon, 2 Jan 2012 10:04:32 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 9C22911E80A5 for <>; Mon, 2 Jan 2012 10:04:32 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 8B50ECD154 for <>; Mon, 2 Jan 2012 10:04:32 -0800 (PST)
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3094D1C0068; Mon, 2 Jan 2012 10:04:30 -0800 (PST)
X-Virus-Scanned: Debian amavisd-new at
Received: from [] ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 755F61C08B9; Mon, 2 Jan 2012 10:04:27 -0800 (PST)
Message-ID: <>
Date: Mon, 02 Jan 2012 13:04:29 -0500
From: "Joel M. Halpern" <>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version: 1.0
To: Michael Menth <>
References: <> <> <BLU0-SMTP18EE1E01EAA97CC44A44FFD8900@phx.gbl> <> <> <> <> <> <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc:, Steven Blake <>,, Bob Briscoe <>, Tom Taylor <>, David Harrington <>
Subject: Re: [Gen-art] Review: draft-ietf-pcn-sm-edge-behaviour-08
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 02 Jan 2012 18:04:33 -0000

The clarification on U is very helpful.  I look forward to comments from 
others on the routing based behavior / ECMP text removal / replacement 

On 1/2/2012 12:58 PM, Michael Menth wrote:
> Hi Joel, hi Tom,
> Am 02.01.2012 18:18, schrieb Joel M. Halpern:
>> Michael, I am not sure what to make of your recommended text abut ECMP.
>> ECMP is used by almost all operators. It is generally considered a
>> necessary tool in the tool-kit.
>> More significantly, at least for the egress understanding of the
>> ingress, it is not even the single operator's ECMP, but other
>> operators selections of paths that produce the issue. So even in the
>> unlikely event that this operator does not use ECMP, it still is not
>> sufficient.
> Then I better leave the ECMP issue for others to answer.
> The definition of U can be better corrected as follows (improved
> rewording of my previous email):
> U represents the average ratio of PCN-supportable-rate to
> PCN-admissible-rate over all the links of the PCN-domain.
> ->
> U is a domain-wide constant which implicitly defines the
> PCN-supportable-rate by U*PCN-admissible-rate on all links of the PCN
> domain.
> Best wishes,
> Michael
>> Yours,
>> Joel
>> On 1/2/2012 11:54 AM, Michael Menth wrote:
>>> Hi Tom, hi Joel,
>>> I wish you a happy new year!
>>> Here are my comments to address Joel's concerns:
>>> ====================================================================
>>> The issue with ECMP: I'd add a comment that CL and SM should not be in
>>> the presence of ECMP if routing information is used to determine
>>> ingress-egress-aggregates since this seems to be messy and error-prone.
>>> ====================================================================
>>> The following text may clarify at the beginning of Section 3.3.2 the
>>> relation
>>> between admission control and flow termination to address one of Joel's
>>> comments (for both SM and CL):
>>> In the presence of light pre-congestion, i.e., in the presence of a
>>> small,
>>> positive ETM-rate (relative to the overall PCN traffic rate), new
>>> flows may
>>> already be blocked. However, in the presence of heavy pre-congestion,
>>> i.e.,
>>> in the presence of a relatively large ETM-rate, termination of some
>>> admitted
>>> flows is required. Thus, flow blocking is logical prerequisite for flow
>>> termination.
>>> ====================================================================
>>> The following sentence in 3.3.2 should be corrected (only SM-specific):
>>> U represents the average ratio of PCN-supportable-rate to
>>> PCN-admissible-rate
>>> over all the links of the PCN-domain.
>>> ->
>>> U represents the ratio of PCN-supportable-rate to PCN-admissible-rate
>>> for all
>>> the links of the PCN-domain.
>>> ====================================================================
>>> I also recommend to change the following text as I think it may cause
>>> misinterpretations (applies both to SM and CL):
>>> If the difference calculated in the second step is positive, the
>>> Decision
>>> Point SHOULD select PCN-flows to terminate, until it determines that the
>>> PCN-traffic admission rate will no longer be greater than the estimated
>>> sustainable aggregate rate. If the Decision Point knows the bandwidth
>>> required by individual PCN-flows (e.g., from resource signalling used to
>>> establish the flows), it MAY choose to complete its selection of
>>> PCN-flows to
>>> terminate in a single round of decisions.
>>> Alternatively, the Decision Point MAY spread flow termination over
>>> multiple
>>> rounds to avoid over-termination. If this is done, it is RECOMMENDED
>>> that
>>> enough time elapse between successive rounds of termination to allow the
>>> effects of previous rounds to be reflected in the measurements upon
>>> which the
>>> termination decisions are based. (See [IEEE-Satoh] and sections 4.2
>>> and 4.3
>>> of [MeLe10].)
>>> ->
>>> If the difference calculated in the second step is positive (traffic
>>> rate to
>>> be terminated), the Decision Point SHOULD select PCN-flows to
>>> terminate. To
>>> that end, the Decision Point MAY use upper rate limits for individual
>>> PCN-flows (e.g., from resource signalling used to establish the
>>> flows) and
>>> select a set of flows whose sum of upper rate limits is up to the
>>> traffic
>>> rate to be terminated. Then, these flows are terminated. The use of
>>> upper
>>> limits on flow rates avoids over-termination.
>>> Termination may be continuously needed after consecutive measurement
>>> intervals for various
>>> reasons, e.g., if the used upper rate limits overestimate the actual
>>> flow rates.
>>> For such cases it is RECOMMENDED that enough time elapses between
>>> successive
>>> termination events to allow the effects of previous termination events
>>> to be
>>> reflected in the measurements upon which the termination decisions are
>>> based;
>>> otherwise, over-termination may occur. See [IEEE-Satoh] and Sections 4.2
>>> and
>>> 4.3 of [MeLe10].
>>> ====================================================================
>>> [IEEE-Satoh] is not a good key for Daisuke's work as the prefix "IEEE"
>>> makes it look like a reference to a standards document.
>>> You better use [SaUe10] or [Satoh10]. Applies both to CL and SM.
>>> Best wishes,
>>> Michael
>>> Am 02.01.2012 15:21, schrieb Tom Taylor:
>>>> It shall be as you say, subject to comment from my co-authors when
>>>> they get back from holiday.
>>>> On 01/01/2012 5:43 PM, Joel M. Halpern wrote:
>>>>> In-line...
>>>>> On 1/1/2012 4:06 PM, Tom Taylor wrote:
>>>>>> On 01/01/2012 2:58 PM, Joel M. Halpern wrote:
>>>>>>> Thank you for responding promptly Tom. Let me try to elaborate on
>>>>>>> the
>>>>>>> two issues where I was unclear.
>>>>>>> On the ingress-egress-aggregate issue and ECMP, the concern I
>>>>>>> have is
>>>>>>> relative to the third operational alternative where routing is
>>>>>>> used to
>>>>>>> determine where the ingress and egress of a flow is. To be blunt,
>>>>>>> as far
>>>>>>> as I can tell this does not work.
>>>>>>> 1) It does not work on the ingress side because traffic from a given
>>>>>>> source prefix can come in at multiple places. Some of these
>>>>>>> places may
>>>>>>> claim reachability to the source prefix. Some may not. While a given
>>>>>>> flow will use only one of these paths, there is no way to determine
>>>>>>> from
>>>>>>> routing information, at the egress, which ingress that flow used.
>>>>>>> 2) A site may use multiple exits for a given destination prefix.
>>>>>>> Again,
>>>>>>> while the site will only use one of these egresses for a given flow,
>>>>>>> there is no way for the ingress to know which egress it will be
>>>>>>> on the
>>>>>>> basis of routing information.
>>>>>>> Thus, the text seems to allow for a behavior that simply does not
>>>>>>> work.
>>>>>> [PTT] I think the disconnect here is that you read the text to say
>>>>>> that
>>>>>> an individual node uses routing information to determine the IEA.
>>>>>> That
>>>>>> wasn't the intention. Instead, administrators use routing
>>>>>> information to
>>>>>> derive filters that are installed at the ingress and egress nodes.
>>>>> As far as I can tell, your response describes a situation even less
>>>>> effective than what I assumed.
>>>>> Firstly, it does not matter whether it is the edge node, the decision
>>>>> node, or the human administrator. Routing information is not enough to
>>>>> determine what the ingress-egress pairing is. The problems I describe
>>>>> above apply no matter who is making the decision.
>>>>> Secondly, having a human make the decision means that as soon as
>>>>> routing
>>>>> changes, the configured filters are wrong.
>>>>> I would suggest that the text in question be removed, and replaced
>>>>> with
>>>>> a warning against attempting what is currently described.
>>> My view is also that CL ans SM do not work in the presence of ECMP. This
>>> should be indicated as a warning.
>>>>>>> I am still confused about the relationship of section 3.3.2 to the
>>>>>>> behavior you describe. 3.3.2 says that as long as any excess
>>>>>>> traffic is
>>>>>>> being reported, teh decision point shall direct the blocking of
>>>>>>> additional flows. That does not match 3.3.1, and does not match your
>>>>>>> description.
>>>>>> [PTT] I can't see the text in section 3.3.2 that says you continue to
>>>>>> block as long as any excess traffic is being reported. What I
>>>>>> think it
>>>>>> says is that as long as excess traffic is reported, the decision
>>>>>> point
>>>>>> checks to see whether the traffic being admitted to the aggregate
>>>>>> exceeds the supportable level. Excess traffic may be non-zero, yet no
>>>>>> termination may be required (i.e., traffic is below the second
>>>>>> threshold).
>>>>> I think I see what you are saying. If I am reading this correctly, the
>>>>> decision process must re-calculate to determine if there is
>>>>> termination
>>>>> every time it receives a report with non-zero excess and the port is
>>>>> already blocked. But it does not have to actually block anything.
>>>>> This however seems to depend upon the correct relative
>>>>> configuration of
>>>>> the limit that flips it into blocked state, the value of U, and maybe
>>>>> some other values.
>>>>> Put differently, I understand that the two are not contradictory.
>>>>> However, since the two things use different calculations, it is not at
>>>>> all clear that they are consistent. This may well be acceptable.
>>>>> But the
>>>>> difference in methods is likely to lead to confusion. So, as a minor
>>>>> (rather than major) comment, I would suggest that you provide
>>>>> clarifying
>>>>> text explaining why it is okay to use one condition to decide if there
>>>>> is blocking, but a different condition (which could produce a lower
>>>>> threshold) to decide how much to get rid of.
>>>>> Yours,
>>>>> Joel
>>>>>>> Yours,
>>>>>>> Joel
>>>>>>> On 1/1/2012 2:48 PM, Tom Taylor wrote:
>>>>>>>> Thanks for the review, Joel. Comments below, marked with [PTT].
>>>>>>>> On 31/12/2011 4:50 PM, Joel M. Halpern wrote:
>>>>>>>>> I am the assigned Gen-ART reviewer for this draft. For
>>>>>>>>> background on
>>>>>>>>> Gen-ART, please see the FAQ at
>>>>>>>>> <>.
>>>>>>>>> Please resolve these comments along with any other Last Call
>>>>>>>>> comments
>>>>>>>>> you may receive.
>>>>>>>>> Document: draft-ietf-pcn-sm-edge-behaviour-08
>>>>>>>>> PCN Boundary Node Behaviour for the Single Marking (SM) Mode of
>>>>>>>>> Operation
>>>>>>>>> Reviewer: Joel M. Halpern
>>>>>>>>> Review Date: 31-Dec-2011
>>>>>>>>> IETF LC End Date: 13-Jan-2012
>>>>>>>>> IESG Telechat date: N/A
>>>>>>>>> Summary: This documents is almost ready for publication as an
>>>>>>>>> Informational RFC.
>>>>>>>>> Question: Given that the document defines a complex set of
>>>>>>>>> behaviors,
>>>>>>>>> which are mandatory for compliant systems, it seems that this
>>>>>>>>> ought to
>>>>>>>>> be Experimental rather than Informational. It describes something
>>>>>>>>> that
>>>>>>>>> could, in theory, later become standards track.
>>>>>>>> [PTT] OK, we've wobbled on this one, but we can follow your
>>>>>>>> suggestion.
>>>>>>>>> Major issues:
>>>>>>>>> Section 2 on Assumed Core Network Behavior for SM, in the third
>>>>>>>>> bullet,
>>>>>>>>> states that the PCN-domain satisfies the conditions specified
>>>>>>>>> in RFC
>>>>>>>>> 5696. Unfortunately, look at RFC 5696 I can not tell what
>>>>>>>>> conditions
>>>>>>>>> these are. Is this supposed to be a reference to RFC 5559
>>>>>>>>> instead? No
>>>>>>>>> matter which document it is referencing, please be more specific
>>>>>>>>> about
>>>>>>>>> which section / conditions are meant.
>>>>>>>> [PTT] You are right that RFC 5696 isn't relevant. It's such a long
>>>>>>>> time
>>>>>>>> since that text was written that I can't recall what the intention
>>>>>>>> was.
>>>>>>>> My inclination at the moment is simply to delete the bullet.
>>>>>>>>> It would have been helpful if the early part of the document
>>>>>>>>> indicated
>>>>>>>>> that the edge node information about how to determine
>>>>>>>>> ingress-egress-aggregates was described in section 5.
>>>>>>>>> In conjunction with that, section 5.1.2, third paragraph, seems to
>>>>>>>>> describe an option which does not seem to quite work. After
>>>>>>>>> describing
>>>>>>>>> how to use tunneling, and how to work with signaling, the text
>>>>>>>>> refers to
>>>>>>>>> inferring the ingress-egress-aggregate from the routing
>>>>>>>>> information. In
>>>>>>>>> the presence of multiple equal-cost domain exits (which does
>>>>>>>>> occur in
>>>>>>>>> reality), the routing table is not sufficient information to make
>>>>>>>>> this
>>>>>>>>> determination. Unless I am very confused (which does happen) this
>>>>>>>>> seems
>>>>>>>>> to be a serious hole in the specification.
>>>>>>>> [PTT] I'm not sure what the issue is here. As I understand it,
>>>>>>>> operators
>>>>>>>> don't assign packets randomly to a given path in the presence of
>>>>>>>> alternatives -- they choose one based on values in the packet
>>>>>>>> header.
>>>>>>>> The basic intent is that packets of a given microflow all follow
>>>>>>>> the
>>>>>>>> same path, to prevent unnecessary reordering and minimize
>>>>>>>> jitter. The
>>>>>>>> implication is that filters can be defined at the ingress nodes to
>>>>>>>> identify the packets in a given ingress-egress-aggregate (i.e.
>>>>>>>> flowing
>>>>>>>> from a specific ingress node to a specific egress node) based on
>>>>>>>> their
>>>>>>>> header contents. The filters to do the same job at egress nodes
>>>>>>>> are a
>>>>>>>> different problem, but they are not affected by ECMP.
>>>>>>>>> Minor issues:
>>>>>>>>> Section 3.3.1 states that the "block" decision occurs when the CLE
>>>>>>>>> (excess over total) rate exceeds the configured limit. However,
>>>>>>>>> section
>>>>>>>>> 3.3.2 states that the decision node must take further stapes if
>>>>>>>>> the
>>>>>>>>> excess rate is non-zero in further reports. Is this inconsistency
>>>>>>>>> deliberate? If so, please explain. If not, please fix. (If it is
>>>>>>>>> important to drive the excess rate to 0, then why is action only
>>>>>>>>> initiated when the ratio is above a configured value, rather than
>>>>>>>>> any
>>>>>>>>> non-zero value? I can conceive of various reasons. But none are
>>>>>>>>> stated.)
>>>>>>>> [PTT] We aren't driving the excess rate to zero, but to a value
>>>>>>>> equal to
>>>>>>>> something less than (U - 1)/U. (The "something less" is because of
>>>>>>>> packet dropping at interior nodes.) The assumption is that (U -
>>>>>>>> 1)/U is
>>>>>>>> greater than CLE-limit. Conceptually, PCN uses two thresholds.
>>>>>>>> When the
>>>>>>>> CLE is below the first threshold, new flows are admitted. Above
>>>>>>>> that
>>>>>>>> threshold, they are blocked. When the CLE is above the second
>>>>>>>> threshold,
>>>>>>>> flows are terminated to bring them down to that threshold. In
>>>>>>>> the SM
>>>>>>>> mode of operation, the first threshold is specified directly on a
>>>>>>>> per-link basis by the value CLE-limit. The second threshold is
>>>>>>>> specified
>>>>>>>> by the same value (U - 1)/U for all links. With the CL mode of
>>>>>>>> operation
>>>>>>>> the second threshold is also specified directly for each link.
>>>>>>>>> Nits/editorial comments: