Re: [Gen-art] Review: draft-ietf-pcn-sm-edge-behaviour-08

Russ Housley <> Mon, 12 March 2012 22:35 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id ADA8E21E8170 for <>; Mon, 12 Mar 2012 15:35:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -102.577
X-Spam-Status: No, score=-102.577 tagged_above=-999 required=5 tests=[AWL=-0.022, BAYES_00=-2.599, DATE_IN_PAST_03_06=0.044, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id QZ9dJ+3fA82z for <>; Mon, 12 Mar 2012 15:35:01 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 25BFE21E815D for <>; Mon, 12 Mar 2012 15:35:01 -0700 (PDT)
Received: from localhost (unknown []) by (Postfix) with ESMTP id 49C7C9A472C; Mon, 12 Mar 2012 18:35:08 -0400 (EDT)
X-Virus-Scanned: amavisd-new at
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id U59IFuQUcqjQ; Mon, 12 Mar 2012 18:34:57 -0400 (EDT)
Received: from [] ( []) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTP id 5EDF69A470F; Mon, 12 Mar 2012 18:35:05 -0400 (EDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Russ Housley <>
In-Reply-To: <>
Date: Mon, 12 Mar 2012 14:28:42 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <BLU0-SMTP18EE1E01EAA97CC44A44FFD8900@phx.gbl> <> <> <> <> <> <> <> <>
To: Joel M. Halpern <>
X-Mailer: Apple Mail (2.1084)
Cc:, Steven Blake <>, Michael Menth <>,, Bob Briscoe <>, Tom Taylor <>, David Harrington <>
Subject: Re: [Gen-art] Review: draft-ietf-pcn-sm-edge-behaviour-08
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 12 Mar 2012 22:35:24 -0000

I am very confused about the state of this.  My skimming of the thread seems to indicate at least one unresolved issue.


On Jan 2, 2012, at 1:04 PM, Joel M. Halpern wrote:

> The clarification on U is very helpful.  I look forward to comments from others on the routing based behavior / ECMP text removal / replacement question.
> On 1/2/2012 12:58 PM, Michael Menth wrote:
>> Hi Joel, hi Tom,
>> Am 02.01.2012 18:18, schrieb Joel M. Halpern:
>>> Michael, I am not sure what to make of your recommended text abut ECMP.
>>> ECMP is used by almost all operators. It is generally considered a
>>> necessary tool in the tool-kit.
>>> More significantly, at least for the egress understanding of the
>>> ingress, it is not even the single operator's ECMP, but other
>>> operators selections of paths that produce the issue. So even in the
>>> unlikely event that this operator does not use ECMP, it still is not
>>> sufficient.
>> Then I better leave the ECMP issue for others to answer.
>> The definition of U can be better corrected as follows (improved
>> rewording of my previous email):
>> U represents the average ratio of PCN-supportable-rate to
>> PCN-admissible-rate over all the links of the PCN-domain.
>> ->
>> U is a domain-wide constant which implicitly defines the
>> PCN-supportable-rate by U*PCN-admissible-rate on all links of the PCN
>> domain.
>> Best wishes,
>> Michael
>>> Yours,
>>> Joel
>>> On 1/2/2012 11:54 AM, Michael Menth wrote:
>>>> Hi Tom, hi Joel,
>>>> I wish you a happy new year!
>>>> Here are my comments to address Joel's concerns:
>>>> ====================================================================
>>>> The issue with ECMP: I'd add a comment that CL and SM should not be in
>>>> the presence of ECMP if routing information is used to determine
>>>> ingress-egress-aggregates since this seems to be messy and error-prone.
>>>> ====================================================================
>>>> The following text may clarify at the beginning of Section 3.3.2 the
>>>> relation
>>>> between admission control and flow termination to address one of Joel's
>>>> comments (for both SM and CL):
>>>> In the presence of light pre-congestion, i.e., in the presence of a
>>>> small,
>>>> positive ETM-rate (relative to the overall PCN traffic rate), new
>>>> flows may
>>>> already be blocked. However, in the presence of heavy pre-congestion,
>>>> i.e.,
>>>> in the presence of a relatively large ETM-rate, termination of some
>>>> admitted
>>>> flows is required. Thus, flow blocking is logical prerequisite for flow
>>>> termination.
>>>> ====================================================================
>>>> The following sentence in 3.3.2 should be corrected (only SM-specific):
>>>> U represents the average ratio of PCN-supportable-rate to
>>>> PCN-admissible-rate
>>>> over all the links of the PCN-domain.
>>>> ->
>>>> U represents the ratio of PCN-supportable-rate to PCN-admissible-rate
>>>> for all
>>>> the links of the PCN-domain.
>>>> ====================================================================
>>>> I also recommend to change the following text as I think it may cause
>>>> misinterpretations (applies both to SM and CL):
>>>> If the difference calculated in the second step is positive, the
>>>> Decision
>>>> Point SHOULD select PCN-flows to terminate, until it determines that the
>>>> PCN-traffic admission rate will no longer be greater than the estimated
>>>> sustainable aggregate rate. If the Decision Point knows the bandwidth
>>>> required by individual PCN-flows (e.g., from resource signalling used to
>>>> establish the flows), it MAY choose to complete its selection of
>>>> PCN-flows to
>>>> terminate in a single round of decisions.
>>>> Alternatively, the Decision Point MAY spread flow termination over
>>>> multiple
>>>> rounds to avoid over-termination. If this is done, it is RECOMMENDED
>>>> that
>>>> enough time elapse between successive rounds of termination to allow the
>>>> effects of previous rounds to be reflected in the measurements upon
>>>> which the
>>>> termination decisions are based. (See [IEEE-Satoh] and sections 4.2
>>>> and 4.3
>>>> of [MeLe10].)
>>>> ->
>>>> If the difference calculated in the second step is positive (traffic
>>>> rate to
>>>> be terminated), the Decision Point SHOULD select PCN-flows to
>>>> terminate. To
>>>> that end, the Decision Point MAY use upper rate limits for individual
>>>> PCN-flows (e.g., from resource signalling used to establish the
>>>> flows) and
>>>> select a set of flows whose sum of upper rate limits is up to the
>>>> traffic
>>>> rate to be terminated. Then, these flows are terminated. The use of
>>>> upper
>>>> limits on flow rates avoids over-termination.
>>>> Termination may be continuously needed after consecutive measurement
>>>> intervals for various
>>>> reasons, e.g., if the used upper rate limits overestimate the actual
>>>> flow rates.
>>>> For such cases it is RECOMMENDED that enough time elapses between
>>>> successive
>>>> termination events to allow the effects of previous termination events
>>>> to be
>>>> reflected in the measurements upon which the termination decisions are
>>>> based;
>>>> otherwise, over-termination may occur. See [IEEE-Satoh] and Sections 4.2
>>>> and
>>>> 4.3 of [MeLe10].
>>>> ====================================================================
>>>> [IEEE-Satoh] is not a good key for Daisuke's work as the prefix "IEEE"
>>>> makes it look like a reference to a standards document.
>>>> You better use [SaUe10] or [Satoh10]. Applies both to CL and SM.
>>>> Best wishes,
>>>> Michael
>>>> Am 02.01.2012 15:21, schrieb Tom Taylor:
>>>>> It shall be as you say, subject to comment from my co-authors when
>>>>> they get back from holiday.
>>>>> On 01/01/2012 5:43 PM, Joel M. Halpern wrote:
>>>>>> In-line...
>>>>>> On 1/1/2012 4:06 PM, Tom Taylor wrote:
>>>>>>> On 01/01/2012 2:58 PM, Joel M. Halpern wrote:
>>>>>>>> Thank you for responding promptly Tom. Let me try to elaborate on
>>>>>>>> the
>>>>>>>> two issues where I was unclear.
>>>>>>>> On the ingress-egress-aggregate issue and ECMP, the concern I
>>>>>>>> have is
>>>>>>>> relative to the third operational alternative where routing is
>>>>>>>> used to
>>>>>>>> determine where the ingress and egress of a flow is. To be blunt,
>>>>>>>> as far
>>>>>>>> as I can tell this does not work.
>>>>>>>> 1) It does not work on the ingress side because traffic from a given
>>>>>>>> source prefix can come in at multiple places. Some of these
>>>>>>>> places may
>>>>>>>> claim reachability to the source prefix. Some may not. While a given
>>>>>>>> flow will use only one of these paths, there is no way to determine
>>>>>>>> from
>>>>>>>> routing information, at the egress, which ingress that flow used.
>>>>>>>> 2) A site may use multiple exits for a given destination prefix.
>>>>>>>> Again,
>>>>>>>> while the site will only use one of these egresses for a given flow,
>>>>>>>> there is no way for the ingress to know which egress it will be
>>>>>>>> on the
>>>>>>>> basis of routing information.
>>>>>>>> Thus, the text seems to allow for a behavior that simply does not
>>>>>>>> work.
>>>>>>> [PTT] I think the disconnect here is that you read the text to say
>>>>>>> that
>>>>>>> an individual node uses routing information to determine the IEA.
>>>>>>> That
>>>>>>> wasn't the intention. Instead, administrators use routing
>>>>>>> information to
>>>>>>> derive filters that are installed at the ingress and egress nodes.
>>>>>> As far as I can tell, your response describes a situation even less
>>>>>> effective than what I assumed.
>>>>>> Firstly, it does not matter whether it is the edge node, the decision
>>>>>> node, or the human administrator. Routing information is not enough to
>>>>>> determine what the ingress-egress pairing is. The problems I describe
>>>>>> above apply no matter who is making the decision.
>>>>>> Secondly, having a human make the decision means that as soon as
>>>>>> routing
>>>>>> changes, the configured filters are wrong.
>>>>>> I would suggest that the text in question be removed, and replaced
>>>>>> with
>>>>>> a warning against attempting what is currently described.
>>>> My view is also that CL ans SM do not work in the presence of ECMP. This
>>>> should be indicated as a warning.
>>>>>>>> I am still confused about the relationship of section 3.3.2 to the
>>>>>>>> behavior you describe. 3.3.2 says that as long as any excess
>>>>>>>> traffic is
>>>>>>>> being reported, teh decision point shall direct the blocking of
>>>>>>>> additional flows. That does not match 3.3.1, and does not match your
>>>>>>>> description.
>>>>>>> [PTT] I can't see the text in section 3.3.2 that says you continue to
>>>>>>> block as long as any excess traffic is being reported. What I
>>>>>>> think it
>>>>>>> says is that as long as excess traffic is reported, the decision
>>>>>>> point
>>>>>>> checks to see whether the traffic being admitted to the aggregate
>>>>>>> exceeds the supportable level. Excess traffic may be non-zero, yet no
>>>>>>> termination may be required (i.e., traffic is below the second
>>>>>>> threshold).
>>>>>> I think I see what you are saying. If I am reading this correctly, the
>>>>>> decision process must re-calculate to determine if there is
>>>>>> termination
>>>>>> every time it receives a report with non-zero excess and the port is
>>>>>> already blocked. But it does not have to actually block anything.
>>>>>> This however seems to depend upon the correct relative
>>>>>> configuration of
>>>>>> the limit that flips it into blocked state, the value of U, and maybe
>>>>>> some other values.
>>>>>> Put differently, I understand that the two are not contradictory.
>>>>>> However, since the two things use different calculations, it is not at
>>>>>> all clear that they are consistent. This may well be acceptable.
>>>>>> But the
>>>>>> difference in methods is likely to lead to confusion. So, as a minor
>>>>>> (rather than major) comment, I would suggest that you provide
>>>>>> clarifying
>>>>>> text explaining why it is okay to use one condition to decide if there
>>>>>> is blocking, but a different condition (which could produce a lower
>>>>>> threshold) to decide how much to get rid of.
>>>>>> Yours,
>>>>>> Joel
>>>>>>>> Yours,
>>>>>>>> Joel
>>>>>>>> On 1/1/2012 2:48 PM, Tom Taylor wrote:
>>>>>>>>> Thanks for the review, Joel. Comments below, marked with [PTT].
>>>>>>>>> On 31/12/2011 4:50 PM, Joel M. Halpern wrote:
>>>>>>>>>> I am the assigned Gen-ART reviewer for this draft. For
>>>>>>>>>> background on
>>>>>>>>>> Gen-ART, please see the FAQ at
>>>>>>>>>> <>.
>>>>>>>>>> Please resolve these comments along with any other Last Call
>>>>>>>>>> comments
>>>>>>>>>> you may receive.
>>>>>>>>>> Document: draft-ietf-pcn-sm-edge-behaviour-08
>>>>>>>>>> PCN Boundary Node Behaviour for the Single Marking (SM) Mode of
>>>>>>>>>> Operation
>>>>>>>>>> Reviewer: Joel M. Halpern
>>>>>>>>>> Review Date: 31-Dec-2011
>>>>>>>>>> IETF LC End Date: 13-Jan-2012
>>>>>>>>>> IESG Telechat date: N/A
>>>>>>>>>> Summary: This documents is almost ready for publication as an
>>>>>>>>>> Informational RFC.
>>>>>>>>>> Question: Given that the document defines a complex set of
>>>>>>>>>> behaviors,
>>>>>>>>>> which are mandatory for compliant systems, it seems that this
>>>>>>>>>> ought to
>>>>>>>>>> be Experimental rather than Informational. It describes something
>>>>>>>>>> that
>>>>>>>>>> could, in theory, later become standards track.
>>>>>>>>> [PTT] OK, we've wobbled on this one, but we can follow your
>>>>>>>>> suggestion.
>>>>>>>>>> Major issues:
>>>>>>>>>> Section 2 on Assumed Core Network Behavior for SM, in the third
>>>>>>>>>> bullet,
>>>>>>>>>> states that the PCN-domain satisfies the conditions specified
>>>>>>>>>> in RFC
>>>>>>>>>> 5696. Unfortunately, look at RFC 5696 I can not tell what
>>>>>>>>>> conditions
>>>>>>>>>> these are. Is this supposed to be a reference to RFC 5559
>>>>>>>>>> instead? No
>>>>>>>>>> matter which document it is referencing, please be more specific
>>>>>>>>>> about
>>>>>>>>>> which section / conditions are meant.
>>>>>>>>> [PTT] You are right that RFC 5696 isn't relevant. It's such a long
>>>>>>>>> time
>>>>>>>>> since that text was written that I can't recall what the intention
>>>>>>>>> was.
>>>>>>>>> My inclination at the moment is simply to delete the bullet.
>>>>>>>>>> It would have been helpful if the early part of the document
>>>>>>>>>> indicated
>>>>>>>>>> that the edge node information about how to determine
>>>>>>>>>> ingress-egress-aggregates was described in section 5.
>>>>>>>>>> In conjunction with that, section 5.1.2, third paragraph, seems to
>>>>>>>>>> describe an option which does not seem to quite work. After
>>>>>>>>>> describing
>>>>>>>>>> how to use tunneling, and how to work with signaling, the text
>>>>>>>>>> refers to
>>>>>>>>>> inferring the ingress-egress-aggregate from the routing
>>>>>>>>>> information. In
>>>>>>>>>> the presence of multiple equal-cost domain exits (which does
>>>>>>>>>> occur in
>>>>>>>>>> reality), the routing table is not sufficient information to make
>>>>>>>>>> this
>>>>>>>>>> determination. Unless I am very confused (which does happen) this
>>>>>>>>>> seems
>>>>>>>>>> to be a serious hole in the specification.
>>>>>>>>> [PTT] I'm not sure what the issue is here. As I understand it,
>>>>>>>>> operators
>>>>>>>>> don't assign packets randomly to a given path in the presence of
>>>>>>>>> alternatives -- they choose one based on values in the packet
>>>>>>>>> header.
>>>>>>>>> The basic intent is that packets of a given microflow all follow
>>>>>>>>> the
>>>>>>>>> same path, to prevent unnecessary reordering and minimize
>>>>>>>>> jitter. The
>>>>>>>>> implication is that filters can be defined at the ingress nodes to
>>>>>>>>> identify the packets in a given ingress-egress-aggregate (i.e.
>>>>>>>>> flowing
>>>>>>>>> from a specific ingress node to a specific egress node) based on
>>>>>>>>> their
>>>>>>>>> header contents. The filters to do the same job at egress nodes
>>>>>>>>> are a
>>>>>>>>> different problem, but they are not affected by ECMP.
>>>>>>>>>> Minor issues:
>>>>>>>>>> Section 3.3.1 states that the "block" decision occurs when the CLE
>>>>>>>>>> (excess over total) rate exceeds the configured limit. However,
>>>>>>>>>> section
>>>>>>>>>> 3.3.2 states that the decision node must take further stapes if
>>>>>>>>>> the
>>>>>>>>>> excess rate is non-zero in further reports. Is this inconsistency
>>>>>>>>>> deliberate? If so, please explain. If not, please fix. (If it is
>>>>>>>>>> important to drive the excess rate to 0, then why is action only
>>>>>>>>>> initiated when the ratio is above a configured value, rather than
>>>>>>>>>> any
>>>>>>>>>> non-zero value? I can conceive of various reasons. But none are
>>>>>>>>>> stated.)
>>>>>>>>> [PTT] We aren't driving the excess rate to zero, but to a value
>>>>>>>>> equal to
>>>>>>>>> something less than (U - 1)/U. (The "something less" is because of
>>>>>>>>> packet dropping at interior nodes.) The assumption is that (U -
>>>>>>>>> 1)/U is
>>>>>>>>> greater than CLE-limit. Conceptually, PCN uses two thresholds.
>>>>>>>>> When the
>>>>>>>>> CLE is below the first threshold, new flows are admitted. Above
>>>>>>>>> that
>>>>>>>>> threshold, they are blocked. When the CLE is above the second
>>>>>>>>> threshold,
>>>>>>>>> flows are terminated to bring them down to that threshold. In
>>>>>>>>> the SM
>>>>>>>>> mode of operation, the first threshold is specified directly on a
>>>>>>>>> per-link basis by the value CLE-limit. The second threshold is
>>>>>>>>> specified
>>>>>>>>> by the same value (U - 1)/U for all links. With the CL mode of
>>>>>>>>> operation
>>>>>>>>> the second threshold is also specified directly for each link.
>>>>>>>>>> Nits/editorial comments:
> _______________________________________________
> Gen-art mailing list