Re: [Pce] Preliminary feedback on I.-D. draft-pouyllau-pce-enhanced-errors-02.txt

"POUYLLAU, HELIA (HELIA)" <helia.pouyllau@alcatel-lucent.com> Fri, 23 July 2010 11:43 UTC

Return-Path: <helia.pouyllau@alcatel-lucent.com>
X-Original-To: pce@core3.amsl.com
Delivered-To: pce@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 2F6FA3A69F0 for <pce@core3.amsl.com>; Fri, 23 Jul 2010 04:43:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.949
X-Spam-Level:
X-Spam-Status: No, score=-5.949 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, HELO_EQ_FR=0.35, J_CHICKENPOX_82=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ptj8UStN2zuC for <pce@core3.amsl.com>; Fri, 23 Jul 2010 04:43:11 -0700 (PDT)
Received: from smail2.alcatel.fr (smail2.alcatel.fr [62.23.212.57]) by core3.amsl.com (Postfix) with ESMTP id 96E0F3A6A62 for <pce@ietf.org>; Fri, 23 Jul 2010 04:43:10 -0700 (PDT)
Received: from FRMRSSXCHHUB02.dc-m.alcatel-lucent.com (FRMRSSXCHHUB02.dc-m.alcatel-lucent.com [135.120.45.62]) by smail2.alcatel.fr (8.14.3/8.14.3/ICT) with ESMTP id o6NBhREh021606 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for <pce@ietf.org>; Fri, 23 Jul 2010 13:43:27 +0200
Received: from FRMRSSXCHMBSA1.dc-m.alcatel-lucent.com ([135.120.45.38]) by FRMRSSXCHHUB02.dc-m.alcatel-lucent.com ([135.120.45.62]) with mapi; Fri, 23 Jul 2010 13:43:27 +0200
From: "POUYLLAU, HELIA (HELIA)" <helia.pouyllau@alcatel-lucent.com>
To: "pce@ietf.org" <pce@ietf.org>
Date: Fri, 23 Jul 2010 13:43:26 +0200
Thread-Topic: [Pce] Preliminary feedback on I.-D. draft-pouyllau-pce-enhanced-errors-02.txt
Thread-Index: AcspiQR5OGrjx8jZQXyJR9g6OYtWHAABuFFw
Message-ID: <2D26984ED849C94A8C7D6276DA008FF2125CC5AD2C@FRMRSSXCHMBSA1.dc-m.alcatel-lucent.com>
References: <4C481E93.4020207@cttc.es>
In-Reply-To: <4C481E93.4020207@cttc.es>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: fr-FR, en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.64 on 155.132.188.80
Subject: Re: [Pce] Preliminary feedback on I.-D. draft-pouyllau-pce-enhanced-errors-02.txt
X-BeenThere: pce@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Path Computation Element <pce.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/pce>, <mailto:pce-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/pce>
List-Post: <mailto:pce@ietf.org>
List-Help: <mailto:pce-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pce>, <mailto:pce-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Jul 2010 11:43:13 -0000

Dear Ramon,

Thank you for your feedback. I will try to comment the main points you raise in your mail and will study later your detailed comments:

1) If I understand well, one of your concerns is about the use of error and notification type codes rather than TLVs. On that point you argue that in other documents, error-type is more related to a family of errors and the value to a "refinement", in the draft it is related to a kind of processing indication.

The proposal intends to allow the definition of specific errors and notifications but in the frame of normalized processing information. So you proposition about the definition of objects which could be combined in order to define this specific process is inline with the I-D objectives. I see no strong argument against it. Flags of PCEP-ERROR and NOTIFICATION object could also be used.

Any other opinion on that point?

2) You first comment that "an error should cause the shutdown 
of the connection, is an attribute of every particular concrete error 
pair, and belongs to its description". In the I-D, we have indeed defined some kind of "warning". Well, this is because when you read RFC5440 the fact that an error causes the shutdown of the connection is not always mentioned. But, as you say further, the error-type which cause the connection to close is rather an information that the connection will be closed by the PCE which sent the error messages (see section 5.3 and 5.4 as illustrations). But you are right about your comment on section 5: it is not clear whether a "warning" should be defined as an error or a notification specific to a request. Hence, we have kept both options in the I-D but it is certainly true that maybe only one option should suffice.

3) Actually the use of the DLO is related to notification-type 5 of the I-D (propagation of the notification) and its purpose to avoid flooding. In section 4.3.2 it is explained that it must be used with this type. The use with other types is optional and maybe some "SHOULD NOT" could be added for some types. About you last remark on how PCC could be identified, it could be indeed achieved by pre-configuration. The purpose of the "not to PCC" option is to avoid flooding the routers of a routing domain - which would appear in the DLO -, which have the PCC feature but not the PCE.


Any other feedback on these issues and others is welcomed,

Thanks again Ramon for your detailed comments, it will help to launch discussions in Maastricht and to progress on the I-D.

Best regards,
Hélia

-----Message d'origine-----
De : pce-bounces@ietf.org [mailto:pce-bounces@ietf.org] De la part de Ramon Casellas
Envoyé : jeudi 22 juillet 2010 12:34
À : pce@ietf.org
Objet : [Pce] Preliminary feedback on I.-D. draft-pouyllau-pce-enhanced-errors-02.txt

  > We have submitted an updated version of draft 
http://www.ietf.org/id/draft-pouyllau-pce-enhanced-errors-02.txt
 > The major update is - as mentioned during IETF 77 meeting - a new 
section describing some potential scenarios of usages of the error and 
notification types specified by the draft.

Dear Draft authors, all

As requested, please find below some preliminary feedback on the draft. 
I am not aware of previous comments so apologies in advance if some of 
the questions I raise have been addressed.
I considered that some early feedback may be appropriate in view of 
Maastricht, but some of the points may not be valid be due to my yet 
limited comprehension of the draft. Some comments also reflect my 
subjective views :)


As a short executive summary
===========================

I would agree that extending PCEP to convey error / notification 
processing indications (fatal, not fatal, transient, and to be 
forwarded) and secondarily related information (ncluding a list of 
potential targets) could useful but I see a few open issues in its 
current form, as detailed below. I also agree that the fact that 
different PCEs have different support for errors is a good use case for 
the applicability of the proposed extensions

* First, is the fact that the error hierarchy type/value as found in 
RFC5440 and other normative documents is no longer maintained. In those 
aforementioned docs, the error type indicates a "family" of errors and 
the value is a "refinement". Instead, in the proposed I.-D. a given 
error type indicates a "processing indication" rather than the actual 
error cause, which is delegated to the only the error value itself? did 
I miss something? (see point in Section 4)

* Second, I would think (as it seems the case in other errors) that the 
notion of "unrecoverable" and whether an error should cause the shutdown 
of the connection, is an attribute of every particular concrete error 
pair, and belongs to its description. For example, receiving a message 
with protocol version ">1" triggers an error and causes the PCEP 
connection to close. I would also think that closing the TCP/PCEP 
connection is a local decision, and I am not sure of the idea of "this 
is a serious error and _you_ should close the connection", rather "this 
is a serious error and _I_'ll be closing the connection" (on a side 
note, I always thought that the implication that the endpoint receiving 
a CLOSE message should close the connection does not prevent the PCE 
from closing it itself.)

* Third, the DLO seems a means to indicate the set of destinations for 
an error/notification but raises questions on how to map data (or even 
control) plane resources to PCEs responsible for them and the fact that 
in a chain of requests, the history is lost. I still fail to see 
concrete cases for DLO other than ("all your other peers") which is 
still not clear to me and could lead to flooding and more complex 
procedures.

In short, in my particular humble view, giving "context" to errors where 
"context" means "processing indications" or "destination targets" or 
"conveys more detailed fine grained information" could use for example 
TLVs rather than blocking several error types for this purpose (as an 
illustrating example, we often add a ASCII_TLV to PCEP_ERROR objects to 
add some descriptive error cause). Ideally, the actual error (x/y) 
should be descriptive enough to state whether a) the error needs to be 
forwarded as is or not b) the error should cause the connection to be 
shutdown and c) the error/notification could, eventually, be interpreted 
as a warning without implying that the request has been canceled.



Detailed review
=========================

Section 2.2.
----------------
* Section 2.2 is, in my opinion, a bit misleading and confusing, when 
discussing the dead-timer (although it does not impact the rest of the 
I.-D). The dead-timer is not related to the latency to get a response to 
a request (PCRep message), but the ability to detect a dead connection, 
which can be kept alive by means of KeepAlives. It is possible to have 
latencies in path computations higher than deadtimers, where KeepAlives 
are used accordingly. Thus, the text "If the PCC does not receive any 
reply before the dead timer is out" and the text "it supposes that the 
deadtimer is long enough to support end to end distributed path 
computation" is, imho, not right. The latency to get a Reply is 
implementation defined. These are two orthogonal aspects, the deadtimer 
is local to a given adjacency, and can be very low yet the end to end 
path computation be notably higher.

* I would suggest changing PCEReq and PCERep by PCReq and PCRep (seems 
to be the common notation)

* Rather than stating "there are two types of Notifications", I would 
suggest "[RFC5440] defines two types of notifications".

Section 2.3
----------------

* Also related to a bullet in Section 4, note that the RP (rplist) bound 
to PCEP_ERROR object is also optional, as the I-D says is the case for 
notifications.

Section 4
------------
* "PCE errors are always request specific". Why are you assuming this? 
The RP list is optional and nothing prevents you from adding new error 
types and new error values for, for example, PCE status. Clearly, 
RFC5440 states "The PCErr message is sent.in response. or unsolicited 
manner" (for what is worth, the parsing and treatment of PCNtf and PCErr 
messages is quite similar in our case)

* Minor detail: Error Types 16 / 17 are already assigned to P2MP - could 
be worth adding a IANA TBD

* Typo: must backward -> must send backwards / forwards

* One of my main concerns of using error types 16-19 is the following: 
in all other cases the error type conveys some kind of 
meaning/semantics. Error "X" means that e.g. "P2MP" has failed. Error 
types are a "family of errors" and error values are a "refinement". 
Using OOP terms one could conceive a hierarchy with a base class, error 
types inheriting base class and error types/values ("concrete errors") 
another refinement. In the proposed approach this is no longer the case. 
Personally, I would be for the idea of extending PCEP for support of 
"status quo" / "propagation" as proposed in the draft, but by means of 
TLVs. Since PCEP_ERROR objects support optional TLVs, Errors can be 
tagged with TLVs specifying error processing rules (which, in turn, 
could be applied to existing error pairs). In other words: processing 
indications can be conveyed by other means, while leaving error 
type/value pairs to indicate concrete errors (with a two level hierarchy).

* If an error / notification is a "warning" and a response is still to 
be expected, I don't know whether an alternative means could be to 
extend the RBNF of PCRep to allow for such a warning to be "embedded" in 
the response, rather than a separate PCErr message that is sent 
previously. Likewise, I would think that whether the fact of sending an 
error or a notification "cancels" or "implies that no response is to be 
expected" or, on the contrary "a response is to be expected" is either a 
property of a concrete error or notification pair, or, in a more strict 
manner, PCEP mantains the motto : "a request causes a reponse or an 
error or a notification. The response can be positive (1+ paths) or 
negative(no path). An error always implies a no response (and as far as 
i know, also the notification bound to a RP) and warnings are embedded 
in the response (as when adding an object with the I flag which means 
"warning I was not able to take this into account".


Diffusion List object (DLO)
------------------------------

The purpose of a DLO seems to be to indicate the a generic set of 
"destinations" for a given error / notification. I am still not sure of 
this (I may need more time to digest the use cases). At first sight, I 
would say that this is a good example of the need of generalizing the 
use of PCEid identifiers (using IPv4 / IPv6 addresses) that refer to 
PCEs regardless of their actual IP addresses (or addresses used in the 
actual TCP connection), playing a similar role to router addresses / 
node Ids/ router ids/ loopbacks, etc. A DLO could indeed be, in its 
simplest form, the list of PCEs to be notified (something similar to the 
MONITORING document). the I.-D. proposes using ERO-like objects for the 
DLO, which raises the questions of : a) mixes data a control plane 
entities and b) mapping data plane resources (e.g. an unnumbered 
interface ID) to the one (or more) PCEs responsible for the domain where 
that resource is found.

Another of my main concerns is that I still fail to see the use cases 
where a given PCE could send an error including a DLO with a set of 
peers for which it has no visibility (typically, a given PCE involved in 
a PCE or domain chain does not have the information of the "history" of 
the end-to-end request. At most, one could keep the original endpoints, 
but that's all.

Finally, I am not sure of the procedure regarding DLO when a deployment 
opts for "non persistant" connections, in which the TCP/PCEP connection 
is closed after every request. The reception of a DLO including a peer 
for which there is no PCEP session in the UP state should : a) trigger 
the establishment of the connection for the sole purposes of forwarding 
an error? B) trigger an error to the peer sending the ERROR with the DLO 
(this imho should not be the case to avoid the error on error)

How do you plan to recognize that the remote endpoint of a TCP/PCEP 
connection is a PCC or a PCE? (this is implied by the use of TT field in 
DLO). PCEP does not easily provide a means to recognize it unless 
somehow pre-configured.


Section 5
---------------

* Section 5.1 In the description of Error Type 16, it is said that Error 
type 16, not critical, implies that the PCEP session needs not to be 
shutdown. I tend to think this is a property of every concrete error 
pair. Also I don't agree with the example: a Metric with a bound value 
of -1 is an encoding error, it is a "contract" a PCE may not be able to 
fulfill (especially if processing bit is set to 1) and must trigger an 
error, it should not be allowed to "expect a response". In general there 
are two different aspects: whether an TCP/PCEP connection must be 
shutdown after a given error (it is somehow related to the "severity" of 
the error: a malformed message, a different PCEP version, etc) and 
whether the error is a "warning" or not. I would agree with the 
extension of "warnings" meaning that the response "may follow". I don't 
agree with the example. However, warnings could also be conveyed by (yet 
to be defined) notifications.

* Section 5.2 like mentioned before, the ability to "propagate upstream" 
an error is a good addition, but it may be a property of a given error 
(modified by policies of the PCEs in the chain)


Typos
----------
* Section 2.1.1 - maybe the word "throw" is too exception-specific, I 
would suggest "returns"

* If PCE. (unfinished sentence/typo)

* . and sends back / and send back (could send back)

. in details -> in detail



I hope this is somehow useful, and open to discussion. Thanks for 
reading and best regards

Ramon.



-- 
Ramon Casellas, Ph.D.
Research Associate - Optical Networking Area -- http://wikiona.cttc.es
CTTC - Centre Tecnològic de Telecomunicacions de Catalunya, PMT Ed B4
Av. Carl Friedrich Gauss 7 08860 Castelldefels (Barcelona) - Spain
Tel.: +34 93 645 29 16 -- Fax. +34 93 645 29 01

_______________________________________________
Pce mailing list
Pce@ietf.org
https://www.ietf.org/mailman/listinfo/pce