Re: [Pce] Preliminary feedback on I.-D. draft-pouyllau-pce-enhanced-errors-02.txt

Gino Carrozzo <g.carrozzo@nextworks.it> Sat, 24 July 2010 11:46 UTC

Return-Path: <g.carrozzo@nextworks.it>
X-Original-To: pce@core3.amsl.com
Delivered-To: pce@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D4D213A69B1 for <pce@core3.amsl.com>; Sat, 24 Jul 2010 04:46:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.119
X-Spam-Level:
X-Spam-Status: No, score=-0.119 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_IT=0.635, HOST_EQ_IT=1.245, J_CHICKENPOX_82=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y-rTYzQ2BTVi for <pce@core3.amsl.com>; Sat, 24 Jul 2010 04:46:39 -0700 (PDT)
Received: from mercurio.nextworks.it (mercurio.nextworks.it [195.250.30.136]) by core3.amsl.com (Postfix) with ESMTP id 3E2753A698C for <pce@ietf.org>; Sat, 24 Jul 2010 04:46:38 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mercurio.nextworks.it (Postfix) with ESMTP id 452F4304002 for <pce@ietf.org>; Sat, 24 Jul 2010 13:47:30 +0200 (CEST)
X-Virus-Scanned: amavisd-new at nextworks.it
Received: from mercurio.nextworks.it ([127.0.0.1]) by localhost (mercurio.nextworks.it [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fpAYnB1AGeSg for <pce@ietf.org>; Sat, 24 Jul 2010 13:47:26 +0200 (CEST)
Received: from [1.92.119.33] (unknown [94.165.53.119]) by mercurio.nextworks.it (Postfix) with ESMTP id 1827E304001 for <pce@ietf.org>; Sat, 24 Jul 2010 13:47:22 +0200 (CEST)
Message-ID: <4C4AD2A4.1030404@nextworks.it>
Date: Sat, 24 Jul 2010 13:46:44 +0200
From: Gino Carrozzo <g.carrozzo@nextworks.it>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.7) Gecko/20100713 Lightning/1.0b2 Thunderbird/3.1.1
MIME-Version: 1.0
To: pce@ietf.org
References: <4C481E93.4020207@cttc.es> <2D26984ED849C94A8C7D6276DA008FF2125CC5AD2C@FRMRSSXCHMBSA1.dc-m.alcatel-lucent.com>
In-Reply-To: <2D26984ED849C94A8C7D6276DA008FF2125CC5AD2C@FRMRSSXCHMBSA1.dc-m.alcatel-lucent.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Pce] Preliminary feedback on I.-D. draft-pouyllau-pce-enhanced-errors-02.txt
X-BeenThere: pce@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Path Computation Element <pce.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/pce>, <mailto:pce-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/pce>
List-Post: <mailto:pce@ietf.org>
List-Help: <mailto:pce-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pce>, <mailto:pce-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 24 Jul 2010 11:46:42 -0000

Dear Hélia, authors

your proposal for augmenting processing-related informations among 
multiple PCEs is very interesting and may help to improve the PC task in 
multi-domain or hierarchical PCE contexts.

However, I share Ramon's concerns about (1) the use of PCErr for sending 
non-failure (processing) related notifications/warnings, and (2) the 
mixing of data plane and PCE identifiers in DLO.

Concerning PCErr semantic, I'd prefer to use processing warnings 
conveyed in PCEP-ERROR TLVs by delegating the propagation scope of the 
warning contents to specific flags. TLVs may also specify details on the 
processing failure cause (e.g. a missing switching resource in the data 
plane, PCE congestions, etc.)

About DLO contents, I'm not so convinced about the applicability of the 
'Unnumbered Interface ID' for notifications as in case of IRO (path 
requests/responses). It is a topology detail, not necessarily meaningful 
for all the PCEs, and its resolution would imply complex lookups in the 
topology map for each object and for each notification.

Other comments I have on the ID are briefly listed below.

* In case of notification type 5 (fwd to all or a list of PCEP peers) 
you mention a max number of PCEP peers to be notified to avoid flooding. 
Due to the potential interest of all the known PCEs in that info and its 
hopefully limited changing frequency, it'd be probably better to make 
use of the IGP instead (e.g. previously used for PCE discovery). This 
could benefit of an inherent flooding mechanism and avoid the max number 
of notify recipients (who's in, who's out?).

* Concerning propagation restrictions (section 4.3), may an 
aggregation/correlation mechanisms for errors/notifications be also 
useful/needed? It’d be used to aggregate multiple concurrent errors and 
notifications reaching a PCE in the same multi-domain request context in 
a given time window.

* in section 2, multi-vendor and homogeneous multi-domain PCEs are 
mentioned as possible motivation scenarios for the ID.
Does ‘homogeneous’ stand here for ‘the same PCE in all domains/ASes’? If 
yes, may the behaviour variation in error/notification processing depend 
on the different topological views by PCEs in this case?

* section 2.2 seems to presents just PCEP foundations which should be 
known to the reader through RFC5440 citation. Maybe it is redundant in 
this ID.

Hope this can help during the upcoming IETF meeting.

Best regards
Gino
-- 
==============================================================
   Dr. Gino Carrozzo
   R&D Project Manager

   Ph     : +39 050 3871679
   Fax    : +39 050 3871601
   e-mail : g.carrozzo@nextworks.it
   address: Nextworks s.r.l.
            via Turati, 43/45
            56125 Pisa, ITALY (IT)
==============================================================


On 23/07/2010 13.43, POUYLLAU, HELIA (HELIA) wrote:
> Dear Ramon,
>
> Thank you for your feedback. I will try to comment the main points you raise in your mail and will study later your detailed comments:
>
> 1) If I understand well, one of your concerns is about the use of errorand notification type codes rather than TLVs. On that point you argue that in other documents, error-type is more related to a family of errors and the value to a "refinement", in the draft it is related to a kind of processing indication.
>
> The proposal intends to allow the definition of specific errors and notifications but in the frame of normalized processing information. So you proposition about the definition of objects which could be combined in order to define this specific process is inline with the I-D objectives. I see no strong argument against it. Flags of PCEP-ERROR and NOTIFICATION object could also be used.
>
> Any other opinion on that point?
>
> 2) You first comment that "an error should cause the shutdown
> of the connection, is an attribute of every particular concrete error
> pair, and belongs to its description". In the I-D, we have indeed defined some kind of "warning". Well, this is because when you read RFC5440 the fact that an error causes the shutdown of the connection is not always mentioned. But, as you say further, the error-type which cause the connection to close is rather an information that the connection will be closedby the PCE which sent the error messages (see section 5.3 and 5.4 as illustrations). But you are right about your comment on section 5: it is notclear whether a "warning" should be defined as an error or a notification specific to a request. Hence, we have kept both options in the I-D but it is certainly true that maybe only one option should suffice.
>
> 3) Actually the use of the DLO is related to notification-type 5 of theI-D (propagation of the notification) and its purpose to avoid flooding.In section 4.3.2 it is explained that it must be used with this type. The use with other types is optional and maybe some "SHOULD NOT" could be added for some types. About you last remark on how PCC could be identified, it could be indeed achieved by pre-configuration. The purpose of the "not to PCC" option is to avoid flooding the routers of a routing domain - which would appear in the DLO -, which have the PCC feature but not the PCE.
>
>
> Any other feedback on these issues and others is welcomed,
>
> Thanks again Ramon for your detailed comments, it will help to launch discussions in Maastricht and to progress on the I-D.
>
> Best regards,
> Hélia
>
> -----Message d'origine-----
> De : pce-bounces@ietf.org [mailto:pce-bounces@ietf.org] De la part de Ramon Casellas
> Envoyé : jeudi 22 juillet 2010 12:34
> À : pce@ietf.org
> Objet : [Pce] Preliminary feedback on I.-D. draft-pouyllau-pce-enhanced-errors-02.txt
>
>    >  We have submitted an updated version of draft
> http://www.ietf.org/id/draft-pouyllau-pce-enhanced-errors-02.txt
>   >  The major update is - as mentioned during IETF 77 meeting - a new
> section describing some potential scenarios of usages of the error and
> notification types specified by the draft.
>
> Dear Draft authors, all
>
> As requested, please find below some preliminary feedback on the draft.
> I am not aware of previous comments so apologies in advance if some of
> the questions I raise have been addressed.
> I considered that some early feedback may be appropriate in view of
> Maastricht, but some of the points may not be valid be due to my yet
> limited comprehension of the draft. Some comments also reflect my
> subjective views :)
>
>
> As a short executive summary
> ===========================
>
> I would agree that extending PCEP to convey error / notification
> processing indications (fatal, not fatal, transient, and to be
> forwarded) and secondarily related information (ncluding a list of
> potential targets) could useful but I see a few open issues in its
> current form, as detailed below. I also agree that the fact that
> different PCEs have different support for errors is a good use case for
> the applicability of the proposed extensions
>
> * First, is the fact that the error hierarchy type/value as found in
> RFC5440 and other normative documents is no longer maintained. In those
> aforementioned docs, the error type indicates a "family" of errors and
> the value is a "refinement". Instead, in the proposed I.-D. a given
> error type indicates a "processing indication" rather than the actual
> error cause, which is delegated to the only the error value itself? did
> I miss something? (see point in Section 4)
>
> * Second, I would think (as it seems the case in other errors) that the
> notion of "unrecoverable" and whether an error should cause the shutdown
> of the connection, is an attribute of every particular concrete error
> pair, and belongs to its description. For example, receiving a message
> with protocol version ">1" triggers an error and causes the PCEP
> connection to close. I would also think that closing the TCP/PCEP
> connection is a local decision, and I am not sure of the idea of "this
> is a serious error and _you_ should close the connection", rather "this
> is a serious error and _I_'ll be closing the connection" (on a side
> note, I always thought that the implication that the endpoint receiving
> a CLOSE message should close the connection does not prevent the PCE
> from closing it itself.)
>
> * Third, the DLO seems a means to indicate the set of destinations for
> an error/notification but raises questions on how to map data (or even
> control) plane resources to PCEs responsible for them and the fact that
> in a chain of requests, the history is lost. I still fail to see
> concrete cases for DLO other than ("all your other peers") which is
> still not clear to me and could lead to flooding and more complex
> procedures.
>
> In short, in my particular humble view, giving "context" to errors where
> "context" means "processing indications" or "destination targets" or
> "conveys more detailed fine grained information" could use for example
> TLVs rather than blocking several error types for this purpose (as an
> illustrating example, we often add a ASCII_TLV to PCEP_ERROR objects to
> add some descriptive error cause). Ideally, the actual error (x/y)
> should be descriptive enough to state whether a) the error needs to be
> forwarded as is or not b) the error should cause the connection to be
> shutdown and c) the error/notification could, eventually, be interpreted
> as a warning without implying that the request has been canceled.
>
>
>
> Detailed review
> =========================
>
> Section 2.2.
> ----------------
> * Section 2.2 is, in my opinion, a bit misleading and confusing, when
> discussing the dead-timer (although it does not impact the rest of the
> I.-D). The dead-timer is not related to the latency to get a response to
> a request (PCRep message), but the ability to detect a dead connection,
> which can be kept alive by means of KeepAlives. It is possible to have
> latencies in path computations higher than deadtimers, where KeepAlives
> are used accordingly. Thus, the text "If the PCC does not receive any
> reply before the dead timer is out" and the text "it supposes that the
> deadtimer is long enough to support end to end distributed path
> computation" is, imho, not right. The latency to get a Reply is
> implementation defined. These are two orthogonal aspects, the deadtimer
> is local to a given adjacency, and can be very low yet the end to end
> path computation be notably higher.
>
> * I would suggest changing PCEReq and PCERep by PCReq and PCRep (seems
> to be the common notation)
>
> * Rather than stating "there are two types of Notifications", I would
> suggest "[RFC5440] defines two types of notifications".
>
> Section 2.3
> ----------------
>
> * Also related to a bullet in Section 4, note that the RP (rplist) bound
> to PCEP_ERROR object is also optional, as the I-D says is the case for
> notifications.
>
> Section 4
> ------------
> * "PCE errors are always request specific". Why are you assuming this?
> The RP list is optional and nothing prevents you from adding new error
> types and new error values for, for example, PCE status. Clearly,
> RFC5440 states "The PCErr message is sent.in response. or unsolicited
> manner" (for what is worth, the parsing and treatment of PCNtf and PCErr
> messages is quite similar in our case)
>
> * Minor detail: Error Types 16 / 17 are already assigned to P2MP - could
> be worth adding a IANA TBD
>
> * Typo: must backward ->  must send backwards / forwards
>
> * One of my main concerns of using error types 16-19 is the following:
> in all other cases the error type conveys some kind of
> meaning/semantics. Error "X" means that e.g. "P2MP" has failed. Error
> types are a "family of errors" and error values are a "refinement".
> Using OOP terms one could conceive a hierarchy with a base class, error
> types inheriting base class and error types/values ("concrete errors")
> another refinement. In the proposed approach this is no longer the case.
> Personally, I would be for the idea of extending PCEP for support of
> "status quo" / "propagation" as proposed in the draft, but by means of
> TLVs. Since PCEP_ERROR objects support optional TLVs, Errors can be
> tagged with TLVs specifying error processing rules (which, in turn,
> could be applied to existing error pairs). In other words: processing
> indications can be conveyed by other means, while leaving error
> type/value pairs to indicate concrete errors (with a two level hierarchy).
>
> * If an error / notification is a "warning" and a response is still to
> be expected, I don't know whether an alternative means could be to
> extend the RBNF of PCRep to allow for such a warning to be "embedded" in
> the response, rather than a separate PCErr message that is sent
> previously. Likewise, I would think that whether the fact of sending an
> error or a notification "cancels" or "implies that no response is to be
> expected" or, on the contrary "a response is to be expected" is either a
> property of a concrete error or notification pair, or, in a more strict
> manner, PCEP mantains the motto : "a request causes a reponse or an
> error or a notification. The response can be positive (1+ paths) or
> negative(no path). An error always implies a no response (and as far as
> i know, also the notification bound to a RP) and warnings are embedded
> in the response (as when adding an object with the I flag which means
> "warning I was not able to take this into account".
>
>
> Diffusion List object (DLO)
> ------------------------------
>
> The purpose of a DLO seems to be to indicate the a generic set of
> "destinations" for a given error / notification. I am still not sure of
> this (I may need more time to digest the use cases). At first sight, I
> would say that this is a good example of the need of generalizing the
> use of PCEid identifiers (using IPv4 / IPv6 addresses) that refer to
> PCEs regardless of their actual IP addresses (or addresses used in the
> actual TCP connection), playing a similar role to router addresses /
> node Ids/ router ids/ loopbacks, etc. A DLO could indeed be, in its
> simplest form, the list of PCEs to be notified (something similar to the
> MONITORING document). the I.-D. proposes using ERO-like objects for the
> DLO, which raises the questions of : a) mixes data a control plane
> entities and b) mapping data plane resources (e.g. an unnumbered
> interface ID) to the one (or more) PCEs responsible for the domain where
> that resource is found.
>
> Another of my main concerns is that I still fail to see the use cases
> where a given PCE could send an error including a DLO with a set of
> peers for which it has no visibility (typically, a given PCE involved in
> a PCE or domain chain does not have the information of the "history" of
> the end-to-end request. At most, one could keep the original endpoints,
> but that's all.
>
> Finally, I am not sure of the procedure regarding DLO when a deployment
> opts for "non persistant" connections, in which the TCP/PCEP connection
> is closed after every request. The reception of a DLO including a peer
> for which there is no PCEP session in the UP state should : a) trigger
> the establishment of the connection for the sole purposes of forwarding
> an error? B) trigger an error to the peer sending the ERROR with the DLO
> (this imho should not be the case to avoid the error on error)
>
> How do you plan to recognize that the remote endpoint of a TCP/PCEP
> connection is a PCC or a PCE? (this is implied by the use of TT field in
> DLO). PCEP does not easily provide a means to recognize it unless
> somehow pre-configured.
>
>
> Section 5
> ---------------
>
> * Section 5.1 In the description of Error Type 16, it is said that Error
> type 16, not critical, implies that the PCEP session needs not to be
> shutdown. I tend to think this is a property of every concrete error
> pair. Also I don't agree with the example: a Metric with a bound value
> of -1 is an encoding error, it is a "contract" a PCE may not be able to
> fulfill (especially if processing bit is set to 1) and must trigger an
> error, it should not be allowed to "expect a response". In general there
> are two different aspects: whether an TCP/PCEP connection must be
> shutdown after a given error (it is somehow related to the "severity" of
> the error: a malformed message, a different PCEP version, etc) and
> whether the error is a "warning" or not. I would agree with the
> extension of "warnings" meaning that the response "may follow". I don't
> agree with the example. However, warnings could also be conveyed by (yet
> to be defined) notifications.
>
> * Section 5.2 like mentioned before, the ability to "propagate upstream"
> an error is a good addition, but it may be a property of a given error
> (modified by policies of the PCEs in the chain)
>
>
> Typos
> ----------
> * Section 2.1.1 - maybe the word "throw" is too exception-specific, I
> would suggest "returns"
>
> * If PCE. (unfinished sentence/typo)
>
> * . and sends back / and send back (could send back)
>
> . in details ->  in detail
>
>
>
> I hope this is somehow useful, and open to discussion. Thanks for
> reading and best regards
>
> Ramon.
>
>
>