Re: [netconf] Magnus Westerlund's Discuss on draft-ietf-netconf-subscribed-notifications-25: (with DISCUSS and COMMENT)

"Eric Voit (evoit)" <evoit@cisco.com> Mon, 06 May 2019 21:19 UTC

From: "Eric Voit (evoit)" <evoit@cisco.com>
To: Magnus Westerlund <magnus.westerlund@ericsson.com>, The IESG <iesg@ietf.org>
CC: "draft-ietf-netconf-subscribed-notifications@ietf.org" <draft-ietf-netconf-subscribed-notifications@ietf.org>, Kent Watsen <kent+ietf@watsen.net>, "netconf-chairs@ietf.org" <netconf-chairs@ietf.org>, "netconf@ietf.org" <netconf@ietf.org>
Thread-Topic: Magnus Westerlund's Discuss on draft-ietf-netconf-subscribed-notifications-25: (with DISCUSS and COMMENT)
Thread-Index: AQHVAOILPUp24ohD6UOwYws4qyBRVKZec0sQ
Date: Mon, 06 May 2019 21:19:10 +0000
Message-ID: <7efee2607c62458d865ce663f9d4fc27@XCH-RTP-013.cisco.com>
References: <155679988653.24963.16586553566609600202.idtracker@ietfa.amsl.com>
In-Reply-To: <155679988653.24963.16586553566609600202.idtracker@ietfa.amsl.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/netconf/upvzS289Ea4aGXQ5UtqsGyldyrA>
Subject: Re: [netconf] Magnus Westerlund's Discuss on draft-ietf-netconf-subscribed-notifications-25: (with DISCUSS and COMMENT)
Precedence: list

Hi Magnus,

Thanks very much for your comments.   Thoughts in-line...

> From: Magnus Westerlund, May 2, 2019 8:25 AM
> 
> Magnus Westerlund has entered the following ballot position for
> draft-ietf-netconf-subscribed-notifications-25: Discuss
> 
> When responding, please keep the subject line intact and reply to all email
> addresses included in the To and CC lines. (Feel free to cut this introductory
> paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-netconf-subscribed-notifications/
> 
> 
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> My focus when reviewing this document was from a perspective of how to
> handle overload. 

You are absolutely correct to focus on overload.  Mitigating different dimensions of the overload risk has been a design goal since this effort’s inception.  And this is the reason a variety of QoS mechanisms were included in the document.

> I have a hard time understanding how this will actually work,
> especially in a fashion that preservers goodput and ensure what is considered
> the most important subscriptions are delivered. Not having good undertanding
> into netconf and restconf don't hesitate to call out likely missunderstanding by
> me and provide clarification and pointers.

Few of the mechanisms are specific to either RESTCONF or NETCONF.  For completeness reasons, let me summarize the overload mechanisms available...

1. The publisher is allowed to decline a dynamic subscription.  One of these reasons is that the incremental traffic generated by a particular incremental subscription will be too high.  There are errors back to the subscriber indicating this condition exist.
2. A publisher can suspend any subscription for capacity reasons, and a receiver must be able to gracefully accept this suspension. 
3. Much like with HTTP2 streams, higher priority subscriptions intended for a particular receiver can be dequeued first from a publisher. 
4. Much like with HTTP2 streams, proportional subscription dequeuing intended for a particular receiver can be performed a publisher.
5. DSCP markings can be placed on subscribed data.
6. Mechanisms for detecting and reacting to different types of subscribed data loss have been embedded.
7. Methods have been included to ensure a configured receiver “ok’s” information push before subscribed data is sent. (This should reduce one DDoS vector.) 
8. Keep alive mechanisms are established for different transport types, so that subscribed data isn’t being sent when the is no receiver listening.

Mechanisms (3) & (4) will likely be seen only with HTTP2 based transports.*   This is because (as documented within draft-ietf-netconf-restconf-notif section 4), these capabilities are to integrated directly HTTP2 RFC-7450 sections 5.3.1 & 5.3.2.   

* One background/review note...  Earlier versions of draft-ietf-netconf-subscribed-notifications included specific parallels to RFC-7450 when describing (3) & (4).  However WG members wanted to abstract away what they felt were HTTP2 specific references. This is despite the fact that what was being referenced was the desired functional behavior rather than anything HTTP2 specific.   I can understand the WG reviewers' concern.  This is already a long document, and a reader who only cares about NETCONF hopefully won't need to wade through complex issues which they are unlikely to worry about in deployment.

> A) The QoS and priority sending mechanism discussed in 2.3 and furhter defined
> by the YANG model.
> 
> I do want to raise the usage of the DSCP code point to a discuss. As the DSCP
> defines different PHB and relative priorities in the router queues a DSCP value
> does not provide the publisher any information about priority. I get the feeling
> from the text that this may be intended. If the only intention is to have the
> transport perform differential treatment in the network between the publisher
> and the receiver 

Yes, this is the case.

> there are still more considerations are needed.
> First of all I think these sentence needs a total rewrite:
> 
>    If the publisher supports the "dscp" feature, then a subscription
>    with a "dscp" leaf MUST result in a corresponding [RFC2474] DSCP
>    marking being placed within the IP header of any resulting
>    notification messages and subscription state change notifications.
>    Where TCP is used, a publisher which supports the "dscp" feature
>    SHOULD ensure that a subscription's notification messages are
>    returned within a single TCP transport session where all traffic
>    shares the subscription's "dscp" leaf value.
> 
> I think one need to put a requriement on the transport to use a transport that
> utilize the DSCP marking on its traffic. 

I believe you are asking for a  the publisher respect the DSCP markings for traffic egressing an interface on a publisher.  Yes this requirement was assumed, and can be explicitly added.

> Which for the current set of connection
> oriented transport protocols, TCP, SCTP, and QUIC all currently only support
> using a single DSCP per connection. Implying multiple transport protocol
> connections using a particular transport to enable this mapping.

Yes fully agree, a connection would be needed per DSCP.     Is your objection with the text above the words "SHOULD ensure" rather than "MUST ensure"?    If yes, I can ask the WG objects to whether this requirement should be a MUST.

> A.2 Queuing model of a publisher.
> With the DSCP and the Weight and dependency model I think it is important to
> clarify the model of the queueing in the publisher. So is the intention that several
> subscriptions with different weights and possibly dependencies have their
> individual queues that goes into a scheduler?

As you know, queuing models are non-trivial.   For that reason we wanted to 100% adopt the functional behavior RFC-7540 Section 5.3.1 and 5.3.2 without having to re-document/mirror that content at a higher level of the network stack.   We are a hoping that a reader of network subscriptions can look to well documented, implemented HTTP2 behaviors as applicable.  Providing an intermediate layer of functional description could easily result in some mis-alignments from what is intended.

> To avoid complex queue
> interactions on this level I think there need to be seperate scheduler instances
> per DSCP. I would also note that Dependency mechanism can't ensure that a
> dependent stream arrive at receiver after the identified dependency if they are
> on different DSCP. In fact if one would have HTTP/3 (over QUIC) we may not
> even guarantee it within a single connection and same DSCP due to packet loss
> impact. To me this model and what relationship one need to consider here need
> to be clarified. I think RFC 7540 Section 5.3.1 is an excellent indication of just the
> importance of considering what is in the same dependency tree and what it
> means to have different weighting. To conclude I think this needs a model
> description and clearer definition and possibly requirements towards the
> transport. Espceially if you actually need an in-order delivery requirement?

I think we have the same technical objective in mind.   And it is absolutely the desire to have identical behavior to RFC-7540 Section 5.3.1 and 5.3.2.    For the current set of documents before the IESG, we include within draft-ietf-netconf-restconf-notif Section 4, a 1:1 mapping between draft-ietf-netconf-subscribed-notifications and RFC-7540.    

We are hoping that the transport documents like draft-ietf-netconf-restconf-notif can be the place where such supporting documentation and mappings occur.  

> B. The unpredictability of the circuit breaker overload mechanism.
> 
> My description of the overload handling in this document is that it is a circuit
> breaker based mechanism that can blow a fuse on subscriptions that it fails to
> honor in overload situations. What worries me deply is the total unpredictability
> of this mechanism.

At the beginning of this email, there are eight methods of overflow handling listed.  We optimized these eight mechanisms for different failure scenarios and congestion conditions.   Our goal was to depend on existing mechanisms/technologies wherever possible.  Reuse of existing mechanisms should reduce at least some of the unpredictability.

> First, is it the intention to derive what subscriptions are least important from the
> DSCP, weighting and Dependency parameters? If it is, I think that may be a
> misstake as priority on what subscriptions are most important to retain are not
> necessarily reflected in their QoS parameters.

At this point the document does not attempt to define "important".  All that is done is to provide guidance relating to some elements of network transport.   If there is a dimension of "importance" which an implementer would like to layer onto this solution, it could be done.  For example, "importance" could applied in areas such as what subscriptions should be suspended  in case of CPU exhaust.   However guidance on such enhancements are not included in this document.

> Secondly, what are the values when a subscription are considered to be to heavy
> or not be handled accordingly. Are there any parameter sets that actually
> describe what SLA the subscriber expect that can be converted into timeout
> timers or buffer depth thresholds to indicate that publisher side isn't delivering
> these in time?

There is no guidance on this provided in the document.   As equipment types, subscription volumes, available memory, will vary between solutions, this will take a while to dimension properly.  I can imagine someday that we might have something like "Erlang for subscriptions" much like we used Erlangs in the old telephony network to dimension call handling capabilities of phone switches.  But we are a long way from that. 
 
> Third, I what I understand there are no any additional back pressure mechanism
> between the receiver and the publisher than the transport protocols flow
> control? So a receiver that is not keeping up processing the data it process will
> not read out the data out of the flow controlled buffers in the receiver and thus
> prevent the publisher to write to the transport conncetion, thus causing the
> publisher to eventually trigger a suspension due to its queue build up?

There is nothing beyond transport flow control.  We thought about it initially, but we were not ready to pick up even more complexity than we already had.

> To my understanding the current mechanism places all subscriptions on the
> same importance and with the same SLA. Thus likely causing short term overload
> situations to cause subscription suspensions in random subscriptions. Is the WG
> fine with this type of randomness occuring and expecting that normally there
> will be such amount of overprovisioning that being able to indicate priority and
> SLA are overkill?

Yes.   We needed a starting point.  And there are technical solutions which can be layered on top.   But what we have now took many years to finalize, and should be a big enough technical jump considering our current knowledge.

> At a minimal a change of this sentence in Section 2.5.1 is needed:
> 
>   This could
>    be for reasons of an unexpected but sustained increase in an event
>    stream's event records, degraded CPU capacity, a more complex
>    referenced filter, or other higher priority subscriptions which have
>    usurped resources.

I have removed "higher priority".

> As it states that subscriptions has priorities without reference to a mechanism
> that provides that priority.
> 
> C. 2.4.5.  Killing a Dynamic Subscription
> 
>    The "kill-subscription" operation permits an operator to end a
>    dynamic subscription which is not associated with the transport
>    session used for the RPC.  A publisher MUST terminate any dynamic
>    subscription identified by the "id" parameter in the RPC request, if
>    such a subscription exists.
> 
> Can someone please clarify the use case for this functionality. To my
> understanding it allows another receiver given authority to kill the subscription
> being delivered to another receiver. Based on the otherwise rather strict that all
> manipulations of dynamic subscriptions happens from the transport session
> context that established it I want to better understand the use case it appears
> out place.

A network operator needs a very secure mechanism to end a dynamic subscription in progress which it sees as harmful.   The operator cannot do this via configuration operations, as the dynamic subscription is not configured (and therefore cannot be deleted in the configuration datastore).  

> D. The Requirements on a transport supporting Configured Subscriptions
> Reading through Section 2.5 I arrived at a number of questions. I tried to
> understand what the requirements are for the transport that supports
> Configured Subscriptions. I did note that neither draft-ietf-netconf-restconf-
> notif-13 nor
> draft-ietf-netconf-netconf-event-notifications-17 supports configured
> subscriptions. Thus, there appear no template for a solution either, or does
> there exist another document that is being worked on defining such a transport?

This is the case.   Originally both of those documents *did* include configured subscriptions.  However within the WG there was a decision not to progress configured subscriptions yet.  One reason is that other YANG model drafts moving in the NETCONF WG were seen as pre-requisites for securely creating transport sessions via call home for the configured subscriptions.  As a result, support for configured subscriptions was extracted from those transport documents.   It is expected that that updated versions of just those transport documents will be driven when the YANG models complete.

> Questions that arose for me related to Configured Susbription Transport where
> the following: 1. Can Transport Session be initiated in both direction. RFC
> 8071 provides a solution for Publisher to Receiver initiation. It is unclear if all
> parts are in place to have a receiver to publisher initiated transport to be bound
> to the subscription. 

This will be up to a specific transport draft to make this determination.  

To see what might be viable for NETCONF, check out the earlier version of the document at:
https://datatracker.ietf.org/doc/draft-ietf-netconf-netconf-event-notifications/10/ 

This was seen as a complete version of such a solution.  However the WG wanted a YANG model for session parameters discussed in Section 5.2.

>2. What is "name" really? It appears to be defined as an
> empty container. Despite that it appears to have requirements on being both a
> security identity as well as network address. 

You are correct.  In previous versions of this draft, a receiver was identified by the combination of address + port.  However due to the YANG doctor reviews, and call home discussions referenced above, the WG wasn't ready to finalize this YANG structure.  The compromise was the current structure, plus the example YANG model of Appendix A to show how this might be built out.

> 3. In Figure 9, which is stated to be
> for the receiver. What information does the receiver use to determine the
> transition (d)? And what does it do in this step. Related to Discuss part B). 

This determination is implementation specific.

> 4. RFC
> 8071 appears to lack any considerations for how often and how many times it
> attempts to connect to the receiver. So applying that mechanism appears to
> require some usage guidance to avoid causing overload situations or DoS
> potential by misconfiguring network devices with this soltution to dial out
> frequently to a target.
> 
> As the transport solution requirements are not detail it is actually hard to
> determine if there are short comings in the description in Section 2.5 and the
> corresponding YANG model. Is it an reasonable request to document these
> requirements?

The requirements were documented for both NETCONF and RESTCONF.   However these requirements were removed when the WG decided to wait until there was a YANG model for RFC-8071 ready to go to the IESG.   For a preview on what these requirements might look like, I refer you  to Section 5.2 of 
https://datatracker.ietf.org/doc/draft-ietf-netconf-netconf-event-notifications/10/


> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> 1. Section 2.3: DSCP domains
> I think the text could benefit from pointing out that the subscriber actually needs
> to know what the DSCP values represents in the diffserv domain of the publisher.
> As these could be different, they also create an interesting problems for
> transports of how they establish a transport connection that uses a particular
> DSCP, as the receiver if initiating need to know the local DSCP value that
> corresponds to the behavior in the publisher's domain.

This makes sense.  

I have added in Section 5.3 Transport Requirements a new entry which states:

"A subscriber which includes a "dscp" leaf within an "establish-subscription" request will need to understand and consider what the corresponding DSCP value represents within the domain of the publisher."

Let me know if this is not sufficient.

> 2. In general I think there are more than one description that are fuzzy about if it
> describes a publisher or receiver action/requirement.

It would be great if you have some specifics.   The authors and previous reviewers likely have looked at this often enough where things look obvious which perhaps are not.

Thanks again for the review,
Eric

[netconf] Magnus Westerlund's Discuss on draft-ie… Magnus Westerlund via Datatracker
Re: [netconf] Magnus Westerlund's Discuss on draf… Eric Voit (evoit)
Re: [netconf] Magnus Westerlund's Discuss on draf… Magnus Westerlund
Re: [netconf] Magnus Westerlund's Discuss on draf… Eric Voit (evoit)
Re: [netconf] Magnus Westerlund's Discuss on draf… Magnus Westerlund
Re: [netconf] Magnus Westerlund's Discuss on draf… Eric Voit (evoit)