Re: [Netconf] LC on subscribed-notifications-10

"Eric Voit (evoit)" <evoit@cisco.com> Tue, 24 April 2018 20:53 UTC

Return-Path: <evoit@cisco.com>
X-Original-To: netconf@ietfa.amsl.com
Delivered-To: netconf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9F1C212D945 for <netconf@ietfa.amsl.com>; Tue, 24 Apr 2018 13:53:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.51
X-Spam-Level:
X-Spam-Status: No, score=-9.51 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, GB_SUMOF=5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bXiPxe9gu6ix for <netconf@ietfa.amsl.com>; Tue, 24 Apr 2018 13:53:41 -0700 (PDT)
Received: from alln-iport-1.cisco.com (alln-iport-1.cisco.com [173.37.142.88]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B9D4B12D87C for <netconf@ietf.org>; Tue, 24 Apr 2018 13:53:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=115670; q=dns/txt; s=iport; t=1524603220; x=1525812820; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=DrzKYrJuPZKesBMuP7VFk29cCVtlWI9D6nA1O7qaCuk=; b=IgZ4SKrML+Vyhi0xxb1E4J6wGCb0VD0VT8yGev5UhAYPz4Bp8v8udTst hHzBzewHZkFo3BLgBsq94xzsJdQdU85/eTYdSddz0wwrac9EFl4buOzvB GZNiLscwzZC0Yue1JXJraXotDelHPics6uFfJtU9T8c3JY6ctzlwVtwqh A=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0BjAQA+mN9a/4sNJK1RAwcZAQEBAQE?= =?us-ascii?q?BAQEBAQEBBwEBAQEBgk1LK2EXYzKDYIgCjH6BdHUakwMUgWQLhGwCGoJeITQ?= =?us-ascii?q?YAQIBAQEBAQECbCiFIgEBAQECARoBCAQGSgcLAgEGAg4HEBMBCQICAjAlAgQ?= =?us-ascii?q?BGhECBIQMXAiLF5tBgWkzH4gjgjmHfQ+BVD+BD4IMSjWEPwkTBSEqgkiCVAK?= =?us-ascii?q?FOYF4iTMKhw8IAogsgmSDKIE8GoYgZ4N7hzqIUQIREwGBJAEcOIFScBU7gkS?= =?us-ascii?q?CHxd6AQKNGQGPLwIkAwSBAYEYAQE?=
X-IronPort-AV: E=Sophos;i="5.49,324,1520899200"; d="scan'208,217";a="104416942"
Received: from alln-core-6.cisco.com ([173.36.13.139]) by alln-iport-1.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Apr 2018 20:53:39 +0000
Received: from XCH-RTP-011.cisco.com (xch-rtp-011.cisco.com [64.101.220.151]) by alln-core-6.cisco.com (8.14.5/8.14.5) with ESMTP id w3OKrckN014601 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Tue, 24 Apr 2018 20:53:39 GMT
Received: from xch-rtp-013.cisco.com (64.101.220.153) by XCH-RTP-011.cisco.com (64.101.220.151) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Tue, 24 Apr 2018 16:53:37 -0400
Received: from xch-rtp-013.cisco.com ([64.101.220.153]) by XCH-RTP-013.cisco.com ([64.101.220.153]) with mapi id 15.00.1320.000; Tue, 24 Apr 2018 16:53:37 -0400
From: "Eric Voit (evoit)" <evoit@cisco.com>
To: Kent Watsen <kwatsen@juniper.net>, Alexander Clemm <ludwig@clemm.org>, "netconf@ietf.org" <netconf@ietf.org>
Thread-Topic: [Netconf] LC on subscribed-notifications-10
Thread-Index: AQHTvAAnP4UPxNeFY0CSJ8tCCoPN1aPROUcQgATP1QCAHwQrAIADjkNQgA1teoD//750sIAJjRkA///cqgA=
Date: Tue, 24 Apr 2018 20:53:37 +0000
Message-ID: <96615f0331cd455182901ddf3e6ece23@XCH-RTP-013.cisco.com>
References: <17B884BF-0BB8-4B7C-BFBB-0AAFBEA857F6@juniper.net> <aedeb7390d0b4faa9f2bf12c2fe45cd2@XCH-RTP-013.cisco.com> <040a01d3be9f$09700490$1c500db0$@clemm.org> <2089023D-DA09-48E9-8F37-8FE459DC4F49@juniper.net> <dfc78f2b1062498388824b1f6dd97ff6@XCH-RTP-013.cisco.com> <1EC2E732-C524-4552-A3AD-27507239F763@juniper.net> <2b788c22f7ee4af889813b805348d69a@XCH-RTP-013.cisco.com> <9E7F3A66-98B9-4528-882C-43AAD19F0AEC@juniper.net>
In-Reply-To: <9E7F3A66-98B9-4528-882C-43AAD19F0AEC@juniper.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.118.56.228]
Content-Type: multipart/alternative; boundary="_000_96615f0331cd455182901ddf3e6ece23XCHRTP013ciscocom_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/netconf/B8DgaPacicGVU_l1krBeYzlPZ7M>
Subject: Re: [Netconf] LC on subscribed-notifications-10
X-BeenThere: netconf@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Network Configuration WG mailing list <netconf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netconf>, <mailto:netconf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netconf/>
List-Post: <mailto:netconf@ietf.org>
List-Help: <mailto:netconf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netconf>, <mailto:netconf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Apr 2018 20:53:46 -0000

Hi Kent,

See <Eric4>...

From: Kent Watsen, April 23, 2018 4:51 PM


[further trimming]





>   Only two notifications?



Only two notifications indicate a change in the state of the subscription.

<KENT> okay, but then can you add somewhere that only two notifications are represented because they're the only ones indicating a change in the state of the subscription?



<Eric2> Text now says:



The two state change notifications "subscription-suspended" and "subscription-resumed" are shown.  These are under the control of a publisher. These are the only two state change notifications which indicate a change in state of a dynamic subscription.

 <Kent2> better, but s/only two state/only state/ - right?



<Eric3> I should have been more explicit and said these are the only two state change notifications which signal a new state for an ACTIVE dynamic subscription.  There are other state change notifications used, but they don’t indicate a state change when ACTIVE.



In any case, ‘two’ isn’t in the draft text itself, and the key question is hit within the point below.



<Kent3> true, my goal below is to remove these notifications from the diagram altogether…  (see below)



<Eric 4>   Yes, they are removed from the diagram.  (Full answer below)





>   Looking at the graphic, how is the reader to

>   distinguish these as notifications?



Added a * to the two notifications, and text at the bottom of the drawing which says:



* indicates a state-change-notification



<KENT> better, but somehow not satisfying…  Mentally removing these two notifications from the diagram entirely, I notice that there is no other arrow going from ACTIVE to SUSPENDED; it seems like you might need one, perhaps labeled something like "<internal state event>"?  Assuming this is done, could we then remove listing these notifications from the diagram?



<Eric2> My reading of your comment is that you don’t like the identification of the “suspend subscription” transition cause via the “subscription-suspended*” notification.   To clarify, I have removed all state change notifications from the diagram, and described them in the text below...


                        .........
                        : start :
                        :.......:
                            |
                   establish-subscription
                            |
                            |   .------modify-subscription-------.
                            v   v                                |
                      .-----------.                        .-----------.
           .--------. | receiver  |--suspend-subscription->| receiver  |
       modify-       '|  ACTIVE   |                        | SUSPENDED |
       subscription   |           |<--resume-subscription--|           |
           ---------->'-----------'                        '-----------'
                            |                                    |
                 delete/kill-subscription                   delete/kill-
                            |                               subscription
                            v                                    |
                        .........                                |
                        :  end  :<-------------------------------'
                        :.......:

          Figure 1: Publisher's state for a dynamic subscription

Of interest in this state machine are the following:
...(snip)...

   o  A publisher may choose to suspend a subscription, this is notified to a subscriber with a "subscription-suspended" state change notification.

   o  A resume subscription state change is notified to a subscriber "subscription-resumed". There are no direct external controls over resuming a subscription other than for a subscriber to attempt the modification of a subscription in a way which reduces the resources consumed.

<Kent2> er, you said that the all state change notifications were removed from the diagram, but the diagram still looks the same to me, listing "suspend-subscription" and "resume-subscription".  Also, which is it, "resume-subscription" (in the diagram) or " subscription-resumed" in the text above?

<Eric3>  I read your point <KENT> as you wanted to show the cause of the state transition (e.g., suspend subscription) rather than the resulting notification (e.g., “subscription-suspended”).

<Kent3> correct

So what I was trying to do with the top part of the picture was to remove the names of the state change notifications, and just show the event which drives the subscription from one state to the other.

<Kent3> I'm still confused, are there typos in your diagram or not - e.g., suspend-subscription should be subscription-suspended, right?  Maybe I'm not being clear enough, what's bothering me it that all the other transitions are via RPCs so, for these two transitions, it seems that the diagram should either identify an RPC or put something like "implicit" or "internal event" here - makes sense?

Otherwise the document would be confusing/overly-verbose if it tried to show both on a link (e.g., suspend subscription “subscription-suspended”).
I can go back to the ‘*’ version of the diagram if you prefer.  Do you have a different suggestion?



<Kent3> I agree that it would be cluttered.  Of course, my hope is to remove the notifications from the diagram altogether, I just want the diagram to identify what caused the transition to occur…



 <Eric4>  Ahhh.  I got it now.  The two reasons are:

·       Insufficient resources (e.g., CPU)

·       Unsupportable volume (i.e., a bandwidth constraint)



I adjusted the diagram to:
                      .........
                      : start :
                      :.......:
                          |
                 establish-subscription
                          |
                          |   .------modify-subscription-------.
                          v   v                                |
                    .-----------.                        .-----------.
         .--------. | receiver  |-insufficient CPU, b/w->| receiver  |
     modify-       '|  ACTIVE   |                        | SUSPENDED |
     subscription   |           |<---CPU, b/w sufficient-|           |
         ---------->'-----------'                        '-----------'
                          |                                    |
               delete/kill-subscription                   delete/kill-
                          |                               subscription
                          v                                    |
                      .........                                |
                      :  end  :<-------------------------------'
                      :.......:



With the supporting bullet items under the diagram:



·       A publisher may choose to suspend a subscription when there is insufficient CPU or bandwidth available to service the subscription. This is notified to a subscriber with a "subscription-suspended" state change notification.



·       A suspended subscription may be modified by the subscriber (for example in an attempt to use fewer resources).  Successful modification returns the subscription to an active state.



·       Even without a "modify-subscription" request, a publisher may return a subscription to the active state should the resource constraints clear.  This is announced to the subscriber via the "subscription-resumed" subscription state change notification.





>   Are your tree diagrams dynamically-generated?  - is there any concern

>   that they are out-of-date?



Generated from Pyang.   Manually snipped from the output.  Concerns are discussed more below.   Next drafts I am certainly changing my integration environment.

<KENT> the question more regards if they've been generated (via pyang or whatever) recently…



<Eric2> With the tool Martin pointed me to for automatically generating to a fixed column width, life is much easier now.
<Kent2> so now the tree-diagrams are automatically generated?



<eric3> Every tree was automatically generated just before posting.  The only hand editing was to remove extra “|” connectors between discrete Notifications & RPCs.



<Kent3> right, I sometimes find myself needing to make similar edits as well, though I do so via scripts with lots of `sed` and `grep` commands…







<snip/>


<Eric3> Per above, RESTCONF-notif is my first thing for tomorrow.   I am now feeling more comfortable I won’t be churning contexts there as the rest of the documents’ frameworks settle.

<Kent3> okay, any ETA on when an update might be posted?

<Eric4> Goal is about a week from now.







When any such configured subscription receivers become ACTIVE, buffered event records (if any) will be sent immediately after the “subscription-started” notification.  The first event sent will be the most recent following the latest of four different times: the "replay-log-creation-time", "replay-log-aged-time", "replay-start-time", or the most recent publisher boot time.

<KENT> I don't understand the 2nd sentence here



<Eric2>   Rewrote to: “The leading event record sent will be the first event record subsequent to the latest of four different times: the "replay-log-creation-time", "replay-log-aged-time", "replay-start-time", or the most recent publisher boot time.”
<Kent2> Still confusing.   I think that you may need to expand this into a paragraph where you more carefully discuss the four times and how they're used here, or perhaps use an example to explain it…

<Eric3>  Added the descriptive paragraph requested in the middle of the three paragraphs below...

It is possible to place a start time on a configured subscription.  This enables streaming of logged information immediately after restart.

Replay of events records created since restart can be quite useful.  This allows event records generated before transport connectivity was supportable by a publisher to be passed to a receiver.  In addition, event records logged before restart are not sent.  This avoids the potential for accidental event record duplication.  Such duplication might otherwise be likely as a configured subscription’s identifier before and after the reboot is the same, and there may be not be evidence to a receiver that a restart has occurred.  By establishing restart as the earliest potential time for event records to be included in notification messages, a well-understood timeframe for replay is defined.

Therefore, when configured replay subscription receivers first become ACTIVE, buffered event records (if any) will be sent immediately after the "subscription-started" notification.  And the leading event record sent will be the first event record subsequent to the latest of four different times: the "replay-log-creation-time", "replay-log-aged-time", "replay-start-time", or the most recent publisher boot time.

<Kent3> Hmmm, I'm having a negative reaction to the "event records logged before restart are not sent" bit.  I know what you are trying to do, but I worry that this behavior might drop important logs, perhaps to the advantage of an adversary.  Note that some devices implement an <edit-config> with a restart.  Maybe the solution should require publishers to maintain a per configured-subscription awareness of (roughly) which log was sent last?   - and notify the receiver when a restart has occurred, or when the replaying of events occurs, so that they can be aware that there might be some duplicates?

<Eric4>  The current solution guarantees no duplicates, and also informs the receiver of each new “start-time”.  This allows the receiver to attempt to reconstruct any gaps from the last event previously pushed, should the choose to attempt such reconstruction.   As a dynamic subscription has no such boundary constraints on replay and boot time, all a subsequent dynamic subscription needs to do is to request the events between the last received event previously received from that configured subscription and the new replay-start-time.

Note that this solution acts identically for loss of events when the platform *doesn’t* reboot, and events are just lost due to some overflow.  See the Section 2.5.2 text:
   “However if events are lost (rather than just delayed) due to replay buffer overflow, a new "subscription-started" must be sent.  This new "subscription-started" indicates an event record discontinuity.”
I.e., this way the receiver doesn’t have to do forensics to determine and attempt to determine the cause of a transient loss of events on a publisher.

In any case, tracking the last event sent to each receiver will be a pretty hard requirement to meet during a publisher crash.  Things are simpler to just let the receiver attempt a reconstruction should they need to.



<Kent3> Going back to my original comment, the new paragraph helps, it certainly caught my attention regarding reboots wiping out the replay log buffer.



<Eric4>  There is no requirement that the reboot wipe outs out the buffer (the solution is agnostic to that).   The only requirement is that a configured subscription replay start no earlier than the last reboot time.



But this wasn't quite what I was looking for either.   What I was hoping for was something akin to references (e.g., <xref>), or perhaps just starting that the "replay-log-creation-time" and "replay-log-aged-time" are identified as being via the read-only "streams" data tree, and "replay-start-time" is configured in the "subscription" tree.  It's clear what you mean by the publisher boot time, though I have to admit that, before, I didn't understand its envisioned impact.



<Eric4> Added a subsequent sentence including <xref> which says:
“The "replay-log-creation-time" and "replay-log-aged-time" are discussed in Section 2.4.2.1, and "replay-start-time" in Section 2.7.1.”











All other replay functionality remains the same as with dynamic subscriptions as described in Section 2.4.2.1

<KENT> I'm not sure I like having to look at 2.4.2.1 and trying to figure out what this means.  Can you make this more explicit or, since 5.6 is pretty small, copy the parts into this section?



<Eric2> I initially had all the text in 2.4.2.1.  But this hid the fact that you can do replay on a configured subscription.  So your comment above lead to this section being introduced.  Which is a good thing.   But as 2.4.2.1 is not very small, to me it feels like repeating all that text here might be overkill.



<Kent2> hmmm, maybe factor the relevant text in 2.4.2.1 to yet another section and have both refer to it instead?



<Eric3> If we move the relevant Replay to a new Section just before 2.6, then we create harder difficulties.   For example, there are establish-subscription errors specific to replay in Section 2.4 -- we shouldn’t talk about these errors before replay is introduced.   And if we tried to fix this by replicating a different table for establish-subscription errors related to replay, we would confuse to readers.  I believe the current partitioning is a reasonable approach.



<Kent3> still seems clunky, but it will do.







>   The 2nd paragraph would make more sense if I was looking at a tree

>   diagram.  But then I realize that this would be the same tree-diagram

>   that should've been presented in Configured Subscriptions.



The tree is in the subscriptions container section just below.  I will gladly reference it wherever it ends up.

<KENT> you already need to be referring to it regardless.  As for where it is, see my previous comment on this topic



<Eric2> References to Figure 20 has been made.   If the tree must be moved up, it can be.   I think it fits better where it is.

<Kent2> reference by figure-number is okay, but you might want to also again which section the figure appears in.





<Eric3>  Now says:


   A tree diagram describing these parameters is shown in Figure 20
   within Section 3.3.



 <Kent3> perfect.











<KENT> better, though I'm unsure the "none" nodes need to be listed.



 <Eric2> The template text  “These are the subtrees and data nodes and their sensitivity/vulnerability” appears to make the list of all node mandatory.  As this was not your intent, I pulled the “none” out.



<Kent2> not my template, so it's not so much a question of my intent, per se.   But what I do is to call out the nodes with special considerations (i.e., having an NACM extension statement) and state that everything else isn't noteworthy.



<Eric3>  Ok,  I tweaked the template to say:

These are the subtrees and data nodes where there is a specific sensitivity/vulnerability:



<Kent3> okay





[snip]





>   Re: the 6th paragraph, I'm surprised that requirements for transport-

>   bindings wasn't discussed before in its own section.  It seems like

>   a new thing here, that a receiver's transport might not be secure.

>   I'm okay with and support this, btw, as its sometimes better to

>   offload devices thru the use of a local collector node, for which

>   encryption may not be needed...



Agree with your comments.

<KENT> but where's the change?  Shouldn't this have been discussed

previously in the draft somewhere?



<Eric2> The vast majority of transport binding discussions are addressed in the transport document.  So I see this as guidance to a documenter of a transport document.  Perhaps that is unnecessary for this document, and the paragraph should be removed.  I would be fine with that.



<Kent2> wait, I don't think you can offload transport-requirements to the transport-binding documents.   I think that this document needs to define the requirements and the transport-binding documents then show how they adhere to them.   Does this make sense?



 <Eric3>With the varied transports of NETCONF, HTTP/RESTCONF, UDP, CoAP already in drafts my belief is that only a high level subset of transport requirements spanning the universe of potential transports can potentially be abstracted in this document.  The secure transport requirement is one such example, and that is a recommendation.  The Security Considerations section is a good place for that one.  Beyond the security recommendation there aren’t too many transport independent possibilities.   I did just added one new transport requirement to the very end of “Event Streams” section though (which perhaps wasn’t explicit enough elsewhere).  This requirement is:



“Event records MUST NOT be delivered to a receiver in a different order than they were placed onto an event stream.”



What other transport-independent transport requirements might there be which are not already documented?



Stepping back, I see the transport draft plus this drafts providing the aggregate set of requirements for a full solution.  And I had thought it would be up to the draft authors plus WGs to validate that the sum of the documents is sufficient.





<Kent3> unsure.  For example, RFC 6241has Section 2 (Transport Protocol Requirements) that the SSH and TLS binding drafts refer to.  It seems that this draft should have a similar section that highlights what MUST or MUST NOT be supported.  It could even include some additional text indicating that bindings MAY introduce additional requirements.



<Eric4> I re-read RFC6241 Section 2 a couple times.  There are a comparisons can be made from that document to a subset of requirements currently in this document’s security section.  But I don’t see anything missing on the MUST and MUST NOT side of things.   FYI: the specific requirements I am thinking of are:



   For both configured and dynamic subscriptions the publisher MUST

   authenticate and authorize a receiver via some transport level

   mechanism before sending any updates.



   A secure transport is highly recommended and the publisher MUST

   ensure that the receiver has sufficient authorization to perform the

   function they are requesting against the specific subset of content

   involved.



   With configured subscriptions, one or more publishers could be used

   to overwhelm a receiver.  Notification messages SHOULD NOT be sent to

   any receiver which does not support this specification.  Receivers

   that do not want notification messages need only terminate or refuse

   any transport sessions from the publisher.



That is about it for common stuff.  Considering the wide variety of potential transports, and ubiquity for the need of stream transports, I am simply not aware of any more common requirements.  If you need me to,  I can extract these three requirements, and put this under a separate transport requirements section.   But this seems excessive, especially as we have transport specific documents with eyes on them from the WG.  But if really do want this, I will place these into a new, separate section; and I will add your text: “bindings MAY introduce additional requirements.”



Eric









Kent