Re: [Gen-art] Gen-art LC review: draft-mm-netconf-time-capability-05

Tal Mizrahi <talmi@marvell.com> Sun, 02 August 2015 06:59 UTC

From: Tal Mizrahi <talmi@marvell.com>
To: Andy Bierman <andy@yumaworks.com>
Thread-Topic: Gen-art LC review: draft-mm-netconf-time-capability-05
Thread-Index: AdDIdyKlVE3ctVSLQXagUYpGMyo4/wEDs/AgAAioM4AAECkPsA==
Date: Sun, 02 Aug 2015 06:59:24 +0000
Message-ID: <8dd1c8efac4949beb332a5d6b90bc680@IL-EXCH01.marvell.com>
References: <60322a704b1e4d1cbc85f6a3b6a33b8e@IL-EXCH01.marvell.com> <CABCOCHSE-T66rHw6=6RJ2h+o+8=KYT4O3tY2QWnZrR1oRYxMDA@mail.gmail.com>
In-Reply-To: <CABCOCHSE-T66rHw6=6RJ2h+o+8=KYT4O3tY2QWnZrR1oRYxMDA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative; boundary="_000_8dd1c8efac4949beb332a5d6b90bc680ILEXCH01marvellcom_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/E7gvTi95kIP9HX1EHrOt4d0N1J0>
Cc: General Area Review Team <gen-art@ietf.org>, "draft-mm-netconf-time-capability.all@ietf.org" <draft-mm-netconf-time-capability.all@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>
Subject: Re: [Gen-art] Gen-art LC review: draft-mm-netconf-time-capability-05
Precedence: list

Hi Andy,

Thanks for the prompt response.

>IMO returning the execution-time is not needed.
>How far from the requested time do you expect the server to be?
>Maybe a few milli-seconds?

The scheduled-time refers to the *start time* of the RPC, whereas the execution-time refers to the *completion time* of the RPC.
The difference between these two values is affected by: (i) How accurately the server is able to *start* the RPC compared to its scheduled time, and (ii) The elapsed time of execution of the RPC.
Yes, we expect the typical difference between the execution-time and the scheduled-time to be hundreds of microseconds to a few milliseconds, depending on the RPC type, and on the server’s processing power. However, in some cases, especially when the server is heavily utilized, this difference can be as high as hundreds of milliseconds, or even more than a second. In such cases it is important for the client to know that the RPC was executed a long time after it was scheduled to be performed.

>I don't understand the arbitrary 15 sec. limit.
>What problems magically disappear if the timeout is 14 sec vs. 16 sec.?

Instead of ‘15 seconds’, it should actually be read ‘a few seconds’.

The idea is that ‘a few seconds’ is *long enough* to allow the client to:

1.       Send the scheduled RPC, potentially to multiple servers.

2.       Receive notifications or rpc-error messages from all the servers (or wait for a timeout and decide that if no response has arrive then something is wrong).

3.       [If necessary] Send a cancellation message, potentially to multiple servers.

At the same time, ‘a few seconds’ is *short enough* to guarantee a low probability of something going wrong, for example:

-          If a server is restarted due to a fault, the procedure of the fault detection + restart will typically take more than a few seconds.

-          A change in the authorization level of the client is usually an event that is triggered by a human operation, and therefore takes more than a few seconds.

Having said that, it is still possible (with a low probability) that something *will* go wrong during those ‘few seconds’. However, this short potentially hazardous period is not significantly worse than in conventional (unscheduled) RPCs, as even a conventional RPC may in some cases be executed a few seconds after it was sent by the client.

15 seconds was selected as a default value. Please let us know if you think the value should be different.
We expect that this number may vary in different environments.

>>A typical example of using near-future scheduling is a coordinated commit;
>>a client needs to trigger a commit at n servers, so that the n servers perform
>>the commit as close as possible to simultaneously. Without the time capability,
>>the client sends a sequence of n commit messages, and thus each server
>>performs the commit at a different time. By using the time capability, the client
>>can send commit messages that are scheduled to take place at time Ts, which is
>>5 seconds in the future, causing the servers to invoke the commit as close as
>>possible to time Ts.

>You should pick a different example because the NETCONF confirmed-commit
>procedure is designed to be loose-coupled.  The default timeout is 10 minutes.
>Since the client needs sessions open with all servers involved in the network-wide
>commit, there is no advantage in staging the <commit> operations 15 sec. in advance,
>to make sure the servers are reachable.

Right, confirmed-commit is loose-coupled. But the example quoted above (Example 1 in the draft) is not intended to replace the confirmed commit. The purpose in this example is different: the client wants the commit RPCs to be executed at the same time in all servers.
The confirmed-commit serves a different purpose, which is to make sure that everyone either commits or rolls back. BTW, a confirmed commit can be sent with the scheduled-time element, allowing to enjoy the best of both worlds.

>I thought the synchronized <get> on operational state was a good use-case.

Please let us know if you have further comments.
Thanks,
Tal.

From: Andy Bierman [mailto:andy@yumaworks.com]
Sent: Sunday, August 02, 2015 4:23 AM
To: Tal Mizrahi
Cc: Robert Sparks; General Area Review Team; ietf@ietf.org; draft-mm-netconf-time-capability.all@ietf.org
Subject: Re: Gen-art LC review: draft-mm-netconf-time-capability-05

On Sat, Aug 1, 2015 at 11:32 AM, Tal Mizrahi <talmi@marvell.com<mailto:talmi@marvell.com>> wrote:
Hi Robert,

Thanks for the comments.

We have submitted an updated version of the draft, which addresses the comments we received from you and other reviewers in IETF last call.
https://tools.ietf.org/html/draft-mm-netconf-time-capability-06

Our responses to your comments can be found below.
Please let us know if you have further comments or questions.

Thanks,
Tal and Yoram.

>-----Original Message-----
>From: ietf [mailto:ietf-bounces@ietf.org<mailto:ietf-bounces@ietf.org>] On Behalf Of Robert Sparks
>Sent: Thursday, July 09, 2015 12:40 AM
>To: General Area Review Team; ietf@ietf.org<mailto:ietf@ietf.org>; draft-mm-netconf-time-
>capability.all@ietf.org<mailto:capability.all@ietf.org>
>Subject: Gen-art LC review: draft-mm-netconf-time-capability-05
>
>I am the assigned Gen-ART reviewer for this draft. For background on
>Gen- ART, please see the FAQ at
>
><http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
>
>Please resolve these comments along with any other Last Call comments
>you may receive.
>
>Document: draft-mm-netconf-time-capability-05
>Reviewer: Robert Sparks
>Review Date: 8 Jul 2015
>IETF LC End Date: 29 Jul 2015
>IESG Telechat date: not yet scheduled
>
>Summary: This draft has open issues to address before publication
>
>This draft adds two separable concepts to netconf
>* Asking for and receiving knowledge of when a command was executed
>* Requesting that a command be executed at a particular time
>
>The utility of the first is obvious, and I have no problems with the
>specification of that part of this extension. Would it be better to
>pull these apart and progress them separately?
>

We believe there is a great benefit to defining these two feature together, although each of them can be used independently. The second certainly gains from the first, since the execution-time provides feedback to the client about the actual time of execution compared to the scheduled time of execution.

IMO returning the execution-time is not needed.
How far from the requested time do you expect the server to be?
Maybe a few milli-seconds?

>The utility of the second would be more obvious if the draft didn't
>limit the time to be "near future scheduling". It punts on most of the
>hard problems with scheduling things outside a very tight range (15
>seconds in the future by default), without motivating the advantages of
>saying "wait until 5 seconds from now before you do this".
>
>So:
>
>Why was 15 seconds chosen? Could you add a motivating example that
>shows why being able to say "now is not good, but 5 seconds from now is
>better" is useful? (Something like having a series of things happen as
>close to simultaneously without the network delay of sending the
>requests impacting how they are separated perhaps?)
>

Point well taken. We have added the following example, motivating why near future scheduling (<15 seconds) can be useful:

I don't understand the arbitrary 15 sec. limit.
What problems magically disappear if the timeout is 14 sec vs. 16 sec.?

A typical example of using near-future scheduling is a coordinated commit; a client needs to trigger a commit at n servers, so that the n servers perform the commit as close as possible to simultaneously. Without the time capability, the client sends a sequence of n commit messages, and thus each server performs the commit at a different time. By using the time capability, the client can send commit messages that are scheduled to take place at time Ts, which is 5 seconds in the future, causing the servers to invoke the commit as close as possible to time Ts.

You should pick a different example because the NETCONF confirmed-commit
procedure is designed to be loose-coupled.  The default timeout is 10 minutes.
Since the client needs sessions open with all servers involved in the network-wide
commit, there is no advantage in staging the <commit> operations 15 sec. in advance,
to make sure the servers are reachable.

I thought the synchronized <get> on operational state was a good use-case.

Andy

We have also added an explanation of why 15 seconds were chosen as the default value:

The default value of sched-max-future is defined to be 15 seconds. This duration is long enough to allow the scheduled RPC to be sent by the client, potentially to multiple servers, and in some cases to send a cancellation message, as described in Section ‎3.2. On the other hand, the 15 second duration yields a very low probability of a reboot or a permission change.

>Given the punt, why isn't there a statement that sched-max-future MUST
>NOT be configured for more than some small value (twice the default, or
>30 seconds, perhaps), especially while this is targeted for
>Experimental? Without something like that, I think the document needs
>to talk about more of the issues it is trying to avoid with longer term
>scheduling, even if it doesn't solve those issues. (If I have a fast
>pipe, I can make a server keep a lot of queued requests, eating a lot
>of state, even if the window is only 15 seconds. Pointing to how
>netconf protects against state-exhaustion abuse might be useful).
>

Note that we did not define a maximal value for sched-max-future, since one of the goals was to define a generic tool that can be used for various different environments. The draft clearly states the intention of using near-future-scheduling, but the requirements and constraints of different environments may require the sched-max-future to have a different value, potentially higher than 30 seconds. Hence, we prefer not to define a maximal value. Indeed, in the draft 06 there is a more detailed discussion about the issues we are trying to prevent by using near-future scheduling (Section 3.6).

>The security considerations section talks about malicious parties
>attempting to cause sched-max-future to be configured to "a small
>value". Could you more clearly characterize  "small", given that the
>default is 15 seconds?
>

Agreed.
We rephrased this paragraph to be more clear about the "small" value:

This YANG module defines <sched-max-future> and <sched-max-past>, which are writable/creatable/deletable. These data nodes may be considered sensitive or vulnerable in some network environments. An attacker may attempt to maliciously configure these parameters to a low value, thereby causing all scheduled RPCs to be discarded. For instance, if a client expects <sched-max-future> to be 15 seconds, but in practice it is maliciously configured to 1 second, then a legitimate scheduled RPC that is scheduled to be performed 5 seconds in the future will be discarded by the server.

>Even with the near-future limit, there are issues to discuss introduced
>with the ability to cancel a request:
>
>* What prevents a 3rd party from cancelling a request? I think it's
>only that the 3rd party would have to obtain the right id to put in the
>cancel message. If so, the document should talk about how you keep
>eavesdroppers from seeing those ids, and that the servers that generate
>them should make ids that are hard to guess.
>

We understand this needs further clarification. As noted by Andy Bierman in a corresponding mail:

>>Since the scheduled rpc event is sent to every client that is
>>listening for notifications, there is no possibility for security
>>through hard-to-guess token, as is done with the "persist-id"  for cancelling a confirmed-commit.

We rephrased the paragraph to clarify these issues:

This YANG module defines the <cancel-schedule> RPC. This RPC may be considered sensitive or vulnerable in some network environments. Since the value of the <schedule-id> is known to all the clients that are subscribed to notifications from the server, the <cancel-schedule> RPC may be used maliciously to attack servers by canceling their pending RPCs. This attack is addressed in two layers: (i) security at the transport layer, limiting the attack only to clients that have successfully initiated a secure session with the server, and (ii) the authorization level required to cancel an RPC should be the same as the level required to schedule it.

>* Especially given the near-future limitation, you run a high risk that
>the cancel arrives after the identified request has been executed. It's
>not clear in the current text what the server should do. I assume you
>want the server to reply to the cancel with a "I couldn't cancel that"
>rather than to do something like try to undo the request. The document
>should be explicit.
>
>* The document should explicitly disallow adding <scheduled-time> to
><cancel-schedule>
>

Agreed.
We have addressed these two comments by adding the following paragraph:

A cancel-schedule message MUST NOT include the scheduled-time parameter. A server that receives a cancel-schedule should try to cancel the schedule as soon as possible. If the server is unable to cancel the scheduled RPC, for example because it has already been executed, it should respond with an rpc-error [RFC6241], in which the error-type is 'protocol', and the error-tag is 'operation-failed'.

>One editorial comment: It would help to move the concept of the
>near-future limitation much earlier in the document, perhaps even into
>the introduction and abstract.
>

Agreed.
We added the following to the introduction:

The NETCONF time capability is intended for scheduling RPCs that should be performed in the near future, allowing to coordinate simultaneous configuration changes, or to specify an order of configuration updates. Time-of-day-based policies and far-future scheduling, e.g., [Cond], are outside the scope of this memo.

[Cond]                          Watsen, K., "Conditional Enablement of Configuration Nodes", draft-kwatsen-conditional-enablement-00 (expired), 2013.

>And for the shepherding AD: The document has no shepherd or shepherd
>writeup. While a writeup is not required, one would have been useful in
>this case to discuss the history of (lack of) discussion of the
>document on the group's list and the group's reaction to progressing as
>Experimental as an Individual Submission.

[Gen-art] Gen-art LC review: draft-mm-netconf-tim… Robert Sparks
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Robert Sparks
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Robert Sparks
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
[Gen-art] Gen-art Telechat review: draft-mm-netco… Robert Sparks
Re: [Gen-art] Gen-art Telechat review: draft-mm-n… Tal Mizrahi
Re: [Gen-art] Gen-art Telechat review: draft-mm-n… Robert Sparks
Re: [Gen-art] Gen-art Telechat review: draft-mm-n… Jari Arkko