Re: [Gen-art] Gen-art LC review: draft-mm-netconf-time-capability-05

Tal Mizrahi <talmi@marvell.com> Tue, 04 August 2015 16:19 UTC

From: Tal Mizrahi <talmi@marvell.com>
To: Robert Sparks <rjsparks@nostrum.com>
Thread-Topic: Gen-art LC review: draft-mm-netconf-time-capability-05
Thread-Index: AdDIdyKlVE3ctVSLQXagUYpGMyo4/wEDs/AgAGeQZQAAKy6jQA==
Date: Tue, 04 Aug 2015 16:19:27 +0000
Message-ID: <03c295837c984138bb30bd9aacf21999@IL-EXCH01.marvell.com>
References: <60322a704b1e4d1cbc85f6a3b6a33b8e@IL-EXCH01.marvell.com> <55BFEDC8.6040800@nostrum.com>
In-Reply-To: <55BFEDC8.6040800@nostrum.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/R9utfFrFj-NIOI0jOnpXqAj_Jig>
Cc: General Area Review Team <gen-art@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>, "draft-mm-netconf-time-capability.all@ietf.org" <draft-mm-netconf-time-capability.all@ietf.org>
Subject: Re: [Gen-art] Gen-art LC review: draft-mm-netconf-time-capability-05
Precedence: list

Hi Robert,

Thanks for the comments.


>>A typical example of using near-future scheduling is a coordinated commit; 
>>a client needs to trigger a commit at n servers, so that the n servers perform 
>>the commit as close as possible to simultaneously. Without the time capability, 
>>the client sends a sequence of n commit messages, and thus each server 
>>performs the commit at a different time. By using the time capability, the client 
>>can send commit messages that are scheduled to take place at time Ts, which 
>>is 5 seconds in the future, causing the servers to invoke the commit as close as 
>>possible to time Ts.

>I'm interested in your response to Andy's point on this paragraph.

Okay, so here is Andy's point:

>>You should pick a different example because the NETCONF confirmed-commit
>>procedure is designed to be loose-coupled.  The default timeout is 10 minutes.
>>Since the client needs sessions open with all servers involved in the network-wide
>>commit, there is no advantage in staging the <commit> operations 15 sec. in advance,
>>to make sure the servers are reachable.

And here is our response from 02-Aug-2015:

>Right, confirmed-commit is loose-coupled. But the example quoted above (Example 
>1 in the draft) is not intended to replace the confirmed commit. The purpose in this 
>example is different: the client wants the commit RPCs to be executed at the same 
>time in all servers.
>The confirmed-commit serves a different purpose, which is to make sure that everyone 
>either commits or rolls back. BTW, a confirmed commit can be sent with the scheduled-time 
>element, allowing to enjoy the best of both worlds.


Please let us know if you have further concerns about this point.


>>The default value of sched-max-future is defined to be 15 seconds. This duration 
>>is long enough to allow the scheduled RPC to be sent by the client, potentially to 
>>multiple servers, and in some cases to send a cancellation message, as described 
>>in Section ‎3.2. On the other hand, the 15 second duration yields a very low probability 
>>of a reboot or a permission change.

>I'm not finding the explanation terribly persuasive, but it's at least 
>_some_ explanation - thanks for that.  I'll leave it to the ADs and 
>other reviewers in the field to see if it's sufficient for an 
>experimental protocol.

(*) Please see comment (**) below.

>>Note that we did not define a maximal value for sched-max-future, since one 
>>of the goals was to define a generic tool that can be used for various different 
>>environments. The draft clearly states the intention of using near-future-scheduling, 
>>but the requirements and constraints of different environments may require the 
>>sched-max-future to have a different value, potentially higher than 30 seconds. Hence, 
>>we prefer not to define a maximal value. Indeed, in the draft 06 there is a more detailed 
>>discussion about the issues we are trying to prevent by using near-future scheduling (Section 3.6).

>Without a maximal value, I think you need more of a discussion guiding 
>the choice of sched-max-future. Otherwise, you are just waiving your 
>hands at not addressing the problems with far-future scheduling, and 
>potentially well-meaning but uninformed people are going to go step in 
>them anyway. There was a point to choosing the near-future limit. 
>Enforce it or explain it with more vigor please.

(**) Your point is well taken. What we suggest, regarding this point and the previous point (*), is that we add more text explaining the factors that affect sched-max-future to Section 3.6 .

Here is the new text we suggest. Please let us know if this addresses your comment:


The challenge in far future scheduling is that during the long period between the time at which the RPC is sent and the time at which it is scheduled to be executed the following erroneous events may occur:
- The server may restart.
- The client's authorization level may be changed.
- The client may restart and send a conflicting RPC.
- A different client may send a conflicting RPC.

In these cases if the server performs the scheduled operation it may perform an action that is inconsistent with the current network policy, or inconsistent with the currently active clients.

Near future scheduling guarantees that external events such as the examples above have a low probability of occurring during the sched-max-future period, and even when they do, the period of inconsistency is limited to sched-max-future, which is a short period of time.

Hence, sched-max-future should be configured to a value that is high enough to allow the client to:
1. Send the scheduled RPC, potentially to multiple servers.
2. Receive notifications or rpc-error messages from the server(s), or wait for a timeout and decide that if no response has arrive then something is wrong.
3. If necessary, send a cancellation message, potentially to multiple servers.

On the other hand, sched-max-future should be configured to a value that is low enough to allow a low probability of the erroneous events above, typically on the order of a few seconds. Note that even if sched-max-future is configured to a low value, it is still possible (with a low probability) that an erroneous event will occur. However, this short potentially hazardous period is not significantly worse than in conventional (unscheduled) RPCs, as even a conventional RPC may in some cases be executed a few seconds after it was sent by the client.

The default value of sched-max-future is defined to be 15 seconds. This duration is long enough to allow the scheduled RPC to be sent by the client, potentially to multiple servers, and in some cases to send a cancellation message, as described in Section ‎3.2. On the other hand, the 15 second duration yields a very low probability of a reboot or a permission change.


>>This YANG module defines the <cancel-schedule> RPC. This RPC may 
>>be considered sensitive or vulnerable in some network environments. 
>>Since the value of the <schedule-id> is known to all the clients that are 
>>subscribed to notifications from the server, the <cancel-schedule> RPC 
>>may be used maliciously to attack servers by canceling their pending RPCs. 
>>This attack is addressed in two layers: (i) security at the transport layer, 
>>limiting the attack only to clients that have successfully initiated a secure 
>>session with the server, and (ii) the authorization level required to cancel 
>>an RPC should be the same as the level required to schedule it.

>To help me along, point me to the specifics of what you use to set and 
>verify such an authorization level?

Indeed, there is a need for an authorization scheme, which is able to set and verify the authorization level.
NETCONF (RFC 6241) does not explicitly define an authorization scheme, and it is probably not within the scope of the current draft to define such a scheme either.
Quoting RFC 6241:

   This document does not specify an authorization scheme, as such a
   scheme will likely be tied to a meta-data model or a data model.
   Implementors SHOULD provide a comprehensive authorization scheme with
   NETCONF.
   ...
   Different environments may well allow different rights prior to and
   then after authentication.  Thus, an authorization model is not
   specified in this document.  When an operation is not properly
   authorized, a simple "access denied" is sufficient. 



Please let us know if you have further comments or concerns about any of the issues above.

Thanks,
Tal.

[Gen-art] Gen-art LC review: draft-mm-netconf-tim… Robert Sparks
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Robert Sparks
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Robert Sparks
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Andy Bierman
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
Re: [Gen-art] Gen-art LC review: draft-mm-netconf… Tal Mizrahi
[Gen-art] Gen-art Telechat review: draft-mm-netco… Robert Sparks
Re: [Gen-art] Gen-art Telechat review: draft-mm-n… Tal Mizrahi
Re: [Gen-art] Gen-art Telechat review: draft-mm-n… Robert Sparks
Re: [Gen-art] Gen-art Telechat review: draft-mm-n… Jari Arkko