[Netconf] Updated draft-mm-netconf-time-capability-04

Tal Mizrahi <talmi@marvell.com> Wed, 15 April 2015 17:55 UTC

From: Tal Mizrahi <talmi@marvell.com>
To: "netconf@ietf.org" <netconf@ietf.org>
Thread-Topic: Updated draft-mm-netconf-time-capability-04
Thread-Index: AdB3oueMF4dHIRkvQ2WManeoD+WZFg==
Date: Wed, 15 Apr 2015 17:55:13 +0000
Message-ID: <dff01581726e4f57b072a6957e81f34e@IL-EXCH01.marvell.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative; boundary="_000_dff01581726e4f57b072a6957e81f34eILEXCH01marvellcom_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/netconf/098oSx0nLNn6jpx4iMjmbe5yzzw>
Cc: Tal Mizrahi <dew@campus.technion.ac.il>, Yoram Moses <moses@ee.technion.ac.il>
Subject: [Netconf] Updated draft-mm-netconf-time-capability-04
Precedence: list

Hi,

Feedbacks will be welcome.
Even short feedbacks such as "I think this draft is useful" will be welcome.

https://tools.ietf.org/html/draft-mm-netconf-time-capability-04

We want to thank all the people who reviewed previous versions of the draft and sent comments. We would appreciate if you could go over the comment list below and make sure we have addressed your comments.


A short overview
==============
This draft defines a time capability in NETCONF; an RPC can include an element that defines its scheduled time of execution.
This allows a few interesting use cases:
- Network-wide commit (see https://www.ietf.org/proceedings/92/slides/slides-92-netconf-13.pdf).
- Time-based network update (see http://www.ietf.org/proceedings/87/slides/slides-87-netconf-1.pdf).

As a side note, the ability to perform time-triggered network updates has been recently added to the OpenFlow protocol v1.5.
NETCONF is not OpenFlow. Yet, the ability to perform time-triggered operations seems to be a basic and important tool for a network configuration protocol.


Changes compared to draft 03
=======================
The most notable changes compared to version 03 are:
- The notification message now uses a unique <schedule-id>, rather than using the message-id of the corresponding RPC.
- The security considerations section was extended to include the content of http://www.ops.ietf.org/netconf/yang-security-considerations.txt.


===================================
Feedback from previous versions of the draft
===================================
We received a lot of feedback from the NETCONF WG about the previous drafts.
Below are some of the main questions and comments, and for each one a description of how we addressed it.
Please let us know if we missed something, or if you believe we did not address some of the issues below.


Comments from Andy:
==================
>This solution seems to send 1 <rpc-reply>
>when the operation is finally executed.  How does the client know the operation
>was scheduled successfully?  IMO, a better solution would be an immediate
><rpc-reply> (scheduled OK) and the execution results sent in a <notification>.

(1) Agreed.
In the current draft the server sends a notification once it receives a scheduled commit. This allows the client to know that the scheduled RPC was received. Then, when the RPC is completed, the server sends the RPC reply.

>IMO, YANG date-and-time is better time parameter than 'seconds since 1970'.
>It is already supported by NETCONF servers.

(2) Agreed.
The current draft uses date-and-time.

>How does a client cancel an operation?
>Can client A cancel operations for client B,
>assuming client A is allowed to invoke <kill-session>?

(3) Agreed.
The current draft includes a cancel-schedule RPC.

>I agree it is better to synch the start-times, not the finish-times.
>That is too difficult to predict.

(4) Agreed.
The current draft defines <scheduled-time> to be the start time of the operation.

>IMO, 1 micro-second resolution is enough, not nano-seconds.

(5) Understood. Draft 00 used the IEEE 1588 time format, which may have implied that a nanosecond accuracy is expected.
The current draft uses the date-and-time format. Nanosecond accuracy is not required (neither explicitly, nor implicitly).
The draft intentionally does not define an accuracy requirement, as in the Network Time Protocol (NTP), and the Precision Time Protocol (PTP).  The current draft defines time as a tool. Accuracy will depend on the application, implementation, and network size/topology.

>Can you explain why the server returns the execution time for operations
>since that doesn't seem to have anything to do with the problem statement
>of starting operations at a coordinated time?

(6) The servers returns the actual execution time to the client. This allows the client to receive feedback about when the actual operation was completed compared to its intended scheduled time. This is an important mechanism, as it provides the client information about the accuracy of the scheduled RPC.

>How do clients monitor what operations are pending?
>What if the NACM rules change while the operation is pending?
>What if a session is lost or closed before its scheduled operation is started?
>What if the server reboots while operations are pending?

(7) Essentially, all these questions boil down the discussion we have on 3.6. Near Future Scheduling vs. Far Future Scheduling.
We focus on near future scheduling. Quoting from section 3.6:
"Near future scheduling guarantees that external events such as the examples above have a low probability of occurring during the sched-max-future period, and even when they do, the period of inconsistency is limited to sched-max-future, which is a short period of time."

>There are lots of operational problems with delayed operations.
>I like the solution in draft-kwatsen-conditional-enablement-00
>better than this one because it covers more use-cases and
>seems more tractable.

(8) Continuing the point of (7), I believe the solution of draft-kwatsen-conditional-enablement-00 was optimized for far future scheduling, while our draft is optimized for near future scheduling. We believe once you think about the current draft in the context of near future scheduling, the operational problems you mention are not an issue.


Comments from Joe:
================
>I think there should be a way to query the server's current time to make sure the client and server agree.
>In terms of this draft should there be some text that explain this as a potential concern with suggestions

(9) Agreed.
This is discussed in Section 3.3 of the current draft.

>In terms of things that need to be done to the draft, I think it needs a
>section on error-handling specifically around what happens if the clock
>on the device is insane or if a time is specified in the past.  I think
>there needs to be some direction about how far in the future one can set
>the timer or how many pending operations can be outstanding.

(10) Agreed.
This is currently discussed in Section 3.5.

Comments from Martin:
===================
>date-and-time (and RFC 3339) allow finer granularity than a 10th of a
>second.

(11) Agreed.
The current draft uses date-and-time.

Comments from Juergen:
====================
>If nobody seriously can implement a certain precision, implementations
>will simply cheat and the consequence of that in the long run is often
>worse than having a less fine grained precision that is implementable
>and thus meaningful.

(12) Understood. Please see (5).

Comments from Jonathan:
=====================
>wouldn't it be more appropriate to define the scheduled time
>as "the time at which the RPC should be executed

(13) Agreed. Please see (4).


>How would you see this working when supporting the configuration
>of a set of network elements in a robust and transaction-oriented way,
>where the operation should complete on all devices or be fully reversed?

(14) Agreed.
The current draft includes the cancel-schedule RPC, which allows a network-wide all-or-none operation.

Comments from Radek:
===================
>Even if you do <edit-config> on a single leaf, it can represent a
>complex operation with a hardly predictable duration.

(15) Agreed.
The current draft defines <scheduled-time> to be the start time of the operation.

Comments from Balazs:
===================

>Can you give us some examples of applications today that use coordinated
>configuration? What precision is used today?

(16) Some interesting uses cases are given at the beginning of this email.

>IMHO anything better then millisec precision is unrealistic, unneeded or
>rather just wishfull thinking.

(17) Understood. Please see (5).

>The idea that scheduled time should be the required completion time for
>an operation is unfortunate. That assumes a device can estimate how long a
>complex configuration operation will take.

(18) Agreed. Please see (4).

>there must be a way to check pending scheduled operations.
>if a management user is removed/undefined will his pending operations also
>be removed, or will they stand there and fail at the scheduled execution time?
>What if my session is closed e.g. by a network timeout?

(19) Understood. We believe these comments boil down to the discussion of (7), which is discussed in Section 3.6 of the current draft.

>often a configuration session will consist of multiple operations:
>(lock, edit-config, unlock), (lock, discard-changes, edit-config, commit, unlock).
>If I can only schedule single operations does that mean I can not use lock? If I
>want to schedule commit, should I keep the running and the candidate
>configurations locked to ensure I commit, what I really want?

(20) Multiple operations can use multiple scheduled RPCs with different execution times. The TIME of these RPCs can determine the order of their execution. Similarly, the lock can be used with a schedule, so you do not need to lock the datastore in advance.


Regards,
Tal and Yoram.

[Netconf] Updated draft-mm-netconf-time-capabilit… Tal Mizrahi