Re: [Gen-art] Gen-art LC review: draft-mm-netconf-time-capability-05

Tal Mizrahi <talmi@marvell.com> Tue, 04 August 2015 23:41 UTC

Return-Path: <talmi@marvell.com>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3A2DD1AD35D; Tue, 4 Aug 2015 16:41:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.267
X-Spam-Level:
X-Spam-Status: No, score=-2.267 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, IP_NOT_FRIENDLY=0.334, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PCmcV6RH3Yld; Tue, 4 Aug 2015 16:41:34 -0700 (PDT)
Received: from mx0a-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 966301AD35C; Tue, 4 Aug 2015 16:41:34 -0700 (PDT)
Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.15.0.59/8.15.0.59) with SMTP id t74Ndqml030849; Tue, 4 Aug 2015 16:41:34 -0700
Received: from il-exch01.marvell.com ([199.203.130.101]) by mx0a-0016f401.pphosted.com with ESMTP id 1w2kk1kghn-1 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 04 Aug 2015 16:41:34 -0700
Received: from IL-EXCH01.marvell.com (10.4.102.220) by IL-EXCH01.marvell.com (10.4.102.220) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Wed, 5 Aug 2015 02:41:30 +0300
Received: from IL-EXCH01.marvell.com ([fe80::41:1c9f:8611:3a4a]) by IL-EXCH01.marvell.com ([fe80::41:1c9f:8611:3a4a%20]) with mapi id 15.00.1044.021; Wed, 5 Aug 2015 02:41:30 +0300
From: Tal Mizrahi <talmi@marvell.com>
To: Robert Sparks <rjsparks@nostrum.com>
Thread-Topic: Gen-art LC review: draft-mm-netconf-time-capability-05
Thread-Index: AdDIdyKlVE3ctVSLQXagUYpGMyo4/wEDs/AgAGeQZQAAKy6jQP//6tyA//92GYA=
Date: Tue, 04 Aug 2015 23:41:29 +0000
Message-ID: <a788b8d09b104d9a9f48a8486fbdb33c@IL-EXCH01.marvell.com>
References: <60322a704b1e4d1cbc85f6a3b6a33b8e@IL-EXCH01.marvell.com> <55BFEDC8.6040800@nostrum.com> <03c295837c984138bb30bd9aacf21999@IL-EXCH01.marvell.com> <55C0FDD7.1050203@nostrum.com>
In-Reply-To: <55C0FDD7.1050203@nostrum.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.94.250.30]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2015-08-05_01:, , signatures=0
X-Proofpoint-Spam-Details: rule=inbound_notspam policy=inbound score=0 kscore.is_bulkscore=0 kscore.compositescore=1 compositescore=0.9 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0.9 spamscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1506180000 definitions=main-1508040382
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/-pQGER-_2oF8nFZYNcZmq6ejct4>
Cc: General Area Review Team <gen-art@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>, "draft-mm-netconf-time-capability.all@ietf.org" <draft-mm-netconf-time-capability.all@ietf.org>
Subject: Re: [Gen-art] Gen-art LC review: draft-mm-netconf-time-capability-05
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Aug 2015 23:41:37 -0000

Hi Robert,

Thanks again for the prompt responses.


>Well, those are just a subset of the things that could change in command's
>context that would cause the command to be erroneous or even damaging if
>it were run, and you're not addressing the other security issues that come
>with very long scheduling (overflowing buffers, or having lots of time to
>schedule a massive number of commands to all try to happen at once). I
>suspect there are other things that pressured adding the "near future"
>restriction that haven't been captured well yet.

Well, the thing is that 15 seconds (or 'a few seconds' for that matter) is a long enough time to send thousands (or more) of scheduled RPCs, so I am not sure the sched-max-future mitigates the buffer overflow threat. Generally speaking, Section 3.6 discusses erroneous scenarios, and not security threats.

I would suggest to add some text to the security considerations section, which discusses the overflow attack you mentioned here. Would this address your concern?


>I think you're saying that in production deployments today, the 
>authorization policy is "the peer was able to send me a packet". Is that 
>wrong?

I can't comment about what is deployed in production today, although I am sure there are operators out there who can comment about that. RFC 6536, which defines a NETCONF access control model, is cited by 6 other RFCs, so I do not think access control has been overlooked by the community. Nevertheless, I believe that (much like RFC 6241) the access control specifics are not within the scope of the current draft.


Thanks,
Tal.


>-----Original Message-----
>From: Robert Sparks [mailto:rjsparks@nostrum.com]
>Sent: Tuesday, August 04, 2015 9:01 PM
>To: Tal Mizrahi
>Cc: ietf@ietf.org; General Area Review Team; draft-mm-netconf-time-
>capability.all@ietf.org
>Subject: Re: Gen-art LC review: draft-mm-netconf-time-capability-05
>
>
>
>On 8/4/15 11:19 AM, Tal Mizrahi wrote:
>> Hi Robert,
>>
>> Thanks for the comments.
>>
>>
>>>> A typical example of using near-future scheduling is a coordinated
>>>> commit; a client needs to trigger a commit at n servers, so that the
>>>> n servers perform the commit as close as possible to simultaneously.
>>>> Without the time capability, the client sends a sequence of n commit
>>>> messages, and thus each server performs the commit at a different
>>>> time. By using the time capability, the client can send commit
>>>> messages that are scheduled to take place at time Ts, which is 5
>>>> seconds in the future, causing the servers to invoke the commit as close
>as possible to time Ts.
>>> I'm interested in your response to Andy's point on this paragraph.
>> Okay, so here is Andy's point:
>>
>>>> You should pick a different example because the NETCONF
>>>> confirmed-commit procedure is designed to be loose-coupled.  The
>default timeout is 10 minutes.
>>>> Since the client needs sessions open with all servers involved in
>>>> the network-wide commit, there is no advantage in staging the
>>>> <commit> operations 15 sec. in advance, to make sure the servers are
>reachable.
>> And here is our response from 02-Aug-2015:
>>
>>> Right, confirmed-commit is loose-coupled. But the example quoted
>>> above (Example
>>> 1 in the draft) is not intended to replace the confirmed commit. The
>>> purpose in this example is different: the client wants the commit
>>> RPCs to be executed at the same time in all servers.
>>> The confirmed-commit serves a different purpose, which is to make
>>> sure that everyone either commits or rolls back. BTW, a confirmed
>>> commit can be sent with the scheduled-time element, allowing to enjoy
>the best of both worlds.
>>
>> Please let us know if you have further concerns about this point.
>>
>>
>>>> The default value of sched-max-future is defined to be 15 seconds.
>>>> This duration is long enough to allow the scheduled RPC to be sent
>>>> by the client, potentially to multiple servers, and in some cases to
>>>> send a cancellation message, as described in Section ‎3.2. On the
>>>> other hand, the 15 second duration yields a very low probability of a
>reboot or a permission change.
>>> I'm not finding the explanation terribly persuasive, but it's at
>>> least _some_ explanation - thanks for that.  I'll leave it to the ADs
>>> and other reviewers in the field to see if it's sufficient for an
>>> experimental protocol.
>> (*) Please see comment (**) below.
>>
>>>> Note that we did not define a maximal value for sched-max-future,
>>>> since one of the goals was to define a generic tool that can be used
>>>> for various different environments. The draft clearly states the
>>>> intention of using near-future-scheduling, but the requirements and
>>>> constraints of different environments may require the
>>>> sched-max-future to have a different value, potentially higher than
>>>> 30 seconds. Hence, we prefer not to define a maximal value. Indeed, in
>the draft 06 there is a more detailed discussion about the issues we are trying
>to prevent by using near-future scheduling (Section 3.6).
>>> Without a maximal value, I think you need more of a discussion
>>> guiding the choice of sched-max-future. Otherwise, you are just
>>> waiving your hands at not addressing the problems with far-future
>>> scheduling, and potentially well-meaning but uninformed people are
>>> going to go step in them anyway. There was a point to choosing the near-
>future limit.
>>> Enforce it or explain it with more vigor please.
>> (**) Your point is well taken. What we suggest, regarding this point and the
>previous point (*), is that we add more text explaining the factors that affect
>sched-max-future to Section 3.6 .
>>
>> Here is the new text we suggest. Please let us know if this addresses your
>comment:
>>
>>
>> The challenge in far future scheduling is that during the long period between
>the time at which the RPC is sent and the time at which it is scheduled to be
>executed the following erroneous events may occur:
>> - The server may restart.
>> - The client's authorization level may be changed.
>> - The client may restart and send a conflicting RPC.
>> - A different client may send a conflicting RPC.
>Well, those are just a subset of the things that could change in command's
>context that would cause the command to be erroneous or even damaging if
>it were run, and you're not addressing the other security issues that come
>with very long scheduling (overflowing buffers, or having lots of time to
>schedule a massive number of commands to all try to happen at once). I
>suspect there are other things that pressured adding the "near future"
>restriction that haven't been captured well yet.
>>
>> In these cases if the server performs the scheduled operation it may
>perform an action that is inconsistent with the current network policy, or
>inconsistent with the currently active clients.
>>
>> Near future scheduling guarantees that external events such as the
>examples above have a low probability of occurring during the sched-max-
>future period, and even when they do, the period of inconsistency is limited
>to sched-max-future, which is a short period of time.
>>
>> Hence, sched-max-future should be configured to a value that is high
>enough to allow the client to:
>> 1. Send the scheduled RPC, potentially to multiple servers.
>> 2. Receive notifications or rpc-error messages from the server(s), or wait for
>a timeout and decide that if no response has arrive then something is wrong.
>> 3. If necessary, send a cancellation message, potentially to multiple servers.
>>
>> On the other hand, sched-max-future should be configured to a value that is
>low enough to allow a low probability of the erroneous events above, typically
>on the order of a few seconds. Note that even if sched-max-future is
>configured to a low value, it is still possible (with a low probability) that an
>erroneous event will occur. However, this short potentially hazardous period
>is not significantly worse than in conventional (unscheduled) RPCs, as even a
>conventional RPC may in some cases be executed a few seconds after it was
>sent by the client.
>>
>> The default value of sched-max-future is defined to be 15 seconds. This
>duration is long enough to allow the scheduled RPC to be sent by the client,
>potentially to multiple servers, and in some cases to send a cancellation
>message, as described in Section ‎3.2. On the other hand, the 15 second
>duration yields a very low probability of a reboot or a permission change.
>I still think, especially while this as at experimental, you should scope this with
>an absolute max. But I'm just one reviewer. Work it out with your AD.
>
>>
>>
>>>> This YANG module defines the <cancel-schedule> RPC. This RPC may
>>>> be considered sensitive or vulnerable in some network environments.
>>>> Since the value of the <schedule-id> is known to all the clients that are
>>>> subscribed to notifications from the server, the <cancel-schedule> RPC
>>>> may be used maliciously to attack servers by canceling their pending
>RPCs.
>>>> This attack is addressed in two layers: (i) security at the transport layer,
>>>> limiting the attack only to clients that have successfully initiated a secure
>>>> session with the server, and (ii) the authorization level required to cancel
>>>> an RPC should be the same as the level required to schedule it.
>>> To help me along, point me to the specifics of what you use to set and
>>> verify such an authorization level?
>> Indeed, there is a need for an authorization scheme, which is able to set and
>verify the authorization level.
>> NETCONF (RFC 6241) does not explicitly define an authorization scheme, and
>it is probably not within the scope of the current draft to define such a
>scheme either.
>> Quoting RFC 6241:
>>
>>     This document does not specify an authorization scheme, as such a
>>     scheme will likely be tied to a meta-data model or a data model.
>>     Implementors SHOULD provide a comprehensive authorization scheme
>with
>>     NETCONF.
>>     ...
>>     Different environments may well allow different rights prior to and
>>     then after authentication.  Thus, an authorization model is not
>>     specified in this document.  When an operation is not properly
>>     authorized, a simple "access denied" is sufficient.
>I think you're saying that in production deployments today, the
>authorization policy is "the peer was able to send me a packet". Is that
>wrong?
>>
>>
>>
>> Please let us know if you have further comments or concerns about any of
>the issues above.
>>
>> Thanks,
>> Tal.