Re: [Netconf] WG LC for zerotouch

Martin Bjorklund <mbj@tail-f.com> Wed, 04 October 2017 16:58 UTC

Return-Path: <mbj@tail-f.com>
X-Original-To: netconf@ietfa.amsl.com
Delivered-To: netconf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 652AD1342FC for <netconf@ietfa.amsl.com>; Wed, 4 Oct 2017 09:58:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0W3QUwrmrWWY for <netconf@ietfa.amsl.com>; Wed, 4 Oct 2017 09:58:43 -0700 (PDT)
Received: from mail.tail-f.com (mail.tail-f.com [46.21.102.45]) by ietfa.amsl.com (Postfix) with ESMTP id 790FC13430D for <netconf@ietf.org>; Wed, 4 Oct 2017 09:58:43 -0700 (PDT)
Received: from localhost (h-40-225.A165.priv.bahnhof.se [94.254.40.225]) by mail.tail-f.com (Postfix) with ESMTPSA id 20DFC1AE02A7; Wed, 4 Oct 2017 18:58:12 +0200 (CEST)
Date: Wed, 04 Oct 2017 18:58:11 +0200
Message-Id: <20171004.185811.1601400798069063177.mbj@tail-f.com>
To: kwatsen@juniper.net
Cc: netconf@ietf.org
From: Martin Bjorklund <mbj@tail-f.com>
In-Reply-To: <36B6D286-F8B6-4423-8DD2-4DF74943F2A1@juniper.net>
References: <B08CEFF5-8E7E-4D67-A7E4-9C8992114623@juniper.net> <20170928.094514.526215195914925651.mbj@tail-f.com> <36B6D286-F8B6-4423-8DD2-4DF74943F2A1@juniper.net>
X-Mailer: Mew version 6.7 on Emacs 24.5 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
Content-Type: Text/Plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/netconf/MgKSr_tisbeF8_8rR2-oeOOuoEo>
Subject: Re: [Netconf] WG LC for zerotouch
X-BeenThere: netconf@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Network Configuration WG mailing list <netconf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netconf>, <mailto:netconf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netconf/>
List-Post: <mailto:netconf@ietf.org>
List-Help: <mailto:netconf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netconf>, <mailto:netconf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Oct 2017 16:58:46 -0000

Hi,

Kent Watsen <kwatsen@juniper.net> wrote:
> Hi Martin,
> 
> I trimmed the discussion down to just the open threads.
> 
> 
> >> > o  Section 7.2
> >> >
> >> >  Adding custom query parameters like this breaks the normal YANG
> >> >  contract.  If I understand this correctly, the instance data tree
> >> >  presented to the client is supposed to change based on these query
> >> >  parameters.
> >> 
> >> Correct.  The idea is that the response could vary based on the input
> >> query params.  Of course, in case there was any doubt, the response 
> >> would always conform to same YANG schema.
> >
> >See below.
> >
> >> >  If you really need to have that functionality, it would better to
> >> >  model the device-specific data as an action; input would be the
> >> >  os-name, os-version etc, and output would be the
> >> >  zerotouch-information and voucher etc.
> >> 
> >> I thought of this as well, but didn't do anything about it as my 
> >> last message to the list said that we'd add "query parameters"
> >> and I wanted to follow through on that statement.
> >> 
> >> But what is the issue?  Is it primarily that using query params 
> >> on GET should be limited to returning different representations
> >> of a resource, and this use seems to be returning different 
> >> resources altogether?
> >
> > Yes, this is my concern.  A normal RESTCONF / YANG server has *one*
> > single instance of the operational state, and it just provides an API
> > to this data.  It can be filtered and pruned based on access rights,
> > but not modified from the outside based on query parameters.
> 
> I agree, it seems more proper.  Okay, let's make this change.  No
> objections from the WG, right?  As a heads-up for those following
> along, this change doesn't affect the essence of the solution, but
> the diff will appear somewhat big...
> 
> 
> >> FWIW, if we were to do this, we might also consider moving the 
> >> 'serial-number' field from the URL to an input field
> >
> > Hmm, did you mean 'unique-id'?
> 
> Yes. But what about the idea, moving 'unique-id' from the URL path
> to an input param?  Or, even more dramatically, we could remove 
> 'unique-id' altogether, relying instead on the RESTCONF username
> extracted from the DevID certificate?

See below.

> If relying on the extracted username, note that none of the 
> 'cert-to-name' identities defined in RFC 7407, which is also
> used in draft-ietf-netconf-restconf-client-server, would work.
> The 'common-name' identity is closest, but what is really needed
> is a combination of a 'serialNumber' attribute and an optional
> 'hardwareModuleName' attribute (not all DevID certs define it).

Ok.  I've tried to find a spec for the DevID, but couldn't find a
public version.

> I suppose that this is an issue for the afore-mentioned restconf
> client/server draft more so than this one though.  Still, since
> we're talking about it, assuming we did this, would that draft
> add said identities, or would a 7407bis be better?

I think that such an identitiy could be defined in this draft.  Isn't
such an identity quite specific to this use case?

> >> , and make
> >> a similar change to the 'update-progress' action.  But that then
> >> would mess up your NACM rule below.  Thoughts?
> >
> >We can have:
> >
> >  list device {
> >    key unique-id;
> >
> >    leaf unique-id { ... }
> >
> >    action get-bootstrapping-data {
> >      input {
> >        leaf os-name { ... }
> >        leaf os-version { ... }
> >        ...
> >      }
> >      output {
> >        leaf zerotouch-information { ... }
> >        leaf ownership-voucher { ... }
> >        leaf owner-certificate { ... }
> >      }
> >    }
> >
> >    action report-progress {  // I like this name better :)
> >      ...
> >    }
> >  }
> 
> Yes, this is roughly what switching to an action statement would look
> like though, pending on the above discussion, we may use top-level RPCs
> instead of actions, if relying on the RESTCONF username.
> 
> I agree, 'report-progress' does seem better.  It's a minor thing, but
> happy to oblige.  (no objections from the WG, right?)
> 
> 
> >> > o  Section 4.4
> >> >
> >> >     When a device is
> >> >     not able to trust a bootstrap server, it MUST NOT send its IDevID
> >> >     certificate in the form of a TLS client certificate
> >> >
> >> >  How will the server authenticate the client in this case?
> >> 
> >> The idea is that the server doesn't auth the client in this case.
> >> The goal behind not sending the IDevID certificate was so a rouge
> >> bootstrap server wouldn't be able to discover the device's serial
> >> number, which is why Appendix B suggests also masking the serial
> >> number the device sends in the URL.  That said, we might need to
> >> rethink this, especially in conjunction with your next comment.  
> >> (continued below)  
> >> 
> >> 
> >> >  I see that in section 7.3 you have:
> >> >
> >> >   Note that the bootstrap server MUST NOT process a progress update
> >> >   from a device without first authenticating the device.  This is in
> >> >   contrast to when a device is fetching data from the server, a read-
> >> >   only operation, in which case device authentication is not strictly
> >> >   required (e.g., when sending signed information).
> >> >
> >> >  But the server is a normal RESTCONF server, so it will follow
> >> >  section 2.5 of RFC 8040; it will require clients to be
> >> >  authenticated.
> >> 
> >> I was hoping that the draft could suppress this auth requirement
> >> for this specific "untrusted" interaction.  In essence, reducing
> >> the bootstrap server to a simple file server providing anonymous
> >> read-access.
> >> 
> >> But in the security analysis of things, it comes down to what's 
> >> more important to prevent:
> >>  a) a rogue server discovering a valid device's serial number,
> >>     while not being able to provide the device a response the
> >>     device can trust, or
> >>  b) a valid server giving signed redirect information to a
> >>     rogue device, where the information given doesn't provide
> >>     the device any more information than the device must have
> >>     already had to make the request in the first place, other
> >>     than a confirmation that indeed that serial number is one
> >>     that the server is expecting.  
> >> 
> >> Looking at it this way, (b) is more important to prevent and, 
> >> besides:
> >> 1) there is already a Security Consideration related to the use
> >>    of serial numbers in the solution, so (a) seems to be just
> >>    more of the same, 
> >> 2) rogue connections could play havoc with the server's logs, and
> >> 3) (to your point) it would be tricky business to define an
> >>    exception for the standard RESTCONF server to follow.
> >> 
> >> All this is to say that I think we should go back to the device
> >> always sending its IDevID cert and, of course, the server always
> >> authenticating the device.  But I also think that a device should
> >> continue NOT sending any other information (i.e. query params,
> >> progress updates, etc.), beyond its serial number & IDevID cert,
> >> until it can fully authenticate the server.
> >
> > Ok.
> 
> Okay, it seems that we agree.  Any objections from the WG?  In case it's
> unclear, this change is fairly significant from a solution-perspective.
> It will also have a non-trivial impact from a draft-diff perspective.
> Please chime in now with any opinions now.
> 
>  
> >> > o  Section 5.6
> >> >
> >> >     Upon rebooting, the device MUST still be
> >> >     in its initial state, causing the bootstrapping process to run again,
> >> >     which will eventually come to this very point,
> >> >
> >> >  Why this MUST?  What if the new boot image contains other factory
> >> >  defaults that does not enable ZTP, would that be illegal?
> >> >
> >> >  I would think that implementations are free to do whatever they want
> >> >  in case of errors.
> >> 
> >> Maybe not *illegal*, but it would certainly brick the device at that
> >> point.  Would just reducing the "MUST" to a "must" suffice, or should
> >> we add a qualifier along the lines of "in order for the solution to
> >> work..." ?
> >
> > I would just say something like 
> >
> >     Upon rebooting, if the new image also enables zerotouch, the
> >     device will still be
> >     in its initial state, causing the bootstrapping process to run again,
> >     which will eventually come to this very point,
> 
> Okay.
> 
> 
> >> >    In the case of errors, the device MUST reset
> >> >    itself in such a way that forces a reinstallation of the boot image,
> >> >    thereby wiping out any bad state the script may have left behind.
> >> >
> >> >  Is this also required?  Why can't stop and wait for manual
> >> >  intervention be allowed?
> >> 
> >> It could, but again, it would brick the device.  Same question as above,
> >> would reducing the "MUST" to a "must" suffice, or should we add a 
> >> qualifier along the lines of "in order for the solution to work..." ?
> >
> > Well, this depends if the error is transient or not.  If it is not
> > transient, it will just happen again and again.
> 
> Maybe, maybe not.  The device could potentially land onto another/different
> source of bootstrapping data that works the next go around.  The hope is
> that the device keeps trying (and posting progress-updates regarding its
> errors) ad infinitum until things eventually clear up.  For instance, an
> admin eventually takes notice and administers corrective action.

Ok.

> > I don't have a strong opinion, but I would probably be silent about
> > what to do in case of these kinds of errors.  Leave that to the
> > implementation to figure out.
> 
> I feel that it's critical that manual intervention is never required.
> Yes, a manual step could be employed, but it should be a rare step.  I
> think it's best to say something along these lines.  I'll try to state
> this more clearly.

Ok.

> >> >  I suggest you shorten the lines of the examples even more, so that
> >> >  the examples are properly indented in the draft (they are currently
> >> >  outdented)
> >> 
> >> It's hard to make the lines much shorter with the URN being so long.
> >> Options are 1) don't indent just the xmlns line or 2) move from two-
> >> space indent to single-space indent.  Do you have a preference?
> >
> > In the second example you use (1).  You can also use "\", which you
> > are also already using.
> 
> So I do, albeit only in the HTTP-header parts, not the HTTP-body parts.
> 
> FWIW, my build-script inserts source files in situ as directed by
> processing instructions. I can modify the build-script to do the 
> auto-'\' insertion logic on some column-boundary, but it may need
> a complimentary update on the extraction tools.  Do we need some 
> sort of standard around this?
> 
> I'm aware that here you're only talking about examples, which 
> currently are not extracted via tools, but the same auto-'\' 
> insertion logic could also help with the YANG modules, as they
> could become unwieldy too...though I suppose liberal use of 
> groupings could always be employed to keep line-lengths in
> check.  Okay, but still, a YANG Doctors discussion at IETF 99
> suggested introducing the ability to auto-extract and validate
> examples in drafts too, to automate as much of the YD review
> process as possible, so still I think there may be a need for
> a '\' standard of some sort here.  Thoguhts?
> 
> 
> 
> > I just found another outdented figure in section 5.3, which probably
> > is easier to fix.
> 
> Huh?  I don't see it.  The figure in 5.3 looks okay to me...

Yes, my bad.  I think I meant 5.2.


> >> > o  Section 9
> >> >
> >> >  Should you also mention that /device list should be protected by
> >> >  nacm rules?
> >> 
> >> Yes, indeed, I left out the standard security template for the bootstrap
> >> server YANG module.  Will add.
> >> 
> >> 
> >> >  If the RESTCONF user name is the device's unique-id, then a single
> >> >  NACM rule can be used:
> >> >
> >> >       <rule>
> >> >         <name>allow-device</name>
> >> >         <path xmlns:ztbs="urn:ietf:params:xml:ns:yang:\ietf-zerotouch-bootstrap-server">
> >> >           /ztbs:device[ztbs:unique-id=$USER]
> >> >         </path>
> >> >         <access-operations>read</access-operations>
> >> >         <action>permit</action>
> >> >       </rule>
> >> 
> >> Yes, but would you expect this to appear in the draft?  Typically, the specific
> >> NACM rule syntax is left as an exercise to the reader...
> >
> > In order to do this, the "username" must be the same as the
> > "unique-id".  This requires the client cert to be constructed
> > properly.
> 
> Agreed.  In this case, the client cert is a DevID cert, and  802.1AR 
> governs its construction.  Not too many options there.  Of course, 
> the mapping of said cert to username is being discussed above wrt the
> cert-to-name mapping logic... 
> 
> 
> >> Since the NACM rule is not obvious, and due to the cert implications,
> >> I think it would be very helpful with an appendix that describes how
> >> this can be set up.
> 
> I think that this will depend on if we go with an action or a top-level 
> RPC. If an action, then NACM may have a place and an example may help.
> But, if using a top-level RPC, then it seems that the server has no 
> option but to auth the DevID cert and provide a RESTCONF username-specific
> response - NACM can't constrain it any further, right?

Correct.

> (note: this seems
> more secure to me - I like it)

Hmm, I think it is less clear.  With an explicit list and actions, you
can use NACM to do the access control.  With top-level RPCs, the
implementation of the RPC has to do the access control.

Also, a list of devices is more transparent - I mean, this list must
somehow exist anyway, but with top-level RPCs it will be hidden.
Maybe implementations want to add additional objects to this list, for
monitoring or something.

> >> > o  Other comment
> >> >
> >> >  Is it assumed that there will be a vendor-specific YANG module to
> >> >  enable the zero-touch process?  Did you consider an
> >> >  ietf-zerotouch-device module for this purpose?  Such a configuration
> >> >  must be carefully designed to allow for a "merge" configuration file
> >> >  to actually disable the zero touch process.
> >> 
> >> It is assumed to be vendor-specific currently.  I have not thought to
> >> define a ietf-zerotouch-device module as of yet that, presumably, would
> >> contain a boolean or an enum (but not a p-container) for enabling the
> >> service.  Do you think we should add such a module to the draft, or just
> >> put a note somewhere about it?
> >
> > I don't have a strong opinion, but if there's not a standard module
> > for this, the text should mention that it is assumed that vendors
> > define this themselves.   Hmm, I think I would prefer a standard
> > module; it would make the solution more complete.
> 
> We can add a module, but maybe it should be called something like
> ietf-zerotouch-bootstrap-client, to mirror the ietf-zerotouch-
> bootstrap-server module? - though this may be a mismatch as the
> former regards a southbound protocol and the latter regards a
> configuration model...
> 
> Maybe we can discuss what would be in this model.  I'm not sure if
> I'm oversimplifying it, but I think the module might be just a single
> config true leaf called something like "enabled".   There might also
> be some config false leafs for the stuff mentioned in section 5.1, 
> though it doing so is not so important since the bootstrapping only
> occurs once (not ongoing management).  So a module would primarily
> be for the "enabled" leaf.  Anything else?  Still think it's worth it?

I'm not sure.  It does seem odd to write that we expect vendors to
provide this single leaf in a proprietary module, but maybe it is ok.



/martin