Re: Genart early review of draft-ietf-opsawg-mud-08

Robert Sparks <rjsparks@nostrum.com> Thu, 07 September 2017 15:51 UTC

Subject: Re: Genart early review of draft-ietf-opsawg-mud-08
To: Eliot Lear <lear@cisco.com>, gen-art@ietf.org
Cc: draft-ietf-opsawg-mud.all@ietf.org, opsawg@ietf.org, ietf@ietf.org
References: <150411366399.21627.17047458871931107094@ietfa.amsl.com> <0a8c04d6-eb0f-09d0-eeed-da2dacf8260c@cisco.com>
From: Robert Sparks <rjsparks@nostrum.com>
Message-ID: <3570a07d-6786-3077-f1a7-4ec61a4bf9d0@nostrum.com>
Date: Thu, 07 Sep 2017 10:51:20 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <0a8c04d6-eb0f-09d0-eeed-da2dacf8260c@cisco.com>
Content-Type: multipart/alternative; boundary="------------35768AF4B247EF2DF66EB037"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/aaAyK78x8q5BEg-Z-JsE0Plyano>
Precedence: list

Apologies for being a little laggy -

Some comments inline.


On 8/31/17 2:07 PM, Eliot Lear wrote:
>
> Robert,
>
> As I wrote earlier, this was a great review.  Thanks for that. Please 
> see below.
>
>
> On 8/30/17 7:21 PM, Robert Sparks wrote:
>> Reviewer: Robert Sparks
>> Review result: Almost Ready
>>
>> This is an exciting concept, and the draft overall is approachable. I
>> have identified a few areas I think need more detail, and have a
>> longish list of nits (please don't take that to be negative).
>>
>> ==Issues==
>>
>> I find the structure of the introduction unclear. Please consider
>> reworking it.  I would suggest even more succinctly listing goals and
>> constraints, and then intended applicability (these things are in the
>> current text, but I think you can render them much more efficiently). In
>> particular, the argument that implementers of things are incented only to
>> provide the minimal amount of behavior to get their thingyness could be
>> more strongly highlighted.
>
> I've received conflicting reviews here.  Most like what's there and 
> while I'm open to specific textual changes, a full reorganization 
> would likely be destabilizing.
I disagree. I know it's work, but if you read this as an implementer, I 
think it would pay back.
This is a stylistic suggestion though, so take it only as my opinion.
>
>> The document proposes "reputation services". It needs more words about
>> whether those exist, and what scopes the architecture imagines (an
>> enterprise might have a different idea of a reputation service than a
>> residence). There is a notion of "decent web reputations" in the security
>> considerations section. Who determines that? The security considerations
>> section should talk about attacks against the reputation services.
>
> This is discussed in security considerations:
>> It may also be useful
>> to limit retrieval of MUD URLs to only those sites that are known to
>> have decent web reputations.
> What I am specifically talking about are web or domain reputation 
> services.  These are pretty commonplace today.  Your browser uses one, 
> and numerous companies offer them, including our own.  But to be 
> clearer, I propose to use the term "web or domain reputation 
> services", so that people know what I'm talking about.
I'll look at the diff.
>
>
>
>
>> In the first paragraph of Section 2, it's not clear if you are trying
>> to restrict the models to only those in the two documents in the list
>> following the paragraph.
>
> Right.  This was caught earlier, and I'll correct.
>
>> I am not a YANG doctor, so this may be in the weeds, but it feels like
>> there's a discrepancy between the diagram at the end of section 2 and the
>> element definitions in section 3. In particular 3.7 doesn't seem to align
>> with what the diagram or the example in Appendix B uses. Should you be
>> defining "from-device-policy" and "to-device-policy" instead of
>> "packet-direction"? (I'm wondering if 3.7 reflects an older design?)
>
> Yupper.  Fixed (I think).
>> At section 3.13, the description of my-controller is not quite right.
>> This bit signals to the mud controller to use a mapping that it knows
>> about or creates. Something else established that class (and maybe gave
>> it a name). I talked about this with Eliot and he has a better
>> description to use.
>
> Proposed text:
>
>> This null-valued node signals to the MUD controller to use whatever
>> mapping it has for this URL to a controller class".  This may require
>> prompting the administrator for class members.  Future work should
>> seek to automate membership management.
ack
>
>> It's not clear to me that this is a good use of .well-known. I suggest
>> getting an expert review on the proposed usage. (I had a quick
>> conversation with Mark Nottingham and got some initial feedback that
>> I'm passing along here. I'm sure there's more that an in-depth review
>> would identify.) Why wouldn't a URI template (RFC6570) do the job?
>> Rather than use RFC3986's query, consider pointing to HTML5 (which
>> would bring the more familiar key=value format).
>
> The key issue is that we want to externalize versioning AND hardcode 
> it in the URL so that it's independent of transport. Remember, there 
> is very little information exchange between the Thing and the network, 
> and I will claim that's a good thing.
Sure, and URL templates would _do_ that, no? But please take that 
argument up with folks like Mark.
>
>> The document needs to say more about how HTTP is used. I assume you only
>> intend to use GET, and that you expect redirects to be followed, and that
>> nothing special needs to be considered with caching? The document needs
>> to be explicit about it. Take a look at
>> <https://mnot.github.io/I-D/bcp56bis/>. (There's been some conversation
>> about it on the art list, so Eliot, at least, is already aware of it -
>> see<https://mailarchive.ietf.org/arch/search/?q=bcp56bis>)
>
> What we say today is the following;
>> Processing of this
>> URL occurs as specified in {{RFC2818}} and {{RFC3986}}.
> There is one aspect of caching semantics we should probably capture, 
> which is that the cache-validity period should exceed the HTTP cache 
> or expiry period as specified by max-age or Expires. Does that sound 
> about right to you?
Goes in the right direction. Do you expect POST to work with this?
>
>> I think there needs to be more discussion of the PKI used for signing MUD
>> files.
>
> We do have some discussion in Section 12.2.  I'm happy to add an 
> additional sentence or two, but would seek guidance on where you think 
> we're missing.
So, are you expecting to reuse the web PKI here? Will the MUD files be 
signed with the same credentials used by the HTTP server? I'm thinking 
you aren't, and are waving your hands at where trust lies with the 
recommendation that signers be validated directly etc. Either way, I 
think you need to be more explicit and that what you expect for 
establishing trust is going to take more than a couple of sentences.
>
>> Consider discussing whether the stacks used by typical things will let
>> them add DHCP options (or include bits in the other protocols being
>> enabled). If it's well known (I can't say) that these stacks typically
>> _won't_ provide that functionality, then you should punch up the
>> discussion of the controllers mapping other identifiers to MUD URLs on
>> behalf of the thing.
>
> I agree.  We allude to this in the draft.  We say, for instance:
>> It is possible that there may be other means for a MUD URL to be
>> learned by a network.  For instance, if a device has a serial number,
>> it may be possible for the MUD controller to perform a lookup of the
>> device, if it has some knowledge as to who the device manufacturer
>> is, and what its MUD file server is.  Such mechanisms are not
>> described in this memo, but are possible.
>
> The case we have in mind is LoRaWAN.  Should we go further?
I think explicitly acknowledging that some things stacks limit their 
behavior will pay back. It would be unfortunate if someone who started a 
MUD controller implementation made the assumption that the majority of 
things will hand them the DHCP option (etc.) and waited to bolt the 
complexity of the lookup above onto their initial design.
>
>
>> You suggest the DHCP Client (which is a thing) SHOULD log or report
>> improper acknowledgments from servers. That's asking a bit much from
>> a thing. I suspect the requirement is unrealistic and should be removed
>> or rewritten to acknowledge that things typically won't do that.
> I think there's a philosophical thing hiding here, though: what 
> expectations should we have of device.  As a SHOULD we're saying, if 
> you have good cause not to, ok.  But otherwise, for the sake of the 
> sanity of the customer, please log.
Why not acknowledge in the document that the expectation is that most 
won't be able to.
Painting as explicit and accurate picture as possible can only do good, no?
(Again, I'm not trying to _assert_ that the majority of things can't, 
but that's my suspicion. People who work with this things on a daily 
basis should weigh in.)
>
>> The security and deployment considerations sections talk about what the
>> need for coordination if control over the domain name used in the URL
>> changes. It should talk more about what happens if the new administration
>> of the domain is not interested in facilitating a transition (consider
>> the case of a young company with a few thousand start-up-ish things out
>> there that loses a suit over its name). Please discuss whether or not
>> suddenly losing the MUD assisted network configuration is expected to
>> leave the devices effectively cut-off.
>
> It should not, and here's why:
>
>   * Assuming the device has already been used, there is no reason to
>     simply delete the MUD file from one's cache.  The cache-validity
>     value is meant as a timer to keep implementations for harassing
>     the MUD file server, but there's the information is still useful,
>     even if it may not have been freshened.
>
Hrmm - there should be some description about using the information even 
if the cache has expired. That might have security ramifications (it at 
least enables an attacker to cause a set of devices to use old 
information by attacking the access to what might be newer information).
>
>   * In the case where the MUD file service is unavailable when the
>     device is first turned up, it's as if it had not included a
>     MUD-URL in the first place.  While this may be a downgrade attack,
>     there is, as I understand it, really no way to get around it,
>     other than for the MUD controller to log a problem.
>
Worth a short discussion in the text.
>
>  *
>
>
>> Right now, you leave the DHCP server (when it's used) responsible for
>> clearing state in the MUD controller. Please discuss what happens when
>> those are distinct elements (as you have in the end of section 9.2) and
>> the DHCP server reboots. Perhaps it would make sense for the DHCP server
>> to hand the length of the lease it has granted to the MUD controller and
>> let the MUD controller clean up on its own?
>
> See other response.
>
>> The document currently suggests that a piece of software inspect the
>> WHOIS database to see if registration ownership of a domain has changed.
>> Do you really mean software, or should this be advice to the
>> administrator of the controller instead?
>
> The controller.  The idea is to catch bad behavior and anomalies. And 
> the bigger idea is to reduce the number of decisions that the 
> administrator must make, while providing relevant information with 
> which to make the decisions.
I don't think this is a reasonable thing to do. It has the many of the 
same properties we complain about when someone suggest that code inspect 
an IANA registry.
>
>> ==Nits==
>>
>> I recommend an editorial pass focusing on simplifying sentences. Look
>> particularly where the word "therefore" is used and consider
>> restructuring the surrounds. (It is used non-sequitur in a couple of
>> places). Be careful to call out actors explicitly (I note the places
>> that particularly caught my eye below).
>
> ok.
>> Some specific nits:
>>
>> The abstract speaks only about properties of MUD but does not describe
>> what MUD _is_, or is good for. A few more words here would help.
> Right.  See response to Henk.
>> Next to last paragraph of section 1 (before 1.1): A means for _who_ to
>> retrieve the description? (Consider rendering the three list elements on
>> their own lines.)
> Right.  Fixed.
>> The last sentence of section 1 treats "enterprise networks" more
>> specially than it intends, I think. Why couldn't _any_ network do this?
>> Could the sentence be reworded to make it clear that enterprise networks
>> are an example?
>
> s/enterprise networks/local deployments/
>
> ?
Sure
>
>> First sentence of 1.1: Perhaps you mean "general purpose computing
>> devices" instead of "general computing"? "their" has an unclear
>> antecedent.
>
> Indeed.  fixed.
>
>> Last paragraph of 1.3: It's unclear what "such an approach" is intended
>> to point to. Would "a general solution that required capabilities their
>> particular device would not use" make more sense?
>
> Reworded.
>
>> First paragraph of 1.5: "might to allow" is probably meant to be "might
>> be to allow". What does it mean for a controller to "need to speak COAP".
>> Do you mean "controllers capable of speaking COAP"?
> Fixed.
>> Fourth paragraph of 1.5 at the discussion of time and effort: Consider
>> rephrasing this to focus on the result of the time and effort (high
>> quality) rather than the time and effort itself.
>
> Existence is good enough in this case ;-)
>
>> In the list of abstractions at the end of 1.5, you have three things you
>> describe as devices and one thing you describe as a class. You later talk
>> about the abstractions you've described as devices as classes. At this
>> point in the document what you mean by "class" has not been made as
>> explicit as it could be.
> I've tried to review all instances of class to be clear that it is 
> used consistently.  This is somewhat difficult given natural language, 
> but I hope I've gotten it right.
>
>> Section 1.8, item 3: the MUD file doesn't have hosts in it (it has
>> identifiers of some kind). Consider being more explicit about what
>> you mean by testing that against a reputation service.
>
> Actually, it can have hosts in it, but see above.
>
>> Section 3.1: You say "Which turn was taken". I think you meant
>> "Which, in turn, was taken". Consider deleting "for those keeping score".
>
> Awwww.  Just a bit of humor? ;-)
>
>> Section 3.3 is missing a word at "the location any MASA service"?
>
> Ok it's cleaned up.
>
>> I found the prose in the descriptions of the "manufacturer" and
>> "same-manufacturer" elements (3.8 and 3.9) very confusing. I think
>> additional prose introducing the concepts and maybe some examples would
>> be very useful.
>
> Added examples per your suggestion.
>
>> What do you mean by "matches" at 3.10. Do you mean "is"?
>
> All of this is applicable in the context of the matches statement in 
> the ACL model.  I've added some explanatory text at the beginning of 
> the chapeau.
>> The caution in the 2nd paragraph of 3.12 is not clear.
>
> Ok, I've cleaned that up and added an example.
>
>> At section 4, consider pointing out that you are not allowing
>> DHCP by default, and that devices that are expected to use DHCP
>> need to have an explicit allow in their MUD file.
>
> Hmm.  The issue here is that DHCP is an L2 protocol that isn't 
> forwarded.  Do you think it needs to be listed anyway?
Hmm indeed. You're right. That said, the thing that triggered the thought
was the ability of a MUD file to say whether or not something can talk to
other things in the local network. Maybe some reinforcement in the 
discussion
about what that rule would expand to would prevent someone from walling
off the device more than you intended (it would be a creative mistake to 
do so,
I agree)
>
>> The description of the manufacturer leaf in the MUD YANG model
>> could be made more useful.
>
> ALL the descriptions have been improved.
>> Provide a reference for "giaddr" when you use it in section 9.2.
>
> Cleaned up (that's defined in RFC 2131, already normatively 
> referenced, but I expanded).
>> Section 14, 2nd paragraph: additional segmentation of what?
>
> Make that "network segmentation".
>> Second paragraph of Section 15 - it would help to be more precise
>> with agency. _Who_ should review the class?
>
> Fixed.
>
>> In the security considerations section, when you get to the "if for some
>> reason it is not possible to determine whether ownership has changed",
>> _who_ are you suggesting conduct further review?
>
> It's always the network administrator.
>> ==Micro-nits==
>>
>> 1,$s/enorcement/enforcement/g
>
> Doh!
>
>> s/autjors/authors/
>>
>
> Fixed.
>
> Thanks again,
>
> Eliot
>

Genart early review of draft-ietf-opsawg-mud-08 Robert Sparks
Re: [OPSAWG] Genart early review of draft-ietf-op… M. Ranganathan
Re: [OPSAWG] Genart early review of draft-ietf-op… M. Ranganathan
Re: [Gen-art] Genart early review of draft-ietf-o… Dale R. Worley
Re: [OPSAWG] [Gen-art] Genart early review of dra… Einar Nilsen-Nygaard (einarnn)
Re: [OPSAWG] Genart early review of draft-ietf-op… Eliot Lear
Re: [OPSAWG] Genart early review of draft-ietf-op… M. Ranganathan
Re: Genart early review of draft-ietf-opsawg-mud-… Eliot Lear
Re: Genart early review of draft-ietf-opsawg-mud-… Robert Sparks
Re: Genart early review of draft-ietf-opsawg-mud-… Eliot Lear