Re: [nmrg] [Anima] review of draft-irtf-nmrg-autonomic-network-definitions-03

"Michael Behringer (mbehring)" <mbehring@cisco.com> Fri, 03 October 2014 13:48 UTC

From: "Michael Behringer (mbehring)" <mbehring@cisco.com>
To: Rene Struik <rstruik.ext@gmail.com>, "anima@ietf.org" <anima@ietf.org>
Thread-Topic: [Anima] review of draft-irtf-nmrg-autonomic-network-definitions-03
Thread-Index: AQHP0e9qhBeXD8S03Uq0EGQztbFMk5weKbzQ
Date: Fri, 03 Oct 2014 13:48:46 +0000
Message-ID: <3AA7118E69D7CD4BA3ECD5716BAF28DF21C5AD3E@xmb-rcd-x14.cisco.com>
References: <5418A1BC.3070803@gmail.com>
In-Reply-To: <5418A1BC.3070803@gmail.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/nmrg/MMDCmyJCxfaUZq6oL1CVPggtEc0
Cc: "nmrg@irtf.org" <nmrg@irtf.org>
Subject: Re: [nmrg] [Anima] review of draft-irtf-nmrg-autonomic-network-definitions-03
Precedence: list

Finally working on the next version of this document. Thanks for your detailed review Rene, inline more...  

> -----Original Message-----
> From: Anima [mailto:anima-bounces@ietf.org] On Behalf Of Rene Struik
> Sent: 16 September 2014 22:47
> To: anima@ietf.org
> Subject: [Anima] review of draft-irtf-nmrg-autonomic-network-definitions-
> 03
> 
> Dear colleagues:
> 
> Please find below my initial comments on draft-irtf-nmrg-autonomic-
> network-definitions-03.
> 
> 1. Summary:
> 
> The draft provides a loose introduction of some autonomic networking
> notions and then lists some design goals and non-goals.
> 
> 2. General comments:
> 
> a) The title of the document is somewhat misleading, since the document is
> mostly about goals and non-goals and less about definitions.

I don't have a better title for now, and still think that the title should mention both parts.

> b) The goal of autonomic networking is described {Section 2} in terms of

Section 2? That's just the definitions. "goals" points to section 3, but actually, I think you're really referring to the intro?  

> reversing the swing of a pendulum that previously swung into the direction
> of replacing localized configuration and network management decision
> criteria by centralized ones (by putting "intelligence" into network
> management tools rather than devices themselves) into the counter-swing
> of putting more intelligence in devices themselves. What is missing,
> though, is a somewhat more balanced treatment of the topic:
> after all, network design exercises have to deal with many trade-offs, with
> optimal choices not a priori cast in stone. As case in point, consider sensor
> networks: here, energy savings and spotty connectivity favor localized
> decisions, whereas configuration may benefit from a more holistic view of
> the network (e.g., to manage time latency and energy cost, as with, e.g.,
> 6TiSCH, or to reduce the shear amount of traffic to communicate micro-
> updates on state that may be flushed from a more centralized repository in
> one protocol pass). I would strongly suggest giving this entire section more
> substance by bringing in various elements of network design considerations
> besides configuration (e.g., audit/log functionality) and paying tribute to
> trade-offs in device capabilities, energy usage, time latencies, flexibility,
> security, ease of use, and the-like. Now, it reads like a "cutting the non-
> localized cord" ode no matter what. This should also include some
> discussion on whether localized decisions result in anything close to a
> "global optimum", what the effect is if some local decisions are disturbed
> (whether due to malicious act or otherwise), and whether convergence
> actually can take place. {NOTE: I do realize that this is an internet draft and
> not an academic paper, but why not at least mention some of this??}

Independently where this actually belongs, let me make a clarifying statement here: We are trying NOT to judge what is better. This is invariably going to get into a religious war, which I really don't want to have, because it doesn't help. 

But I see the point you're making - We shouldn't make it sound that complete decentralisation is the real goal, and I agree with that (and never meant to imply that). Let me work in some verbiage to that regard. Bottom line of the argument is, as I see it: 
- "autonomic" is for most networks NOT black and white
- some functions are better centralised, some decentralised
- the discussion on which function should follow which paradigm is out of scope for this doc. 
- this doc describes the decentralised functions. 

Does that cut it? 

And, looking at where this discussion fits best, I think it should be part of the "introduction" section.  I added the following (plus some small edits along the way): 

      <t>Some network deployments benefit from a fully autonomic approach, for 
      example networks with a large number of relatively simple devices. 
      Most of currently deployed networks however will require a mixed approach,
      where some functions are autonomic and others are centrally mangaged.
      Central management of networking functions clearly has advantages and
      will be chosen for many networking functions. This document does not 
      discuss which functions should be centralised or follow an autonomic 
      approach. Instead, it should help make the decision which is the best 
      approach for a given situation.</t>      

> c) I would also describe some other factors that may have contributed
> {Section 2} to complexity and non-flexibility, where lack of "separation of
> concern" (e.g., encoding addresses in a topology-aware fashion -- nice for
> semi-static networks, but a major headache for mobile networks, where
> flexibility comes at a premium) may unfortunately have driven part of the
> design activities in the past for whatever reason and where vested
> economical interests may have favored centralized control points
> (think: service providers, with centralized user authentication functionality;
> and with an interest in raising the entry bar for new network entrants, via
> these same control points).

Hmmm... These are all good points, but I think this is mainly historical considerations, which may lead to more and more text being added, and I would really like to keep this document as short as possible. 
Can we leave out WHY certain decisions were made in the past, or do you really think this is a major part of the document? 

> d) It should be made more clear that autonomic networking has limits, since
> it cannot really deal well with device-specific functions, since "intent"
> treats all devices within a network generically and similarly (e.g., in their
> role as routers or as "members of a domain"). This may have repercussions
> when applying these notions to security architectural design that includes
> applications, since physical devices are often not generic, but display
> tailored functions, without too much redundancy and replication.

I think the new section above covers that, by not even trying to make autonomics the "right" architecture in the future, but a part of a bigger scheme. Again, yes, we could enter into more detail here, but I think the above paragraph might be clear enough? I think your main point is "AN isn't for everything" - and I think we make that point now.

> e) Some of the non-goals seem to be actually goals that one, for some
> reason (politics, unions?) does not want to label as such. As an example, the
> objective clearly is to significantly reduce human involvement in network
> management, for all but specialized functions 

I can see how you can read it this way, and am thinking on how to fix that perception. But we state clearly in the "goals" that " It is the design goal to make functions on network nodes self- managing, in other words, minimally dependent on management systems or controllers, as well as human operators." (3.1). But it is NOT a design goals to ELIMINATE human operators. 

I'll try to fix that perception, because it's clearly misleading (and thanks for pointing this out!). I now start that section with "Section 3.1 states that "It is the design goal to [...] minimally dependent on [...] human operators". It is however not a design goal to completely eliminate them." 

> (conflicting with Section 4.1,
> but implied by the last sentence hereof
> -- "more like doctors than hospital orderlies"), to reduce control by
> management rather than "network management controls" (which are
> completely different things altogether!) (Section 4.3), 

We mean to say that even if a function is autonomic, you can still control it (albeit by different means). 
I rename the phrase in question to "Eliminate central control " - should be clearer? 

But again, I *do* think it is a non-goal to ELIMINATE central control. Do we agree? 

I start this section now with " While it is a goal to simplify northbound interfaces (<xref target="simple-north"/>, it is not a goal to eliminate central control, but to allow it on a higher abstraction level."

> to sanitize the
> current complexities and thereby strive for elimination of configuration
> tools (Section 4.4) and existing network management systems (Section 4.5).

For sections 4.4 and 4.5 I actually agree with you! This is incorrect as listed here, and thanks for spotting this. For a given autonomic function (!) the goal is indeed to eliminate complex config tools and NMS systems. 
These 2 points really belong to the co-existence section, because since in reality only some functions will be autonomic, traditional tools will still be needed for the non-autonomic functions. 
And I just re-read the co-existence section, and think it is sufficiently clear. I'll therefore just remove those two paragraphs. 

Well spotted, thanks! :-) 

> If the goal of the autonomic networking exercise is *not* to clean up the
> current complexities, do things better, with ultimate goal of brushing some
> of the current procedures/dead wood/ossified methods aside, why even
> start this effort at all? Stating goals as non-goals is not credible. More to the
> point would be that one takes into account "managed change", co-existence
> of new systems with pre-existing ones, etc., but that is something that
> applies to any development. A draft with goals should make a convincing
> case, including convincing no loss of manageability and anomaly control,
> and not pay lip service to all those who might be impacted by this
> (moreover, this is a technical document, isn't it?).

Let me know whether it's better now, and if you spot anything else that's incoherent or misleading. 

> 3. Specific comments
> 
> Section 1:
> 1) First sentence (p. 2): "control loops" should read "control loop"

Actually, the way I see it there isn't necessarily a single control loop. I guess it depends how you define "control loop". 

> 2) First sentence (pp. 2-3): remove the word autonomic (twice), which is
> really overused here, without definition yet. 

done. 

> Moreover, the definition in
> Section 2 is really strict, so most networks may not be labelled this way.

Yes, this is the result of some previous discussion, where some folks felt strongly that an autonomic node can only be called that if it is fully self-managed, with all self-CHOP features. The new version emphasises much more the autonomic function. To me, that is way more relevant to the work at hand. 

Frankly we can re-open that debate, but I think the current set of definitions is coherent and logical. You could come up with other coherent sets of definitions. We just need to settle on one. 

> 3) Forelast para (p. 3): define "as a well as northbound" (or, better, remove
> this topology-related notions and replace by "and with external processes"
> or the-like)

Fixed. 

> Section 2:
> 4) The word "autonomic" is defined in terms of self*, which is not that
> descriptive. Couldn't one describe this in terms of devices reaching
> localized decisions, via observation (akin to the nervous system analogue)?

The self-* paradigm has been widely used in research and other autonomic architectures, and I suggest we use the same scheme here as elsewhere. Actually, it is quite hard to define "autonomic" in a couple of sentences... Let's use existing definitions here. 

> 5) The description of "intent" seems to preclude codifying device-specific
> functionality in, e.g., a device certificate or attribute certificate. (As
> described in the draft, it seems to codify only "domain" policy attributes.)

This is intentional. Well, the intent is network wide. Certificates can have attributes. The two don't contradict each other. 

> As an aside, codifying domains seems to require careful network planning
> and coordination and knowing in advance which domain a node should
> belong to. I wonder to what degree this limits flexibility of deployment and
> ease of use.

This is a good question. Indeed, we need to keep our eyes open to this question in the future. 
 
> Section 3:
> 6) First bullet (p. 6): With self-configuration, how does one distinguish two
> devices A and B or elevate one to be in control of the other? 

You use a tie-breaker algorithm to decide. As in routing protocols. Worst case "lowest IP wins". Many ways to do that. Should this be elaborated here? Is that not too detailed? 

> Shouldn't one
> mention somewhere privacy aspects (what if a retail store "discovers" may
> cell phone and now conveniently tracks this, since the cell phone has to
> authenticate itself first?)

I prefer to keep that one out of a "definitions and goals" draft. 

> 7) Second bullet (p. 6): With self-healing, how does one recover from
> anomalies that cannot easily be decided by majority rules (such as with
> security)?

There is some different page scheme between us... To me, page 6 doesn't have a second bullet... Are you referring to page 4 (definition of self-healing). And, I don't understand that question... Can you expand please? 

> 8) Third bullet (p. 7): With self-optimizing, shouldn't one add some verbiage
> as to local vs. global optimization and the potential impact of a local
> disturbance of the system on state? (As to the latter, I can imagine there to
> be a few algorithms that lend themselves to easy convergence and failure
> recovery, whereas with others, localized failure recovery is almost doomed
> to fail.)

[yes, now I have it: your page scheme = mine+2 pages]
So, yes, as with some of the previous points, we could go into more details. I think we don't want to roll up all nuances of the approach in a definitions draft? Please let me know if you disagree... 

> 9) First para Section 3.2: "forseeable" should read "foreseeable"

Fixed, thanks! 

> 10) First para Section 3.2: "fully autonomic nodes and network(s)"
> (i.e., make the last word plural)

Fixed.

> 11) In Section 3.2, the priority order (where human decisions rule) should
> not necessarily be a "design principle"; rather, it should be a potential
> instantiation of a policy-setting. After all, management functionality should
> consider a mix of local self-management functions, plus oversight (thereby,
> keeping operational invariants), where different applications may call for
> different decision criteria prioritization and choices. (In fact, this is alluded
> to in the last para of Section 3.2, withe auto-pilot and nuclear plant
> scenarios.) One may want to add some verbiage here as well to graceful
> degradation and failure recovery.

OK, you're right that strictly speaking the design goal is "coexistence" and questions on order of priorities are not really part of the goal. But this question has been asked and should be address somewhere, and i don't see a better place right now... 

"more verbiage" - same comment as before :-)   Yes, we could, but we were trying to keep this document short and limited to definitions and goals. I can be convinced I think, but I'm not convinced yet :-)  

> 12) First para (p. 6): replace "trustworhty" by "reliable"

ok. done.

> 13) In Section 3.3, first para, one should omit the subsentence "using a
> domain identity, for example a certificate issued by a domain certificate
> authority". After all, there are multiple ways for a device to evidence group
> membership, besides issuing domain specific certs. As case in point, with
> sensor networks such as w/HART, ZigBee, ISA SP100, network membership
> is evidenced by the use of a group key and network access is arbitraged by a
> network manager (who may coincide or coordinate with the originator of
> the group key).

This is why we use the term "for example". You're mentioning another example. Both are ok. :-) 

> 14) In Section 3.3, first para, replace "common trust anchor" by "mutually
> respected trust anchor" (after all, it may be that two devices A and B have
> certs issued by different CAs, but nevertheless A respect B's certificate
> authority's CA and vice-versa).

good point. Fixed! (you don't miss anything, do you?! :-) 

> 15) In Section 3.3, 2nd para, while it is possible to have domain entities, this
> is not a priori necessary for autonomic networking: as already mentioned
> above, if network access involves a network manager, it could simply
> perform membership tests via a white list of devices (SubjectPublicKeyInfo,
> so to speak). Should one use domain identities, one should clearly
> articulate how these should be issued, what preplanning and coordination
> this requires and what the impact is on flexibility of deployment. This
> should be compared with alternatives, such as the use of group keys in the
> sensor network example above. As a final note, if one where to use domain
> identities, these could also be realized via attribute certs, rather than
> device certs.

So. Again, we can go into 100 times more detail here, and all the options you list are correct, and will probably be used somewhere. I think the purpose of this document is not to be complete, but to point out the fundamental concepts really. 

What we're trying to say is "a node must be able to authenticate another node". Now you're right, "domain identity" doesn't capture all of your examples, and you may also not require always a "strong, cryptographically verifiable" one. Can we still try to put this concept into a short paragraph? 

I think this should be explained in a different doc. - Goes beyond the scope here. 

> 16) In Section 3.4, I believe it has merit to replace the somewhat
> positional/almost ideological language ("decentralization above all") by the
> more neutral statement that one aims to "improve the ability to cope with
> changes". (This is also the language in the ACM 2006 survey paper by Dobson
> et al).

Ideologies are bad. Except if you create them yourself! :-) 

I see your point. And it's politically a bit risky, I see that. At the same time I think the phrase "de-centralisation and distribution are fundamental to the concept." is actually a lot clearer to the reader than "improving the ability to cope with changes". 

In other words, here I think it's worth walking a fine line on political acceptability, to the benefit of clarity. Thoughts? 

> 17) In Section 3.4, 2nd para, one should add some verbiage to the effect that
> a central repository of information has also been in the interest of vested
> interests, since this has been about control and lock-in of users/customers
> as well. It is not clear to me why an autonomic network
> *must* be able to use a centralized system in order to be deployable.
> Isn't this just lip service to the vested interests? {I can very well imagine,
> e.g., completely distributed wifi access, with local micropayment to pay for
> ongoing access, without need for any centralized system, except perhaps
> for settling those micro-payments. This seems to indicate that the
> autonomic approach could live in parallel of whatever central
> repositories/control points might be operatioal right now.}

I think we're trying to say "if operationally relevant data is kept centrally, an AN system must be able to use it.". We're also saying that "it is possible to distribute such databases and that should be considered". Actually we're trying to NOT be dogmatic here, in practice a lot of information is centralised, and we better work with it. Actually I use this to justify the above sentence. :-)  

There is one change I can make that will "smoothen" the argument, I'll change the "must" to "should". But re-reading this section, I think it expresses exactly what we want to say?! 

And frankly, I don't want to go into the reason WHY some data might be centralised, and what the commercial consequences are. OK? 

> 18) In Section 3.5, I would explicitly mention audit/log/monitoring and
> failure recovery mechanisms, as well as, perhaps, aggregated functionality
> that is relevant when one zooms out (e.g., lease more capacity on backbone
> if local network generates lots of traffic).

Added.

> 19) Section 3.6, last sentence of first para: I would argue that some of these
> details do matter, e.g., auto-address configuration, naming of entities, etc.

Can you expand? I'm not with you...

> 20) Section 3.7, first para: I wonder how "autonomic network reporting"
> provides functional trouble shooting? (Faulty components, anomaly
> detection, dead link, etc.)

Example: instead of getting 50 syslogs that all have a common root cause, aggregated reporting could provide a single message back to the operator "link between x and y down." Keep in mind, this is in parallel with traditional methods such as syslog for the time being. 

> 21) Section 3.7, last para: replace "special algorithms" by "specific
> algorithms"

done.

> 22) Section 3.7, last para: while I agree one should be cognoscent to not
> create unnecessary messaging, I again have trouble to see how one could
> build a reliable system if one purportedly does not care about detail. With
> some systems, this may violate per-device requirements (e.g., specific
> safety valve info does not come through).

Good discussion, but not for a definitions doc, IMO. 

> 23) Section 3.8, first para: replace "however, they" by "however, these"

Done.

> 24)  Section 4, first para: replace "items which" by "items that" (also on
> numerous other locations in the draft)

Done here. Prefer a native English speaker to fix the others, not sure...

> 25) Section 4.2, p. 9: replace "fault" by "error"

Actually in the hardware context I think "fault" is more appropriate than "error". Am I looking at the wrong sport? 

THANKS for this very detailed review, with tons of good comments! I expect to issue a new version later today.
Michael

Re: [nmrg] [Anima] review of draft-irtf-nmrg-auto… Michael Behringer (mbehring)