Re: [Anima] Rtgdir telechat review of draft-ietf-anima-reference-model-07

Brian E Carpenter <brian.e.carpenter@gmail.com> Sun, 26 August 2018 20:58 UTC

To: Christian Hopps <chopps@chopps.org>, rtg-dir@ietf.org
Cc: anima@ietf.org, draft-ietf-anima-reference-model.all@ietf.org
References: <153529941582.11902.1347468414499836311@ietfa.amsl.com>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Message-ID: <6288ec99-fbf6-e2e0-32c3-e402c19fdecd@gmail.com>
Date: Mon, 27 Aug 2018 08:57:51 +1200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <153529941582.11902.1347468414499836311@ietfa.amsl.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima/cEFH6q7bVqTjJxw6KA0VEw2Vslc>
Subject: Re: [Anima] Rtgdir telechat review of draft-ietf-anima-reference-model-07
Precedence: list

(Ccs trimmed)

Christian,

Thanks for this careful review. I'll comment here on the larger issues:

On 2018-08-27 04:03, Christian Hopps wrote:
...
> Minor Major Issues:
> 
> - Virtualization is mentioned once in "4.2 addressing" section. To quote:
> 
>   TEXT: "Support for virtualization: Autonomic Nodes may support Autonomic
>   Service Agents in different virtual machines or containers. The addressing
>   scheme should support this architecture."
> 
>   The special casing of VM/containers here seems to indicate that virtual
>   devices are not "1st class citizens" in an autonomic network. In particular I
>   could easily imagine virtual machines being full blown autonomic nodes
>   themselves. Assuming the intent is not to restrict virtual devices in this
>   manor something needs to be said (somewhere) to make that clear.

I don't think that was the intention. We haven't really explored this in detail,
but I can certainly imagine a deployment (for example) where each tenant in
a data centre has its own virtual autonomic network, and the underlying physical
network is also autonomic. Since the ACP is expected to be implemented as
a VRF, you could even argue that every autonomic network is virtual.

So, yes, we can reword this.

> 
> - Robust programming techniques. I think the intention here is to say that the
>   design of ASAs must have robustness as a top design principle. I think in
>   doing that it should talk about what being robust means; however, it should
>   not be talking about how to accomplish that as there are multiple ways to
>   achieve this goal.
> 
>   In particular I feel saying that restarting is the *last* thing an ASA should
>   do is way overreaching into engineering the solution rather than specifying
>   the requirement. Indeed plenty of people think that overly complex recovery
>   mechanisms that try everything under the sun to *not* restart often have more
>   bugs and are less robust than KISS solutions that "fail" simply but recover
>   quickly with minimal or no disruption.
> 
>   I feel this section reads a bit more like someones idea of how to design a
>   robust system instead of talking about what robust means which is the intent I
>   believe.
> 
>   Perhaps better is just to focus on robust design ideas (some are already
>   stated in the text):
> 
>   - must deal with discovery and negotiation failure as routine.
>   - recovering from failures should be minimally disruptive.
>   - must not leak resources.
>   - must monitor for and deal with hung code.
>   - must include security analysis

OK. Since I drafted that text, I will leave the document editor to fix
it. (Some of the detail probably belongs in another draft specifically
about ASAs, which I am editing.)

> 
> - 7.4: When text talks about feedback loop, it mentions "allow the intervention"
>   of human admin or control system; however, it then describes the feedback loop
>   as presenting default actions and allowing for override. This is fine, but it
>   seems to leave out the common case where something is misbehaving and would
>   not be presenting any choices to the administrator (using the feedback loop),
>   so the admin must forcefully intervene.

Yes. I think the word "feedback" is a bad choice. For engineers raised on
Nyquist diagrams it is part of a closed loop; for other people it means
feedback to humans. The text needs clarifying.

> 
> Minor Issues:
> 
> - 6.1 TEXT: "It must be possible to run ASAs as non-privileged (user space)
>   processes except for those (such as the infrastructure ASAs) that necessarily
>   require kernel privilege. Also, it is highly desirable that ASAs can be
>   dynamically loaded on a running node."
> 
>   ISSUE: Discussing implementation details like user-space, kernel privilege and
>   dynamic loading seems unnecessary and outside the scope of this document. Does
>   this document care if I implement my ASA on a real-time architecture with no
>   "user space" etc..?

Fair enough. See my above comment re robustness.

I'll leave the rest of your comments to the document editor.

Regards
    Brian

> 
> - 4.6 Why call out global routing and overlay networks in particular? Is the
>   real intention to just say that the ACP implementation is not restricted to
>   any specific type of networking?
> 
> - TEXT: 6.3.1.2 "on a given LAN"
> 
>   NIT: Everyone knows what a LAN is; however, I wonder if the text should be
>   more generic and actually describe what it really requires here which is a
>   broadcast or multicast network?
> 
> Questions/Comments:
> 
> - QUESTION: IoT and node requirements. There a couple node ASA requirements. I
>   found myself wondering if a very simple IoT things like thermostats might ever
>   be an AN and if so did they all really need to have joining assistent ASAs? It
>   could be that the answer is "Yes, they do or they can't be nodes". I was just
>   curious.
> 
> - COMMENT: For the types of ASAs: simple (run anywhere), complex (resource
>   restricted), and infra (run everywhere), I was reminded of Kubernetes/cloud
>   orchestration, and the concept of DaemonSets (pods that run everywhere) and
>   Deployments (pods that can run anywhere, possibly be scaled replicated, and
>   may also have requirements that restrict where they can run). I imagine that
>   folks in Anima have also looked at this, but if not it would be good to as
>   they seem to be solving very similar problems.
> 
> Nits:
> 
> - TEXT: 3.2 "However, the information is tracked independently of the status of
>   the peer nodes; specifically, it contains information about non-enrolled
>   nodes, nodes of the same and other domains. "
> 
>   QUESTION: What are peer nodes? Is this another name for adjacent nodes? If so
>   "s/peer/adjacent/".
> 
> - TEXT: 3.3.1 "enrols"
>   CHANGE: "enrolls"
> 
> - TEXT: 3.3.3 "In this state, the autonomic node has at least one ACP channel to
>   another device. It can participate in further autonomic transactions, such as
>   starting autonomic service agents. For example it must now enable the join
>   assistant ASA, to help other devices to join the domain.
> 
>   NIT: "For example foo" is not a sentence on it's own, also "It" is not a good
>   subject as there are multiple nouns in the previous sentence that could serve
>   as antecedents.
> 
>   SUGGEST: 3.3.3 "In this state, the autonomic node has at least one ACP channel
>   to another device. The node can now participate in further autonomic
>   transactions, such as starting autonomic service agents (e.g., it must now
>   enable the join assistant ASA, to help other devices to join the domain).
> 
> - TEXT: 4.1 "Names are typically assigned by a Registrar at bootstrap time and
>   persistent over the lifetime of the device."
> 
>   NIT: s/persistent/and persist/
> 
> - TEXT: "Out of scope are addressing approaches for the data plane of the
>   network, which may be configured and managed in the traditional way, or
>   negotiated as a service of an ASA. One use case for such an autonomic function
>   is described in [I-D.ietf-anima-prefix-management]."
> 
> - NIT: Sounds sort of Yoda-like, and the compounding makes things less clear.
> 
>   SUGGEST: "Addressing approaches for the data plane of the network are outside
>   the scope of this document. These addressing approaches may be configured and
>   managed in the traditional way, or negotiated as a service of an ASA. One use
>   case for such an autonomic function is described in
>   [I-D.ietf-anima-prefix-management]."
> 
> - TEXT: 6.1: "Following an initial discovery phase, the device properties and
>   those of its neighbors are the foundation of the behavior of a specific
>   device. A device and its ASAs have no pre-configuration for the particular
>   network in which they are installed."
> 
>   NIT: Why suddenly lose the "node" abstraction and start talking about devices
>   here? I think it continues to work well to say "node" (e.g., "node
>   properties", "specific node" and "A node and its ASAs...").
> 
> - TEXT: 6.2 "install ASA: copy the ASA code onto the host and start it,"
>   NIT: "s/host/node/"
> 
> 
>

[Anima] Rtgdir telechat review of draft-ietf-anim… Christian Hopps
Re: [Anima] Rtgdir telechat review of draft-ietf-… Brian E Carpenter
Re: [Anima] Rtgdir telechat review of draft-ietf-… Michael H. Behringer
Re: [Anima] Rtgdir telechat review of draft-ietf-… Brian E Carpenter
Re: [Anima] Rtgdir telechat review of draft-ietf-… Michael H. Behringer