Re: [Anima] Rtgdir telechat review of draft-ietf-anima-reference-model-07

Brian E Carpenter <brian.e.carpenter@gmail.com> Sun, 26 August 2018 20:58 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: anima@ietfa.amsl.com
Delivered-To: anima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5C0FE130E0C; Sun, 26 Aug 2018 13:58:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SepqsZ4bNfeT; Sun, 26 Aug 2018 13:57:58 -0700 (PDT)
Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 18058130E26; Sun, 26 Aug 2018 13:57:58 -0700 (PDT)
Received: by mail-pl1-x635.google.com with SMTP id e11-v6so3215657plb.3; Sun, 26 Aug 2018 13:57:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=NpJsZU7Le1DGtoo9J1qscuLqNiygT8ZCEuvq6POmosg=; b=rOSDileS+sNywBNPQOqHC2+RbtVnMA0VJuFklQjbUB7AE1+SvgOOc1Pn3r8hafb4BQ KhCOkWR907XUdIl4gWD4Heq8zfDbfhg5fCZV/hLCqsbTkR9VkFMj0iZhzugRG6yvN1OX X/fUYyGJojlAwuCSdmEqd3/YmBDF71UpPlk7tpZIjzox2jWlueatPuDDROxzEQRZ+QPL ybeYxL+pJmVXXIhiBYkqVuBgnU9z2ui7iTQOt9eh6mulf8/aXLxu18qfs5qzA0ykRdrZ 8Hamnj5IBTfaFR4im7giCMMMoOY+0wemAogQpbET95JPv38AZVDZDCxy/d3jf+/ZM/8g EMlA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=NpJsZU7Le1DGtoo9J1qscuLqNiygT8ZCEuvq6POmosg=; b=jqzqKSRQJ+PDwqLFTzhfPO4vuIdmSux2mjKzGm7857mlXzKlZsJn4vnFlXBJyqYoAM c0kgr/YWa2RgQ2DccCBeHnmx0n8RYN9cJFAGwDviwKeThPbRJLDy5z3RIeuE/jpLKaT+ Vzhl+cPrXDORXFkYjhKqfUNCmAt2sJLU9Gd5cqGkCZpKo6EknFh1NwJpCOwea7FLgtxM KJBYDjGoyCNC8N9eQusBYNzQ12TkcjxvkE+E/8rP4PccmHCU/CVSjj3HfzfRNZHYLkSi uhBAxuamlYIt+N7DiA6dOWJEOPCEJflquPiku70rETm4VPLDC80GyR5aXTKAoCAQTiuA R+GQ==
X-Gm-Message-State: APzg51Dr4EhViV6y8/WehFWlnJDu+RaOb2UTVGdhkkaQkIJZ+DyobFBs ZPzETGRC/j+0C5vLK0+c2AZvUMW6
X-Google-Smtp-Source: ANB0VdY9EM1IYdNOEzJ9ETdv3xFR/taKwPdXE2id9fPQZCUoxZoMcVbXkr7h0BRxuuhi7IwgtHTEng==
X-Received: by 2002:a17:902:b20d:: with SMTP id t13-v6mr10611328plr.107.1535317077334; Sun, 26 Aug 2018 13:57:57 -0700 (PDT)
Received: from [192.168.178.30] ([118.148.68.33]) by smtp.gmail.com with ESMTPSA id r81-v6sm18581523pfa.18.2018.08.26.13.57.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Aug 2018 13:57:56 -0700 (PDT)
To: Christian Hopps <chopps@chopps.org>, rtg-dir@ietf.org
Cc: anima@ietf.org, draft-ietf-anima-reference-model.all@ietf.org
References: <153529941582.11902.1347468414499836311@ietfa.amsl.com>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Message-ID: <6288ec99-fbf6-e2e0-32c3-e402c19fdecd@gmail.com>
Date: Mon, 27 Aug 2018 08:57:51 +1200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <153529941582.11902.1347468414499836311@ietfa.amsl.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima/cEFH6q7bVqTjJxw6KA0VEw2Vslc>
Subject: Re: [Anima] Rtgdir telechat review of draft-ietf-anima-reference-model-07
X-BeenThere: anima@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Autonomic Networking Integrated Model and Approach <anima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/anima>, <mailto:anima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/anima/>
List-Post: <mailto:anima@ietf.org>
List-Help: <mailto:anima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/anima>, <mailto:anima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Aug 2018 20:58:01 -0000

(Ccs trimmed)

Christian,

Thanks for this careful review. I'll comment here on the larger issues:

On 2018-08-27 04:03, Christian Hopps wrote:
...
> Minor Major Issues:
> 
> - Virtualization is mentioned once in "4.2 addressing" section. To quote:
> 
>   TEXT: "Support for virtualization: Autonomic Nodes may support Autonomic
>   Service Agents in different virtual machines or containers. The addressing
>   scheme should support this architecture."
> 
>   The special casing of VM/containers here seems to indicate that virtual
>   devices are not "1st class citizens" in an autonomic network. In particular I
>   could easily imagine virtual machines being full blown autonomic nodes
>   themselves. Assuming the intent is not to restrict virtual devices in this
>   manor something needs to be said (somewhere) to make that clear.

I don't think that was the intention. We haven't really explored this in detail,
but I can certainly imagine a deployment (for example) where each tenant in
a data centre has its own virtual autonomic network, and the underlying physical
network is also autonomic. Since the ACP is expected to be implemented as
a VRF, you could even argue that every autonomic network is virtual.

So, yes, we can reword this.

> 
> - Robust programming techniques. I think the intention here is to say that the
>   design of ASAs must have robustness as a top design principle. I think in
>   doing that it should talk about what being robust means; however, it should
>   not be talking about how to accomplish that as there are multiple ways to
>   achieve this goal.
> 
>   In particular I feel saying that restarting is the *last* thing an ASA should
>   do is way overreaching into engineering the solution rather than specifying
>   the requirement. Indeed plenty of people think that overly complex recovery
>   mechanisms that try everything under the sun to *not* restart often have more
>   bugs and are less robust than KISS solutions that "fail" simply but recover
>   quickly with minimal or no disruption.
> 
>   I feel this section reads a bit more like someones idea of how to design a
>   robust system instead of talking about what robust means which is the intent I
>   believe.
> 
>   Perhaps better is just to focus on robust design ideas (some are already
>   stated in the text):
> 
>   - must deal with discovery and negotiation failure as routine.
>   - recovering from failures should be minimally disruptive.
>   - must not leak resources.
>   - must monitor for and deal with hung code.
>   - must include security analysis

OK. Since I drafted that text, I will leave the document editor to fix
it. (Some of the detail probably belongs in another draft specifically
about ASAs, which I am editing.)

> 
> - 7.4: When text talks about feedback loop, it mentions "allow the intervention"
>   of human admin or control system; however, it then describes the feedback loop
>   as presenting default actions and allowing for override. This is fine, but it
>   seems to leave out the common case where something is misbehaving and would
>   not be presenting any choices to the administrator (using the feedback loop),
>   so the admin must forcefully intervene.

Yes. I think the word "feedback" is a bad choice. For engineers raised on
Nyquist diagrams it is part of a closed loop; for other people it means
feedback to humans. The text needs clarifying.

> 
> Minor Issues:
> 
> - 6.1 TEXT: "It must be possible to run ASAs as non-privileged (user space)
>   processes except for those (such as the infrastructure ASAs) that necessarily
>   require kernel privilege. Also, it is highly desirable that ASAs can be
>   dynamically loaded on a running node."
> 
>   ISSUE: Discussing implementation details like user-space, kernel privilege and
>   dynamic loading seems unnecessary and outside the scope of this document. Does
>   this document care if I implement my ASA on a real-time architecture with no
>   "user space" etc..?

Fair enough. See my above comment re robustness.

I'll leave the rest of your comments to the document editor.

Regards
    Brian

> 
> - 4.6 Why call out global routing and overlay networks in particular? Is the
>   real intention to just say that the ACP implementation is not restricted to
>   any specific type of networking?
> 
> - TEXT: 6.3.1.2 "on a given LAN"
> 
>   NIT: Everyone knows what a LAN is; however, I wonder if the text should be
>   more generic and actually describe what it really requires here which is a
>   broadcast or multicast network?
> 
> Questions/Comments:
> 
> - QUESTION: IoT and node requirements. There a couple node ASA requirements. I
>   found myself wondering if a very simple IoT things like thermostats might ever
>   be an AN and if so did they all really need to have joining assistent ASAs? It
>   could be that the answer is "Yes, they do or they can't be nodes". I was just
>   curious.
> 
> - COMMENT: For the types of ASAs: simple (run anywhere), complex (resource
>   restricted), and infra (run everywhere), I was reminded of Kubernetes/cloud
>   orchestration, and the concept of DaemonSets (pods that run everywhere) and
>   Deployments (pods that can run anywhere, possibly be scaled replicated, and
>   may also have requirements that restrict where they can run). I imagine that
>   folks in Anima have also looked at this, but if not it would be good to as
>   they seem to be solving very similar problems.
> 
> Nits:
> 
> - TEXT: 3.2 "However, the information is tracked independently of the status of
>   the peer nodes; specifically, it contains information about non-enrolled
>   nodes, nodes of the same and other domains. "
> 
>   QUESTION: What are peer nodes? Is this another name for adjacent nodes? If so
>   "s/peer/adjacent/".
> 
> - TEXT: 3.3.1 "enrols"
>   CHANGE: "enrolls"
> 
> - TEXT: 3.3.3 "In this state, the autonomic node has at least one ACP channel to
>   another device. It can participate in further autonomic transactions, such as
>   starting autonomic service agents. For example it must now enable the join
>   assistant ASA, to help other devices to join the domain.
> 
>   NIT: "For example foo" is not a sentence on it's own, also "It" is not a good
>   subject as there are multiple nouns in the previous sentence that could serve
>   as antecedents.
> 
>   SUGGEST: 3.3.3 "In this state, the autonomic node has at least one ACP channel
>   to another device. The node can now participate in further autonomic
>   transactions, such as starting autonomic service agents (e.g., it must now
>   enable the join assistant ASA, to help other devices to join the domain).
> 
> - TEXT: 4.1 "Names are typically assigned by a Registrar at bootstrap time and
>   persistent over the lifetime of the device."
> 
>   NIT: s/persistent/and persist/
> 
> - TEXT: "Out of scope are addressing approaches for the data plane of the
>   network, which may be configured and managed in the traditional way, or
>   negotiated as a service of an ASA. One use case for such an autonomic function
>   is described in [I-D.ietf-anima-prefix-management]."
> 
> - NIT: Sounds sort of Yoda-like, and the compounding makes things less clear.
> 
>   SUGGEST: "Addressing approaches for the data plane of the network are outside
>   the scope of this document. These addressing approaches may be configured and
>   managed in the traditional way, or negotiated as a service of an ASA. One use
>   case for such an autonomic function is described in
>   [I-D.ietf-anima-prefix-management]."
> 
> - TEXT: 6.1: "Following an initial discovery phase, the device properties and
>   those of its neighbors are the foundation of the behavior of a specific
>   device. A device and its ASAs have no pre-configuration for the particular
>   network in which they are installed."
> 
>   NIT: Why suddenly lose the "node" abstraction and start talking about devices
>   here? I think it continues to work well to say "node" (e.g., "node
>   properties", "specific node" and "A node and its ASAs...").
> 
> - TEXT: 6.2 "install ASA: copy the ASA code onto the host and start it,"
>   NIT: "s/host/node/"
> 
> 
>