[netmod] Re: 2nd WGLC on system-config-08

Jürgen Schönwälder <jschoenwaelder@constructor.university> Tue, 20 August 2024 05:54 UTC

Return-Path: <jschoenwaelder@constructor.university>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9BFBC14F602 for <netmod@ietfa.amsl.com>; Mon, 19 Aug 2024 22:54:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.242
X-Spam-Level:
X-Spam-Status: No, score=-6.242 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qvninbH2drD4 for <netmod@ietfa.amsl.com>; Mon, 19 Aug 2024 22:54:40 -0700 (PDT)
Received: from beadg.de (beadg.de [178.254.54.206]) by ietfa.amsl.com (Postfix) with ESMTP id BB54CC14F5FE for <netmod@ietf.org>; Mon, 19 Aug 2024 22:54:38 -0700 (PDT)
Received: from localhost (firewallix.jacobs-university.de [212.201.44.246]) by beadg.de (Postfix) with ESMTPSA id B57B316A04E; Tue, 20 Aug 2024 07:54:32 +0200 (CEST)
Date: Tue, 20 Aug 2024 07:54:31 +0200
From: Jürgen Schönwälder <jschoenwaelder@constructor.university>
To: Kent Watsen <kent+ietf@watsen.net>
Message-ID: <ZsQvl__JDf2EIivX@alice.eecs.jacobs-university.de>
Mail-Followup-To: Kent Watsen <kent+ietf@watsen.net>, "netmod@ietf.org" <netmod@ietf.org>
References: <0100019141c6c601-53d41832-302a-4589-9f75-f181bc2a4306-000000@email.amazonses.com> <Zr4agbxHuzSP9cQ-@alice.eecs.jacobs-university.de> <010001916c81b9af-a9417248-0ebd-461e-a679-3c338a3c56aa-000000@email.amazonses.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <010001916c81b9af-a9417248-0ebd-461e-a679-3c338a3c56aa-000000@email.amazonses.com>
Message-ID-Hash: BPBTCSCOBO36PVRXQLOQBDNRMEOTLHSU
X-Message-ID-Hash: BPBTCSCOBO36PVRXQLOQBDNRMEOTLHSU
X-MailFrom: jschoenwaelder@constructor.university
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-netmod.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: "netmod@ietf.org" <netmod@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Reply-To: Jürgen Schönwälder <jschoenwaelder@constructor.university>
Subject: [netmod] Re: 2nd WGLC on system-config-08
List-Id: NETMOD WG list <netmod.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/QGU1coVxdsRPeNJ3VauJC08nIH8>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Owner: <mailto:netmod-owner@ietf.org>
List-Post: <mailto:netmod@ietf.org>
List-Subscribe: <mailto:netmod-join@ietf.org>
List-Unsubscribe: <mailto:netmod-leave@ietf.org>

As an old Unix guy, I hardly know any complex service that is not
configured via multiple configuration files, using combinations of
include and merge mechanisms to produce the final configuration the
service uses. The idea of a monolithic <running> gave us conceptual
simplicity. But in practice, it helps to be able to break down large
and complex configurations into meaningful pieces that are merged
together. If you follow this logic, then introducing a well defined
merge mechanism when <running> flows into <intended> makes <system>
fall out as just a special. Introducing a well defined merge mechanism
is a powerful addition to NMDA, a special purpose copy from <system>
to <running> is at best a point solution.

The authors of this draft likely just want a copy to running solution
standardized. But then the draft needs to say that and not something
different. For me, a copy to <running> solution is a missed
opportunity to introduce a mechanism that is much more general and
powerful than just providing a solution for the <system> datastore.

/js

PS: One way to look at templates is that they expand to config
    snippets that are merged with running.

PS: And with a well designed merge operation, one might in the future
    even move towards breaking the monolithic <running> into pieces,
    like we do for the <candidate> singleton today.

On Mon, Aug 19, 2024 at 09:18:46PM +0000, Kent Watsen wrote:
> Hi Jürgen,
> 
> 
> > On Aug 15, 2024, at 11:10 AM, Jürgen Schönwälder <jschoenwaelder@constructor.university> wrote:
> > 
> > On Sun, Aug 11, 2024 at 02:10:31PM +0000, Kent Watsen wrote:
> >> 
> >> This email begins a two-week WGLC on:
> >> 
> >> 	System-defined Configuration
> >> 	https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Fdraft-ietf-netmod-system-config-08&data=05%7C02%7Cjschoenwaelder%40constructor.university%7Ce9f7b1ecbe534b0205b508dcc0948485%7Cf78e973e5c0b4ab8bbd79887c95a8ebd%7C0%7C0%7C638596991341830370%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=DIVpTU%2BLerxzFk6QLUJTWJxohdj%2Bx0MrG3D6sj9Bk04%3D&reserved=0
> > 
> > I have reviewed draft-ietf-netmod-system-config-08 as part of the WG
> > last call. I think the document is _not_ready_, in fact, I find the
> > document inconsistent. There is text (and a figure) suggesting that
> > system config is merged after template expansion into intendedn and
> > then there is text and lots of details how system config is copied
> > into running. These are two very different things and you either do
> > copy to running or merge to intended.
> 
> +1  If everything in <system> is copied into <running>, there is no need to merge.
> 
> That said, not *everything* in <system> is necessarily copied.  It’s more likely everything used-by-running (referenced, overridden, etc.) that is copied.  Other things (e.g., loopback interface) may never be copied to running, yet are still conceptually merged into intended, and later appear in <operational>.
> 
> Earlier today I wrote Jan that the reason why the merge occurs after template-expansion is because we (presumably) know how to merge configuration, but not how to merge templates.   Put aside for a moment the fact that we don’t have a template solution defined, and just accept that it would likely be really hard.  But your comment could be played backwards, to show that the “resolve-system” parameter is similarly intractable.  That is, if <running> contains templates, how would the server know how/where to copy <system> data into <running>?  
> 
> Double-checking why config is being copied from <system> into <running>, I see broadly two reasons:
> 
> 1. For so-called “shared objects”, only because some feel that “running alone must be valid”, solely for the benefit of existing systems that do offline validation.  If it is felt that such change cannot occur without versioning the protocols, then this (NETMOD WG) draft can state that, leaving it to the NETCONF WG to decide when to publish NETCONF 1.2 and RESTCONF 1.1.
> 
> 2. For the ability for clients to a) override system-defined values and b) configure descendent nodes to system-defined nodes.  The question is, though, why is this needed?  Do any existing systems exist where clients can do (a) or (b)?   Is there actually a use-case where this is needed?  Section 5.5.3 gives an example of the MTU on the loopback interface being changed, and 5.5.4 gives an example of a “description” being changed - how do vendors support such things today or, if they don’t, then is it needed now?
> 
> One might claim that there is a hybrid case (mixing 1 and 2 above) where it is desired to change the values of a system-defined shared-object.  But maybe we shouldn’t support that case.  Maybe it should be that either <running> references the <system> object as is, or <running> defines its own version of the system object and references it instead.  No exceptions.
> 
> To your very last point at bottom, I’m in the "<intended> as an overlay of <running> and <system>” camp.  I hope that my reasoning above makes sense to the WG, and this draft is updated accordingly.
> 
> 
> Kent // as a contributor
> 
> 
> > Here are my notes as I was going through the document from the
> > beginning to the end.
> > 
> > * Abstract
> > 
> >  - The abstract is not well written for readers not familiar with the
> >    details of the work nor is it clear to me what the second sentence
> >    tries to convey. Here is an attempt to write a replacement
> >    abstract:
> > 
> >      The Network Management Datastore Architecture (NMDA) defined in
> >      RFC 8342 defines several configuration datastores holding
> >      configuration information. The content of these configuration
> >      datastores is controlled by clients. This document introduces
> >      the concept of a system configuration datastore holding
> >      configuration controlled by the system on which a server is
> >      running.
> > 
> >      This document updates RFC 6241, RFC 8040, RFC 8342, and RFC
> >      8526.
> > 
> > * Introduction
> > 
> >  - I think the second paragraph in the Introduction is
> >    handwaving. The system configuration is clearly exposed and the
> >    structure follows a data model, the new system datastore is no
> >    different. Also the 3rd paragraph is not good, a server allowing a
> >    reference from a configuration datastore to operational state is
> >    broken. I would love to see this text be rewritten to state the
> >    problem instead of avoiding an elaboration of bad implementation
> >    practice as a justification for this work. In other words, the
> >    text should say that YANG does not allow configuration data to
> >    reference operational state data (and why) and then it should
> >    motivate why a system configuration datastore helps to improve the
> >    situation.
> > 
> >  - There is text about a 'resolve-system" parameter in the
> >    Introduction and it remains unclear where and when this parameter
> >    is used. It is a parameter of what?
> > 
> >  - The text "If a system-defined node is referenced ..." seems to be
> >    belong to the terminology section. And the term "system-defined
> >    node" deserves to be defined.
> > 
> >  - Why is the merging of system into intended happening after any
> >    expansions of templates or removal of inactive configuration? Is
> >    there a specific reason for this? Given that we have no
> >    specification for templates or inactive configuration, is it wise
> >    to fix the order? Also note that Figure 1 is misleading since it
> >    kind of indicates a merge _before_ intended. Why do we not have:
> > 
> >      |                              | // configuration transformations,
> >      |                              | // e.g., removal of nodes marked
> >      |                              | // as "inactive", expansion of
> >      +--------------+---------------+ // templates, merge of <system>
> >                     |
> >                     |
> >                     v
> >               +------------+
> >               | <intended> |  // subject to validation
> >               | (ct, ro)   |
> >               +------------+
> > 
> > * Kinds of System Configuration
> > 
> >  - I did search for immediate and conditional and it seems that the
> >    two terms defined here are not really used anywhere. If so, do we
> >    need the definitions? And if we need them, why are these terms not
> >    defined in section 1.1? My preference is to not defined terms we
> >    do not use. In section 4.2, it is stated that <system> MAY change
> >    dynamically and hence all information is conditionally present. I
> >    suggest to simply remove section 2.
> > 
> > * The System Configuration Datastore (<system>)
> > 
> > * Static Characteristics of <system>
> > 
> >  - I am confused about what is being said there. The text says
> >    "should not cause the automatic update of <running> and later it
> >    says "Servers migrate system configuration update in <running>". I
> >    think this needs more clarity.
> > 
> >    I think you _never_ update running. According to your model,
> >    <system> feeds into intended. Perhaps you meant <intended>? If so,
> >    I believe there are situations where updating intended is the
> >    right thing to do (say the system's default MTU of an interface
> >    changes and there is no explicit configuration for it) but there
> >    may also be situations where indeed the upgrade may need to be
> >    rejected (e.g., if an update would rename all interfaces, which
> >    may cause for example firewall rules to mean something very
> >    different). I believe we should have better and contradiction free
> >    text here. And the text should not talk about updating <running>.
> > 
> >    I am also not sure why this is a static characteristic. Perhaps
> >    the important portions from section 4 can be merged into section 5
> >    and we get rid of the distinction of static characteristics and
> >    dynamic characteristics?
> > 
> > * Dynamic Behaviors
> > 
> >  - The phrase "Clients MAY reference nodes defined in <system>" is
> >    misleading. A client does not reference nodes, a client controls
> >    configuration in other datastores and these datastores may contain
> >    nodes referencing nodes defined in <system>. In other words, it
> >    should be "Clients may create configuration data nodes that
> >    reference nodes in the <system> datastore ..." or something like
> >    that. I also believe the MAY should just be a may.
> > 
> >  - Again, what is the reason for merging after template expansion and
> >    removal of inactive? Why is this a MUST? What breaks if I would
> >    say remove inactive after the merge? There may be valid reasons
> >    and it may be useful to capture them.
> > 
> >  - "Whenever configuration in <system> changes, the server MUST also
> >    immediately update and validate <intended>."
> > 
> >  - See above on comments on the figure. I would go even further and
> >    draw it as follows:
> > 
> > +-----------+         |        +-----------+         |
> > | <system>  |         +------->| <running> |<--------+
> > | (ct, ro)  |                  | (ct, rw)  |
> > +-----------+                  +-----------+
> >      |                              | 
> >      |              +---------------+
> >      |              |
> >      |              |  // configuration transformations, e.g., removal
> >      +--------------+  // of nodes marked as "inactive", expansion of
> >                     |  // templates, merge of <system>
> >                     v
> >               +------------+
> >               | <intended> |  // subject to validation
> >               | (ct, ro)   |
> >               +------------+
> > 
> >  - What does it mean to "remove a copied system node from <running>"?
> >    Copied by whom? Perhaps you mean 'delete and overriding system
> >    node'?
> > 
> >  - I did not understand what section 5.1.1 tries to tell me. And it
> >    talks about "configuration copied from <system> into <running>",
> >    copied by whom? And why would something in <running> be tagged
> >    origin intended? My attempts to make sense out of 5.1.1 cause
> >    disagreement with it.
> > 
> >  - I think section 5.2 is not well worded. It talks about 'declaring
> >    system configuration' but I think we never really used the phrase
> >    to 'declare configuration'. It seems that all the text in 5.2
> >    essentially reduces to this statement:
> > 
> >      Clients can create configuration data nodes in <running> that
> >      match configuration data nodes in <system>. This can be
> >      necessary, for example, when the client does not support the
> >      "resolve-system" parameter (Section 5.3) in order to validate
> >      <intended>.
> > 
> >   - Section 5.3 says that "resolve-system" causes the server to copy
> >     the entire referenced system configuration, including all
> >     descendants into the target datastore (e.g., <candidate> and
> >     <running>). Really? This is not what I got from the text I read
> >     before and this is clearly not what the figures suggest. I also
> >     find this "copy into running" problematic; I thought so that that
> >     "resolve-system" causes <system> content to be merged into
> >     <intended>?
> > 
> >   - Why is it desirable to allow implementations supporting
> >     "resolve-system" but not exposing <system>? This means that there
> >     is no way to predict for a client what an edit-config will
> >     resolve to. Is this desirable?
> > 
> >   - Section 5.4 tells me that some data in <running> may override
> >     what is in <system> but some other data may not since system
> >     config is immutable. "The immutability of system configuration is
> >     defined in [I-D.ietf-netmod-immutable-flag]." So does this create
> >     a dependency on the immutability flag? Or is the idea that
> >     clients do not require to know this since they can go into trial
> >     and error mode? Is this desirable? And if so, what exactly is the
> >     error information to expect and react on?
> > 
> >   - Is <system> configuration that is not referenced directly or
> >     indirectly from <running> applied (or merged into intended) or
> >     not? In the example, are all applications visible if there is a
> >     "resolve-system"? Or only if they are referenced by explicit
> >     config?
> > 
> >   - In the example, can a explicit client express that a specific
> >     application should not make it into <intended>? Something like
> >     resolve-system but if that brings in xyz, please fail the
> >     edit-config since xyz is a no-go. And can I request that a
> >     specific application should make it into running without having
> >     to write an ACL or something else that references the
> >     application? If I do not know what system provides, do I go trial
> >     and error to explore what the list of system preconfigured
> >     applications is?
> > 
> > * Default Interactions
> > 
> >  - There are "should not" and MUST rules here but there is no
> >    explanation why these rules are important. What breaks if a
> >    <system> data node happens to have the schema's default value?
> > 
> > * The "ietf-system-datastore" Module
> > 
> >  - Why is section 8.2 here? I expected to see the definition of the
> >    "ietf-system-datastore" module and not a 2+ pages example of
> >    something relatively unrelated. Perhaps move all lengthy examples
> >    into the appendix and use only short concise examples where really
> >    needed in the body of the specification. In this section, I do not
> >    see the need for an example at all.
> > 
> > * Security Considerations
> > 
> >  - While you follow the default security templates, I wonder whether
> >    more needs to be said here. If I have a server supporting
> >    resolve-system but not implementing <system>, I have no way to
> >    predict how the result produced by an edit-config will look
> >    like. So I have to fully trust that resolve-system does not lead
> >    to a modification of running that may have undesirable
> >    consequences.
> > 
> > I did not go to rewrite my review now that I see that I fundamentally
> > misunderstood the proposed solution at the beginning. The reason is
> > that Figure 1 is misleading. The reason is that you seem to have
> > two models and neither is captured in Figure 1:
> > 
> > +-----------+         |        +-----------+         |
> > | <system>  |         +------->| <running> |<--------+
> > | (ct, ro)  |                  | (ct, rw)  |
> > +-----------+                  +-----------+
> >                                     | // configuration transformations,
> >                                     | // e.g., removal of nodes marked
> >                                     | // as "inactive", expansion of
> >                     +---------------+ // templates
> >                     |
> >                     |
> >                     v
> >               +------------+
> >               | <intended> |  // subject to validation
> >               | (ct, ro)   |
> >               +------------+
> > 
> > This is the situation where the client learns about system config by
> > reading <system> and then explicitly copying over into <running>
> > whatever is needed. The other model is really a partial copy of
> > <system> into <running> via the "resolve-system" magic:
> > 
> > +-----------+         |        +-----------+         |
> > | <system>  |         +------->| <running> |<--------+
> > | (ct, ro)  |----------------->| (ct, rw)  |
> > +-----------+                  +-----------+
> > 	                             | // configuration transformations,
> > 	                             | // e.g., removal of nodes marked
> >                                     | // as "inactive", expansion of
> >       	       	     +---------------+ // templates
> >                     |
> >                     |
> >                     v
> > 	       +------------+
> > 	       | <intended> |  // subject to validation
> > 	       | (ct, ro)   |
> > 	       +------------+
> > 
> > But then I do find statements like this:
> > 
> >   Configuration in <running> is merged with <system> to create the
> >   contents of <intended> after the configuration transformations to
> >   <running> (e.g., template expansion, removal of inactive
> >   configuration defined in [RFC8342]) have been performed
> >   (Section 5.1).
> > 
> > This apparently is conflict with what is really proposed. So my
> > confusion is very well justified I think. There is a very big
> > difference between thinking of <intended> as an overlay of <running>
> > and <system> or thinking of <system> as something that either explicit
> > or implicitly is copied into <running> as a side-effect of certain
> > operations.
> > 
> > /js
> > 
> > -- 
> > Jürgen Schönwälder              Constructor University Bremen gGmbH
> > Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
> > 
> > _______________________________________________
> > netmod mailing list -- netmod@ietf.org
> > To unsubscribe send an email to netmod-leave@ietf.org
> 

-- 
Jürgen Schönwälder              Constructor University Bremen gGmbH
Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany