Re: [netmod] AD review: draft-ietf-netmod-revised-datastores-08

Robert Wilton <rwilton@cisco.com> Tue, 09 January 2018 15:34 UTC

Return-Path: <rwilton@cisco.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3525112D868 for <netmod@ietfa.amsl.com>; Tue, 9 Jan 2018 07:34:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.51
X-Spam-Level:
X-Spam-Status: No, score=-14.51 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jITbINMxUo7x for <netmod@ietfa.amsl.com>; Tue, 9 Jan 2018 07:33:59 -0800 (PST)
Received: from aer-iport-3.cisco.com (aer-iport-3.cisco.com [173.38.203.53]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 39B531205F0 for <netmod@ietf.org>; Tue, 9 Jan 2018 07:33:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=21421; q=dns/txt; s=iport; t=1515512038; x=1516721638; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=7v0tywUFa3Rf7WKlteIXlOzEVG/nxjvOjsQ8YzqPi6s=; b=HQtYAjHDpejEerKx/vi8I6yF17oIqeW4XM08ZPlsPj7VZL1sVvbXRpRZ zkpBKuVDgHSoo43NJFovcsUvURs6vk85KJ0BtcQ+oi88UblaKXCoPfu/a E4HKqdPUriQdlMm+Af1Eyr8HENw9o/16oLqSdXES6pfA5Y3JeMa3v9ere s=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0B0AQCX4FRa/xbLJq1aAxkBAQEBAQEBAQEBAQEHAQEBAQGEJnQnhAeLGI9ql0KCAQoYC4RJTwKEfRQBAQEBAQEBAQFrKIUkAQEEAQEhDwEFNgsOAgkCDgICBgICIwMCAhsMHwMOBg0GAgEBF4oWEJEmnW6CJ4pBAQEBAQEBAQEBAQEBAQEBAQEBAQEBHQWBCoMRfYJvgWkpgXeBDoMvAYFHDwI3JoJQgmUFmVmKBot/iUKCF4oIJoE6hgqKYYQxiAeBPDYigVAyGggbFT2CKgmCSxyBZ0E3iF8CJQeCHQEBAQ
X-IronPort-AV: E=Sophos;i="5.46,336,1511827200"; d="scan'208";a="1310142"
Received: from aer-iport-nat.cisco.com (HELO aer-core-4.cisco.com) ([173.38.203.22]) by aer-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Jan 2018 15:33:35 +0000
Received: from [10.63.23.84] (dhcp-ensft1-uk-vla370-10-63-23-84.cisco.com [10.63.23.84]) by aer-core-4.cisco.com (8.14.5/8.14.5) with ESMTP id w09FXZj6017667; Tue, 9 Jan 2018 15:33:35 GMT
To: Martin Bjorklund <mbj@tail-f.com>
Cc: andy@yumaworks.com, netmod@ietf.org
References: <cf27d398-1883-c1ce-a54a-4644bac8a1dc@cisco.com> <CABCOCHQCv8ih9uKFxmews_=3c_rX6fSAA=L8vtW91k-pMSHOEg@mail.gmail.com> <d2f8abd1-56fb-93b0-da3c-37cf16d2d4db@cisco.com> <20180109.122807.1121028038684414186.mbj@tail-f.com>
From: Robert Wilton <rwilton@cisco.com>
Message-ID: <b9aca498-d056-7f50-b098-70b765f47cf9@cisco.com>
Date: Tue, 09 Jan 2018 15:33:35 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <20180109.122807.1121028038684414186.mbj@tail-f.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/ZX81H65JCD9G3oe7yLq6a7Mo6Ag>
Subject: Re: [netmod] AD review: draft-ietf-netmod-revised-datastores-08
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jan 2018 15:34:01 -0000


On 09/01/2018 11:28, Martin Bjorklund wrote:
> Robert Wilton <rwilton@cisco.com> wrote:
>> Hi Andy,
>>
>>
>> On 08/01/2018 19:45, Andy Bierman wrote:
>>>
>>> On Mon, Jan 8, 2018 at 5:55 AM, Robert Wilton <rwilton@cisco.com
>>> <mailto:rwilton@cisco.com>> wrote:
>>>
>>>      Hi Andy,
>>>
>>>      Regarding your comment below, this intent is captured by this text
>>>      describing the operational datastore in section 5.3:
>>>
>>>          <operational> SHOULD conform to any constraints specified in the
>>>          data
>>>          model, but given the principal aim of returning "in use" values, it
>>>          is possible that constraints MAY be violated under some
>>>          circumstances, e.g., an abnormal value is "in use", the structure of
>>>          a list is being modified, or due to remnant configuration (see
>>>          Section 5.3.1).  Note, that deviations SHOULD be used when it is
>>>          known in advance that a device does not fully conform to the
>>>          <operational> schema.
>>>
>>>          Only semantic constraints MAY be violated, these are the YANG
>>>          "when",
>>>          "must", "mandatory", "unique", "min-elements", and "max-elements"
>>>          statements; and the uniqueness of key values.
>>>
>>>          Syntactic constraints MUST NOT be violated, including hierarchical
>>>          organization, identifiers, and type-based constraints.  If a node in
>>>          <operational> does not meet the syntactic constraints then it MUST
>>>          NOT be returned, and some other mechanism should be used to flag the
>>>          error.
>>>
>>>
>>>      Do you agree that this is sufficient?
>>>
>>>
>>>
>>> Not really.
>>> It does not address my concern, which is that NMDA is
>>> removing the YANG constraints on config=false data nodes
>>> for no apparent reason.
>> There is a reason. I don't think that the constraints on config=false
>> is really being removed, because I don't think that they truly existed
>> in the first place (despite what RFC 7950 might indicate!).
> I agree.  But note that RFC 7950 says:
>
>     o  If the constraint is defined on state data, it MUST be true in a
>        valid state data tree.
>
> It is not defined anywhere that <get> must return a "valid state data
> tree".
>
> In reality, I suspect that all implementations of <get> call various
> instrumentation call back functions in some order, possibly in
> parallell, which means that data will be collected at different times
> from the backend systems.  I don't think it is feasible to freeze the
> operational state of a device, collect all data, and unfreeze, in
> order to get a consistent snapshot of the operational state.
I agree.

It is not even just that the management agent may be reading the data 
from the backend systems at different times, the operational state in 
those backend systems may not have converged at the point that it is 
being read (e.g. a route that has been installed in the FIB on some 
linecards, but not all).

I think that the operational state is probably best considered as being 
eventually consistent.  I.e. if the device stops receiving further 
updates (in config or state), and if no abnormal conditions have 
occurred, then the operational state of the system must end up 
conforming to the schema for <operational>, and <operational> would be a 
valid data state data tree.

Thanks,
Rob


>
>
> /martin
>
>> I think that we all agree on the expected behavior for configuration:
>> If a client sends configuration to a server that would cause <running>
>> to become invalid then the server should reject that change, to ensure
>> that <running> always holds a consistent configuration.  Having a
>> consistent configuration is the most important property here.
>> I.e. the server has the right to reject an invalid configuration
>> request from a client.
>>
>> However, the flow of operational state data in opposite direction
>> cannot hold to the same rules.  If during the processing of a get
>> request (or YANG push) a server sends operational state data back to a
>> client then a client has to choose how to process the message:
>>   - if the message is garbled or not sane then it makes sense to
>> discard it.
>>   - however, what should the client do if the message is well formed
>> but either (i) contains some values outside the permitted schema range
>> (but can be represented by the schema datatype), or (ii) by applying
>> the values would cause the clients copy of <operational> to become
>> invalid?
>>
>> If the client discards the message because of one bad value, then that
>> doesn't seem to be helpful, since it allows for a very fragile model
>> of system management.  I.e. if one small thing is bad then the whole
>> house of cards collapses.
>>
>> So I think that the only sensible behaviour here is that the client
>> has to process the operational state update in a best effort fashion,
>> keep all the good data and probably flag any values that are outside
>> the value constraints.  Similarly any reference constraint failures
>> (i.e. when/must) can similarly be flagged up, but throwing away an
>> update message that would cause the operational state to become
>> inconsistent doesn't seem to be helpful.  I.e. it is much better if
>> the client gets to see the true state of the server, even if that
>> state isn't good (or consistent).
>>
>> Similar questions arise on the server itself:
>>   - what if the real value in use (e.g. that is read from the hardware)
>> is outside the permitted range (because of a logic defect)?  Is it
>> really better to suppress that value entirely or return a value that
>> server knows to be wrong?
>>   - can a server even know that its operational view is consistent? For
>> complex systems where the real operational state is split across
>> multiple underlying linecards, or remote devices, I think that this is
>> very hard (if not impossible) to do.
>>
>> So what the NMDA architecture states is:
>>   (i) if a server knows that it won't conform to the operational schema
>> then it must use deviations,
>>   (ii) a server in a normal steady state should conform to the
>> operational schema (and be valid),
>>   (iii) but, if the system is churning (e.g. configuration, route
>> update, etc) then the operational state of the server might be
>> transiently inconsistent and this is OK,
>>   (iv) if, the server is in a bad state, then it is better to return
>> the actual state than to lie or not report a particular value (as long
>> as it can be encoded).
>>   (v) a server does not need to explicitly validate that its view of
>> operational is valid. It is unclear what it would/could do if it
>> detected that the operational state is invalid, nor is it clear that
>> servers would generally be able to always perform this operation.
>>
>>> The server implementation requirements expressed in YANG constraints
>>> are applicable to any data node, not just config=true data nodes.
>>> The requirement to implement the ancestor nodes (with keys) does not
>>> change.
>> The draft does not allow this to be violated.  I.e. the following
>> statement prevents this: "Syntactic constraints MUST NOT be violated,
>> including hierarchical organization".
>>
>>
>>> The requirement to conform to the YANG constraints defined within
>>> config=false
>>> data nodes does not change.
>>>
>>> To do otherwise does not make sense.  E.g. "when" conditions that add
>>> ethernet
>>> counters only when the interface type is ethernetCsmacd. Why would it
>>> be OK for
>>> the server to ignore that when-stmt and add ethernet counters to every
>>> interface?
>> It is not OK for a server to ignore that and add Ethernet counters to
>> every interface (without using a deviation).  The draft is not trying
>> to allow that.
>>
>> But if an interface could change type (e.g. between Ethernet and ATM
>> via a different optics module being inserted) then it would be allowed
>> for a server to transiently report the ethernet counters on the
>> interface whilst it is in the process of changing the interface type
>> from ethernet to ATM (e.g. if the counters are maintained by a
>> separate daemon that is updated asynchronously with respect to the
>> config or optics change).  Once the change had completed, the the
>> system reaches steady state then the Ethernet counter must no longer
>> be reported.
>>
>> Thanks,
>> Rob
>>
>>
>>> IMO the text above can only apply to the operational values of
>>> config=true nodes.
>>>
>>>
>>>      Thanks,
>>>      Rob
>>>
>>>
>>>
>>> Andy
>>>
>>>
>>>
>>>      On 21/12/2017 22:49, Andy Bierman wrote:
>>>>      Hi,
>>>>
>>>>      It should be clear somehow that server requirements to provide
>>>>      config=false data
>>>>      that is valid according to the YANG definitions is not affected
>>>>      by NMDA.
>>>>      That is not being taken away.  The ability to validate
>>>>      operational values
>>>>      of configuration data has never been provided, and therefore is
>>>>      not being taken away either.
>>>>
>>>>      A constraint on config=true nodes only applies to configuration
>>>>      datastores.
>>>>      These are the only constraints that should be ignored in
>>>>      <operational>.
>>>>      Constraints on config=false nodes still apply in <operational>.
>>>>
>>>>
>>>>      Andy
>>>>
>>>>
>>>>
>>>>      On Thu, Dec 21, 2017 at 2:27 PM, Juergen Schoenwaelder
>>>>      <j.schoenwaelder@jacobs-university.de
>>>>      <mailto:j.schoenwaelder@jacobs-university.de>> wrote:
>>>>
>>>>          On Thu, Dec 21, 2017 at 07:52:54PM +0100, Vladimir Vassilev
>>>>          wrote:
>>>>          > On 12/21/2017 02:20 PM, Juergen Schoenwaelder wrote:
>>>>          >
>>>>          > > On Thu, Dec 21, 2017 at 02:03:45PM +0100, Vladimir
>>>>          Vassilev wrote:
>>>>          > > > On 12/21/2017 11:34 AM, Robert Wilton wrote:
>>>>          > > >
>>>>          > > > > Hi Vladimir,
>>>>          > > > >
>>>>          > > > > First point of clarification is that this is not
>>>>          about running/intended
>>>>          > > > > at all.  The contents of running/intended do not
>>>>          change in anyway
>>>>          > > > > depending on whether hardware is present or absent.
>>>>          > > > >
>>>>          > > > > The section is only concerned with how the
>>>>          configuration is applied in
>>>>          > > > > operational, and basically says that you cannot apply
>>>>          configuration for
>>>>          > > > > resources that are missing (which seems reasonable).
>>>>          E.g. I cannot
>>>>          > > > > configure an IP address on a physical interface that
>>>>          isn't there.  Or if
>>>>          > > > > the physical interface gets removed then the
>>>>          configuration associated
>>>>          > > > > with that interface is also removed from operational.
>>>>          > > > >
>>>>          > > > > Operational isn't validated and data model
>>>>          constraints are allowed to be
>>>>          > > > > broken (ideally transiently).
>>>>          > > > I want to focus on this. IMO giving up schema validitiy
>>>>          for any datastore is
>>>>          > > > unacceptable price. Pre-NMDA devices had full model
>>>>          support in operational
>>>>          > > > data (all YANG constrains part of the model without
>>>>          discrimination were
>>>>          > > > enforced).
>>>>          > > There was a long debate about the value of returning the true
>>>>          > > operational state. What do you do if the operational
>>>>          state is invalid?
>>>>          > > A server can reject configuration changes if they lead to
>>>>          invalid
>>>>          > > state, a server can not reject reality.
>>>>          > IMO if the model can represent reality then data conforming
>>>>          to the model
>>>>          > can. If not a better model is needed not a hack that breaks
>>>>          the datastore
>>>>          > conformance to the YANG model. I do not see how
>>>>          > /interfaces/interface/oper-status=not-present was not
>>>>          representing the
>>>>          > reality of a system with removed line card that is
>>>>          configured and ready to
>>>>          > resume operation as soon as the line card is reconnected.
>>>>
>>>>          I assume this is all system and implementation specific. If your
>>>>          system knows about interfaces that are not present (i.e.,
>>>>          there is
>>>>          operational state about them), you can report these
>>>>          interfaces.  But
>>>>          'is configured' is confusing here. I am not sure a line card
>>>>          that does
>>>>          not exist should be considered configured. But yes, this may
>>>>          be system
>>>>          specific. Anyway, draft-ietf-netmod-rfc7223bis-01.txt still has
>>>>          oper-status 'not-present' - so this seems to be a mood point.
>>>>
>>>>          > > > If this is about to change it will compromise
>>>>          interoperability
>>>>          > > > and a significant portion of the client implementation
>>>>          workload that can be
>>>>          > > > automated will need to be coded in hand and tested.
>>>>          Unresolved leafrefs,
>>>>          > > > undefined behaviour of different implementations
>>>>          removing different
>>>>          > > > configuration nodes in violation of YANG semantic
>>>>          constraints (which I do
>>>>          > > > not think can be so clearly separated from the
>>>>          syntactic constraints when
>>>>          > > > one considers types like leafref, instance-identifier
>>>>          etc.) and the
>>>>          > > > corresponding side effects based on the server
>>>>          implementators own creativity
>>>>          > > > is eventually going to create more problems.
>>>>          > > >
>>>>          > > > 1. IMO the only acceptable solution is to have YANG
>>>>          valid operational
>>>>          > > > datastore at all times. operational like any other
>>>>          datastore MUST be valid
>>>>          > > > YANG data tree and it has to be a system implementation
>>>>          task to consider all
>>>>          > > > complications resulting from the removal of the
>>>>          resources leading to any
>>>>          > > > data transformations. If this is difficult or
>>>>          impossible other mechanisms to
>>>>          > > > flag missing resources should be used (e.g.
>>>>          > > > /interfaces/interface/oper-status=not-present) This
>>>>          sounds like a useful
>>>>          > > > contract providing the value of a standard the
>>>>          alternative does not.
>>>>          > > As said above, it is impossible to report valid
>>>>          operational state if
>>>>          > > the operational state is not valid according to the models.
>>>>          > >
>>>>          > > > 2. Even with the change in 1. I do not see the removal
>>>>          of intended
>>>>          > > > configuration nodes from operational as a solution
>>>>          worth implementing on our
>>>>          > > > servers. I do not see a real world plug-and-play
>>>>          scenario that can be
>>>>          > > > automatically solved without specific additions to the
>>>>          models e.g.
>>>>          > > > /interfaces/interface/oper-status=not-present is
>>>>          oversimplified solution but
>>>>          > > > it needs to be extended exactly as much as the solution
>>>>          provided by the
>>>>          > > > removal of config true; nodes without the sacrifice of
>>>>          YANG validity of
>>>>          > > > operational.
>>>>          > > Your thinking is likely wrong. <operational> reports the
>>>>          operational
>>>>          > > state. It may have little in common with <intended>.
>>>>          Trying to derive
>>>>          > > operational from intended is likely a not well working
>>>>          approach.
>>>>          > The proposal for this solution ("derive operational from
>>>>          intended" e.g.
>>>>          > merge /interfaces-state in /interfaces) comes from the
>>>>          revised datastores
>>>>          > draft not me.
>>>>          >
>>>>          > By definition config true; data represents intent. Reusing
>>>>          the model of a
>>>>          > config true; data to represent state absent of intent (e.g.
>>>>          > /interfaces/interface with origin="or:system") is a hack.
>>>>          The hack works
>>>>          > fine without compromising the conformance of operational to
>>>>          the YANG model
>>>>          > as long as certain conditions are met. I am pointing out
>>>>          that one of the
>>>>          > conditions is to keep all of the intended configuration
>>>>          data present in
>>>>          > 'operational' and handle missing resources with
>>>>          conventional means e.g.
>>>>          > /interfaces/interface/oper-status=not-present instead of
>>>>          adding the straw
>>>>          > that breaks the camel's back.
>>>>
>>>>          I fail to see why you believe all objects that appear in intended
>>>>          configuration needs to exist in applied configuration. In fact,
>>>>          operators told us very clearly that they care about the
>>>>          distinction
>>>>          between intended and applied config.
>>>>
>>>>          > > > 3. Solutions like /interfaces/interface/admin-state
>>>>          stop working. With the
>>>>          > > > interface removed you can no longer figure if the
>>>>          if-mib has or does not
>>>>          > > > have the interface enabled so an operator has to use
>>>>          SNMP or wait for a
>>>>          > > > replacement line card to be connected to figure this
>>>>          bit of information.
>>>>          > > At least on my boxes, if I remove a line card, the
>>>>          interface also
>>>>          > > disappears in SNMP tables. Stuff that is operationally
>>>>          not present is
>>>>          > > simply operationally not present.
>>>>          > >
>>>>          > > > My
>>>>          > > > interpretation of the MAY as requirement level in sec.
>>>>          5.3. The Operational
>>>>          > > > State Datastore (<operational>) is that plug-and-play
>>>>          solutions can be
>>>>          > > > implemented without this limited approach that has the
>>>>          same problem as the
>>>>          > > > pre-NMDA only now we have to have /interfaces-state to
>>>>          keep config false;
>>>>          > > > data relevant to hardware that is configured but not
>>>>          present:
>>>>          > > >
>>>>          > > >     configuration data nodes supported in a
>>>>          configuration datastore
>>>>          > > >     MAY be omitted from <operational> if a server is
>>>>          not able to
>>>>          > > >     accurately report them.
>>>>          > > >
>>>>          > > > I realize this discussion comes late. I have stated my
>>>>          objections to this
>>>>          > > > particular part of the NMDA draft earlier.
>>>>          > > I believe there is a conceptual misunderstanding. I think
>>>>          there never
>>>>          > > was a requirement that a server reports the state of
>>>>          hardware that is
>>>>          > > not present.
>>>>          > "Data relevant to hardware that is configured but not
>>>>          present" is different
>>>>          > from "state of hardware that is not present". For example
>>>>          information
>>>>          > indicating when the line card became unavailable, what was
>>>>          the reason, or
>>>>          > other information like how many packets that had this
>>>>          interface as egress
>>>>          > destination are being dropped as a result of the removal.
>>>>
>>>>          I think that systems handle non-existing interfaces
>>>>          differently. It
>>>>          seems that ietf-interfaces is flexible enough to accomodate the
>>>>          differnet styles.
>>>>
>>>>          /js
>>>>
>>>>          --
>>>>          Juergen Schoenwaelder           Jacobs University Bremen gGmbH
>>>>          Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen
>>>>          | Germany
>>>>          Fax:   +49 421 200 3103
>>>>           <http://www.jacobs-university.de/
>>>>          <http://www.jacobs-university.de/>>
>>>>
>>>>          _______________________________________________
>>>>          netmod mailing list
>>>>          netmod@ietf.org <mailto:netmod@ietf.org>
>>>>          https://www.ietf.org/mailman/listinfo/netmod
>>>>          <https://www.ietf.org/mailman/listinfo/netmod>
>>>>
>>>>
>>>>
>>>>
>>>>      _______________________________________________
>>>>      netmod mailing list
>>>>      netmod@ietf.org <mailto:netmod@ietf.org>
>>>>      https://www.ietf.org/mailman/listinfo/netmod
>>>>      <https://www.ietf.org/mailman/listinfo/netmod>
>>>
> .
>