[Gen-art] Gen-ART IETF Last Call review of draft-ietf-netmod-rfc6020bis-12

worley@ariadne.com (Dale R. Worley) Fri, 13 May 2016 00:36 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A3C4C12B01E for <gen-art@ietfa.amsl.com>; Thu, 12 May 2016 17:36:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.934
X-Spam-Level:
X-Spam-Status: No, score=-1.934 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcast.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 69b0fIbcZJSJ for <gen-art@ietfa.amsl.com>; Thu, 12 May 2016 17:36:04 -0700 (PDT)
Received: from resqmta-ch2-06v.sys.comcast.net (resqmta-ch2-06v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E042412B01B for <gen-art@ietf.org>; Thu, 12 May 2016 17:36:03 -0700 (PDT)
Received: from resomta-ch2-20v.sys.comcast.net ([69.252.207.116]) by comcast with SMTP id 114Gbq3uD2R4w115Gbcozr; Fri, 13 May 2016 00:36:02 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1463099762; bh=C4VN7gy6WjMOPhi65FQK+dKkY5m4yltEVFI1x5yYlzs=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=KnIskx9sIvEYJms0wJgbDkL7+Aqp+EyyFZgyUQH1u5BOkCsYukU92ikBrZ7RMhVIo KRX+M3SMZsNz9UwUkBLjwi9CmmhKPn7z6j3F6AzjOJm8KN4cBRuxpYvLj/5F15hmtg 7E8VoYUapHj6KuinzGeK3+rlUw0xpWV36XNv1Rl+6RbQ69J+jMfNUmzxWQyPKWXl4D Yp5Vmb+rnUho3cIeyOmgbglyxOfI+o4N3gCSjUPXVvkuLB9V3lASF3OWYKwS93O6P3 YrPFdZwpTmVNTdzW8S/NalH6oD9JLDb9vQ26uWo4H2aToprMiAVUSKDZc9TV3nuXai geDxtiYnf41pw==
Received: from hobgoblin.ariadne.com ([73.143.237.82]) by resomta-ch2-20v.sys.comcast.net with comcast id tcc11s00H1nMCLR01cc14S; Fri, 13 May 2016 00:36:02 +0000
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id u4D0Zxti032087; Thu, 12 May 2016 20:35:59 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id u4D0ZxcM032082; Thu, 12 May 2016 20:35:59 -0400
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: gen-art@ietf.org, draft-ietf-netmod-rfc6020bis.all@ietf.org, ietf@ietf.org
Sender: worley@ariadne.com
Date: Thu, 12 May 2016 20:35:59 -0400
Message-ID: <878tzerdzk.fsf@hobgoblin.ariadne.com>
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/FLcIj0ewD8nOo3ZbrxAK76BGG2w>
Subject: [Gen-art] Gen-ART IETF Last Call review of draft-ietf-netmod-rfc6020bis-12
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2016 00:36:07 -0000

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Document: draft-ietf-netmod-rfc6020bis-12
Reviewer: Dale R. Worley
Review Date: 2016-05-12
IETF LC End Date: 2016-05-12
IESG Telechat date: 2016-05-19

Summary: This draft is basically ready for publication, but has nits
that should be fixed before publication.  I have a large number of
editorial comments.  A few of these comments address technical
uncertainties, but I strongly suspect that the technical issues have
long since been fixed, as this is a revision of Yang 1, and the
only needed work is clarifying the text.

- Abstract

This Abstract does not list what the document does.  E.g., define Yang
1.1.  Also might help to mention upward-incompatibility.  Basically,
go through section 1 to see what should be in Abstract.  Perhaps:

   YANG is a data modeling language originally designed to model
   configuration and state data manipulated by the Network
   Configuration Protocol (NETCONF), NETCONF remote procedure calls,
   and NETCONF notifications.  This document describes the syntax and
   semantics of version 1.1 of the YANG language.  YANG version 1.1 is
   a maintenance release of the YANG language, addressing ambiguities
   and defects in the original specification.  There are a small
   number of upward-incompatibilities from Yang 1.  This document also
   describes how a data model defined in a YANG module is encoded in
   the Extensible Markup Language (XML), and how NETCONF operations
   are used to manipulate the data.

- section 1.1

The incompatibilities should be marked whether the old, incompatible
usage always causes an error "at compile time", "at run time", or
changed behavior.

A number of later items seem to be upward-incompatibilities but not
marked as such:

   o  Made the "yang-version" statement mandatory.

   o  Made noncharacters illegal in the built-in type "string".

- section 3

   o  identifier: Used to identify different kinds of YANG items by
      name.

This item, unlike the others, does not start with a noun phrase.
Perhaps, "A string used to ..."?

   o  leaf: A data node that exists in at most one instance in the data
      tree.  A leaf has a value but no child nodes.

I'm not sure that this is correct; a leaf schema node inside a list
schema node can have many instances in a data tree.  The real
criterion is that it has a value but no child nodes.

   o  leaf-list: Like the leaf node but defines a set of uniquely
      identifiable nodes rather than a single node.  Each node has a
      value but no child nodes.

Better to say a "sequence of nodes".  Without the concept that the
nodes are a sequence (and can be identified by location within the
sequence), there's no assurance that the nodes are uniquely
identifiable (since they can have duplicate values in state data).

   o  mandatory node: A mandatory node is one of:
      ...
      *  A container node without a "presence" statement, which has at
         least one mandatory node as a child.

Perhaps replace the comma with "and"?

It would be useful to insert a note somewhere that all data is either
"configuration" or "state" data.  It's hard to learn that now, because
the terms "configuration data" and "state data" are defined only by
reference to RFC 6241.

There are general issues with the terms "namespace" and "prefix".

"namespace" is used to identify both the namespaces of identifiers in
Yang source (6.2.1) as well as XML namespaces in the XML encodings of
data trees.  It would make things clearer if all uses that refer to
XML namespaces used the form "XML namespace".

"prefix" is used to name both identifier prefixes in Yang source code
and XMLNS prefixes.  Worse, the prefix statement is used to declare
one string, which is then both the prefix used within the Yang source
code and also the preferred (but not mandatory!) XMLNS prefix used to
refer to the associated XML namespace.  It would help if "XMLNS
prefix" was used whenever talking specifically about prefixes used in
XML.  See also comments on section 7.1.4.

- section 3.1

There is an overall question regarding whitespace in XML in
"non-significant" places.  Is it allowed in the XML representation of
data trees?  And if so, precisely what whitespace is allowed?  Also,
to what degree is the whitespace shown in examples what is allowed on
the wire, and to what degree is it there just to make the example
easier to read?  (It's possible that XML has implemented some sort of
global solution for the issue of non-significant whitespace, but back
when I was using it regularly, there was none.)

- section 4.1

   YANG structures data models into modules and submodules.  A module
   can import data from other external modules, and include data from
   submodules.

"data" has other meanings.  Perhaps change to "elements of models" or
"definitions"?

   The instantiation of these groupings can refine or augment the
   nodes, allowing it to tailor the nodes to its particular needs.

"instantiate" is used with two meanings, the more common one being
the relationship between nodes and data tree elements, and the
insertion of a grouping into a node tree.  E.g., see the definitions
of "data tree" and "uses" in section 3.  It would make life easier if
there was a different word for the second meaning -- perhaps "use"?
In any case, this particular sentence has no context to disambiguate
the two meanings.

   The conversion from YANG to
   YIN is lossless, so content in YIN can be round-tripped back into
   YANG.

This is almost certainly not true.  But there is no loss of Yang
semantics, or something like that, and that could be stated.

   YANG strikes a balance between high-level data modeling and low-level
   bits-on-the-wire encoding.  The reader of a YANG module can see the
   high-level view of the data model while understanding how the data
   will be encoded on-the-wire.

Is that true in cases where the NETCONF XML wire encoding is not used?
Or is that encoding mandatory in some sense, even if the environment
is not NETCONF?  If Yang is intended to describe data which will
*always* be expressed in XML in a certain way, that fact should be
introduced earlier in the document.

- section 4.2.1

Section should start with a definition of "module" and what modules
are used for.  The current text is a list of details about "module"
without telling what the *point* of a module is.

   A module may be divided into submodules, based on the needs of the
   module owner.

There is a global problem that the reader needs to know the
relationships between "module" and "submodule", but those are not
stated explicitly anywhere.  (This is probably a candidate for
inclusion in section 3.)  AFAICT:

    A module
    - is defined by a module statement in its own file
    - defines a group of data trees, etc. that can be used by NETCONF
    - "includes" the submodules that "belong to" it

    A submodule
    - is defined by a submodule statement in its own file
    - defines things that can be used in the module statement
    - "belongs to" a single module, which is specified in the "belongs-to"
      substatement of the submodule statement
    - does NOT have submodules of its own

"module owner" is not defined in section 3.  And is it the right term;
is there an ownership relationship, or does this just mean "whoever
wrote the module"?  Or is there an understood social context in which
a module will be written that needs to be paid attention to in this
document?  (E.g., regarding allocation of XML namespaces.)

There is also the risk of misunderstanding between "own" and "belong
to" -- a submodule "belongs to" its module but the module doesn't
"own" the submodule.

Also, the module isn't "divided" into submodules, since there is
always a module statement.

Perhaps a better version of this paragraph is:

   A module may have portions of its definition separated into
   submodules, based on the needs of the module writer.  This
   separation is not visible outside the text of the module and
   submodules; neither importation into other modules nor the data
   encodings are affected.

--

   The "import" statement allows a module or submodule to reference
   material defined in other modules.

There is no definition of "material".  Clearly, it means "things that
can be referenced", but this points out that there is no term for
"things that can be referenced", but it would be useful to have one.
Then one could state, e.g., that a module can reference ???s defined
in its submodules, or it can reference ???s defined in modules that it
imports.

   The "include" statement is used by a module to incorporate the
   contents of its submodules into the module.

In a sense, "include" isn't really like "import", because it's not
optional; it's the mandatory reverse-link of the "belongs-to"
statement in the submodule.  (The module and all submodules
automatically have access to definitions in all submodules due to
their relationship.)

And that has been the cause of a lot of my confusion on this topic;
"include" and "import" are discussed together in many places, making
it unclear that "a module includes a submodule" is really a *static*
relationship, not a statement that the programmer could insert or not
according to need.  A clearer statement would be:

   The "include" statement is used in a module to identify each
   submodule that belongs to it.  The module's definitions have access
   to the definitions in the submodule.

It might be useful to define "include" and "belong to" in section 3.

- section 4.2.2.1

     leaf host-name {
       type string;
       description
         "Hostname for this system";
     }

It might be useful here to provide a forward reference to 6.3 as a
reference for the syntax of statements.  If you don't already know the
syntax, it can be hard to figure out that "host-name" is the node's
identifier.  Or perhaps the forward reference could be put at the end
of 4.2.2.

- section 4.2.2.3

   A container may contain any number of child nodes of any type
   (leafs, lists, containers, leaf- lists, actions, and
   notifications).

If the plural of "leaf" is "leafs" (rather than the standard English
"leaves"), that should be noted in the entry in section 3.

- section 4.2.2.4

   A list defines a sequence of list entries.  Each entry is like a
   structure or a record instance, and is uniquely identified by the
   values of its key leafs.  A list can define multiple key leafs and
   may contain any number of child nodes of any type (including leafs,
   lists, containers etc.).

Each entry is like a *container*, which was just defined above, which
might be a more intuitive way to describe it.

Perhaps "... is uniquely identified by the values of one or more
specified leaf children, the key leafs."

   XML Encoding Example:

     <user>
       <name>glocks</name>
       <full-name>Goldie Locks</full-name>
       <class>intruder</class>
     </user>
     <user>
       <name>snowey</name>
       <full-name>Snow White</full-name>
       <class>free-loader</class>
     </user>
     <user>
       <name>rzell</name>
       <full-name>Rapun Zell</full-name>
       <class>tower</class>
     </user>

I note that this example, unlike the previous ones, is not a single
XML element; there is no XML element that designates the list as a
whole.  That seems to be a general pattern in the XML encodings for
Yang, that groups of elements can be defined in Yang, but the
resulting set of elements is not wrapped in a single element in the
XML.  That happens with lists (where each list element is an element
but the list as a whole is not), leaf-list (where there is no element
surrounding the repeated leaf element), groupings, and choices with
their child cases.  Perhaps this is a general pattern in XML used for
network management, but it's not quite what one would expect if one
thinks of data structures in a programming language.  Perhaps it would
be helpful to point out this pattern somewhere that global XML issues
are described (e.g., whitespace issues).

- section 4.2.3

   YANG can model state data, as well as configuration data, based on
   the "config" statement.  When a node is tagged with "config false",
   its subhierarchy is flagged as state data.  In NETCONF, state data is
   reported using the <get> operation, not the <get-config> operation.
   Parent containers, lists, and key leafs are reported also, giving the
   context for the state data.

Does this mean that <get> returns the state data and <get-config>
returns the configuration data?  Or does <get> return both the state
and configuration data?  Perhaps this is not really a question for
this document, but if the behavior of <get> and <get-config> is
mentioned, they should be described thoroughly.

- section 4.2.4

   YANG has a set of built-in types, similar to those of many
   programming languages, but with some differences due to special
   requirements from the management domain.

"from the management domain" is awkward, and "domain" has no previous
definition; better "requirements of network management".

       | enumeration         | Enumerated strings                  |

Perhaps "one of an enumerated set of strings".

       | string              | Human-readable string               |

What does "human-readable" mean?  Can passwords be stored in such
leafs?  Does this really mean "a string of Unicode characters"?
Perhaps "character string" would be better.

- section 4.2.7

   YANG allows the data model to segregate incompatible nodes into
   distinct choices using the "choice" and "case" statements.  The
   "choice" statement contains a set of "case" statements that define
   sets of schema nodes that cannot appear together.  Each "case" may
   contain multiple nodes, but each node may appear in only one "case"
   under a "choice".

   When a node from one case is created in the data tree, all nodes from
   all other cases are implicitly deleted.  The server handles the
   enforcement of the constraint, preventing incompatibilities from
   existing in the configuration.

   The choice and case nodes appear only in the schema tree but not in
   the data tree.  The additional levels of hierarchy are not needed
   beyond the conceptual schema.

This description reads very oddly.  AFAICT, "choice" simply defines a
union type, where each alternative is a container (usually called a
structure or record).  Or rather, it's conceptually a container, but
in an XML instance of data, there is no start-end tags for the
structure as a whole (paralleling the lack of start-end tags for lists
as a whole).  Once you state that, it's clear why there are "sets of
schema nodes that cannot appear together".

- section 4.2.8

It would help to give a general discussion of augmenting.  If one
module augments another, is the augmented data definition part of the
data structures defined by augmenting module or the augmented one?

If I've got it right, this module:

   module /system/login/user {
     namespace "urn:user";
     prefix "user";

     list user {
       key "name";
       leaf name {
         type string;
       }
       leaf full-name {
         type string;
       }
       leaf class {
         type string;
       }
     }
   }

gives this as the XML encoding of a typical data tree:

     <user xmlns="urn:user">
       <name>alicew</name>
       <full-name>Alice N. Wonderland</full-name>
       <class>drop-out</class>
       <other:uid>1024</other:uid>
     </user>

whereas the *existence* (in some sense) of this module:

   module /system/login/user-augmenter {
     namespace "urn:user-extension";
     prefix "ext";

     augment /system/login/user {
       when "class != 'wheel'";
       leaf uid {
         type uint16 {
           range "1000 .. 30000";
         }
       }
     }
   }

changes the encoding of the data tree to:

     <user xmlns="urn:user" xmlns:ext="urn:user-extension">
       <name>alicew</name>
       <full-name>Alice N. Wonderland</full-name>
       <class>drop-out</class>
       <ext:uid>1024</ext:uid>
     </user>

What is it that triggers this action-at-a-distance?

   If a module augments another module, the XML encoding of the data
   will reflect the prefix of the augmenting module.  For example, if
   the above augmentation were in a module with prefix "other", the XML
   would look like:

This is hard to interpret.  Better would be:

   If a module augments another module, the XML elements that are
   added to the encoding are in the namespace of the augmenting module
   (which should preferentially be associated with the prefix of the
   augmenting module).

The restriction is on the XML namespace of the new element, not its
XMLNS prefix.

- section 5.1

   The module is the base unit of definition in YANG.  A module defines
   a single data model.  A module can define a complete, cohesive model,
   or augment an existing data model with additional nodes.

What is the semantic content of "complete, cohesive"?  It seems like
"defines a data model" is shorter and equally accurate.

If I understand Yang correctly, every module defines a data model
(which may have no leafs!), but some also augment other data models.
If that's so, the use of "or" is misleading, and it would be more
accurate to say:

   The module is the base unit of definition in YANG.  A module
   defines a single data model.  A module can also augment an existing
   data model with additional nodes.

Then again, perhaps a module that defines no data nodes cannot be
referenced as a data model.  If so, that should be stated.

--

   Developers of YANG modules and submodules are RECOMMENDED to choose
   names ...

"RECOMMENDED" is an adjective which qualifies the choice, not an
adverb that qualifies the act of choosing.  I think you want
"... SHOULD choose ...".  See RFC 2119.

   A module uses the "include" statement to include all its submodules,
   and the "import" statement to reference external modules.  Similarly,
   a submodule uses the "import" statement to reference other modules.

This tends to confuse the nature of "include" and "import".  I would
prefer:

   A module has an "include" statement for each of its submodules.  A
   module, or submodule belonging to that module, can reference
   definitions in the module and all submodules included by the
   module.

   A module or submodule uses the "import" statement to reference
   external modules.  Statements in the module or submodule can
   reference definitions in the external module using the prefix
   specified in the "import" statement.

--

   For backward compatibility with YANG version 1, a submodule is ...

It's not clear why backward compatibility is needed here, since
yang-version marks each module explicitly with the language version
that is used.

   References to definitions in the local module MAY use the prefix
   notation.

I would add "... using the prefix defined for the module."

- The syntax of colon

Note there are four uses of ":" in the text:

section 5.1

   When a definition in an external module is referenced, a locally
   defined prefix MUST be used, followed by ":", and then the external
   identifier.

section 6.1.2

   A keyword
   is either one of the YANG keywords defined in this document, or a
   prefix identifier, followed by ":", followed by a language extension
   keyword.

section 7.1.4

   When a reference to an identifier from the imported
   module is used, the prefix string for the imported module is used in
   combination with a colon (":") and the identifier, e.g.,
   "if:ifIndex".

section 7.19

   The
   statement's name is created by combining the prefix of the module in
   which the extension was defined, a colon (":"), and the extension's
   keyword, with no interleaving whitespace.

The BNF shows that in all cases, no whitespace is allowed around the
":".  I think that consistent wording should be used in all four
locations to make it clear that whitespace/comments/newlines are not
allowed.  Also "combined with" is unnecessarily inexact, "concatenated
with" or "followed by" is much better.

- section 5.1.1

The beginning of this section should be a discussion of the nature of
revisions abstractly.  AFAICT, an (abstract) module has one or more
revisions, each of which is identified by a distinct revision date.
Each of these is described by one module statement, each of which is
in a separate file.  It seems to be preferred (but seemingly not
required) that a revision of a module will have revision statements
that specify the revision dates of itself and all older revisions in
reverse chronological order -- the actual specification of the
revision date of a module seems to be done externally to the module
statement.

In regard to submodules, it seems that a revision of the module
specifies the revisions of all of its submodules, but a revision of a
submodule only specifies the module's identifier, not its revision.

   When a module is written, it can import the current revisions of
   other modules, based on what is available at the time.

"current" is a dangerous word to use, because it means the narrative
present moment.  You want to say:

   When a module is written, it can import the revisions of other
   modules that are current when the module is written.

--

   When the author of the
   module is prepared to move to the most recently published revision of
   an imported module, the module is republished with an updated
s/the module is republished/the author's module is republished/
   "import" statement.  By republishing with the new revision, the
   authors explicitly indicate their acceptance of any changes in the
s/their acceptance of any changes in/the use of the newer revision of/
   imported module.

Why is "acceptance" being used here?  It has political connotations.
It seems like "use of" carries less baggage.

   For submodules, the issue is related but simpler.  A module or
   submodule that includes submodules needs to specify the revision of
   the included submodules.  If a submodule changes, any module or
   submodule that includes it needs to be updated.

The second sentence is unclear.  I think you mean "In order for a
module or submodule to use definitions in a new revision of a
submodule, the module must be updated to include the new revision of
the referenced submodule."

   If a module is not imported with a specific revision, it is undefined
   which exact revision is used.

Delete "exact" here; it isn't needed and is too informal.  Perhaps
replace with "specific".  (There are other instances of "exact" with
the same issue.)

- section 5.1.2

   YANG allows modeling of data in multiple hierarchies, where data may
   have more than one top-level node.  Models that have multiple top-
   level nodes are sometimes convenient, and are supported by YANG.

Any single module seems to define only one data structure, the one
whose elements are the items listed in the module, and Yang doesn't
seem to provide for any sort of "model" definition other than a
module.  Or are the nodes at the top level within the module all
usable as independent data structures?  I.e., given a module
statement, what is the set of allowed root XML elements for data trees
in its XML namespace?

- section 5.1.2.1

   For example:

     module example-config {
       yang-version 1.1;
       namespace "urn:example:config";
       prefix "co";

       container system { ... }
       container routing { ... }
     }

   could be encoded in NETCONF as:

s/For example/For example, an instance of/

   NETCONF is capable of carrying any XML content as the payload in the
   <config> and <data> elements.  The top-level nodes of YANG modules
   are encoded as child elements, in any order, within these elements.

This isn't very clear.  AFAICT from the example:

   NETCONF <config> and <data> elements each contain sequences of
   instances of top-level nodes of the appropriate YANG module.

Given that this example is about a particular module, it would help if
the module namespace and prefix was used correctly in the XML example.

- section 5.2

   YANG modules and submodules are typically stored in files, one module
   or submodule per file.

This might be clearer as "... one module or submodule statement per
file."

   The name of the file SHOULD be of the form:

     module-or-submodule-name ['@' revision-date] ( '.yang' / '.yin' )

Does the extension ('.yang' or '.yin') correspond to whether the
contents of the file are in Yang or Yin syntax?

- section 5.3

   All YANG definitions are specified within a module that is bound to a
   particular XML namespace [XML-NAMES], which is a globally unique URI
   [RFC3986].

This doesn't clearly specify the range of "particular" and "unique".
A better phrasing is

   All YANG definitions are specified within a module.  Each module is
   bound to a distinct XML namespace [XML-NAMES], which is a globally
   unique URI [RFC3986].

--

   XML namespaces for private modules are assigned by the organization
   owning the module without a central registry.  Namespace URIs MUST be
   chosen so they cannot collide with standard or other enterprise
   namespaces, for example by using the enterprise or organization name
   in the namespace.

I don't see the use of "without a central registry"; haven't you
already said the module is is private?  (Or is "without a central
registry" a term of art?)

Also, the second sentence is redundant, as you've previously said that
the URI is globally unique, which implies that whoever assigned the
URI has some mechanism for assuring its uniqueness.

- section 5.4

   There is no worry over conflicts if both modules define the
   type, since there is no ambiguity.

I think you mean "There is no ambiguity if both modules define types
with the same name.".

- section 5.5

This section talks a great deal about why scoping is done, but isn't
specifically clear what the rules are.  (Traditionally, the rule is
stated as either what range of source text a particular definition is
visible in, or given an identifier use, how the matching definition is
found.)

   Scoped definitions MUST NOT shadow definitions at a higher scope.  A
   type or grouping cannot be defined if a higher level in the schema
   hierarchy has a definition with a matching identifier.

I think the second sentence is confusing, because (to me) the term
"schema hierarchy" implies the data tree structure, whereas the
scoping rule seems to be intended to be entirely lexical.  In either
case, this needs to be clarified.

- section 5.6

   Conformance is a measure of how accurately a server follows the
   model.

There is no model already under discussion, so you can't use "the
model".  Perhaps

   Conformance to a model is a measure of how accurately a server
   follows a model.

- section 5.6.5

I suspect that if a server implements module A, and A imports B, the
server is not required to implement B.  (Even if A references
groupings in B.)  The answer to that should probably be stated
explicitly.

   If a server lists a module C in the "/modules-state/module" list from
   "ietf-yang-library", and there are other modules Ms listed that
   import C without specifying the revision date of module C, the server
   MUST use the definitions from the most recent revision of C listed
   for modules Ms.

I think the end of that would be clearer as

   ... the server MUST implement Ms by importing the most recent
   revision of C listed in the "/modules-state/module" list.

     module b {
       yang-version 1.1;
       namespace "urn:example:b";
       prefix "b";

       revision 2015-04-04;
       revision 2015-01-01;

       typedef myenum {
         type enumeration {
           enum zero; // added in 2015-01-01
           enum one;  // added in 2015-04-04
         }
       }

       container x {  // added in 2015-01-01
         container y; // added in 2015-04-04
       }
     }

Is this the correct way to specify multiple revisions of this module?
Or is this an informal notation for the two module definitions:

     module b {
       yang-version 1.1;
       namespace "urn:example:b";
       prefix "b";

       revision 2015-04-04;
       revision 2015-01-01;

       typedef myenum {
         type enumeration {
           enum zero; // added in 2015-01-01
           enum one;  // added in 2015-04-04
         }
       }

       container x {  // added in 2015-01-01
         container y; // added in 2015-04-04
       }
     }

     module b {
       yang-version 1.1;
       namespace "urn:example:b";
       prefix "b";

       revision 2015-01-01;

       typedef myenum {
         type enumeration {
           enum zero; // added in 2015-01-01
         }
       }

       container x {  // added in 2015-01-01
       }
     }

If it is the latter, it would help the new reader to explain the
convention (as no example of a multi-revision module is given in the
document).

- section 6

   YANG modules use the UTF-8 [RFC3629] character encoding.

Better to say that YANG modules use the Unicode character set and are
stored in files using the UTF-8 character encoding.

Verify that form-feed (0x0C) is not allowed.

This might be a good place to talk about line-breaks in Yang source
files.  It looks from section 14 that line-breaks MUST be either CRLF
or LF.

It is worth noting (either here or in 6.1.3) how line-breaks inside
quoted strings are transcribed into the string's value.  As now
written, it seems that the line-break is transcribed identically to
how it is represented in the source.  That means (1) If the source is
recoded with the other type of line break, the semantics of the Yang
code change; and (2) if the source uses line-breaks of one type (CRLF
or LF), only that type can be directly transcribed into string values.
(But regardless of the source line-breaks, an LF can be transcribed
into a double-quoted string with "\n".  But a CRLF cannot be
transcribed into a double-quoted string with escape sequences.  Was
that intended, or was "\r" intended to be legal?)

- section 6.1

   This section details the rules for recognizing tokens from an input
   stream.

Generally, language definitions intersperse the narrative text with
the relevant grammar definitions.  Yang's statement grammar is simple
enough that one doesn't need to see the context-free part of the
grammar to understand the narrative for statements.  But when reading
about tokenization, not having the grammar presented at the same time
is quite a burden.  I'd recommend duplicating the relevant productions
from section 14 into the subsections of section 6.

There is some sort of exposition problem.  The result of
"tokenization" is that the sequence of characters of the source is
converted into a sequence of "tokens".  Then some subset of the tokens
is discarded as being non-significant (e.g., whitespace and comments),
and the remainder is parsed with a context-free grammar.  Here I can't
figure out what the set of tokens is.  Looking at the grammar in
section 14, it seems to be a context-free grammar on characters.  But
that implies that there is no separate tokenization phase.

An example that shows the problems:

   mod:ext

Is this one token, which is also an extension keyword, or is it a
sequence of three tokens?

- section 6.1.1

   A block comment is enclosed within "/*" and "*/".

Should say "starts with "/*" and ends with the nearest following
"*/"".

- section 6.1.2

   A token in YANG is either a keyword, a string, a semicolon (";"), or
   braces ("{" or "}").

What is the intention of this sentence?  One interpretation is that it
defines "token" as the union of several other classes.  But that has
the limitation that those classes are not well-defined in this
section.  E.g., what is the syntax of an "unquoted string"?  The other
interpretation is that the Yang file is already separated into
"tokens", and that the set of tokens is further subdivided, with all
tokens that are not keywords, quoted strings, semicolons, or braces
being unquoted strings, whose value is the sequence of characters
within the token.  But the separation into tokens is not described.

"string" is particularly confusing.  In most languages, "string"
values are always quoted in source code.  But it seems that in Yang,
any "token" that isn't a keyword or one of ;, {, }, is automatically
an "unquoted string".  If that's so, it needs to be stated explicitly,
since it's unusual.

- section 6.1.3

   If a string contains any space, tab, or newline characters, a single
   or double quote character, a semicolon (";"), braces ("{" or "}"), or
   comment sequences ("//", "/*", or "*/"), then it MUST be enclosed
   within double or single quotes.

Is a string containing "*/" required to be quoted?  I ask because the
character sequence "*/" is not ambiguous if it is not preceded by "/*"
-- it must be an unquoted string.

   If a double-quoted string contains a line break followed by space or
   tab characters that are used to indent the text according to the
   layout in the YANG file, this leading whitespace is stripped from the
   string, up to and including the column of the double quote character,
   or to the first non-whitespace character, whichever occurs first.  In
   this process, a tab character is treated as 8 space characters.

This description isn't quite careful enough.  Better:

   If a double-quoted string contains a line break followed by space or
   tab characters, an initial part of this whitespace is removed from the
   string.  The amount removed is the longest prefix whose width is no
   larger than the width of the prefix of Yang source line containing
   the opening double quote character of the string to and including the
   opening double quote character.  For this purpose, the width of a
   tab character is 8 and the width of any other character is 1.

This does assume that all tabs are considered to have width 8, that
is, tabs do not have the usual semantics of "advance to the next
column that is divisible by 8".  That will sometimes cause unexpected
results, e.g., if some source lines start with SPC TAB.  (Consider
that whitespace before a line break is removed, which suggests the
intention is that the value of the string should depend only on its
visual appearance.)

Also, we're using the convention that "whitespace" does NOT include CR
or LF, which is not always how the term is used.  Perhaps a definition
of "whitespace" should be put in section 3.

There is also the special case:

   SPC " LF
   TAB x "

Is the initial TAB of the second line to be removed or not?  There is
no whitespace removal in the second line that will exactly reach the
opening double quote.  As I've written it, the TAB is not removed.

   Within a double-quoted string (enclosed within " "), a backslash
   character introduces a special character, which depends on the
   character that immediately follows the backslash:

s/introduces a special character/introduces a representation of a
special character/

Verify that CR cannot be specified by an escape.

   If a quoted string is followed by a plus character ("+"), followed by
   another quoted string, the two strings are concatenated into one
   string, allowing multiple concatenations to build one string.

Are whitespace, line endings, or comments allowed between the quoted
strings and the "+"?  (This description is in the lexical section, so
non-significant lexemes can't be assumed to have been removed.)  (See
also comments on section 14.)

   If a quoted string is followed by a plus character ("+"), followed by
   another quoted string, the two strings are concatenated into one
   string, allowing multiple concatenations to build one string.
   Whitespace trimming is done before substitution of backslash-escaped
   characters in double-quoted strings.  Concatenation is performed as
   the last step.

I think you want to reverse the phrase order in the second sentence:
"In double-quoted strings, whitespace trimming is done before
substitution of backslash-escaped characters."  However, it's not
clear that that matters; only a backslash followed by whitespace would
be affected by whitespace trimming, and whitespace trimming would
leave the backslash followed by CR or LF -- all of those cases are
invalid.

Also, if you keep that second sentence, it should be moved earlier --
the first and third sentences of the paragraph apply to all quoted
strings, but the second sentence only applies to double-quoted
strings, and should be in a paragraph that deals with only
double-quoted strings.

- section 6.1.3.1

   The following examples show some illegal strings:

     ''''  - a single-quoted string cannot contain single quotes
     """   - a double quote must be escaped in a double-quoted string

These shouldn't be described as "illegal" strings; they aren't
strings, truly, but the lexing process doesn't isolate them as tokens,
after which they are decreed to be improper.  The first example is two
single-quoted strings (''), one after the other.  The second is a
double-quoted string (""), followed by the double-quote that starts
another double-quoted string.  Better to say "The following examples
are not string tokens:".

- section 6.2

   Identifiers can be specified as quoted or unquoted strings.

If there is no lexical difference between how identifiers may be
represented and how strings may be represented, and what is an
identifier is distinguished only by the syntactic context, it would be
helpful if that was stated explicitly in section 6.1.  (Though the
characters allowed in identifiers are restricted.)

- section 6.3.1

   The processing of extensions depends on whether support for those
   extensions is claimed for a given YANG parser or the tool set in
   which it is embedded.  An unsupported extension, appearing in a YANG
   module as an unknown-statement (see Section 14) MAY be ignored in its
   entirety.  Any supported extension MUST be processed in accordance
   with the specification governing that extension.

This allows a loophole, where an implementation can not claim support,
but then not ignore the extension.  I think you want to say "An
unsupported extension ... MUST be ignored in its entirety."  (See also
7.20.1, which implies that an unsupported *feature* must be ignored.)
If you really mean "MAY", what are the limits regarding what the
server can do?

- section 6.5

   In an absolute
   schema node identifier, the first identifier after the leading slash
   is any top-level schema node in the local module or in all imported
   modules.

I think "in all imported modules" should be "in an imported module".
(The prefix of the identifier will tell which module.)

- section 7.1

   The module
   name follows the rules for identifiers in Section 6.2.

Why not say "The module name is an identifier."?  Similarly for all
other occurrences of "follows the rules for identifiers".  (I don't
think we hang any semantics on the term "identifier", only syntax.)

- section 7.1.4

See discussion of "prefix" in comments for section 1.1.  I think all
that is needed is to update the second paragraph to:

   When used inside the "module" statement, the "prefix" statement
   defines the prefix suggested to be used when this module is imported.

   To improve readability of the NETCONF XML, a NETCONF client or server
   that generates XML or XPath that use prefixes SHOULD use the prefix
   defined by the module as the XMLNS prefix to associate with the
   module's namespace (which may be impossible if there is a conflict
   with another XMLNS prefix).

The important thing is that the discussion of using the prefix value
as an XMLNS prefix is split into a separate paragraph, so it's clear
that it is a different subject.

   The
   prefix string MAY be used to refer to definitions contained in the
   module, e.g., "if:ifName".

I think this should be qualified "MAY be used within the module to
refer to the definitions contained"; the following paragraph suggests
that a module that imports this module can specify a different prefix
for its references to use, and this prefix will not work in that
module.

- section 7.1.5

   Multiple
   "import" statements may be specified to import from different
   modules.

Is this a requirement that two "import" statements in the same module
MUST import different modules?  If not, the sentence should be shorted
to "Multiple "import" statements may be specified.", or omitted
entirely.  But if a module can't be imported twice, that should
probably be made clearer

   Multiple "import" statements may be specified, but they must import
   from different modules.

- section 7.1.6

   When a module includes a submodule, it incorporates the contents of
   the submodule into the node hierarchy of the module.

What does this mean?  The "include" statement is a top-level statement
within its module, so how is its contents part of the node hierarchy
-- what node is the parent of its nodes?  Perhaps the meaning is that
top-level nodes of the submodule are implicitly top-level nodes of the
module.

- section 7.2

   While the primary unit in YANG is a module, a YANG module can itself
   be constructed out of several submodules.

This doesn't really capture the essence.  What you want to say is

   While the primary unit in YANG is a module, a YANG module can
   include submodules, whose definitions are incorporated into it.

- section 7.5.1

   In the first style, the container has no meaning of its own, existing
   only to contain child nodes.

I think it would be good to add here "In particular, the presence of
the container element with no child elements is semantically
equivalent to the absence of the container element."

- section 7.5.4.3

       must "ifType != 'ethernet' or " +
            "(ifType = 'ethernet' and ifMTU = 1500)" {
	    ...
       }
       must "ifType != 'atm' or " +
            "(ifType = 'atm' and ifMTU <= 17966 and ifMTU >= 64)" {
	    ...

Are these a preferred style?  I ask because it would be shorter to say

       must "ifType != 'ethernet' or ifMTU = 1500" {

       must "ifType != 'atm' or " +
            "(ifMTU <= 17966 and ifMTU >= 64)" {

(Perhaps the ifMTU element is optional, but even then, "and" must
short-circuit for the given expressions to always be valid, so it
seems that "or" must short-circuit.)

- section 7.5.7

I suspect that child elements can be omitted if they're not mandatory,
but their order need not match the order in the schema.

- section 7.6.1

   Note that if the leaf or any of its ancestors has a "when" condition
   or "if-feature" expression that evaluates to "false", then the
   default value is not in use.

ISTM that this is not a "Note that" but rather a precondition of the
analysis of the preceding paragraphs, so it should be moved above
"Otherwise, the usage of the default value...".  (See also the similar
condition in 7.7.2.)

- section 7.6.4

   The default value MUST NOT be marked with an "if-feature" statement.

I think you want "The definition of the default value..." rather than
"The default value...".

But it's not clear what the whole set of situations is that should be
forbidden.  For instance, this should work:

     typedef xyz {
       type enumeration {
	 enum blue { if-feature blue; }
	 ...
       }
     }

     leaf color {
       if-feature blue;
       type xyz;
       default blue;
     }

Whereas this won't:

     typedef xyz {
       type enumeration {
	 enum blue { if-feature blue; }
	 ...
       }
     }

     leaf color {
       // No if-feature here.
       type xyz;
       default blue;
     }

Is this rule only meant to cover the situation where the leaf's type
is an "in-line" enum and the particular enum value has an if-feature?

- section 7.7.1

It might be worth mentioning that duplicated values are allowed and
are significant, since that has changed since Yang 1.  I.e., even in a
system ordered list, the number of list elements with any particular
value must be preserved.

- section 7.7.2

It appears to be assumed that the server's behavior is the same if (1)
a leaf-list node is specified with zero values, and (2) if the
leaf-list node is absent and has no default values.

   Otherwise, if the leaf-list's type has
   a default value, and the leaf-list does not have a "min-elements"
   statement with a value greater than or equal to one, then the leaf-
   list's default value is the type's default value.  In all other
   cases, the leaf-list does not have any default values.

I think it would help to s/the type's default value/one instance of
the type's default value/.

Is the final condition correct?  E.g., if the leaf-list's type has a
default value, and it has a min-elements with value 2, then the
leaf-list has no default values -- which is invalid.

- section 7.7.7.1

   The entries in the list are sorted according to an order determined
   by the system.  The "description" string for the list may suggest an
   order to the server implementor.  If not, an implementation is free
   to sort the entries in the most appropriate order.

This is mealy-mouthed about what is required and what is recommended.
Better would be to omit "If not, an implementation is free to sort the
entries in the most appropriate order."

- section 7.7.7.2

   The entries in the list are sorted according to an order defined by
   the user.  This order is controlled by using special XML attributes
   in the <edit-config> request.

Actually, the entries aren't "sorted", as there is no requirement that
any particular set of values always has a particular order; the user
can insert 1, 2, 3 at one time, and 3, 2, 1 at another.  The correct
meaning is splicing these two sentences:

   The user orders entries in the list by using special XML attributes
   in the <edit-config> request.

- section 7.7.8

   The XML elements
   representing leaf-list entries MAY be interleaved with other sibling
   elements, ...

I think it would be more accurate to say "interleaved with elements
for siblings of the leaf-list node".  (See also section 7.8.5.)

- section 7.8

   A list entry is uniquely identified by the values of the list's keys,
   if defined.

What does "if defined" apply to?  Is the point that if keys are not
defined, then list entries have no such uniqueness requirement?  If
the latter, then the text "Each entry ... is uniquely identified by
the values of its key leafs." in section 4.2.2.4 needs to be changed.

- section 7.8.2

   The "key" statement, which MUST be present if the list represents
   configuration, and MAY be present otherwise, takes as an argument a
   string that specifies a space-separated list of leaf identifiers of
   this list.

It would help to change to "list of one or more leaf identifiers"

- section 7.8.3

   The "unique" constraint specifies that the combined values of all the
   leaf instances specified in the argument string, including leafs with
   default values, MUST be unique within all list entry instances in
   which all referenced leafs exist.

Should that sentence end "... within all list entry instances in which
all referenced leafs exist or have default values"?

- section 7.9

   The "choice" statement defines a set of alternatives, only one of
   which may exist at any one time.

Better, "only one of which may be present in any one data tree".

   A choice node does not exist in the data tree.

This is a good phrase and should be applied to list nodes (in that
there is no node for the list, only for each list element).

   A choice consists of a number of branches, defined with the "case"
   substatement.

Better, "each defined with a "case" substatement".

   As a shorthand, the "case" statement can be omitted if the branch
   contains a single "anydata", "anyxml", "choice", "container", "leaf",
   "list", or "leaf-list" statement.  In this case, the case node still
   exists in the schema tree, and its identifier is the same as the
   identifier in the branch statement.

There is no "branch" statement...  How about:

   As a shorthand, the "case" statement can be replaced by just the
   statement for the child node if the branch contains a single
   "anydata", "anyxml", "choice", "container", "leaf", "list", or
   "leaf-list" statement.  In this case, the case node still exists in
   the schema tree, and its identifier is the same as the identifier
   of the child node.

- section 7.9.3

   The argument is the identifier of the "case" statement.

Better, "The argument is the identifier of the default "case"
statement."

- section 7.9.5.  XML Encoding Rules

   The choice and case nodes are not visible in XML.

Use this sort of statement for lists and leaf-lists, which have no
visible XML for the lists as a whole.

- section 7.10

   The "anydata" statement is used to represent an unknown set of nodes
   that can be modelled with YANG, except anyxml, ...

What is the practical consequence of this restriction?  It's not at
all clear to me which chunks of XML could be modelled with Yang and
which could not.

- section 7.12

   A grouping is like a "structure" or a "record" in conventional
   programming languages.

Add, "... though no grouping node exists in the XML."

- section 7.15

   The "action" statement is used to define an operation connected to a
   specific container or list data node.

Might it be better to say "list element data node"?  What isn't being
said explicitly is that an action that appears in a list statement is
an action on a single element of the list, rather than an action on
the list as a whole.  Similarly for notifications, section 7.16.

- section 7.15.2

   The last container or list contains an XML element that ...

Would "innermost" be better than "last"?

- section 7.17

   The "augment" statement allows a module or submodule to add to >a<
   schema tree defined in an external module, or >in< the current module and
   its submodules, and to add to the nodes from a grouping in a "uses"
   statement.

I think adding "in" where indicated makes the sentence clearer.

I've changed a "the" is changed to an "a" where indicated.  The
question is whether the module defines a schema tree or a schema
forest, a set of trees.  That depends on whether each top-level node
can be used independently as the root of a data tree or whether they
have to be used together.  See comments on section 5.1.2.

- section 7.18

It would help to insert a discussion of the global purpose and use of
identities.  In particular, what things can refer to them?  It looks
like only identityref leafs can do so, but I could easily be wrong.
Also, more discussion in the example (7.18.3) would be helpful.

   The "identity" statement is used to define a new globally unique,
   abstract, and untyped identity.  Its only purpose is to denote its
   name, semantics, and existence.

The two uses of "its" in the second sentence are ambiguous.  I think
you mean "The statement's only purpose is to denote the identity's
name, semantics, and existence."  Though I'm not sure that an identity
has any semantics beyond its existence and base identities.

- section 7.19

   The statement's argument is an identifier that is the new keyword for
   the extension and must be followed by a block of substatements that
   holds detailed extension information.

s/The statement's/The "extension" statement's/ -- this talks about the
"extension" statement that defines the extension statement [sic!].

- section 7.20.2.1

   In this example, the container "target" is implemented if any of the
   features "outbound-tls" or "outbound-ssh" are supported by the
   server.

"either" is better than "any" if there are only two choices.

- section 7.21.4

   The "reference" statement takes as an argument a string ...

Perhaps s/a string/a human-readable string/.

- section 7.21.5

Note that if a data definition has both an "if-feature" and a "when",
then the "if-feature" is tested first.

   If the XPath expression references any node that also has associated
   "when" statements, these "when" expressions MUST be evaluated first.
   There MUST NOT be any circular dependencies in these "when"
   expressions.

I think this could be better phrased:

   If the XPath expression references any node that also has
   associated "when" statements, then the "when" expressions of the
   referenced nodes MUST be evaluated first.  There MUST NOT be any
   circular dependencies among "when" expressions.

- section 9.3.4

Can one apply a fraction-digits restriction to a derived type that
already has a fraction-digits restriction?  ISTM that this is sensible
if the fraction-digits numbers of the two restrictions are the same.
Then again, that would be pointless, and we might want to forbid
fraction-digits re-restrictions.

The fraction-digits statement doesn't seem to have any of the
substatements (e.g., description, error-app-tag) that other
restrictions (e.g., length, pattern, range) do.  (The grammar in
section 14. agrees.)  Is this OK?  It seems like a value could fail to
conform to fraction-digits.

- section 9.6.3

   An enumeration can be restricted with the "enum" (Section 9.6.4)
   statement.

This should be fleshed out:  "... can be restricted with one or more
"enum" (Section 9.6.4) statements, which enumerate a subset of the
values of the base type."

- section 9.7.2

   The lexical representation of the bits type is a space-separated list
   of the individual bit values that are set.

I think that should be "... list of the names of the bits that are
set."  I tend to think of a "bit value" as 0 or 1.

- section 9.8.3

   The canonical form of a binary value follows the rules in [RFC4648].

Best specify "the rules of 'Base 64 Encoding' in [RFC4648]."

- section 9.9

   The leafref type is used to declare a constraint on the value space
   of a leaf, based on a reference to a set of leaf instances in the
   data tree.  The "path" substatement (Section 9.9.2) selects a set of
   leaf instances, and the leafref value space is the set of values of
   these leaf instances.

The first sentence isn't quite right.  Perhaps:

   The value space of a leaf with leafref type is one of the values of
   a set of leaf instances in the data tree.  The "path" substatement
   (Section 9.9.2) selects the set of leaf instances, and the leafref
   value space is the set of values of these leaf instances.

- section 9.9.3

   If "require-instance" is "true", it means that the instance being
   referred MUST exist for the data to be valid.  This constraint is
   enforced according to the rules in Section 8.

   If "require-instance" is "false", it means that the instance being
   referred MAY exist in valid data.

What does "the instance being referred" mean?  Perhaps you meant "the
instance being referred to"?

Also, I think the second paragraph means that if require-instance is
false, the referred-to instance need not exist.  But it's not clear to
me what that would mean -- if there are no instances referenced by the
XPath expression, then the set of value values of the leafref would be
empty and the leafref would necessarily be invalid. ...  I think these
paragraphs need to be expanded in some way.

(require-instance is also used in instance-identifier, which may have
subtly different semantics than when it is used in leafref.)

- section 9.9.4.  Lexical Representation

   A leafref value is lexically represented the same way as the leaf it
   references.

Probably better as "... the same way as the leaf it references
represents its value."

- section 9.12

   It is used to repeatedly specify each member type of the
   union.

I think you want "It is used repeatedly to specify ..." --
"repeatedly" modifies "used", not "specify".

   In the XML encoding, a value representing a union data type is
   validated consecutively against each member type, in the order they
   are specified in the "type" statement, until a match is found.  The
   type that matched will be the type of the value for the node that was
   validated.

I think a distinction needs to be made here between generating XML and
interpreting XML:

   When generating an XML encoding, a value is encoded according to
   the rules of the member type to which the value belongs.  When
   interpreting an XML encoding, a value is validated consecutively
   against each member type, in the order they are specified in the
   "type" statement, until a match is found.  The type that matched
   will be the type of the value for the node that was validated, and
   the encoding is interpreted according to the rules for that type.

- section 10.3.1

   If the first argument node is of type leafref, the function returns a
   node set that contains the nodes that the leafref refers to.

I *think* that this means the set of nodes that the leafref's XPath
selects, but a plausible alternative is the subset of those nodes
which have the same value as the leafref does.  This hinges on the
meaning of "the leafref refers to a node", which isn't defined
unambiguously in section 9.9.

- section 11

   o  A "description" statement may be added or clarified without
      changing the semantics of the definition.

Probably better to change "clarified" to "changed".  (What is the
specificational content of "clarified"?)

   o  A "type" statement may be replaced with another "type" statement
      that does not change the syntax or semantics of the type.  For
      example, an inline type definition may be replaced with a typedef,
      but an int8 type cannot be replaced by an int16, since the syntax
      would change.

Is the last sentence correct?  Any value that was previously valid
would remain valid.  This is similar to changing "range '-128..127'"
to "range '-32768..32767'".

- section 13

   The translated module is called a YIN module.  This
   section describes symmetric mapping rules between the two formats.

I don't think "symmetric" is the right word; perhaps "bidirectional".

- section 14

   This grammar assumes that the scanner replaces YANG comments with a
   single space character.

It would probably be better to explicitly include comments in the
separator nonterminals:

   ;; 'comment' is specified in Section 6.1.1

   sep                 = 1*(WSP / line-break / comment)
                         ; unconditional separator

   optsep              = *(WSP / line-break / comment)

   stmtsep             = *(WSP / line-break / comment / unknown-statement)

--

   module-stmt         = optsep module-keyword sep identifier-arg-str
   		       	 ...

This has the consequence that 'module"name"{...' is invalid, because
there is no separator after 'module', even though traditional
tokenizers would handle it.  Verify that this is intended.  (This
pattern is consistent for all statement types.)

   linkage-stmts       = ;; these stmts can appear in any order
                         *import-stmt
                         *include-stmt

This could easily be handled like body-stmts:

   linkage-stmts       = *(import-stmt /
                           include-stmt)

--

   if-feature-expr     = "(" if-feature-expr ")" /
                         if-feature-expr sep boolean-operator sep
                           if-feature-expr /
                         not-keyword sep if-feature-expr /
                         identifier-ref-arg

This should probably be marked ";; precedence rules specified in
Section 7.20.2", as the grammar is ambiguous and does not specify the
precedence, whereas typical programming language grammars encode the
precedence rules.

   instance-identifier-specification =
                         [require-instance-stmt]

Why is require-instance-stmt in [...]?  The only use of
instance-identifier-specification is in type-body-stmts, and the only
use of type-body-stmts is in type-stmt, where it is in [...].  Compare
with numerical-restrictions:

   numerical-restrictions = range-stmt

--

   binary-specification = [length-stmt]

Similarly.

   quoted-string       = (DQUOTE string DQUOTE) / (SQUOTE string SQUOTE)

   string              = < an unquoted string as returned by >
                         < the scanner, that matches the rule >
                         < yang-string >

The handling of "string" is hard to follow and/or inexact.  First, we
need a term for the "value" of a string written in Yang source, so we
can say

   prefix-arg-str      = < a string whose ??? matches the rule >
                         < prefix-arg >

The text that is now used is not too difficult to understand, but it's
inexact and makes it hard to write specifications about how the value
is determined.

(Also, there seem to be too many <...>, dividing the narrative text
into two parts.  The natural way to write it would be:

   prefix-arg-str      = < a string whose ??? matches the rule
                           prefix-arg >

But it might not be worth the effort to change the use of <...>
throughout section 14.)

We need a production that tells exactly the syntax of "an unquoted
string as return by the scanner".

We also need productions for quoted strings.  I can reconstruct these:

   quoted-string       = (single-quoted-string / double-quoted-string)
   		         *(optsep "+" optsep
                             (single-quoted-string / double-quoted-string))

   single-quoted-string = SQUOTE *(yang-char excluding "'") SQUOTE

(I don't know a good way to exclude one character from a character set
in ABNF.)

   double-quoted-string = DQUOTE *dq-char DQUOTE

   dq-char = yang-char excluding BACKSLASH and DQUOTE /
             BACKSLASH ( "n" / "t" / DQUOTE / BACKSLASH )

Verify that "optsep" is what is allowed to appear around "+".

Needs a comment pointing to Section 6.1.3 as telling how to interpret
quoted-strings.

[END]