[Lager] Fwd: AD review of draft-ietf-lager-specification-10

All,

here are my notes describing the changes in response to the AD review

They do not cover all points raised by Barry - (I've left alone most of 
the ones that involved sections that Kim wrote, except for some minor 
editorial fixes).

The result is uploaded to the GitHub site for Kim to review and add his 
own changes.

A./

-------- Forwarded Message --------
Subject: 	[Lager] AD review of draft-ietf-lager-specification-10
Date: 	Sun, 13 Mar 2016 14:57:15 +0000
From: 	Barry Leiba <barryleiba@computer.org>
To: 	draft-ietf-lager-specification@ietf.org
CC: 	lager@ietf.org

Hi, authors, shepherd, and working group.
Here's my AD review of draft-ietf-lager-specification-10.

I have a lot of comments, but most are just editorial things that I'd
like you to consider, but that won't delay last call.  I'm putting the
few substantive questions up here, and I'd like them answered before
we do last call.  Any other comments you can address before last call,
or in parallel with it.

------------------------

These are the questions/comments that I'd like addressed now:

-- Section 5.3.5 --

    Two contexts may be complementary, as in the following example, which
    shows ARABIC LETTER TEH MARBUTA (U+0629) as a variant of ARABIC
    LETTER ALEF MAKSURA (U+0649), but with two different types.

        <char cp="0647" >
          <var cp="0629" not-when="arabic-final" type="blocked" />
          <var cp="0629" when="arabic-final" type="allocatable" />
        </char>

There's an error here: the text says "ALEF MAKSURA (U+0649)", but the
example says "<char cp="0647" >"

==> changed 0649 to 0647

  -- Section 6.2.5 --
Am I the only one who doesn't understand the distinction between
"difference" and "symmetric difference"?  I had assumed that
"difference" contained all those in one or the other, but not both...
but that seems to be what "symmetric difference" is (you say it's
xor).  Please explain (to me and in the document).

==> added (subtraction) after "difference"

  -- Section 6.3.9 --
I'm confused here, so maybe you can help me understand:
It seems to me that in this example, the ranges are defined in terms
of the "mixed-digits" rule, and that the rule is defined in terms of
the ranges.  How does such a self-referential thing work?

==> Moved 6.3.9 in before section 6.4.1. Touched surrounding text
     to make it flow and added further clarification.
     Also distinguishing "context rule" (anything invoked via "when")
     from parametrized context rule (anything using "anchor")

     Affected: sections 6.4, new 6.4.1 (old 6.3.9), new 6.4.2 (old 6.4.1)

  -- Section 8.5 --

    Because of symmetry and transitivity, all variant mappings form
    disjoint sets in which each of them is a variant of all other
    members.

I can't sort this one out.  "Each of them" appears to refer to each
set, and I don't know how a set can be a variant of a member.  But if
I instead assume that "each of them" is meant to refer to "all variant
mappings", then I don't understand what the sets have to do with
anything.  I'm also not following how the disjoint sets are formed in
the first place.  Can you explain/clarify?

==> Because of symmetry and transitivity, all variant mappings form
      disjoint sets. In each of these sets, the source and target of
      each mapping are also variants of
      the sources and targets of all the other mappings. "

(and some rewording of the remainder)

------------------------

These are the editorial things that can wait:

There are a number of number-agreement errors in the document (for
example, "A conditional context rules is a specialized form of WLE").
The RFC Editor will fix these, so I didn't call them out, but you
should be alert to that and fix them when you run into them.

==> fixed the one mentioned

  Also throughout the document, you RECOMMEND listing things in
increasing order.  Remembering that RECOMMEND is synonymous with
SHOULD, which means that implementors need to understnd the
consequences of choosing not to abide by what you say, it's usually
best to explain.  In this case, I presume it's to minimize errors such
as duplication that can occur when things aren't sorted.  I suggest
explaining that once, early in the document somewhere.

==> added that for the recommendation to list class elements in order

-- Section 1 --

    This memo describes a method of using Extensible Markup Language
  (XML) to describe the algorithm used to determine whether a given
    identifier label is permitted, and under which conditions, based on
    the code points it contains and their context.

Total nit: Using "describe" twice in the same sentence is awkward, and
"describes a method ... to describe an algorithm ... to determine
whether ..." also feels awkward.  I'd split the sentence, but you can
take this suggestion or leave it, as you see best (I also prefer
"document" to "memo"):

  REPLACEMENT PARAGRAPH
    This document specifies a method of using Extensible Markup Language
    (XML) to describe Label Generation Rulesets (LGRs).  LGRs are
    algorithms used to determine whether, and under what conditions, a
    given identifier label is permitted, based on the code points it
    contains and their context.  These algorithms comprise a list of
    permissible code points, variant code point mappings, and a set of
    rules that act on the code points and mappings.  LGRs form part of
    an administrator's policies.  In deploying internationalized domain
    names (IDNs), they have also been known as IDN tables or variant
    tables.
END

==> incorporated as is.

-- Section 2 --
I'm rather a stickler for parallel lists.  The format of this list is
set as "<noun> <has this attribute>", but item four is not parallel
("Provide the ability...").  Can you rephrase it so it is?

==> matched preceding "An LGR needs to .." Not super elegant, but parallel.

  -- Section 4.2 --
You say that the "meta" and "rules" elements are optional, and we
assume that "data" is required because you don't say it's optional.
It's easy and clearer to say << This is followed by a required "data"
element ...>>

==> while we state that everything is required unless noted, in this
     case noting the required nature of <data> seems fine.

  A document MUST contain exactly one "lgr" element.  Each "lgr"
    element MUST contain zero or one "meta" element, contain exactly one
    "data" element, and contain zero or one "rules" element; and these
    three elements MUST be in that order.

Another total nit, with no response needed: In the second sentence, I
would remove the second and third "contain".

==> done

A minor question: Why are you making the order significant?  Why is
that important?

==> see discussion - no text changes so far.

  -- Section 4.3 --

    The "meta" element is RECOMMENDED to express metadata associated
    within the LGR.

This can be read two ways: that the presence of "meta" is recommended,
or that it is recommended that "meta" express something.  I think
you're aiming for the former.  I don't think "within" works here, and
I think you mean "with".

==> "within" --> "part of" and associated changes. I also added the
     note: Metadata allow the LGR document to become fully self-documenting,
     for example if rendered in a human readable format by an appropriate tool.

    (I have written such a tool for LGR presentation and by picking up
     the metadata it can eliminate the need secondary documents for all but
     the most complex LGRs)

Also, the next section title introduces an
element that wasn't previously mentioned, and I have to look at the
section nesting to realize that it's meant to be a sub-element of
"meta".  I suggest this, with a pargraph break after it:

NEW
    The "meta" element expresses metadata associated with the LGR,
    and the element SHOULD be included.  The following subsections
    describe elements that may appear within the "meta" element.
END

==> incorporated

  I suggest that you put the element names in quotes in the sub-section titles.

==> not done - we played with "decorating" the section headers but abandoned that.

    My personal preference was to have all <elements> and "attributes"
    decorated as shown in this sentence.

-- Section 4.3.7 --
Are there global "Unicode properties", or are we always talking about
properties of a specific Unicode character?

==> The Unicode standard may drop "character" because in its
      context this is understood, likewise, when it drops the qualifier
      "Unicode" in its context it may be understood.

  You use "character
properties" and "Unicode properties" apparently interchangably.
Please pick one throughout the document (I think "character
properties" is better) and use it consistently, lest people think
you're talking about two different things.

==> There are not that many instances so, for the avoidance of doubt,
     I resolved all of them to "Unicode character properties" except
     if the source (Unicode Standard) appears in the same sentence.

     We want to be very specific that any other source of character
     properties cannot be used with the "property" attribute.

  There's a spurious "in" in the first sentence of the second paragraph.

==> fixed

  -- Section 5 --

    Discrete permissible code points or code point sequences are declared
    with a "char" element, e.g.

You neither show nor explain how code point sequences are declared
with a "char" element.  I suggest a forward reference, so readers
don't wonder, as I did:

NEW
    Discrete permissible code points or code point sequences are declared
    with a "char" element.  See Section 5.1 for the description of
    sequences.  A single code point example:
END

Or maybe you like this better?:

NEW
    Discrete permissible code points or code point sequences (see Section
    5.1) are declared with a "char" element.  A single code point example:
END

==> good suggestion.

  Also, you say that 'A "range" element has no child elements,' but you
say nothing about "char" elements.  Maybe add 'A "char" element may
have cild elements; see the subsections below for further
information.' ?

    Code points must be expressed in uppercase, hexadecimal, and zero
    padded to a minimum of 4 digits.  In other words, they are
    represented according to the standard Unicode convention but without
    the prefix "U+".

I think you kind of bury the lede here, and I would reverse the order
of the description (and, as it's a protocol requirement, I'd use
"MUST"):

NEW
    Code points MUST be represented according to the standard
    Unicode convention but without the prefix "U+": they are
    expressed in uppercase hexadecimal, and are zero-padded
    to a minimum of 4 digits.
END

==>OK

==>Also added language why we recommend ordering the Char entries.

-- Section 5.1 --

    In addition, doing so allows the choice of
    either specifying a prohibited or a required context.

You need to factor out "specifying" (or repeat it):

NEW
    In addition, doing so allows the choice of
    specifying either a prohibited or a required context.
END

==> OK

  The last paragraph doesn't belong in this section (it has nothing to
do with sequences).  Please move it up to the end of Section 5.

==>moved.

  -- Section 5.2 --

May I suggest this?:

OLD
    The content condition is met when
    the rule specified in the "when" attribute is matched.
    Alternatively, a "not-when" attribute may be used for a rule that
    must not be matched.
NEW
    The content condition is met when
    the rule specified in the "when" attribute is matched or when
    the rule specified in the "not-when" attribute fails to match.
END

    In the following example no digit from either range must
    occur in a label that mixes digits from both ranges

"no digit must occur" is hard to parse; please make it "may occur", or
"is allowed" (and I'd put a comma after "ecample").

    If a contextual condition is not satisfied for any code point in a
    label, the label is invalid, see Section 7.5.

This is ambiguous: it can be read to mean that all code points have to
fail in order to make the label invalid (if there is no code point in
the label for which the contextual condition is satisfied).  Let's
rephrase to eliminate the ambiguity:

NEW
    If a label contains one or more core points that fail to satisfy
    a contextual condition, the label is invalid (see Section 7.5).
END

==>OK

-- Section 5.3.2 --

    Making this relation explicit allows a generalization of the "type"
    attribute from directly reflecting dispositions to a more
    differentiated intermediate value that used in the resolution of
    label disposition.

Is there a missing or extra word around "value that used"?

==> yep. Found a few spare ones and put them in.

  -- Section 5.3.5 --

    While only a single "when" or "not-when" attribute MUST be applied to
    any "var" element, multiple "var" elements using the same mapping,
    but different "when" or "not-when" attributes MAY be specified.  The
    combination of mapping and conditional context defines a unique
    variant.

I don't think you mean the MUST the way it seems here.  You're not
saying that every "var" MUST have a "when" or a "not-when".  Rather,
you've phrased a MUST NOT as a MUST.  Let's turn it back into MUST
NOT:

NEW
    While a "var" element MUST NOT contain multiple conditions (it
    is only allowed a single "when" or "not-when" attribute), multiple
    "var" elements using the same mapping MAY be specified with
    different "when" or "not-when" attributes.  The combination of
    mapping and conditional context defines a unique variant.
END

==>OK

-- Section 5.5 --

    Formally, it MUST correspond to
    the XML 1.0 Nmtoken (Name token) production.

Because conforming to Nmtoken is a requirement, it would be good to
have a citation to where in the XML spec Nmtoken is defined.

==> not done - could use help locating

  -- Section 6.2.5 --
I don't understand why "intersection" is limited to two classes: the
intersection of more than two sets is as well defined as the union is.
Please explain (at least to me).

==> we started out allowing only binary (or unary) operators. Unions
     from more than two sets proved a rather common case, so we dropped that
     restriction. However, internally in my tool, for example, I simply record
     that as a fully parenthesized expression involving pairs of classes.

     In over a year of using this schema for writing 40 or so LGRs, I've
     made only very limited use of any other operator. So the use case
     seems to not be there. (And I don't think I've used intersection at all)

     What I observe is that the situation where you define "consonants"
     and "vowels" is more common, than defining "letters" and then
     subtracting the consonants to get the vowels.

     But, having intersection, even if only pairwise, means that if some
     language needs it, any type of set operation can be composed, so
     if that language is notationally a bit more complex, it's not
     crippling.

  -- Section 6.3 --
Pet peeve (sorry): Correct use of "comprises" is as a symonym of "is
composed of"... so "is comprised of" is, in correct usage, wrong and
nonsensical.  I know this is a lost battle, but I'd feel oh, so much
better if you would change "is comprised of" in Sections 6.3 and
Appendix A to "comprises", so "Each rule is comprised of a series of
matching operators..."

==> hope you feel better now

-- Section 7 and subsections --
This was a real slog to get through, and I'm quite sure I didn't
follow a lot of it.  I have to take your word that people who are
actually doing this stuff can follow it and get it right, including
all the variant stuff, including the reflexive variants and the
any-variant vs only-variants vs everything else.  I just can't imagine
needing to do anything complicated with this and actually getting it
right.  (This is just a comment; there's nothing actionable here.)

-- Section 9 --
Nit: in the first sentence, it's "cater to", not "cater for".

==> there seem to be regional differences in prepositions, I changed the verb

  The second paragraph is confusing because of the "these" and "those"
and "this", and even the first paragraph suffers from a "these".  The
problem is this: does "these formats" refer to the ones in the cited
RFCs or the ones in this document?  See what I mean?

How about this (and did I get this right?):

NEW
    Both [RFC3743] and [RFC4290] provide different grammars for IDN
    tables.  The formats in those documents are unable to fully cater
    to the increased requirements of contemporary IDN variant policies.

    This specification provides a superset of the functionality
    provided by the older IDN table formats, so any table expressed
    in those formats can be expressed in this new format.  Automated
    conversion can be conducted between tables conformant with the
    grammar specified in each document.
END

==> OK

  -- Section 10 --
Transmission is not transmitted.  The LGR is.

OLD
    Transmission of a well-formed LGR in accordance with this
    specification SHOULD be transmitted with a media type of
    "application/lgr+xml".
NEW
    Well-formed LGRs that comply with this specification SHOULD be
    transmitted with a media type of "application/lgr+xml".
END

==> OK

-- Appendix A --

    In practice, any LGR that includes the hyphen might also contain
    rules invalidating any labels beginning, ending, and containing a
    hyphen in the third and fourth positions as required by [RFC5891].

Mm, then why didn't you put those in the example?  It seems to me that
getting the rules right is the hard part, and that using real-world
examples when you can would be helpful.

==> because it's supremely ugly :), but I've put it in.

  --
Barry, ART Director

_______________________________________________
Lager mailing list
Lager@ietf.org
https://www.ietf.org/mailman/listinfo/lager