[Lager] AD review of draft-ietf-lager-specification-10

Barry Leiba <barryleiba@computer.org> Sun, 13 March 2016 14:57 UTC

MIME-Version: 1.0
Sender: barryleiba@gmail.com
Date: Sun, 13 Mar 2016 14:57:15 +0000
Message-ID: <CALaySJJP0deDOxCs8YSPr72pfyRUsbZBVE9XO=_4d2AvEhVEtQ@mail.gmail.com>
From: Barry Leiba <barryleiba@computer.org>
To: draft-ietf-lager-specification@ietf.org
Content-Type: text/plain; charset="UTF-8"
Archived-At: <http://mailarchive.ietf.org/arch/msg/lager/whczAPI2hNOnKOWD5kScczlYNPE>
Cc: lager@ietf.org
Subject: [Lager] AD review of draft-ietf-lager-specification-10
Precedence: list

Hi, authors, shepherd, and working group.
Here's my AD review of draft-ietf-lager-specification-10.

I have a lot of comments, but most are just editorial things that I'd
like you to consider, but that won't delay last call.  I'm putting the
few substantive questions up here, and I'd like them answered before
we do last call.  Any other comments you can address before last call,
or in parallel with it.

------------------------

These are the questions/comments that I'd like addressed now:

-- Section 5.3.5 --

   Two contexts may be complementary, as in the following example, which
   shows ARABIC LETTER TEH MARBUTA (U+0629) as a variant of ARABIC
   LETTER ALEF MAKSURA (U+0649), but with two different types.

       <char cp="0647" >
         <var cp="0629" not-when="arabic-final" type="blocked" />
         <var cp="0629" when="arabic-final" type="allocatable" />
       </char>

There's an error here: the text says "ALEF MAKSURA (U+0649)", but the
example says "<char cp="0647" >"

-- Section 6.2.5 --
Am I the only one who doesn't understand the distinction between
"difference" and "symmetric difference"?  I had assumed that
"difference" contained all those in one or the other, but not both...
but that seems to be what "symmetric difference" is (you say it's
xor).  Please explain (to me and in the document).

-- Section 6.3.9 --
I'm confused here, so maybe you can help me understand:
It seems to me that in this example, the ranges are defined in terms
of the "mixed-digits" rule, and that the rule is defined in terms of
the ranges.  How does such a self-referential thing work?

-- Section 8.5 --

   Because of symmetry and transitivity, all variant mappings form
   disjoint sets in which each of them is a variant of all other
   members.

I can't sort this one out.  "Each of them" appears to refer to each
set, and I don't know how a set can be a variant of a member.  But if
I instead assume that "each of them" is meant to refer to "all variant
mappings", then I don't understand what the sets have to do with
anything.  I'm also not following how the disjoint sets are formed in
the first place.  Can you explain/clarify?

-- Section 11.1 --
I'm amazed at how many drafts cite RFC 6838, but have not updated
their registration template to conform with it.  Please see RFC 6838,
Section 5.6, and make sure your template includes everything from
there.  Please also post the updated template and draft name to the
<media-types@ietf.org> list for review (this can be done in parallel
with last call, and doesn't have to delay things, but I'd like it to
be done).

-- Section 11.3 --
For Private Dispositions, what is the registration policy?  First Come
First Served?  You need to say.

-- Section 12 --

   If a system that is querying an identifier list (such as a domain
   zone) that uses the rules in this memo, and those rules are not
   implemented correctly, and that system is relying on the rules being
   applied, the system might fail if the rules are not applied in a
   predictable fashion.  This could cause security problems for the
   querying system.

First, I think you have an extra "that" after the parentheses.
Second, can you be more specific than "This could cause security
problems"?  What sort of problems?  How can they be mitigated?  This
needs to be something more than "bad implementations can result in bad
things."

------------------------

These are the editorial things that can wait:

There are a number of number-agreement errors in the document (for
example, "A conditional context rules is a specialized form of WLE").
The RFC Editor will fix these, so I didn't call them out, but you
should be alert to that and fix them when you run into them.

Also throughout the document, you RECOMMEND listing things in
increasing order.  Remembering that RECOMMEND is synonymous with
SHOULD, which means that implementors need to understnd the
consequences of choosing not to abide by what you say, it's usually
best to explain.  In this case, I presume it's to minimize errors such
as duplication that can occur when things aren't sorted.  I suggest
explaining that once, early in the document somewhere.

-- Section 1 --

   This memo describes a method of using Extensible Markup Language
   (XML) to describe the algorithm used to determine whether a given
   identifier label is permitted, and under which conditions, based on
   the code points it contains and their context.

Total nit: Using "describe" twice in the same sentence is awkward, and
"describes a method ... to describe an algorithm ... to determine
whether ..." also feels awkward.  I'd split the sentence, but you can
take this suggestion or leave it, as you see best (I also prefer
"document" to "memo"):

REPLACEMENT PARAGRAPH
   This document specifies a method of using Extensible Markup Language
   (XML) to describe Label Generation Rulesets (LGRs).  LGRs are
   algorithms used to determine whether, and under what conditions, a
   given identifier label is permitted, based on the code points it
   contains and their context.  These algorithms comprise a list of
   permissible code points, variant code point mappings, and a set of
   rules that act on the code points and mappings.  LGRs form part of
   an administrator's policies.  In deploying internationalized domain
   names (IDNs), they have also been known as IDN tables or variant
   tables.
END

-- Section 2 --
I'm rather a stickler for parallel lists.  The format of this list is
set as "<noun> <has this attribute>", but item four is not parallel
("Provide the ability...").  Can you rephrase it so it is?

-- Section 4.2 --
You say that the "meta" and "rules" elements are optional, and we
assume that "data" is required because you don't say it's optional.
It's easy and clearer to say << This is followed by a required "data"
element ...>>

   A document MUST contain exactly one "lgr" element.  Each "lgr"
   element MUST contain zero or one "meta" element, contain exactly one
   "data" element, and contain zero or one "rules" element; and these
   three elements MUST be in that order.

Another total nit, with no response needed: In the second sentence, I
would remove the second and third "contain".

A minor question: Why are you making the order significant?  Why is
that important?

-- Section 4.3 --

   The "meta" element is RECOMMENDED to express metadata associated
   within the LGR.

This can be read two ways: that the presence of "meta" is recommended,
or that it is recommended that "meta" express something.  I think
you're aiming for the former.  I don't think "within" works here, and
I think you mean "with".  Also, the next section title introduces an
element that wasn't previously mentioned, and I have to look at the
section nesting to realize that it's meant to be a sub-element of
"meta".  I suggest this, with a pargraph break after it:

NEW
   The "meta" element expresses metadata associated with the LGR,
   and the element SHOULD be included.  The following subsections
   describe elements that may appear within the "meta" element.
END

I suggest that you put the element names in quotes in the sub-section titles.

-- Section 4.3.5 --

   The attribute SHOULD be a valid MIME
   type.

You correctly call it a "media type" in the first sentence of the
paragraph; we're trying not to use "MIME type" any more.  Please
change this to "The attribute SHOULD be a valid media type from the
IANA Media Types registry."  (You can optionally include a URL for
that: http://www.iana.org/assignments/media-types )

-- Section 4.3.6 --

   The "validity-start" and "validity-end" elements are OPTIONAL
   elements that describe the time period from which the contents of the
   LGR become valid (i.e. are used in registry policy), and the contents
   of the LGR cease to be used.

You're missing something here:

NEW
   The "validity-start" and "validity-end" elements are OPTIONAL
   elements that describe the time period from which the contents of the
   LGR become valid (are used in registry policy), and time when the
   contents of the LGR cease to be used, respectively.
END

One thing I'm unsure of, and that you might want to carify: When I
read "validity-end", I assume that it's the last valid date, and that
the LGR is valid on that date.  But the description says "the time
when the contents of the LGR cease to be used," which implies that
they are NOT valid on that date.  Which is it?

-- Section 4.3.7 --
Are there global "Unicode properties", or are we always talking about
properties of a specific Unicode character?  You use "character
properties" and "Unicode properties" apparently interchangably.
Please pick one throughout the document (I think "character
properties" is better) and use it consistently, lest people think
you're talking about two different things.

There's a spurious "in" in the first sentence of the second paragraph.

-- Section 5 --

   Discrete permissible code points or code point sequences are declared
   with a "char" element, e.g.

You neither show nor explain how code point sequences are declared
with a "char" element.  I suggest a forward reference, so readers
don't wonder, as I did:

NEW
   Discrete permissible code points or code point sequences are declared
   with a "char" element.  See Section 5.1 for the description of
   sequences.  A single code point example:
END

Or maybe you like this better?:

NEW
   Discrete permissible code points or code point sequences (see Section
   5.1) are declared with a "char" element.  A single code point example:
END

Also, you say that 'A "range" element has no child elements,' but you
say nothing about "char" elements.  Maybe add 'A "char" element may
have cild elements; see the subsections below for further
information.' ?

   Code points must be expressed in uppercase, hexadecimal, and zero
   padded to a minimum of 4 digits.  In other words, they are
   represented according to the standard Unicode convention but without
   the prefix "U+".

I think you kind of bury the lede here, and I would reverse the order
of the description (and, as it's a protocol requirement, I'd use
"MUST"):

NEW
   Code points MUST be represented according to the standard
   Unicode convention but without the prefix "U+": they are
   expressed in uppercase hexadecimal, and are zero-padded
   to a minimum of 4 digits.
END

-- Section 5.1 --

   In addition, doing so allows the choice of
   either specifying a prohibited or a required context.

You need to factor out "specifying" (or repeat it):

NEW
   In addition, doing so allows the choice of
   specifying either a prohibited or a required context.
END

The last paragraph doesn't belong in this section (it has nothing to
do with sequences).  Please move it up to the end of Section 5.

-- Section 5.2 --

May I suggest this?:

OLD
   The content condition is met when
   the rule specified in the "when" attribute is matched.
   Alternatively, a "not-when" attribute may be used for a rule that
   must not be matched.
NEW
   The content condition is met when
   the rule specified in the "when" attribute is matched or when
   the rule specified in the "not-when" attribute fails to match.
END

   In the following example no digit from either range must
   occur in a label that mixes digits from both ranges

"no digit must occur" is hard to parse; please make it "may occur", or
"is allowed" (and I'd put a comma after "ecample").

   If a contextual condition is not satisfied for any code point in a
   label, the label is invalid, see Section 7.5.

This is ambiguous: it can be read to mean that all code points have to
fail in order to make the label invalid (if there is no code point in
the label for which the contextual condition is satisfied).  Let's
rephrase to eliminate the ambiguity:

NEW
   If a label contains one or more core points that fail to satisfy
   a contextual condition, the label is invalid (see Section 7.5).
END

-- Section 5.3.2 --

   Making this relation explicit allows a generalization of the "type"
   attribute from directly reflecting dispositions to a more
   differentiated intermediate value that used in the resolution of
   label disposition.

Is there a missing or extra word around "value that used"?

-- Section 5.3.5 --

   While only a single "when" or "not-when" attribute MUST be applied to
   any "var" element, multiple "var" elements using the same mapping,
   but different "when" or "not-when" attributes MAY be specified.  The
   combination of mapping and conditional context defines a unique
   variant.

I don't think you mean the MUST the way it seems here.  You're not
saying that every "var" MUST have a "when" or a "not-when".  Rather,
you've phrased a MUST NOT as a MUST.  Let's turn it back into MUST
NOT:

NEW
   While a "var" element MUST NOT contain multiple conditions (it
   is only allowed a single "when" or "not-when" attribute), multiple
   "var" elements using the same mapping MAY be specified with
   different "when" or "not-when" attributes.  The combination of
   mapping and conditional context defines a unique variant.
END

-- Section 5.5 --

   Formally, it MUST correspond to
   the XML 1.0 Nmtoken (Name token) production.

Because conforming to Nmtoken is a requirement, it would be good to
have a citation to where in the XML spec Nmtoken is defined.

-- Section 6.2.5 --
I don't understand why "intersection" is limited to two classes: the
intersection of more than two sets is as well defined as the union is.
Please explain (at least to me).

-- Section 6.3 --
Pet peeve (sorry): Correct use of "comprises" is as a symonym of "is
composed of"... so "is comprised of" is, in correct usage, wrong and
nonsensical.  I know this is a lost battle, but I'd feel oh, so much
better if you would change "is comprised of" in Sections 6.3 and
Appendix A to "comprises", so "Each rule is comprised of a series of
matching operators..."

-- Section 7 and subsections --
This was a real slog to get through, and I'm quite sure I didn't
follow a lot of it.  I have to take your word that people who are
actually doing this stuff can follow it and get it right, including
all the variant stuff, including the reflexive variants and the
any-variant vs only-variants vs everything else.  I just can't imagine
needing to do anything complicated with this and actually getting it
right.  (This is just a comment; there's nothing actionable here.)

-- Section 9 --
Nit: in the first sentence, it's "cater to", not "cater for".

The second paragraph is confusing because of the "these" and "those"
and "this", and even the first paragraph suffers from a "these".  The
problem is this: does "these formats" refer to the ones in the cited
RFCs or the ones in this document?  See what I mean?

How about this (and did I get this right?):

NEW
   Both [RFC3743] and [RFC4290] provide different grammars for IDN
   tables.  The formats in those documents are unable to fully cater
   to the increased requirements of contemporary IDN variant policies.

   This specification provides a superset of the functionality
   provided by the older IDN table formats, so any table expressed
   in those formats can be expressed in this new format.  Automated
   conversion can be conducted between tables conformant with the
   grammar specified in each document.
END

-- Section 10 --
Transmission is not transmitted.  The LGR is.

OLD
   Transmission of a well-formed LGR in accordance with this
   specification SHOULD be transmitted with a media type of
   "application/lgr+xml".
NEW
   Well-formed LGRs that comply with this specification SHOULD be
   transmitted with a media type of "application/lgr+xml".
END

-- Section 11.1 --

   The media type "application/lgr+xml" should be registered to denote
   transmission of label generation rulesets that are compliant with
   this specification, in accordance with [RFC6838].

The last phrase is another dangling modifier: to what does it refer?
Put it closer to that:

NEW
   The media type "application/lgr+xml" should be registered in
   accordance with [RFC6838] as the type for transmission of label
   generation rulesets that are compliant with this specification.
END

-- Appendix A --

   In practice, any LGR that includes the hyphen might also contain
   rules invalidating any labels beginning, ending, and containing a
   hyphen in the third and fourth positions as required by [RFC5891].

Mm, then why didn't you put those in the example?  It seems to me that
getting the rules right is the hard part, and that using real-world
examples when you can would be helpful.

-- 
Barry, ART Director

[Lager] AD review of draft-ietf-lager-specificati… Barry Leiba
Re: [Lager] AD review of draft-ietf-lager-specifi… Marc Blanchet
Re: [Lager] AD review of draft-ietf-lager-specifi… Asmus Freytag
Re: [Lager] AD review of draft-ietf-lager-specifi… Martin J. Dürst
[Lager] Fwd: AD review of draft-ietf-lager-specif… Asmus Freytag
Re: [Lager] AD review of draft-ietf-lager-specifi… Barry Leiba
Re: [Lager] AD review of draft-ietf-lager-specifi… Barry Leiba
Re: [Lager] AD review of draft-ietf-lager-specifi… Asmus Freytag
Re: [Lager] AD review of draft-ietf-lager-specifi… Kim Davies
Re: [Lager] AD review of draft-ietf-lager-specifi… Marc Blanchet
Re: [Lager] AD review of draft-ietf-lager-specifi… Alexey Melnikov