Re: [Lager] Stephen Farrell's Discuss on draft-ietf-lager-specification-11: (with DISCUSS and COMMENT)

"Asmus Freytag (c)" <asmusf@ix.netcom.com> Thu, 28 April 2016 19:58 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: lager@ietfa.amsl.com
Delivered-To: lager@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 90E2E12D953; Thu, 28 Apr 2016 12:58:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.719
X-Spam-Level:
X-Spam-Status: No, score=-2.719 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); domainkeys=pass (384-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id afsgyxzaqRxd; Thu, 28 Apr 2016 12:58:52 -0700 (PDT)
Received: from elasmtp-dupuy.atl.sa.earthlink.net (elasmtp-dupuy.atl.sa.earthlink.net [209.86.89.62]) by ietfa.amsl.com (Postfix) with ESMTP id D817412D946; Thu, 28 Apr 2016 12:58:51 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=ZF6MQA+iafDgXJXbmLJdpAHgfkRNKREEdZF8RXdlLpbzRL7T4uZwO5Coybax7QK0; h=Received:Subject:To:References:Cc:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:X-ELNK-Trace:X-Originating-IP;
Received: from [71.212.2.16] (helo=[192.168.0.4]) by elasmtp-dupuy.atl.sa.earthlink.net with esmtpa (Exim 4.67) (envelope-from <asmusf@ix.netcom.com>) id 1avs4j-0002pp-65; Thu, 28 Apr 2016 15:58:13 -0400
To: Stephen Farrell <stephen.farrell@cs.tcd.ie>, Alexey Melnikov <aamelnikov@fastmail.fm>
References: <20160421102401.19578.54300.idtracker@ietfa.amsl.com> <1461412191.851961.587365345.53A5CC4C@webmail.messagingengine.com> <571B634F.9070600@cs.tcd.ie>
From: "Asmus Freytag (c)" <asmusf@ix.netcom.com>
Message-ID: <df5235b5-314d-274f-0579-de5de36b7d85@ix.netcom.com>
Date: Thu, 28 Apr 2016 12:58:18 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.0
MIME-Version: 1.0
In-Reply-To: <571B634F.9070600@cs.tcd.ie>
Content-Type: multipart/alternative; boundary="------------ED1501078810B560D8C7200C"
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2b484d7840976cb7ecb3e72a58a936c4bd125d40e93ee5939350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 71.212.2.16
Archived-At: <http://mailarchive.ietf.org/arch/msg/lager/-wwX_fsNzG7Vm6CUGHzUTLR6FQg>
Cc: draft-ietf-lager-specification@ietf.org, audric.schiltknecht@viagenie.ca, The IESG <iesg@ietf.org>, lager@ietf.org
Subject: Re: [Lager] Stephen Farrell's Discuss on draft-ietf-lager-specification-11: (with DISCUSS and COMMENT)
X-BeenThere: lager@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Label Generation Rules <lager.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lager>, <mailto:lager-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lager/>
List-Post: <mailto:lager@ietf.org>
List-Help: <mailto:lager-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lager>, <mailto:lager-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Apr 2016 19:58:55 -0000

Following on our discussion by e-mail, here is suggested wording.

A./

On 4/21/2016 3:24 AM, Stephen Farrell wrote:
> (1) section 5: this says code points MUST be 4 hex digits.
> What is s/w supposed to do if it sees only 2 hex digits?
> Should it ignore the range or char element or fail to process
> the entire LGR document? I think the same issue applies to
> other uses of 2119 language as well, (e.g. "MUST be treated as
> an error at the end of p19), so I'd recommend you state some
> kind of general rule if you can.

One would expect the schema validator to catch such issues.
It's a simple matter to in to provide explicit language in Section 4 
that an LGR that does not conform to the schema in Appendix D is to be 
rejected.

"An LGR is expressed as a well-formed XML Document <xref target="XML"/> 
that conforms to the schema defined in <xref target="schema"/>."

Then we only need to discuss any instances of constraints that are not 
enforced/enforceable by schema validation, and we do that, where those 
are discussed.
> (2) 5.2: when and not-when etc seem to me to allow for
> infinitely baroque representations of useless things like:
>
> 	<char cp="200D" when="cpisnot200D" />
> 	<classs "cpis200D">
> 		200D
> 	</class>
> 	<rule name="cpisnot200D">
> 		<start/>
> 			<complement><class by-ref="cpis200D"/></complement>
> 		<end/>
> 	</rule>
>
> What is parsing s/w supposed to do with structures like that?

There is no issue with parsing.

It's an example of deliberately obfuscatory design of context rules, but 
a conformant implementation should have no problem evaluating such an 
LGR. The result is that 200D is never valid in a label, because its 
conditional context (a single character label not containing 200D) is 
never matched.

Specifying contexts that cannot be matched by any label is not an error. 
In fact, it is a useful design feature for LGR templates. Such templates 
can describe optional code points that can be enabled/disabled by simply 
changing a context rule from never matching to always matching or vice 
versa. Incidentally, there are less obfuscatory ways to write a 
never-matching context, for example:
<rule name="never">
     <start />
     <end />
</rule>

> For example, how would you handle the likes of this or more
> convoluted but equivalent structures which could be delivered
> by accident or deliberately?  I think the response to this
> discuss point needs to either be a) all such constructs are
> automatically detectable and here's why, or else b) here's how
> s/w can handle that (without crashing or looping forever).

These kinds of convoluted constructs do not lead to crashes or looping. 
Rules are simply regular expressions, and no matter how convoluted a 
rule definition appears to be, it can always be converted (resolved) to 
a regex pattern which can then be used in a regex engine to do the matching.

> Note that I don't think that the "MUST be rejected" at the end
> of 6.1 provides an answer to this point.  (But if you do,
> please argue that.)

That text addresses a different issue altogether, so I agree entirely.

However, section 6.1 flowed badly and didn't mention the "rules" 
element, so I suggest the following tweak:


  6.1.
  <mailbox:///C:/Users/asmusf/AppData/Roaming/Thunderbird/Profiles/965oo05j.default/Mail/Local%20Folders/Inbox?number=3739746514#rfc.section.6.1>Basic
  Concepts

The "rules" element contains the specification of both context-based and 
whole Whole Label Evaluation (WLE) rules (Section 6.3 
<mailbox:///C:/Users/asmusf/AppData/Roaming/Thunderbird/Profiles/965oo05j.default/Mail/Local%20Folders/Inbox?number=3739746514#whole_label>), 
the character classes (Section 6.2 
<mailbox:///C:/Users/asmusf/AppData/Roaming/Thunderbird/Profiles/965oo05j.default/Mail/Local%20Folders/Inbox?number=3739746514#character_classes>) 
that they depend on and any actions (Section 7 
<mailbox:///C:/Users/asmusf/AppData/Roaming/Thunderbird/Profiles/965oo05j.default/Mail/Local%20Folders/Inbox?number=3739746514#actions>) 
that assign dispositions to labels based on rules or variant mappings.

A Whole Label Evaluation rule (WLE) is applied to the whole label. It is 
used to validate both original labels and any variant labels computed 
from them.

A conditional context rule is a specialized form of rule, that does not 
necessarily apply to the whole label, but may be specific to the context 
around a single code point or code point sequence. Certain code points 
in a label sometimes need to satisfy context-based rules, for example 
for the label to be considered valid, or to satisfy the context for a 
variant mapping (see the description of the "when" attribute inSection 
6.4 
<mailbox:///C:/Users/asmusf/AppData/Roaming/Thunderbird/Profiles/965oo05j.default/Mail/Local%20Folders/Inbox?number=3739746514#parameterized_context_rule>).



> (3) 6.3.4: While recursion is said to be disallowed, the "for
> which the complete definition has not been seen" is pretty odd
> for an XML specification, as it means that you need a full
> ordering for the elements in the document (or at least within
> the <rules> element).  That means if some editor decodes from
> disk and then encodes to disk, you need to be sure that the
> order is preserved or else you break the "has been seen"
> constraint. (And if you do that, then you're allowing rules to
> mutually refer to one another, which brings us back to discuss
> point 2.) 7.4 maybe has a similar issue. I think for this you
> could simply state up front that these XML documents MUST NOT
> be re-ordered during editing. (Or else add some kind of
> attribute to help with ordering which seems ickky.)

In line with the discussion we had on this issue, the following text
are proposed be added to section 4.2

    Some elements that are direct or nested child elements of the
    <rules> element
    MUST be placed in a specific relative order to other elements for
    the LGR to be valid.
    An LGR that violates these constraints MUST be rejected. In other
    cases, a different
    ordering would result in a changed specification.


> (4) section 12: I don't think this is at all sufficient.
> Missing aspects include: Imprecise LGRs could result in
> registration of identifiers that are unexpected in many other
> protocols, leading to new vulnerabilities; LGRs could be
> deliberately manipulated so as to create such imprecision, and
> if I could feed one such to a registry (e.g. via some nice
> friendly looking git repo) then I could exploit the vuln later
> for fun and profit - that seems to call for some
> interoperable form of data integrity and origin
> authentication (is the lager WG doing that?) and lastly (for
> now), the XML language defined here is very flexible as noted
> earlier - I would expect there to be many implementation bugs
> in new code that attempts to parse this language. So I think
> the security considerations needs to be re-done really.

My reading of this comment is that it presupposes a particular use-case.

The use case we've identified so far is that these will substitute for 
IDN tables.

IDN tables are controlled by registries and define registry policy. As
such it would be up to the registry to have a policy that is well-defined,
and the registries would be in charge of developing or adopting an LGR
that matches their policies.

The LGR data format (and that's what it is) doesn't change anything in who
can set and enforce registry policy.

I also do not understand what an "imprecise" LGR would be. In contrast
to what they replace (IDN tables), LGRs are way more definite.

Issues of deployment of LGRs in IDN registration seem out of scope for
this document.
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
>
> - 4.3.6: my bet: validity-end is a mistake.
         this comment is unclear. Does it refer to the existence of the 
validity-end
         element? Or to it's mention in the text?

         For use in describing historical registration processes, a 
validity-end appears
         useful enough.
> - 4.3.8: odd that the examples have no URLs
         The text will include the formal reference of the Unicode 
Standard,
         including a URL.
> - 4.3 generally: I bet that these objects will evolve and will
> be stored in some VCS. I'm surprised so that there's no
> meta-data element to represent information about the version
> of say the previous instance of the object. If there were,
> then it would then be possible to establish a hash-chain which
> I'd have thought would be useful in disputes.

    I have no opinion on this. It's a fine idea and if anybody wants to
    suggest additional metadata, we can put them in. This is also not
    part of the DISCUSS, so we are under no obligation to rush this.


> - 5.3.1: "normally not only symmetric, but also transitive" -
> isn't that hugely ambiguous?  What is the programmer supposed
> to do? The answer is "nothing" I think so I'd just recommend
> to delete that paragraph and say that only the explicitly
> stated rules apply.

I suggest we reword the end of the second to last paragraph:
"As with symmetry, these transitive relations are only part of the LGR if
spelled out explicitly."

We could further add a suggestion to the programmer:
"Implementations that require an LGR to be symmetric
and transitive should verify this mechanically."
> - 6.2.4: "repertoire" - please define that term or reference a
> definition. I don't think it's a commonly understood technical
> term, at least for implementers.

  Added to Section 5, 2nd para: "Collectively, these are known as the 
repertoire."
> - 8.2: What "suitable optimisations" do you mean? It seems
> cruel to the implementer to say that but not even have a
> reference.
a reference to 8.5 has been added
> - 8.5: I think you could have explained this more clearly - the
> 2nd para there is not really useful as it's too opaque.

This is never going to make "easy reading", but here's an attempt at 
tweaking this a bit.

              8.5 *Checking Labels for Collision*
                   The obvious method for checking collision between 
labels is to generate the
                   fully permuted set of variants for one of them and 
see whether it contains the
                   other label as a member. As discussed above, this can 
be prohibitive and is not
                   necessary.

                  Because of symmetry and transitivity, all variant 
mappings form disjoint sets.
                  In each of these sets, the source and target of each 
mapping are also variants
                  of the sources and targets of all the other mappings. 
However, members of
                  two different sets are never variants of each other.

                  If two labels have code points at the same position 
that are members of two
                  different of these variant mapping sets, any variant 
labels of one, cannot be
                  variant labels of the other:  the sets of their 
variant labels are likewise disjoint.
                  Instead of  generating all permutations to compare all 
possible variants, it is
                  enough to find out whether code points at the same 
position belong to the
                  same variant set or not.

                 For that, it is sufficient to substitute an "index" 
mapping that identifies the
                 set. This index mapping could be, for  example, the 
variant mapping for which
                 the target code point (or sequence) comes first in some 
sorting order. This
                 index mapping would, in effect, identify the set of 
variant mappings for that
                 position.

                 To check collision then means generating a single 
variant label from the original
                 by substituting the respective "index" value for each 
code point. This results in an
                 "index label". Two labels collide whenever the index 
labels
                 for them are the same.

> - Appendix C: Another example of BNF being impressively
> more terse than XML :-)
>
>
>



_______________________________________________
Lager mailing list
Lager@ietf.org
https://www.ietf.org/mailman/listinfo/lager