Re: [salud] New version of the ABNF-syntax

worley@ariadne.com (Dale R. Worley) Mon, 18 February 2013 18:56 UTC

Return-Path: <worley@shell01.TheWorld.com>
X-Original-To: salud@ietfa.amsl.com
Delivered-To: salud@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0BE5921F8BE9 for <salud@ietfa.amsl.com>; Mon, 18 Feb 2013 10:56:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.624
X-Spam-Level:
X-Spam-Status: No, score=-2.624 tagged_above=-999 required=5 tests=[AWL=0.356, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, RCVD_IN_SORBS_WEB=0.619]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0HPHxCQfIeH6 for <salud@ietfa.amsl.com>; Mon, 18 Feb 2013 10:56:03 -0800 (PST)
Received: from TheWorld.com (pcls6.std.com [192.74.137.146]) by ietfa.amsl.com (Postfix) with ESMTP id D7D3521F8BCE for <salud@ietf.org>; Mon, 18 Feb 2013 10:56:02 -0800 (PST)
Received: from shell.TheWorld.com (svani@shell01.theworld.com [192.74.137.71]) by TheWorld.com (8.14.5/8.14.5) with ESMTP id r1IIsOWo014784; Mon, 18 Feb 2013 13:54:26 -0500
Received: from shell01.TheWorld.com (localhost.theworld.com [127.0.0.1]) by shell.TheWorld.com (8.13.6/8.12.8) with ESMTP id r1IIsNWg2077372; Mon, 18 Feb 2013 13:54:23 -0500 (EST)
Received: (from worley@localhost) by shell01.TheWorld.com (8.13.6/8.13.6/Submit) id r1IIsNFG2067515; Mon, 18 Feb 2013 13:54:23 -0500 (EST)
Date: Mon, 18 Feb 2013 13:54:23 -0500
Message-Id: <201302181854.r1IIsNFG2067515@shell01.TheWorld.com>
From: worley@ariadne.com
Sender: worley@ariadne.com
To: Laura Liess <laura.liess.dt@googlemail.com>
In-reply-to: <CACWXZj0Qq=Q=7necdgCPLeFAMbr3gg-WmBb-8UzegseEd_b_Qw@mail.gmail.com> (laura.liess.dt@googlemail.com)
References: <CACWXZj2WhAsmQ3Ku7bVpiNhbFxX7-vx9d9wWzzKgiVLSeKk__g@mail.gmail.com> <201302132105.r1DL5BM01801234@shell01.TheWorld.com> <CACWXZj0Qq=Q=7necdgCPLeFAMbr3gg-WmBb-8UzegseEd_b_Qw@mail.gmail.com>
Cc: salud@ietf.org
Subject: Re: [salud] New version of the ABNF-syntax
X-BeenThere: salud@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Sip ALerting for User Devices working group discussion list <salud.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/salud>, <mailto:salud-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/salud>
List-Post: <mailto:salud@ietf.org>
List-Help: <mailto:salud-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/salud>, <mailto:salud-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 18:56:04 -0000

(as an individual)

> From: Laura Liess <laura.liess.dt@googlemail.com>
> 
> Thank you for adding the rules. I like them :-). I have following
> proposals on the two open items.

Thanks!

> > 2) date:  <date> syntax is a subset of ??? in RFC 3339.
> 
> I would propose following text, which is a modification of the
> corresponding text in the rfc 4198:
> 
> "<date> is a date in ISO 8601 Extended Format ([CC]YY["-"MM["-"DD]]),
> and MUST correspond to a specific day on which the organization
> allocating the URN owned the domain name specified in the provider-id.
>  If not included, the default value for MM and DD is "01". "

This looks good to me.  RFC 3339 Appendix A contains the BNF from ISO
8061 for comparison.  It might be helpful to include a reference to
RFC 3339, since it seems to be the RFC people use as a reference for
date syntax and semantics.

However, there are two parts to this issue.  One is the syntax, which
you write as:

> "<date> is a date in ISO 8601 Extended Format ([CC]YY["-"MM["-"DD]]),
> [...]
>  If not included, the default value for MM and DD is "01". "

We want to add the default for the century as well.  This encompasses
what I wrote as (2b) and (2c).

And the other is the semantics:

> MUST correspond to a specific day on which the organization
> allocating the URN owned the domain name specified in the provider-id.

But the semantics is more complicated than that, it includes what I
wrote as (5a) and (5b):

> > 5a) A <provider> has an "owner", which is the entity that was the
> >     registered owner of the domain name <provider-id> on the date
> >     <date> (with respect to rule (2)).
> >
> > 5b) If an entity is the first registrant of a domain name
> >     <provider-id>, it owns all <provider>s with that <provider-id> and
> >     all <date>s preceding when it registered the domain name.

> > 1) provider-id:  This is intended to match the allowed ABNF for domain
> > names.  (What is the correct normative reference?  With
> > internationalization, is our syntax still correct?)
> 
> This reminds me of Alfred's comment #1 and the discussion
> http://www.ietf.org/mail-archive/web/salud/current/msg00260.html and
> previous threads.
> 
> By then, we (also Alfred) agreed on the following text, based on the RFC 5890:
> 
> "The <alert-indication>s are hierarchical identifiers.  The set of
> allowable characters is the same as that for domain names [RFC1123].
> Labels used in <standard-name> MUST comply with the syntax for Non
> Reserved LDH-labels [RFC5890].  Labels used in <private-name> MUST
> comply with the syntax for Non Reserved LDH-labels or for A-labels
> [RFC5890]. Comparisons MUST follow the comparison rules for the
> corresponding type of label.  Registered URNs MUST be transmitted as
> registered. A new name MUST NOT be registered if it is equal by the
> comparison rules above to an already registered name."
> 
> However, we changed the syntax since then. IMO, with Dale's the syntax
> after "name" is deleted,  the second and third sentences in the above
> text should be modified as follows:
> 
> " <Label>s used in && <alert-name>s excepting <label>s used in
> <provider-id>s && MUST comply with the syntax for Non Reserved
> LDH-labels [RFC5890].  Labels used in &&<provider-id>s&& MUST
> comply with the syntax for Non Reserved LDH-labels or for A-labels
> [RFC5890]. " (I put the modified text  between  &&..&&.)
> 
> We also could used different names for the two types of labels, e.g.
> "label" and ""provider-id-label", but I am niot sure we want this.

Ah, yes, I'm starting to remember that discussion.

I examine this text (as you have edited it) in detail:

> "The <alert-indication>s are hierarchical identifiers.  The set of
> allowable characters is the same as that for domain names [RFC1123].

I just looked at RFC 1123, and it doesn't seem to have any BNF.  So
it's a background reference for domain names, but not really
normative.  We may want to remove the reference, as the references to
RFC 5890 below tell the real syntax.

> <Label>s used in <alert-name>s excepting <label>s used in
> <provider-id>s MUST comply with the syntax for Non Reserved
> LDH-labels [RFC5890].  Labels used in <provider-id>s MUST
> comply with the syntax for Non Reserved LDH-labels or for A-labels
> [RFC5890].

(I use RFC 5890 figure 1 on page 10 as a reference for terms like
"A-label".)

There are three sorts of strings that we may want to treat separately:

One sort are domain names.  These are called <provider-id> in the
current BNF.  As far as I can tell from RFC 5890, valid domain names
(when represented on the wire) must be composed of:

- NR-LDH labels (ordinary ASCII labels that meet certain validity
  rules, especially not starting with "xn--", which indicates an
  A-label), and
- A-labels (non-ASCII labels encoded with Punycode).

The second sort are "privately-defined labels", that is, the <label>
part of a <private-name> and the <alert-indication>s that are <label>s
that follow it before the next <private-name>.  These are the ones
covered by:

    5c) The entity owning <provider> defines the meaning of a
	<private-name>, whether it is used as an <alert-category> or an
	<alert-indication>.

    5d) The entity owning <provider> within a <private-name> (in either an
	<alert-category> or an <alert-indication>) defines the meaning of
	each <alert-indication> which is a <label> following that
	<private-name> and preceding the next <alert-indication> which is
	a <private-name>.

We probably want to allow these to be internationalized as well.  If
so, we would define these as "NR-LDH / A-label".

The third sort are the "standardized labels", the <labels> that aren't
covered by the preceding two categories.  We probably want to
restrict these to traditional ASCII labels, so their syntax is
"NR-LDH".

OK, now that I've thought that through, I can compare it with your
text.  I make the following observations:

- In my listing of rules 1, 2, 3, etc. in
  http://www.ietf.org/mail-archive/web/salud/current/msg00341.html,
  though I specify when a <label> *is* defined by a private entity, I
  don't state when a <label> *is not*, that is, when its definition
  must be standard.  That would be accomplished by:  "(5x) The meaning
  of a <label> whose meaning is not defined by an entity according to
  the rules (5c) and (5d) is defined by standardization."  (That
  wording is not quite correct.)

- It would make discussion easier if the <label>s within a
  <provider-id> were a different nonterminal symbol.  Then we could
  say (taking an example from your text above), "<Label>s MUST comply
  with the syntax for Non Reserved LDH-labels [RFC5890]." rather than
  "<Label>s used in <alert-name>s excepting <label>s used in
  <provider-id>s MUST comply with the syntax for Non Reserved
  LDH-labels [RFC5890]."

- Your wording "<Label>s ... MUST comply with the syntax for Non
  Reserved LDH-labels [RFC5890]" means that only "ASCII labels" can be
  used for privately-defined alert-names.  Do we want this
  restriction?  Or do we want to allow entities to define
  "internationalized" alert-names?  To allow internationalization, we
  would say, "<labels> MUST comply with the syntax for Non Reserved
  LDH-labels or for A-labels [RFC5890]."

- I'm sure we want to restrict the standardized <label>s to be NR-LDH
  labels, i.e., ASCII.  We can state such a restriction in the RFC, or
  we can just assume that we will never define non-ASCII standardized
  labels.

> Comparisons MUST follow the comparison rules for the
> corresponding type of label.

I'm taking the comparison rule for internationalized domain names from
RFC 3490 section 2:

   In IDNA, equivalence of labels is defined in terms of the ToASCII
   operation, which constructs an ASCII form for a given label, whether
   or not the label was already an ASCII label.  Labels are defined to
   be equivalent if and only if their ASCII forms produced by ToASCII
   match using a case-insensitive ASCII comparison.  ASCII labels
   already have a notion of equivalence: upper case and lower case are
   considered equivalent.  The IDNA notion of equivalence is an
   extension of that older notion.  Equivalent labels in IDNA are
   treated as alternate forms of the same label, just as "foo" and "Foo"
   are treated as alternate forms of the same label.

That seems to mean that internationalized labels are compared by first
converting them into ASCII strings and then comparing the ASCII
strings case-insensitively.  Since Alert-Info URN domain names are
already in their ASCII form, this means that the URN domain names can
just be compared case-insensitively.

And the <label>s that are not part of domain names are also to be
compared case-insensitively.

So I think we can safely reduce all that to the statement that all
comparisons are to be done case-insensitively.  It would certainly
make life easier for implementers if we state that directly.  (I
suppose we need a reference to RFC 3490 in regard to comparing
internationalized domain names.)

> Registered URNs MUST be transmitted as
> registered. A new name MUST NOT be registered if it is equal by the
> comparison rules above to an already registered name."

We probably want to make it clear what we are concerned about:

    > Registered URNs MUST be transmitted with the case in which they are
    > registered. A new name MUST NOT be registered if it is
    > case-insensitively equal to an already registered name."

(I see that I am using "standardized" where the draft uses
"registered".)

Dale