Re: Syntax

Stephane Bortzmeyer <> Mon, 08 January 2007 20:47 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1H41Or-0001iw-PX; Mon, 08 Jan 2007 15:47:21 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1H41Oq-0001fc-PH for; Mon, 08 Jan 2007 15:47:20 -0500
Received: from ([] by with esmtp (Exim 4.43) id 1H41On-0000E4-3z for; Mon, 08 Jan 2007 15:47:20 -0500
Received: by (Postfix, from userid 10) id EB972240813; Mon, 8 Jan 2007 21:47:09 +0100 (CET)
Received: by (Postfix, from userid 1000) id 909B0131AD; Mon, 8 Jan 2007 21:46:18 +0100 (CET)
Date: Mon, 8 Jan 2007 21:46:18 +0100
From: Stephane Bortzmeyer <>
To: Julian Reschke <>
Message-ID: <>
References: <> <> <>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
X-Transport: UUCP rules
X-Operating-System: Debian GNU/Linux 3.1
User-Agent: Mutt/1.5.9i
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 52e1467c2184c31006318542db5614d5
Subject: Re: Syntax
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: DIscussion on state machine specification in IETF protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

On Mon, Jan 08, 2007 at 10:31:14AM +0100,
 Julian Reschke <> wrote 
 a message of 48 lines which said:

> Choosing characters for identifiers: again, just borrow from
> somewhere else, such as <>).

Did anyone already tried to convert it in ABNF? The production
BaseChar in the XML standard is a bit frightening and may exercice the
limits of some programs. Implementation reports are welcome.

Otherwise, what do you think of the solution used in RFC 4646?

   ASCCHAR    = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26
   UNICHAR    = "&#x" 2*6HEXDIG ";"
   Characters from outside the US-ASCII [ISO646] repertoire, as well as
   the AMPERSAND character ("&", %x26) when it occurs in a field-body,
   are represented by a "Numeric Character Reference" using hexadecimal
   notation in the style used by [XML10] (see
   <>).  This consists of the
   sequence "&#x" (%x26.23.78) followed by a hexadecimal representation
   of the character's code point in [ISO10646] followed by a closing
   semicolon (%x3B).  For example, the EURO SIGN, U+20AC, would be
   represented by the sequence "&#x20AC;".  Note that the hexadecimal
   notation MAY have between two and six digits.

> I think inventing a new format, but not taking I18N is very hard to
> defend. As far as I can tell, there's no real chance to get it
> published.

Hmmm, how many IETF formats are in Unicode? (Apart from those based
only on XML, like Atom in RFC 4287.) ABNF is not, for instance (right,
it is not a few format, the RFC is recent but it derives from an older

Cosmogol mailing list