Re: Syntax
"Clive D.W. Feather" <clive@demon.net> Wed, 10 January 2007 10:46 UTC
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1H4ayf-0007oB-85; Wed, 10 Jan 2007 05:46:41 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1H4aye-0007o4-VR for cosmogol@ietf.org; Wed, 10 Jan 2007 05:46:40 -0500
Received: from anchor-internal-1.mail.demon.net ([195.173.56.100]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1H4ayd-0007fw-Hw for cosmogol@ietf.org; Wed, 10 Jan 2007 05:46:40 -0500
Received: from finch-staff-1.server.demon.net (finch-staff-1.server.demon.net [193.195.224.1]) by anchor-internal-1.mail.demon.net with ESMTP� id l0AAkcNQ023282Wed, 10 Jan 2007 10:46:38 GMT
Received: from clive by finch-staff-1.server.demon.net with local (Exim 3.36 #1) id 1H4ayb-0008aE-00; Wed, 10 Jan 2007 10:46:37 +0000
Date: Wed, 10 Jan 2007 10:46:37 +0000
From: "Clive D.W. Feather" <clive@demon.net>
To: Julian Reschke <julian.reschke@gmx.de>
Message-ID: <20070110104637.GA32555@finch-staff-1.thus.net>
References: <45A129E9.50905@gmx.de> <20070107205255.GA14621@sources.org> <45A20F62.9060306@gmx.de> <20070108204618.GA29407@sources.org> <45A34BC6.3050407@gmx.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <45A34BC6.3050407@gmx.de>
User-Agent: Mutt/1.5.3i
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 082a9cbf4d599f360ac7f815372a6a15
Cc: Stephane Bortzmeyer <bortzmeyer@nic.fr>, cosmogol@ietf.org
Subject: Re: Syntax
X-BeenThere: cosmogol@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: DIscussion on state machine specification in IETF protocols <cosmogol.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/cosmogol>, <mailto:cosmogol-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/cosmogol>
List-Post: <mailto:cosmogol@ietf.org>
List-Help: <mailto:cosmogol-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/cosmogol>, <mailto:cosmogol-request@ietf.org?subject=subscribe>
Errors-To: cosmogol-bounces@ietf.org
Julian Reschke said: > The reason why XML's production is complex is that it excludes > characters that do not belong into identifiers. > > The escaping rule quoted above doesn't solve that problem at all; it's > just an escaping rule. Right. This sort of exclusion is not easy to do in syntax. It's easier to do in a separate restriction (in the C Standard, the syntax simply says you have a "universal character name" and then there's a separate requirement: [#3] Each universal character name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in annex D. ). >> Hmmm, how many IETF formats are in Unicode? (Apart from those based >> only on XML, like Atom in RFC 4287.) ABNF is not, for instance (right, >> it is not a few format, the RFC is recent but it derives from an older >> format.) > I just tried to understand how RFC4234 works with non-ASCII characters, > and it's not obvious at all. Section 2.4 seems to deal with it but > really sounds a bit like hand-waving. I take it as saying that you have to write your grammar to show the encoding that you want to use, and may want to have alternative grammars for different contexts. It doesn't let you use non-ASCII characters in grammars. In RFC3977 we wrote the following, which seems to be acceptable to IETF: UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4 UTF8-2 = %xC2-DF UTF8-tail UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2UTF8-tail / %xED %x80-9F UTF8-tail / %xEE-EF 2UTF8-tail UTF8-4 = %xF0 %x90-BF 2UTF8-tail / %xF1-F3 3UTF8-tail / %xF4 %x80-8F 2UTF8-tail UTF8-tail = %x80-BF So, putting it together, one approach would be something like this: name = identifier / quoted-name identifier = id-initial *(["-"] 1*id-char) id-char = id-initial / DIGIT id-initial = ALPHA / UTF8-non-ascii ; Each UTF8-non-ascii in an identifier shall designate a character ; whose encoding in ISO/IEC 10646 falls into one of the ranges ; specified in appendix X. quoted-name = DQUOTE 1*q-char DQUOTE q-char = %x21 / %x23-5B / %x5D-7E / UTF-8-non-ascii / q-escape ; excludes DQUOTE and BACKSLASH from ASCII q-escape = %x5C.75 4HEXDIG / %x5C.55 8HEXDIG -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 THUS plc | | _______________________________________________ Cosmogol mailing list Cosmogol@ietf.org https://www1.ietf.org/mailman/listinfo/cosmogol
- Syntax Julian Reschke
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Julian Reschke
- Re: Syntax Julian Reschke
- Re: Syntax Cullen Jennings
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Clive D.W. Feather
- Re: Syntax Julian Reschke
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Julian Reschke
- Re: Syntax Frank Ellermann
- Re: Syntax Julian Reschke
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Clive D.W. Feather
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Clive D.W. Feather
- Re: Syntax Clive D.W. Feather
- Re: Syntax Clive D.W. Feather
- Re: Syntax Clive D.W. Feather
- Re: Syntax Frank Ellermann
- Re: Syntax Frank Ellermann
- Re: Syntax Julian Reschke
- Re: Syntax Clive D.W. Feather
- Re: Syntax Clive D.W. Feather
- Re: Syntax Stephane Bortzmeyer
- Re: Syntax Julian Reschke
- Re: Syntax Frank Ellermann
- Re: Syntax Frank Ellermann
- Re: Syntax Frank Ellermann
- Re: Syntax Julian Reschke
- Re: Syntax Julian Reschke
- OT: ABNF (was: Syntax) Frank Ellermann
- Re: Syntax Frank Ellermann
- Re: Syntax Frank Ellermann
- Re: OT: ABNF Julian Reschke
- Re: OT: ABNF Frank Ellermann
- Re: OT: ABNF Julian Reschke
- Re: Syntax Clive D.W. Feather
- Re: Syntax Julian Reschke
- Unicode identifiers in other RFCs (Was: Syntax Stephane Bortzmeyer
- Re: Unicode identifiers in other RFCs Frank Ellermann
- Jabber room for BoF in Prague (Was: Unicode ident… Stephane Bortzmeyer
- Re: Jabber room for BoF in Prague (Was: Unicode i… Stephane Bortzmeyer