Re: Syntax

"Clive D.W. Feather" <clive@demon.net> Wed, 10 January 2007 06:00 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1H4WVO-0002HL-NR; Wed, 10 Jan 2007 01:00:10 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1H4WVN-0002H8-66 for cosmogol@ietf.org; Wed, 10 Jan 2007 01:00:09 -0500
Received: from anchor-internal-1.mail.demon.net ([195.173.56.100]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1H4WV5-0005Uf-La for cosmogol@ietf.org; Wed, 10 Jan 2007 01:00:09 -0500
Received: from finch-staff-1.server.demon.net (finch-staff-1.server.demon.net [193.195.224.1]) by anchor-internal-1.mail.demon.net with ESMTP� id l0A5xoht016079Wed, 10 Jan 2007 05:59:50 GMT
Received: from clive by finch-staff-1.server.demon.net with local (Exim 3.36 #1) id 1H4WV4-0001hA-00; Wed, 10 Jan 2007 05:59:50 +0000
Date: Wed, 10 Jan 2007 05:59:50 +0000
From: "Clive D.W. Feather" <clive@demon.net>
To: Stephane Bortzmeyer <bortzmeyer@nic.fr>
Message-ID: <20070110055950.GA5608@finch-staff-1.thus.net>
References: <45A129E9.50905@gmx.de> <20070107205255.GA14621@sources.org> <45A20F62.9060306@gmx.de> <20070108204618.GA29407@sources.org> <20070109000704.GB17340@finch-staff-1.thus.net> <20070109081753.GA1875@nic.fr>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <20070109081753.GA1875@nic.fr>
User-Agent: Mutt/1.5.3i
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 4b800b1eab964a31702fa68f1ff0e955
Cc: cosmogol@ietf.org
Subject: Re: Syntax
X-BeenThere: cosmogol@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: DIscussion on state machine specification in IETF protocols <cosmogol.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/cosmogol>, <mailto:cosmogol-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/cosmogol>
List-Post: <mailto:cosmogol@ietf.org>
List-Help: <mailto:cosmogol-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/cosmogol>, <mailto:cosmogol-request@ietf.org?subject=subscribe>
Errors-To: cosmogol-bounces@ietf.org

Stephane Bortzmeyer said:
>> I seem to have missed a message or two. What is the problem you're trying
>> to solve.
> 
> Julian Reschke expressed it here:
> 
> http://www1.ietf.org/mail-archive/web/cosmogol/current/msg00007.html

Okay.

My opinions:

(1) Allow UTF-8 in both quoted and unquoted identifiers. In quoted ones,
allow all characters. In unquoted ones, limit it to an "alphanumeric"
set. Don't attempt to do this limitation in the grammar.

(2) The requirements on an XML notation are very different from those on a
human-readable one (e.g. you don't need the ability to "group" states)
and I'm not convinced they fit in the same document.

>> If it's identifiers using characters outside the ASCII regime, then
>> you want people to be able to write them in their own language,
> 
> Hmmm, RFC authors are a small minority. RFC readers are more
> numerous. An encoding solution, like in RFC 4646 ("Proven&#xE7;al"),
> painful for the writer but allowing translators like Shadok to display
> nice Unicode characters for the readers (assuming that Graphviz or
> other back-ends are Unicode-aware) would be already a big step, it
> seems.

But it's very little effort to allow UTF-8 in plain identifiers, so why not
do so? I would limit any encoding solution to quoted names, using them to
provide an ASCII representation.

By the way, I assume that

    begin
and
    "begin"

are interchangeable? Or are they intended to be different? Assuming they're
the same, then:

    xán
    "xán"
    "x\u00E1;n"

would also all be the same.

[I prefer \u because users are more likely to want to write & in a string.]

>> The way we solved this in the C Standard is that you can use such
>> characters directly, or you can encode them as \u#### or \U########
>> (where # represents a hexadecimal digit). So I can write "xán" or
>> "x\u00E1n", and the two are interchangeable.
> In C, only in strings, no, not in identifiers?

In identifiers.

    int xán;

    xán = getchar ();

is legal in the current C Standard.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | Fax:    +44 870 051 9937
Demon Internet      | WWW: http://www.davros.org | Mobile: +44 7973 377646
THUS plc            |                            |

_______________________________________________
Cosmogol mailing list
Cosmogol@ietf.org
https://www1.ietf.org/mailman/listinfo/cosmogol