Re: Syntax

Julian Reschke <> Mon, 08 January 2007 09:31 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1H3qqu-0005D0-D6; Mon, 08 Jan 2007 04:31:36 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1H3qqt-0005Cn-Db for; Mon, 08 Jan 2007 04:31:35 -0500
Received: from ([]) by with smtp (Exim 4.43) id 1H3qqd-0005Aj-Rh for; Mon, 08 Jan 2007 04:31:35 -0500
Received: (qmail invoked by alias); 08 Jan 2007 09:31:18 -0000
Received: from (EHLO []) [] by (mp001) with SMTP; 08 Jan 2007 10:31:18 +0100
X-Authenticated: #1915285
Message-ID: <>
Date: Mon, 08 Jan 2007 10:31:14 +0100
From: Julian Reschke <>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: Gecko/20060516 Thunderbird/ Mnenhy/
MIME-Version: 1.0
To: Stephane Bortzmeyer <>
References: <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 9ed51c9d1356100bce94f1ae4ec616a9
Subject: Re: Syntax
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: DIscussion on state machine specification in IETF protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

Stephane Bortzmeyer schrieb:
> On Sun, Jan 07, 2007 at 06:12:09PM +0100,
>  Julian Reschke <> wrote 
>  a message of 31 lines which said:
>> 1) Although the language is designed to be used in IDs and RFCs,
>> restricting it to ASCII here is IMHO a very bad idea. After all, you
>> may want to use it for other specifications, and the IETF may lift
>> the current restrictions at some point of time. I would suggest to
>> require a specific text encoding such as UTF-8,
> Any idea about the support of UTF-8 in typical languages *and* parsing
> tools? For instance, with C and Yacc, I assume it is quite
> difficult. With Haskell, I'm not sure :-)

Well, I've been living in Java world for a long time, so I really can't 
say anything useful about other languages anymore.

> The problem of UTF-8 is that many assumptions no longer hold:
> * case insensitivity becomes a problem,
> * enumeration of "reasonable" characters become a complex task.

That's a problem with Unicode (the character set), not UTF-8 (the encoding).

Yes, case insensitivity is harder, but is this relevant for cosmogol, if 
  everything stays case-sensitive (which I think is the right thing to do)?

Choosing characters for identifiers: again, just borrow from somewhere 
else, such as <>).

> I'm a big fan of internationalization and Unicode but, since Cosmogol
> is intended for a technical and limited use, is it reasonable? 

I think inventing a new format, but not taking I18N is very hard to 
defend. As far as I can tell, there's no real chance to get it published.

> Anyway, I recorded the point as a TODO (we do not have a formal issue
> tracker) in the draft source. Other advices?

Let's leave it at this for now.

Best regards, Julian

Cosmogol mailing list