Re: Require guidance on Unicode in IETF formats

"Clive D.W. Feather" <> Thu, 11 January 2007 15:17 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1H51gQ-0004VN-Au; Thu, 11 Jan 2007 10:17:38 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1H51eU-0003al-KU for; Thu, 11 Jan 2007 10:15:38 -0500
Received: from ([]) by with esmtp (Exim 4.43) id 1H51bY-0008Cm-Ji for; Thu, 11 Jan 2007 10:12:37 -0500
Received: from ( []) by with ESMTP� id l0BFCXBs010908Thu, 11 Jan 2007 15:12:35 GMT
Received: from clive by with local (Exim 3.36 #1) id 1H51bU-0009Cd-00; Thu, 11 Jan 2007 15:12:32 +0000
Date: Thu, 11 Jan 2007 15:12:32 +0000
From: "Clive D.W. Feather" <>
To: Lisa Dusseault <>
Message-ID: <>
References: <> <>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <>
User-Agent: Mutt/1.5.3i
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 9ed51c9d1356100bce94f1ae4ec616a9
Cc: "Ted Hardie - App. Area Director" <>, Stephane Bortzmeyer <>,
Subject: Re: Require guidance on Unicode in IETF formats
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: DIscussion on state machine specification in IETF protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

Lisa Dusseault said:
> I will point out since it was brought up earlier, that ABNF is  
> expressed in ASCII but it *can* specify protocol syntax in UTF8 or  
> another encoding.    Do we need ABNF to be able to declare rule names  
> with non-ASCII characters, or to allow non-ASCII characters in  
> comments?  Would we bother rewriting ABNF to make that possible?
> Specific use cases may be helpful here. One use case could be "A  
> German speaker communicating with German-speaking coworkers about  
> their code-base needs to be able to name a state something like  
> 'Exclusiv Verändert' ". If everybody agrees to support that use case  
> then the SMDL needs to be able to support non-ASCII (or at least  
> obviously encoded) state names.   Alternatively the consensus could  
> be that use case isn't necessary, either because in practice  
> programmers use state machines and they're used to ASCII labels, or  
> because of a decision to limit the scope of the SMDL to IETF RFCs  
> where labels are even more consistently in English.

My interpretation of this is "it's up to us".

I can see two use cases that are of interest in this context. The first is
Lisa's one: should a German speaker be able to name a state or an action
something in German? If "yes", then we need at least native UTF-8. If "no",
then we can stick with ASCII.

The second one is dependent on the first: if our hypothetical German has
written a state machine in German, does she need to be able to transform
its description into ASCII, transmit it to someone else, and have him
reconstruct the original? [1] Or can we assume that she has an 8-bit clean
emailer to send it?

My personal vote: the first use case should be addressed, the second is one
we can ignore.

[1] This was the use case for ISO C that led to trigraphs, for those who
know what they are. And no, I am *not* suggesting them.

Clive D.W. Feather  | Work:  <>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <>  | Fax:    +44 870 051 9937
Demon Internet      | WWW: | Mobile: +44 7973 377646
THUS plc            |                            |

Cosmogol mailing list