Re: Require guidance on Unicode in IETF formats

Lisa Dusseault <> Wed, 10 January 2007 21:36 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1H4l7Q-0008GJ-UO; Wed, 10 Jan 2007 16:36:24 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1H4kwf-0002Qh-Az for; Wed, 10 Jan 2007 16:25:17 -0500
Received: from ([]) by with esmtp (Exim 4.43) id 1H4kwd-0002MF-Rq for; Wed, 10 Jan 2007 16:25:17 -0500
Received: from localhost (localhost []) by (Postfix) with ESMTP id 35F43142262; Wed, 10 Jan 2007 13:25:13 -0800 (PST)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 22255-10; Wed, 10 Jan 2007 13:25:11 -0800 (PST)
Received: from [] ( []) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTP id 9E42D142260; Wed, 10 Jan 2007 13:25:11 -0800 (PST)
In-Reply-To: <>
References: <>
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed
Message-Id: <>
Content-Transfer-Encoding: quoted-printable
From: Lisa Dusseault <>
Date: Wed, 10 Jan 2007 13:25:09 -0800
To: Stephane Bortzmeyer <>
X-Mailer: Apple Mail (2.752.2)
X-Virus-Scanned: by amavisd-new and clamav at
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 31247fb3be228bb596db9127becad0bc
X-Mailman-Approved-At: Wed, 10 Jan 2007 16:36:24 -0500
Cc: "Ted Hardie - App. Area Director" <>,
Subject: Re: Require guidance on Unicode in IETF formats
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: DIscussion on state machine specification in IETF protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

I am not sure I have any special wisdom or can point to any policy  
requirements on this one.   Our requirements for IETF standards to be  
i18n usually focus on what gets shown to the application user, and  
less importantly the administrative user, and not what the protocol  
implementor or debugger (or log file or wire trace) sees.  Thus error  
messages need not be Unicode and need not be translated, as long as  
it's reasonable that a client implementation could look up an ASCII  
error message or the associated code and figure out what to display  
in the user's language.  Method names and header/parameter names in  
protocols can definitely be ASCII, and not translated.

So who's the *user* of a State Machine Description Language (SMDL)?   
If I can use a SMDL to describe a bunch of states for a protocol that  
is properly i18n for the final end-user of the *protocol*, that seems  
like the minimum that we can base on general IETF policy  
requirements.  If there are further requirements, such as comments  
being i18n, those are community requirements rather than direct  
consequence of IETF/IESG policy.

I will point out since it was brought up earlier, that ABNF is  
expressed in ASCII but it *can* specify protocol syntax in UTF8 or  
another encoding.    Do we need ABNF to be able to declare rule names  
with non-ASCII characters, or to allow non-ASCII characters in  
comments?  Would we bother rewriting ABNF to make that possible?

Specific use cases may be helpful here. One use case could be "A  
German speaker communicating with German-speaking coworkers about  
their code-base needs to be able to name a state something like  
'Exclusiv Verändert' ". If everybody agrees to support that use case  
then the SMDL needs to be able to support non-ASCII (or at least  
obviously encoded) state names.   Alternatively the consensus could  
be that use case isn't necessary, either because in practice  
programmers use state machines and they're used to ASCII labels, or  
because of a decision to limit the scope of the SMDL to IETF RFCs  
where labels are even more consistently in English.


On Jan 10, 2007, at 12:30 PM, Stephane Bortzmeyer wrote:

> We require some guidance from our Area Directors about the use of
> Unicode in an IETF format. On the mailing list, a
> discussion was raised on wether we should accept only ASCII in the
> language we define (our work is to define a format, not a protocol) or
> the full Unicode character set.
> (
> and follow-ups.)
> Some people claimed that Unicode support was more or less mandatory at
> the IETF and that a format without it had no chance of being
> adopted. Besides, internationalization is a very good thing, anyway,
> for the world-wide Internet.
> Some people feared that mandating Unicode would complicate the grammar
> and would drastically reduce the number of tools available to write
> parsers for this format. They think that Cosmogol, being intended
> mostly for RFC or other ultra-technical usages do not have the same
> requirments as a general protocol like HTTP or NNTP.
> We identified the following RFC as possibly relevant:
> RFC 2277 / BCP 18 IETF Policy on Character Sets and Languages
> RFC 2223 Instructions to RFC Authors
> RFC 3536 Terminology Used in Internationalization in the IETF
> But none seems to bring a clear answer. Is Unicode support a MUST, a
> SHOULD or a MAY in a new protocol?
> How many *new* IETF formats are in Unicode? (Apart from those based
> only on XML, like Atom in RFC 4287.) Old formats like ABNF do not
> count because they derive from an older format.

Cosmogol mailing list