Re: Require guidance on Unicode in IETF formats

I am not sure I have any special wisdom or can point to any policy  
requirements on this one.   Our requirements for IETF standards to be  
i18n usually focus on what gets shown to the application user, and  
less importantly the administrative user, and not what the protocol  
implementor or debugger (or log file or wire trace) sees.  Thus error  
messages need not be Unicode and need not be translated, as long as  
it's reasonable that a client implementation could look up an ASCII  
error message or the associated code and figure out what to display  
in the user's language.  Method names and header/parameter names in  
protocols can definitely be ASCII, and not translated.

So who's the *user* of a State Machine Description Language (SMDL)?   
If I can use a SMDL to describe a bunch of states for a protocol that  
is properly i18n for the final end-user of the *protocol*, that seems  
like the minimum that we can base on general IETF policy  
requirements.  If there are further requirements, such as comments  
being i18n, those are community requirements rather than direct  
consequence of IETF/IESG policy.

I will point out since it was brought up earlier, that ABNF is  
expressed in ASCII but it *can* specify protocol syntax in UTF8 or  
another encoding.    Do we need ABNF to be able to declare rule names  
with non-ASCII characters, or to allow non-ASCII characters in  
comments?  Would we bother rewriting ABNF to make that possible?

Specific use cases may be helpful here. One use case could be "A  
German speaker communicating with German-speaking coworkers about  
their code-base needs to be able to name a state something like  
'Exclusiv Verändert' ". If everybody agrees to support that use case  
then the SMDL needs to be able to support non-ASCII (or at least  
obviously encoded) state names.   Alternatively the consensus could  
be that use case isn't necessary, either because in practice  
programmers use state machines and they're used to ASCII labels, or  
because of a decision to limit the scope of the SMDL to IETF RFCs  
where labels are even more consistently in English.

Lisa

On Jan 10, 2007, at 12:30 PM, Stephane Bortzmeyer wrote:

> We require some guidance from our Area Directors about the use of
> Unicode in an IETF format. On the mailing list cosmogol@ietf.org, a
> discussion was raised on wether we should accept only ASCII in the
> language we define (our work is to define a format, not a protocol) or
> the full Unicode character set.
> (http://www1.ietf.org/mail-archive/web/cosmogol/current/msg00007.html
> and follow-ups.)
>
> Some people claimed that Unicode support was more or less mandatory at
> the IETF and that a format without it had no chance of being
> adopted. Besides, internationalization is a very good thing, anyway,
> for the world-wide Internet.
>
> Some people feared that mandating Unicode would complicate the grammar
> and would drastically reduce the number of tools available to write
> parsers for this format. They think that Cosmogol, being intended
> mostly for RFC or other ultra-technical usages do not have the same
> requirments as a general protocol like HTTP or NNTP.
>
> We identified the following RFC as possibly relevant:
>
> RFC 2277 / BCP 18 IETF Policy on Character Sets and Languages
>
> RFC 2223 Instructions to RFC Authors
>
> RFC 3536 Terminology Used in Internationalization in the IETF
>
> But none seems to bring a clear answer. Is Unicode support a MUST, a
> SHOULD or a MAY in a new protocol?
>
> How many *new* IETF formats are in Unicode? (Apart from those based
> only on XML, like Atom in RFC 4287.) Old formats like ABNF do not
> count because they derive from an older format.
>
>
>

_______________________________________________
Cosmogol mailing list
Cosmogol@ietf.org
https://www1.ietf.org/mailman/listinfo/cosmogol