[rfc-i] How lack of Unicode support in IDs is detrimental to design

duerst at it.aoyama.ac.jp ( "Martin J. Dürst" ) Mon, 30 July 2012 06:22 UTC

From: "duerst at it.aoyama.ac.jp"
Date: Mon, 30 Jul 2012 15:22:11 +0900
Subject: [rfc-i] How lack of Unicode support in IDs is detrimental to design
In-Reply-To: <20120727193945.230971A0FD@ld9781.wdf.sap.corp>
References: <20120727193945.230971A0FD@ld9781.wdf.sap.corp>
Message-ID: <50162813.3050307@it.aoyama.ac.jp>

On 2012/07/28 4:39, Martin Rex wrote:
> Phillip Hallam-Baker wrote:
>> I am just writing a draft that describes how to implement a PIN based
>> challenge response.
>>
>> To establish an initial connection to the server, the user presents a
>> PIN value such as CS0F40-30LV09-K000.
>>
>> Now the specification states that the PIN code is in UTF8. It probably
>> does not make any good sense for a French implementation to use accent
>> characters but I would hope that a implementer would have the sense to
>> use the Greek alphabet for a Greek language deployment, Cyrillic for
>> Russian and so on.
>
> For most developers at the ~ 1 dozen software layers at and on top
> of the network layer (below the UI), this information will be a
> simple series of octets.  So for the vast amount of purposes,
> the use of a hex dump is the most appropriate form to provide
> the example in your specification.

When you wrote a spec, did you make sure that all the US-ASCII stuff was 
also presented in hex? If not, why not?

It's my experience that most programmers, most of the time, prefer 
characters over hex. Programmers can get used to hex, and they use it 
when nothing else works, but it's not that they are really keen on it. 
With a tiny bit of effort, programmers can also get used to a few 
examples of non-ASCII.


>> Not being able to express these ideas in drafts means not being able
>> to communicate them effectively.

> Nope, it means helping the vast amount of developers at the network
> layer and the dozen software layers below the UI being able to
> easily understand your document.

When using UTF-8, there's usually a good reason to do so. To make this 
clear, it is very appropriate to use actual characters in examples. 
Actual characters should work on command lines and in similar places, 
and while they are in some sense "UI", they are really more for 
programmers than for the average user.


> It would be a real nightmare (for the document authors and the document
> consumers) if every single document that uses DNS had to deal with
> Unicode to A-label conversion over and over and over again.

> The more reasonable approach is to put all that crap into a very
> small amount of documents, and keep the world straight and simple
> for all the rest.

A more reasonable approach would be to not use "crap" and similar words 
in email. For the record, I don't like U-label <-> A-label conversion 
either, it would be much easier if the DNS were just UTF-8 throughout, 
but that's not where we are at, unfortuately.

[It seems that you are implying that "straight and simple" means "just 
use A-labels". In my view, it would be "just use U-labels", however.]

Regards,   Martin.

> draft-ietf-dane-protocol-23.txt does not contain any unicode glyphs.
> Adding such examples would make the document worse, because the support
> of internationalized domain names is completely orthogonal to most
> uses of the DNS protocol.  Adding examples with real unicode glyphs
> to that document would only create confusion, complexity and problems
> for rendering the document.