[idn] Using draft-jseng-idn-admin-01.txt

Hilde Margrethe Thunem <Hilde.Thunem@uninett.no> Thu, 06 February 2003 08:14 UTC

Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA07962 for <idn-archive@lists.ietf.org>; Thu, 6 Feb 2003 03:14:18 -0500 (EST)
Received: from lserv by psg.com with local (Exim 3.36 #1) id 18gh1v-0007lX-00 for idn-data@psg.com; Thu, 06 Feb 2003 00:05:07 -0800
Received: from [2001:700:1:4::1:0] (helo=tyholt.uninett.no) by psg.com with esmtp (Exim 3.36 #1) id 18gh1p-0007kd-00 for idn@ops.ietf.org; Thu, 06 Feb 2003 00:05:02 -0800
Received: from valgrind.uninett.no (valgrind.uninett.no [IPv6:2001:700:1:4:290:27ff:fe55:4a0c] (may be forged)) by tyholt.uninett.no (8.12.6/8.12.6) with ESMTP id h1684xn4015235 for <idn@ops.ietf.org>; Thu, 6 Feb 2003 09:04:59 +0100
Received: from valgrind.uninett.no (hmg@localhost) by valgrind.uninett.no (8.11.6/8.11.6) with ESMTP id h1684x718385 for <idn@ops.ietf.org>; Thu, 6 Feb 2003 09:04:59 +0100
Message-Id: <200302060804.h1684x718385@valgrind.uninett.no>
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4
To: idn@ops.ietf.org
Subject: [idn] Using draft-jseng-idn-admin-01.txt
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Date: Thu, 06 Feb 2003 09:04:59 +0100
From: Hilde Margrethe Thunem <Hilde.Thunem@uninett.no>
X-Spam-Status: No, hits=0.8 required=5.0 tests=MAY_BE_FORGED, SPAM_PHRASE_00_01 version=2.43
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by ietf.org id DAA07962

I've been looking at the "Internationalized Domain Names Registration 
and Administration Guideline for Chinese, Japanese and Korean" 
(draft-jseng-idn-admin-01.txt)

It looked rather interesting as a language to express a policy in if 
there is characters that could be seen as variants of each other. As I'm 
from a Norwegian background I've looked at using the guidelines in the 
draft to describe an administrative domain name policy for the Norwegian
language, with a language character variant table. In Norwegian there
are few (if any) characters that may be said to be variants of each 
others in all instances where they are used. The closest we get is 
perhaps by dropping the accents (using o as a predominant variant of ó 
and ò) but as e.g. "for", "fór" and "fòr" has three different meanings 
(for, travelled and furrow) it is not immideately given that they should 
be stuck together in one IDL package. There is also certain characters 
that is imported fom neighbouring languages and used in Norwegian names, 
that sound similar in speech to already existing Norwegian characters 
(e.g ø and ö). These may be considered variants of each others. (And even 
if in the end the administrative policy is that all characthers has no
variants, the draft does give a way of easily expressing both that fact, 
and expressing which characters are within the set of valid code points.)

Having tried to use it, I have a few questions concerning the interpretation 
of the different groups in the language character variant table. As far as I 
can see the recommended variants must also be valid codepoints, and result 
in domain names that are in the zonefile, while character variants are merely 
reserved when a valid combination is registered and are not in themselves 
added to the zonefile. In addition, character variants that aren't valid 
codepoints can't be the "starting point" for an IDL package.

In other words, given a language character variant table where the letters 
a-z is among the valid codepoints, but does not have any recommended variants 
or character variants (except the letter itself), and the following lines in 
addition:

00F8; 00F8; 00F6;
00F6; 00F8; 00F8;

(ø; ø; ö;
 o; ø; ø;)

If I've understood correctly an application for the domain name bjørn.no 
will result in an IDL package consisting of bjørn.no and björn.no, where 
bjørn.no is added to the zonefile and björn.no is reserved. (As the ø is the 
recommended variant, while the poor ö is just a character variant). What
happens according to this policy if the domain name applied for is björn.no?
Will the registered name still be bjørn.no while björn.no is reserved?

And if the language table only had a-z and 

00F8; 00F8; 00F6;
(ø; ø; ö;)

would that mean that while one could still apply for bjørn.no (and get 
björn.no as a reserved name), someone applying for björn.no would get
rejected as that name contains a character that is not part of the valid
codepoints?

Sorry for asking the basic questions :-) but as the draft states, the 
quality of the language table is critical for the result... which means that
the logic behind how the table is built is important.

And while I'm asking questions, has any of you in the WG used the draft for 
creating a draft policy for a language?

- Hilde Thunem