[dnsext] Name equivalence - thoughts on the Greek issue

Brian Dickson <brian.peter.dickson@gmail.com> Mon, 13 September 2010 16:39 UTC

Return-Path: <owner-namedroppers@ops.ietf.org>
X-Original-To: ietfarch-dnsext-archive@core3.amsl.com
Delivered-To: ietfarch-dnsext-archive@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5BD563A6A7E; Mon, 13 Sep 2010 09:39:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.144
X-Spam-Level:
X-Spam-Status: No, score=0.144 tagged_above=-999 required=5 tests=[AWL=0.883, BAYES_20=-0.74, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cLG4eQz9s6Pi; Mon, 13 Sep 2010 09:39:03 -0700 (PDT)
Received: from psg.com (psg.com [IPv6:2001:418:1::62]) by core3.amsl.com (Postfix) with ESMTP id 598E13A6A6D; Mon, 13 Sep 2010 09:38:11 -0700 (PDT)
Received: from majordom by psg.com with local (Exim 4.72 (FreeBSD)) (envelope-from <owner-namedroppers@ops.ietf.org>) id 1OvBzn-000Kx2-Fx for namedroppers-data0@psg.com; Mon, 13 Sep 2010 16:35:07 +0000
Received: from mail-fx0-f52.google.com ([209.85.161.52]) by psg.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from <brian.peter.dickson@gmail.com>) id 1OvBzi-000Kvv-Aw for namedroppers@ops.ietf.org; Mon, 13 Sep 2010 16:35:02 +0000
Received: by fxm13 with SMTP id 13so3861801fxm.11 for <namedroppers@ops.ietf.org>; Mon, 13 Sep 2010 09:35:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=WZPtMNQN9ZDk4w2QWKu6vCE8jEiqIpfMYUDzPJ5jC8I=; b=os5FYIz/qezvElKzttfa7dnr4eMWeb7Cj6W/ChkPcwR4Ydzr1iBCYpnc1axVqQekrP dweipHo+AHdIpbEd6Ixy/FYfwB7ZvRHutZ1lG7Nm0pTblFNqvLTmLM20FSHRbjPl/OI/ LMDMfPa11kHPjO5hRVQSgfbLplO/Cw47eWxlU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=ToU96fG4RVwk5TO+qbIITQZY7JOA+cE8uOxMn4KwVDsu8Gnphl8TdHsnwNgorpmJXh Pu3SE9tLHiRt4IZ5LbRzUIUE7+ikwI9Rtq1WQWxAx4lONoEI221TTVhtSUsZ8o2Ki70/ v3YNYXfpepAtc5sXe7LTdpQA92q2IqqL7OXXc=
MIME-Version: 1.0
Received: by 10.223.123.129 with SMTP id p1mr3456728far.94.1284395701270; Mon, 13 Sep 2010 09:35:01 -0700 (PDT)
Received: by 10.223.109.13 with HTTP; Mon, 13 Sep 2010 09:35:01 -0700 (PDT)
Date: Mon, 13 Sep 2010 13:35:01 -0300
Message-ID: <AANLkTiktL=5b_izJGpuOPHHoOH0XJoVbTJK+0H5FoXUP@mail.gmail.com>
Subject: [dnsext] Name equivalence - thoughts on the Greek issue
From: Brian Dickson <brian.peter.dickson@gmail.com>
To: namedroppers@ops.ietf.org
Content-Type: multipart/alternative; boundary="001636c5a9ff867add049026ad4b"
Sender: owner-namedroppers@ops.ietf.org
Precedence: bulk
List-ID: <namedroppers.ops.ietf.org>
List-Unsubscribe: To unsubscribe send a message to namedroppers-request@ops.ietf.org with
List-Unsubscribe: the word 'unsubscribe' in a single line as the message text body.
List-Archive: <http://ops.ietf.org/lists/namedroppers/>

In the case of the Greek characters and the need for "equivalence", this
might be one area to push back to the IDN folks.

Here's what occurs to me:

The encoding used for some subset of IDNs might better be done to make use
of the case-insensitive nature of ASCII, whose case insensitivity is already
fundamentally embedded in DNS.

For instance, if the encodings of a specific character, with and without
some special property unique to the language/script of its origin (such as
Greek "tonos"), were to differ only in the ASCII case of the encoding
itself, the problem goes away by virtue of DNS case insensitivity.

This would require (some) encodings that on a per-character basis are
byte-aligned, meaning each original character becomes an integer number of
characters.
Potentially, If there are more than 36 characters in the original script, or
more than 26 that differ by property, then the result could be doubling of
the length of the label (plus 4 characters for new ACE prefix).
The savings would be that there would no longer be a requirement to have
"the same" work done to augment DNS.
And, it would only require one DNS label to handle all the constituent
variants within the label.

This would either have to be done with a unique ACE prefix for each such
language/script, or with a new ACE prefix with enough encodings for all the
covered language/scripts.

Done right, this could also accommodate case plus another property, so four
variations per "symbol" could be encoded using case.

E.g. if there were 36 characters in a particular language, no more than 26
which had a "property bit" and a "case bit", then "foo" maps to "aa", and
the first "a" is the "property bit" significator, and the second "a" is the
character+case.
Then "Aa" is "foo with property, lower case", "aA" is "foo without property,
upper case", "AA" is "foo with property, upper case".
The shortest label would be $IANA--aa (e.g. "gr-aa"), for a single-character
label.

NB: What the property bit is may differ by character in the original
language, and/or context of usage.
E.g. It is my understanding that the greek sigma could be upper/lower, or
terminal/non-terminal, but not both, so while it may be context dependent,
it is consistently deterministic from a mapping perspective, based on that
context.

The rest of RFC 3490 and RFC 3492 would need to be -bis'd to handle the
additional logic and ACE prefix(es).
E.g. "If all (one-script) then (use specific ACE prefix and mapping) else
(use standard "xn--" prefix and mapping).". The reverse mapping would
obviously be completely deterministic, based on the ACE prefix.

It might not make us popular with the IDN folks, but I bet it would be well
received by those using such scripts...

Brian