Re: [apps-discuss] CONTEXTJ in TLD DNS-Labels (draft-liman-tld-names-05)

Behnam Esfahbod <behnam@esfahbod.info> Wed, 20 July 2011 17:42 UTC

Return-Path: <behnam@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EF87F21F8549 for <apps-discuss@ietfa.amsl.com>; Wed, 20 Jul 2011 10:42:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.951
X-Spam-Level:
X-Spam-Status: No, score=-3.951 tagged_above=-999 required=5 tests=[AWL=1.026, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XMK5st9l5vEs for <apps-discuss@ietfa.amsl.com>; Wed, 20 Jul 2011 10:42:09 -0700 (PDT)
Received: from mail-iw0-f172.google.com (mail-iw0-f172.google.com [209.85.214.172]) by ietfa.amsl.com (Postfix) with ESMTP id A313621F8538 for <apps-discuss@ietf.org>; Wed, 20 Jul 2011 10:42:09 -0700 (PDT)
Received: by iwn39 with SMTP id 39so449777iwn.31 for <apps-discuss@ietf.org>; Wed, 20 Jul 2011 10:42:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=ZkM+H/IsT+ovDekBZ4zx7hr/PPHouCvyxF7tWg4hHKk=; b=TPSX+IVhcLDugsVXZLA4YS+b6uPTZNgeTpIyvQ32H0aeb3RiM2anoSGRQiXbmGEspT o4by5ZyF0wNNN8VXKIYTHPWzUEeDTIfDuizo0dipF0+Qc+IBNuunNm7gjjQyYCxbdZJs Mv4W+IfNx8dUCxbfkLoIpJCMHWSuhZfOwGfKE=
Received: by 10.231.21.10 with SMTP id h10mr8194681ibb.50.1311183729131; Wed, 20 Jul 2011 10:42:09 -0700 (PDT)
MIME-Version: 1.0
Sender: behnam@gmail.com
Received: by 10.231.3.146 with HTTP; Wed, 20 Jul 2011 10:41:29 -0700 (PDT)
In-Reply-To: <85FB14D637D54FBC5A95D68E@PST.JCK.COM>
References: <B464B2C6607E04FD0572AA74@192.168.1.128> <CANp6Ttw4MaAJy2VRvZ8929oBju9jL3b69PkSyFLi-SC4YaNTnw@mail.gmail.com> <85FB14D637D54FBC5A95D68E@PST.JCK.COM>
From: Behnam Esfahbod <behnam@esfahbod.info>
Date: Wed, 20 Jul 2011 13:41:29 -0400
X-Google-Sender-Auth: vpcT5lEUCxXaw_6ouX7SmQfLIqM
Message-ID: <CANp6Ttxjpye3odm+8gNfH5iMUpeL1kqQ2JpyOeVdho2mp4HWeQ@mail.gmail.com>
To: John C Klensin <john-ietf@jck.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Cc: Siavash Shahshahani <shahshah@nic.ir>, apps-discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] CONTEXTJ in TLD DNS-Labels (draft-liman-tld-names-05)
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Jul 2011 17:42:15 -0000

Dear John,

I trust you understand the fact that Persian language and Arabic
script has already faced much more problems than other major scripts
(left-to-right ones) because of it's nature, being a bidirectional
script and it's contextual joining property.  I believe we have
sacrificed enough and there should be a good reason for further
limitations on these script and language.

We greatly appreciate the work that has been done by you all in
IDNA2008, but let's not forget that IDNA2003 *were broken* for many
languages and IDNA2008 was *fixing* it.  I think IDNA2003 was broken
because it was not challenged enough.  In fact, this is a very good
example to see that RFCs should not "simply" (or "blindly") ignoring
characters in each level of protocols and standards.

The other issue is that, in IDNA2003, at least there was the
possibility to have users write a word that included ZWNJ (like
[ARABIC BEH, ZWNJ, ARABIC ALEF]) as a label and ZWNJ was "removed"
before the label was resolved. Yes, the final label would not be the
same as the one entered, but users "could" use words with ZWNJ in URLs
or browsers' address-bar.

Now let me explain why I believe there is no "good reason" to not have
ZWNJ in the TLD labels:

1. We know that in many parts of the DNS technology, TLD labels are
"used" (evaluated, cached, etc) in A-label forms. I think we all agree
that there is no different between PVALID and CONTEXTJ characters in
A-label forms, thus there is no problem with ZWNJ in A-labels.

2. TLD labels are evaluated and delegated in both A-label and U-label
forms. We should note that we don't expect everybody responsible in
these processes (evaluation, deligation, root-zone management, etc) be
able to understand all the languages that will be used in Internet
TLDs ever, and that's why A-labels are very important in Root-Zone.

3. Now let's consider the cases that Arabic TLDs will be "used" in
U-label forms.

3.1. User's understanding of Arabic script:
3.1.1. If the user understands Arabic script, then there wouldn't be
any problem, as IDNA2008 rules for ZWNJ makes sure that ZWNJ is
"visible", thus user can "read" the label correctly.
3.1.2. If the user does not understand Arabic script, still the shape
of the letters would be different (thanks to IDNA2008's rules for
ZWNJ), thus the result would still be "visible".

3.2. Suitable font and rendering engine:
3.2.1. If there is no suitable font with Arabic support, or the
rendering engine does not support Arabic, user would have problems
with any Arabic IDN, no matter if it includes ZWNJ or not.
3.2.2. If there is a suitable font and the rendering engine
impolements the Arabic Contextual Joining algorithm, then there would
not be any problem with ZWNJ and it would be "visible" to user.

4. And finally, as I mentioned in the other thread (sharing with
VIP-Arabic team), there are much more possible security risks using
only PVALID Arabic characters.  So, why do you start with CONTEXTO
(ZWNJ and ZWJ) and stop right there?
4.1. If this RFC is required to make sure TLD labels are secure "all
the way", there is still a lot of work to be done and we should extend
it to cross-script issue, (like the case for .py) as well.
4.2. If we agree that it is not possible to take care of all the
security risks of the characters of all major scripts/languages in
some RFCs, why ZWNJ is different from the other characters?

I hope these make it clear that why there is no good reason to
"disallowed" CONTEXTO characters (specially ZWNJ) in TLD labels in
general, and security risks should be considered in a case-by-case
manner, like what would happen for many of PVALID characters (in
Arabic script, and most probably every other script).

Thanks all for the comments and supports,
-Behnam




On Tue, Jul 19, 2011 at 9:24 AM, John C Klensin <john-ietf@jck.com> wrote:
> Behnam,
>
> I'm sorry I was not clear.   Let me try again, first by
> reference to Patrik's comment: independent of how ICANN has
> formulated the variant investigation, the question remains "what
> is safe across all scripts" and not "what does a particular
> language need".  The ASCII ("English") examples were not
> intended to justify the situation, only to point out that
> restrictions have been with us for a very long time and that one
> of those restrictions is that a string being a valid word in
> some language does not create an entitlement to use that string
> as a DNS label... and never has.  In retrospect, the terms
> "domain name system", and the earlier "hostname" are misleading.
> Precision would have called for substituting something like
> "mneumonic" for "name".
>
> Second, while your detailed explanation is appreciated, we fully
> understand the importance of ZWNJ to writing Persian (and most
> non-Arabic language use of Arabic script) and, although the use
> is a little different, the importance of ZWNJ and ZWJ in writing
> most Indic scripts.  CONTEXTJ was not included in IDNA2008 by
> some magical accident: we (including both Patrik and myself)
> fought to include it in the standard precisely to facilitate
> those uses.
>
> But, examples, explanations, and language requirements aside,
> the issue remains one of whether those characters are safe in
> the root.  With the understanding that this is just my opinion,
> part of that safety evaluation is that the root zone almost
> certainly should have a clear and simple set of rules, rules
> that are easily checked and enforced by the various types of
> (language-independent) software that call on the DNS.  While one
> could imagine a large collection of rules based on a model of
> "determine the script, guess at the language, and then interpret
> and render accordingly", it is almost certainly not feasible
> even if ICANN agrees to use self-discipline about single-script
> labels.  First, the DNS and IDNA do not support explicit
> language information and heuristics to determine language that
> work well with moderate or large blocks of text are not reliable
> when strings are only a few characters long.  Second, and
> equally important, we know that complex procedures based on
> layers of tables are rarely implemented correctly.
>
> So, again returning to one of the implications of Patrik's note:
> please assume that we understand the importance of this
> character to most of the languages that use Arabic script (and
> to most of the languages that use several of the Indic scripts)
> and that, in case knowing this is helpful, we understood it long
> before ICANN created the VIP program.  We also understand its
> importance regardless of how (or whether) "variants" (whatever
> that means in the general case) are supported.  The question is
> whether the use of characters that, among other things, become
> invisible if the wrong rendering engine is chosen, is safe in a
> root context or can be made safe by a plausible, understandable,
> and, if appropriate, enforceable set of rules.
>
> regards,
>   john
>
>
>
>
> --On Monday, July 18, 2011 21:01 -0400 Behnam Esfahbod
> <behnam@esfahbod.info> wrote:
>
>> Hi John,
>>
>> I think it is time to stop general pronouncements that have
>> been repeated and repeated so many times over these past years
>> and get down to specifics.  Here are two very concrete points
>> you should note:
>>
>> 1. ZWNJ is not a special quirk of Persian language, it is not
>> a mnemonic tool,    nor is it an optional writing-style
>> device.  ZWNJ is used in the writing of    MOST languages
>>...
>
>



-- 
    '     بهنام اسفهبد
    '     Behnam Esfahbod
   '      http://behnam.esfahbod.info
  *  ..   http://zwnj.org/
 *  `  *  http://persian-computing.ir
  * o *   3E7F B4B6 6F4C A8AB 9BB9 7520 5701 CA40 259E 0F8B