Re: [I18nrp] [art] Use Unicode if Using Unicode?

Asmus Freytag <asmusf@ix.netcom.com> Fri, 12 October 2018 11:43 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B0E50130E17 for <i18nrp@ietfa.amsl.com>; Fri, 12 Oct 2018 04:43:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.096
X-Spam-Level:
X-Spam-Status: No, score=-1.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DATE_IN_PAST_03_06=1.592, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7C-fq9zx_oPr for <i18nrp@ietfa.amsl.com>; Fri, 12 Oct 2018 04:43:03 -0700 (PDT)
Received: from elasmtp-kukur.atl.sa.earthlink.net (elasmtp-kukur.atl.sa.earthlink.net [209.86.89.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 44851130DE5 for <i18nrp@ietf.org>; Fri, 12 Oct 2018 04:43:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1539344583; bh=egmSPvyJSl/C/unWCl7G8sV0Welxufch+X5+ d86iP2g=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=CjKW+hHOTrcLxd4r8hKZJl4ok55v5V+Tf ErH0DwyjLGZZsIZLoTcrrBWVd2JXbZfXNhFTDfqgQXB1wuW43RmjGhAw2O0c4hTt700 b/wAovmMWRSywNSpSBT0E1ajRsE5nohw0ysvKoRMB7aHAzAApmlQsiRLAMd+Q1+BVlr EWtAdhiMKAPtNfC81gtC6iGsi86UmJuAJWKqvRm11fddwiNuyjh1t9HGHM6oaEqutbO 9ii4tEiRx2EPO4kue66jkWLmyxUgkzJDtLX+ji/EbJnJrRwa+sCIFHFCuUDmjA2qnWl RoEwkNZx5iLgJfkeAsV8Wu9YQAHmxtH5FVJW44oDw==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=IM3C+QKbs9/HydUC09LCQS9fT/taF8xGHPTpW7ZS9VN0TB5Wx9uJK/Ga0viejgpieD41vKjS0FhGKjsbj4AQ+IHjTotlY2HqC4xqdr8c+ie+qi56JPrLQsUvRSG+U+2yutrHe1Lkaz4CxjD40BN15SEmxt7eEcjsxxNDcvj9JzJy4OdXAUmMqaOmbfPpjmM83t5Ilj30y98FFIbSJXxZSYaBRPUEi5h1UfWjUI1JYM7TbdI6bapuF7WA+ORuXEruUfDw6pNGHhUTVkMVPBosS9op6G2mUL4hjHKlUOXnAgEMc47HMwiiv2KNPbIOX5VI428EPpyjxT2B40AGdO5pBA==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [93.223.87.197] (helo=[192.168.2.116]) by elasmtp-kukur.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1gAvqN-0006ds-2w for i18nrp@ietf.org; Fri, 12 Oct 2018 07:42:59 -0400
To: i18nrp@ietf.org
References: <MW2PR2101MB0908F009734817997508274282E00@MW2PR2101MB0908.namprd21.prod.outlook.com> <FB4FE0D631E6F6D4C72B19A1@PSB> <MW2PR2101MB0908D4D3EB13FFAA07AD610682E10@MW2PR2101MB0908.namprd21.prod.outlook.com> <5bc0288a.1c69fb81.73f2f.0213@mx.google.com>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <9ae4b82a-f7e3-35cd-ef40-cf5fc7287c70@ix.netcom.com>
Date: Fri, 12 Oct 2018 01:12:37 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <5bc0288a.1c69fb81.73f2f.0213@mx.google.com>
Content-Type: multipart/alternative; boundary="------------33467C94BB044C5A85F9E8BE"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2b7eec10b52094b3e3ad9ba8ad9c5c7e25af6db7c07dc65ac350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 93.223.87.197
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/ThTj7wJvdHYego81HaGgzFnerM4>
Subject: Re: [I18nrp] [art] Use Unicode if Using Unicode?
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Oct 2018 11:43:06 -0000

On 10/11/2018 9:52 PM, Larry Masinter wrote:
>
> No matter what path you take for new RFCs, implementations will lag, 
> and some choices for strings used for domain names, (LHS of) email 
> addresses and short URLs will have problems
>
>  1. When put through different implementations of normalization
>     (IDNA2008 vs 2003 vs. some exceptions)
>  2. When used in user interfaces for comparison or validation or
>     comparison (in the address bar or rollover links)
>  3. When scanning text for picking out URLS
>  4. when used in print or other media (in an ad or written on a napkin)
>
> These are in order of decreasing specificity, increasing “liberal 
> displacement”.
>

That sounds vaguely political :)

Not sure what you mean with that expression, actually.

> I think those who are choosing names to use (as domain name, email 
> address, short url) will want to use (4), no matter what rules the 
> registry applies. Anyone registering a domain name, a user name at a 
> mail host, or a (short) URL will want to choose names that they can 
> print on a business card or an ad or a roster and believe that their 
> target audience can enter and get the string intended.
>
> And 1 < 2 < 3 < 4 as equivalence goes.
>
> I don’t think it’s practical to redo all the protocols that use these 
> strings to pass along a tuple of string and context.
>
You can avoid certain problems by limiting what a registry (or mail 
server) allow to be registered. This limitation does not belong in the 
protocol, but is a separate layer of the architecture (label generation 
rules).

With label generation rules you can restrict strings to those that can 
be rendered unambiguously, and should display technology change over 
time, you can relax those constraints without affecting other 
implementations of the protocol.

Many of these kinds of constraints are (largely) generic to certain 
scripts. It would therefore be possible to address them by informational 
RFCs laying out "best practices" that can serve as a starting point.

To some extent, similar mechanisms can be used to address discrepancies 
between IDNA2003 and IDNA2008 - in some cases affected code points could 
either be restricted or equivalenced.

Am I on the right track, or were you trying to address something different.

A./

> Larry
>
> --
>
> https://LarryMasinter.net
>
>
> _______________________________________________
> i18nRP mailing list
> i18nRP@ietf.org
> https://www.ietf.org/mailman/listinfo/i18nrp