Re: [I18nrp] [art] Use Unicode if Using Unicode?

Larry Masinter <LMM@acm.org> Fri, 12 October 2018 04:52 UTC

Return-Path: <masinter@gmail.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 31916128CB7; Thu, 11 Oct 2018 21:52:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.389
X-Spam-Level:
X-Spam-Status: No, score=-1.389 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iySk92g6NYUV; Thu, 11 Oct 2018 21:52:28 -0700 (PDT)
Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6FDE312F295; Thu, 11 Oct 2018 21:52:28 -0700 (PDT)
Received: by mail-pg1-x532.google.com with SMTP id 80-v6so5222656pgh.10; Thu, 11 Oct 2018 21:52:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:sender:mime-version:to:from:subject:date:importance :thread-topic:in-reply-to:references; bh=BmghY2spsnZMmfZ48twCi6ZzkPL0Sl5GZPcFbAbfUBY=; b=esJm5LYXXWGxO7rSqFiMkIbRxCIX0D9iTUeHVo8j/OFuLkwpSnRkvU6AEkA016U3cZ 3EN5QmkMyj9eA1U/2vkvMzAqXoYPzYaz3j6EEQe1YdCBF5beVcZ3TuuwobnVXyo/2lHR wY1VmzTvwndkg5lszmdzk4+rpU23N8p5wlP5AoX/uSWsRZZFS4KLyw+LzuPUoZbln4pr E8T2naF8HM7uScqOZAEFqFeMhCrSwEjlLL2lCNw6qxLap+HugFKz6MwxXk1z2B2MmrLJ 5seJUgQLH0bUeNLrJ8fJbR6bjBVKexYtUOX4T+9/oHv6YsLhy/aVq2vMWwYjRypCXqqf bDHQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:sender:mime-version:to:from:subject :date:importance:thread-topic:in-reply-to:references; bh=BmghY2spsnZMmfZ48twCi6ZzkPL0Sl5GZPcFbAbfUBY=; b=edOGcxqOv71hTapvpDqt6KJBwbSQ3HqWUm6+M6/+lnny50Jrc6mGFMKbWhDNcStulA U1LaKhSZI9xHeOXoM/NzkBuGkz0yw84RdJWdrvFP2WQlLRkeMsQQkuAMmBTIK9QrfXRH /ESJQbUOjdsIQoU+j9EA03nyHFfSGjuQR9tNnoj4EYo08CLQ8CnpQvzikjyuMYqZMbFE QdEkILRAEat4zjxPMmXknfG5mGzgOeEWGz8wdrAM+K3OnWDv2DnGPosCy30TRsIpZV7w 8uRDkYh9yA71OZ+5MutkBhNcXMfps6q/g5fphzo80oE8qD2F941Tt1w8MJiffa6oXhFg BepQ==
X-Gm-Message-State: ABuFfojtWL0H1Uf1DZi2zFCTccKnuw1ldJGjUDequoxH4p2Zjc/rXSHg aYX59zi07tNDH6WNQ5aonmFayu+F
X-Google-Smtp-Source: ACcGV63KIIrEhtel2w3jmMHX3b3wPBuLNhLkKxMS8JKmcZgUtk1rOuaqtyvPWVGOsUSLW+NgQR07fw==
X-Received: by 2002:a63:9c01:: with SMTP id f1-v6mr4004370pge.156.1539319947147; Thu, 11 Oct 2018 21:52:27 -0700 (PDT)
Received: from ?IPv6:::ffff:192.150.23.129? (c-24-6-174-39.hsd1.ca.comcast.net. [24.6.174.39]) by smtp.gmail.com with ESMTPSA id g5-v6sm63999pfk.160.2018.10.11.21.52.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 21:52:26 -0700 (PDT)
Message-ID: <5bc0288a.1c69fb81.73f2f.0213@mx.google.com>
Sender: Larry Masinter <masinter@gmail.com>
MIME-Version: 1.0
To: "art@ietf.org" <art@ietf.org>, "i18nrp@ietf.org" <i18nrp@ietf.org>
From: Larry Masinter <LMM@acm.org>
Date: Thu, 11 Oct 2018 21:52:28 -0700
Importance: normal
X-Priority: 3
Thread-Topic: [art] Use Unicode if Using Unicode?
In-Reply-To: <MW2PR2101MB0908D4D3EB13FFAA07AD610682E10@MW2PR2101MB0908.namprd21.prod.outlook.com>
References: <MW2PR2101MB0908F009734817997508274282E00@MW2PR2101MB0908.namprd21.prod.outlook.com> <FB4FE0D631E6F6D4C72B19A1@PSB> <MW2PR2101MB0908D4D3EB13FFAA07AD610682E10@MW2PR2101MB0908.namprd21.prod.outlook.com>
Content-Type: multipart/alternative; boundary="_27E5E152-D49A-46EE-9DFB-47591A9551FD_"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/WV56FTNVdg8qXwfOKMqAgHHT5T4>
Subject: Re: [I18nrp] [art] Use Unicode if Using Unicode?
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Oct 2018 04:52:31 -0000

No matter what path you take for new RFCs, implementations will lag, and some choices for strings used for domain names, (LHS of) email addresses and short URLs will have problems 
1) When put through different implementations of normalization (IDNA2008 vs 2003 vs. some exceptions)
2) When used in user interfaces for comparison or validation or comparison (in the address bar or rollover links)
3) When scanning text for picking out URLS
4) when used in print or other media (in an ad or written on a napkin)

These are in order of decreasing specificity, increasing “liberal displacement”.
I think those who are choosing names to use (as domain name, email address, short url) will want to use (4), no matter what rules the registry applies. Anyone registering a domain name, a user name at a mail host, or a (short) URL will want to choose names that they can print on a business card or an ad or a roster and believe that their target audience can enter and get the string intended.
And 1 < 2 < 3 < 4 as equivalence goes. 

I don’t think it’s practical to redo all the protocols that use these strings to pass along a tuple of string and context.

Larry
--
https://LarryMasinter.net