Re: [regext] [I18ndir] [Last-Call] last call reviews of draft-ietf-regext-epp-eai-12 (and -15)

Asmus Freytag <asmusf@ix.netcom.com> Tue, 27 September 2022 03:00 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: regext@ietfa.amsl.com
Delivered-To: regext@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 366D6C14CF09; Mon, 26 Sep 2022 20:00:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.91
X-Spam-Level:
X-Spam-Status: No, score=-6.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=earthlink.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V7CG6ElB_W79; Mon, 26 Sep 2022 20:00:42 -0700 (PDT)
Received: from mta-202a.earthlink-vadesecure.net (mta-202b.earthlink-vadesecure.net [51.81.232.241]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 93776C14F725; Mon, 26 Sep 2022 20:00:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; bh=5kcrSdA7BOjHFdGFPk/6pyhsn9AhsH1KdU37i0 XyYNY=; c=relaxed/relaxed; d=earthlink.net; h=from:reply-to:subject: date:to:cc:resent-date:resent-from:resent-to:resent-cc:in-reply-to: references:list-id:list-help:list-unsubscribe:list-subscribe:list-post: list-owner:list-archive; q=dns/txt; s=dk12062016; t=1664247636; x=1664852436; b=HKk0NaioGLUyyWJ40hZG5EE3kJtqBZMn6XU+XpTjhmZ15XA46mfwRGU njtwm9fhFR8MH/5M0yOsNJoNMp5Jr/TECJk+RZ0VGaK0OroPSv90JpJ9Zg2/6S3y4Gt1RwV sbhKeBhU9pmEspbXxe8bGJbxmnXEjKkjWFWaT/L5VuKXbo4ghXqh12Gq7InfVZs5blLTGjn k1Ny6i3qll4598jIjVvd9LXSafeu59mWSMTh3I3NTUOYmqYcuyZ/ZPTQSx9X1dusJvm+uZ8 JSB3nVCyPAktr5pM6BK1IeNrcp5OLJrMaRyusY6k5hQs94XMfDH4QXOANyMxhjFdMWSB44n EpQ==
Received: from [192.168.0.2] ([75.172.99.53]) by smtp.earthlink-vadesecure.net ESMTP vsel2nmtao02p with ngmta id 5170f425-171898683ed5d4ba; Tue, 27 Sep 2022 03:00:35 +0000
Content-Type: multipart/alternative; boundary="------------RejB6lPaCW3BgkfHs0hgAh7f"
Message-ID: <ab95711d-88d3-7951-9edb-4515da1745d3@ix.netcom.com>
Date: Mon, 26 Sep 2022 20:00:36 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0
Content-Language: en-US
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, John C Klensin <john-ietf@jck.com>, "Gould, James" <jgould=40verisign.com@dmarc.ietf.org>, beldmit@gmail.com
Cc: paf@paftech.se, art@ietf.org, draft-ietf-regext-epp-eai.all@ietf.org, last-call@ietf.org, regext@ietf.org, i18ndir@ietf.org, gen-art@ietf.org
References: <DFBC3847-F489-42D8-AB28-A22F25BC7CD9@verisign.com> <1C6C85AAF7DA5EFEDCED0BB0@PSB> <cc7308a2-69a1-5490-d418-453fe1a20d58@it.aoyama.ac.jp>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <cc7308a2-69a1-5490-d418-453fe1a20d58@it.aoyama.ac.jp>
Authentication-Results: earthlink-vadesecure.net; auth=pass smtp.auth=asmusf@ix.netcom.com smtp.mailfrom=asmusf@ix.netcom.com;
Archived-At: <https://mailarchive.ietf.org/arch/msg/regext/crmaHuBOfU4XZlFFrtrYfa2gH9o>
X-Mailman-Approved-At: Thu, 29 Sep 2022 08:00:02 -0700
Subject: Re: [regext] [I18ndir] [Last-Call] last call reviews of draft-ietf-regext-epp-eai-12 (and -15)
X-BeenThere: regext@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Registration Protocols Extensions <regext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/regext>, <mailto:regext-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/regext/>
List-Post: <mailto:regext@ietf.org>
List-Help: <mailto:regext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/regext>, <mailto:regext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Sep 2022 03:00:45 -0000

On 9/26/2022 12:31 AM, Martin J. Dürst wrote:
> Very sorry to be late with my reply, and for not replying to the 
> latest posting from John Klensin in this thread.
>
> On 2022-09-14 04:03, John C Klensin wrote:
>> James,
>>
>> My apologies for not having responded to your note sooner.
>> I've been preoccupied with several unrelated things.
>>
>> I greatly appreciate the changes to use an existing EPP
>> extension framework and to correct the terminology error of EAI
>> -> SMTPUTF8.   I agree that the more substantive SMTPUTF8
>> technical issues should go back to the WG.
>>
>> However, in order that the discussion you suggest for IETF 115
>> be useful and not just lead to another round of heated Last Call
>> discussions, I think that, for the benefit of those who have
>> been following the discussion closely and those who should have
>> been, it is important to be clear about what the disagreement is
>> about.  When you characterize the issue as "e-mail cardinality",
>> it makes it sound, at least to me (maybe everyone in the WG has
>> a better understanding) like this is some subtle technical
>> matter.
>>
>> It really isn't.  The EAI WG was very clear during the
>> development of the SMTPUTF8 standards that the biggest problems
>> with non-ASCII email addresses were going to be with user agents
>> (MUAs) (and, to some degree, with IMAP and POP servers that are
>> often modeled as part of MUAs) and not with SMTP transport over
>> the Internet.  Making an MUA tailored to one particular language
>> and script (in addition to ASCII), or even a handful of them, is
>> fairly easy.  Making one that can deal well with all possible
>> SMTPUTF8 addresses is very difficult (some would claim
>> impossible, at least without per-language, or
>> per-language-group, plugins or equivalent).
>
> I very strongly think that "an MUA that can deal well with all 
> possible SMTPUTF8 addresses" is a red herring.
>
> First, as far as backing store (in-memory representation) is 
> concerned, any implementation that is able to handle full Unicode and 
> SMTPUTF8 will be fine; there's no dependency there on natural 
> languages or scripts. And because there days, most MUAs will use 
> user-interface tool kits or OS components that support Unicode, for 
> most MUAs, that part may be essentially for free. This leaves the 
> logic of "if non-ASCII in LHS of email address, then use SMTPUTF8, 
> otherwise not" and the transcoding from the internal Unicode 
> representation (possibly UTF-16) to and from UTF-8 (available as a 
> library function). So on this level, an MUA that is able to deal with 
> SMTPUTF8 is able to deal with all possible SMTPUTF8 addresses, or 
> otherwise it's very badly written.
Thank you for putting this so clearly. I had assumed that to be true, 
but didn't want to say anything because I'm not specifically conversant 
with just the e-mail protocols. The situation you describe is now pretty 
much the standard for any type of application that "supports Unicode", 
for whatever purpose. Which makes making exceptions for some type of 
Unicode strings rather less well motivated.
>
> Second is the level of display. Here again, it's important to 
> understand that MUA implementers will just use a tool kit, which 
> includes a rendering library (such as harfbuzz) that takes care of all 
> the glyph selection and shaping details. And it will use (via that 
> library) the fonts available on the OS. If the necessary font is not 
> available (e.g. for scripts just recently added to Unicode), then 
> square boxes or question marks or something similar will be displayed, 
> but it should still be possible to copy an address from a browser to 
> an (SMTPUTF8-capable) MUA and send the mail. Similar for rendering 
> variations; the browser may show a frog with a tongue, but the MUA may 
> show a frog followed by a tongue. If that's the result of copy-paste, 
> the mail should still be delivered correctly.
This is the crux. These kinds of toolkits and platform support is widely 
available (except for scripts that Unicode explicitly recommends to 
exclude from IDNs. (But I see that you are getting to hat part of the 
argument below).
>
>
> [It is important to note here that these days, the numbers of email 
> addresses that get copied by hand from a napkin or business card to an 
> MUA is way down, and copying from one application (e.g. a browser) to 
> another is the main stream.]
Transcoding to ASCII (or alternate address) solves only the issue of 
guranteeing that an operator can distinguish two strings from each other 
(having had to learn only one small set of symbols, in case ASCII isn't 
part of their native writing system) and it's a nice fallback way of 
keying in data - again, for trained operators. (We are informed that 
there are scripts where native users see ASCII as a barrier).
>
> Third, there's a saying "the better is the enemy of the good". It can 
> be abused to justify sloppiness, but in the area of 
> internationalization, it's very important. If somebody wants to use a 
> Cyrillic or Devanagari or Han (Chinese/Japanese) or Greek,... email 
> address, they don't care whether a script such as Nag Mundari (new in 
> Unicode 15.0.0, out on September 13) or some Egyptian hieroglyph 
> format controls (also new in Unicode 15.0.0) or even some Devanagari 
> characters used to represent auspicious signs found in inscriptions 
> and manuscripts (dito) are available. Because of the very long tail of 
> languages, scripts, and characters, a requirement that "all possible 
> SMTPUTF8 addresses" are covered is very counterproductive. It denies 
> the huge majority of people interested in such addresses something 
> because there may be other who aren't yet able to get it, and in turn 
> will only cause additional delay for everybody.
Realistically, there's limited use case for anything not in the ~30 or 
so recommended scripts (for identifiers). There may be less than a dozen 
of the "limited use" scripts for which there is detectable online use of 
the kind that would correlate to those scripts being used for any type 
of identifier (IDN, email names, user handles in social media).

I did an informal study a while back on that. So that leaves between 
half and two thirds of all scripts that are used in very constrained 
settings (digitally archiving ancient text, text examples in scholarly 
discourse and what have you, including scripts for moribund languages or 
obsolescent writing systems).

Achieving solutions that "perfectly" cover these cases, and holding up 
specifications on their account, makes them the "enemy of the good".

This is a bit of a change for Unicode. Up until about 10 versions ago, 
there were still significant additions or improvements for the modern 
repertoires of modern scripts. That has basically stopped. The only 
modern writing system with modern repertoire that is being actively 
added to are emoji.

All other additions to "modern" scripts are those that are used for 
historic documents, scholarly purposes or to capture some smaller 
languages, dialects or whatever, many of them falling out of use, used 
only orally, except when documented, etc.

Those cases really aren't realistically required to"work", because if 
you insist on using them, you might as well have put a box into your 
alias, because nobody other than you is likely to understand what you 
are trying to do.

On the other hand, Hindi, Ukrainian, Greek, Farsi and Korean should be 
easy to support as is and thus present no great impediments to use.

>
> So my conclusion for the draft in question is that allowing more than 
> one email address won't hurt, saying that one of them can be used for 
> an all-ASCII fallback won't hurt, but not moving the draft forward if 
> these changes are not made isn't really justified.

I wholeheartedly concur!

A./

>
> Regards,   Martin.
>