Re: [I18ndir] HTML, email addresses, etc
John C Klensin <john-ietf@jck.com> Tue, 09 June 2020 05:59 UTC
Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 05B593A096B for <i18ndir@ietfa.amsl.com>; Mon, 8 Jun 2020 22:59:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cliCaSkshnjp for <i18ndir@ietfa.amsl.com>; Mon, 8 Jun 2020 22:59:07 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 058CC3A07F8 for <i18ndir@ietf.org>; Mon, 8 Jun 2020 22:59:06 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1jiXHr-0007Oa-A4; Tue, 09 Jun 2020 01:59:03 -0400
Date: Tue, 09 Jun 2020 01:58:58 -0400
From: John C Klensin <john-ietf@jck.com>
To: Marc Blanchet <marc.blanchet@viagenie.ca>, John Levine <johnl@taugh.com>
cc: i18ndir@ietf.org
Message-ID: <A2B5494F35A428F832AE94AA@PSB>
In-Reply-To: <EEB31A7E-4A82-4BCF-B048-82C0BE66A3DB@viagenie.ca>
References: <20200608145452.EB3E51A4BADD@ary.qy> <EEB31A7E-4A82-4BCF-B048-82C0BE66A3DB@viagenie.ca>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/_fgq-8a8cjfxfokG38ZwHu1hLcI>
Subject: Re: [I18ndir] HTML, email addresses, etc
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jun 2020 05:59:09 -0000
--On Monday, June 8, 2020 11:05 -0400 Marc Blanchet <marc.blanchet@viagenie.ca> wrote: > On 8 Jun 2020, at 10:54, John Levine wrote: > >> In article <C6967F02-35D9-484B-9BF8-7436D1DF3B65@viagenie.ca> >> you write: >>> this is what I was going to say. IMHO, any protocol >>> identifier (such as >>> EAI) using something more than ASCII must define a PRECIS >>> profile. This >>> is the only chance to get it working. Plain UTF8 for an >>> identifier is just plain wrong. >> >> That is true but I'm wondering how many profiles we need. As >> jck pointed out elsewhere, an address with a Tamil mailbox >> and a Chinese domain name is unlikely to work. > > one profile. profile will get rid of all irrelevant > codepoints, specify the normalization form, etc… Profile > won't go into specific scripts. Combining a comment on your earlier note to this one: One difficulty here is that, absent the ability to erase the differences among languages and scripts and --at least-- making sure everyone can conveniently read and enter every code point and sequence of code points, we have only a choice between learning as we go along and adjusting as needed versus not learning and repeating old mistakes. IDNA2003 (and UTR#46) were designed, more or less, around the assumptions that most Unicode graphics should be allowed, that normative tables were a good idea, and that case folding, NFKC, and ignoring code points that were "ignorable" would protect us from any of the associated issues without causing serious problems. As described in RFC 4690 (which would have been longer if we knew then what we know now, particularly about issues normalization does not address), that didn't work out very well: names that people legitimately wanted to use were excluded, there was a great deal of potential for confusion of a type that would impact security, the maintenance requirements were too high, etc.. So we went back and did IDNA 2008 and, in particular, tried to adapt the LDH rule of traditional host names to Unicode (as well as addressing some other issues). That adaptation make it, nearly intact, into PRECIS. It might be worth remembering that before any of the work that started with Stringprep and IDNA started, rather strong arguments were made that domain names were protocol identifiers, that they should stay in a limited subset of ASCII, and that non-ASCII naming should be dealt with in other ways, probably at another layer, and mapped as needed into the DNS. Part of that point of view was that non-ASCII labels in the DNS would just lead to endless trouble about character equivalences, visual confusion, difficulties caused by different principles (not just different code points) among writing systems, a need for types of aliasing the DNS model doesn't support, difficulties with the DNS "feature" that octets with the high bit off were ASCII and could exploit ASCII characteristics while other octets were just octets, and so on. I wouldn't go so far as to claim that those taking that position have been proven right but it has now been more than 17 years since the Stringprep and IDNA2003 specs were published and at least 20 since the demands for IDNs got loud and, lo, we are [still] dealing with those predicted issues. To some extent, we did IDNA at least as much because it was clear that the alternative would have been many registries and implementations making up and deploying their own, incompatible and non-interoperable, ideas about how to have IDNs rather than out of conviction that IDNs were a good idea. And it was a brilliant (IMO) idea for avoiding tremendous transitional disruptions at the application level. Then we came to a demand for non-ASCII email mailbox names. The situation was a bit similar with some people arguing that they would do nothing but cause trouble and fragment the mail system, with personal name phrases being a more than adequate option, especially in a world in which many popular MUAs didn't display actual addresses unless coerced. And, as with IDNs around 2001 and 2002, it was clear that, if the IETF didn't take action to standardize something, we would end up with nasty interoperability problems. So what became the EAI WG was created as a mix of people who really, really, wanted internationalized addresses with people who had misgivings but were concerned about the interoperability issues. Drawing on experience with email local parts going back to RFCs 821 and 822 and the knowledge that there was a long history of email being used to communicate with embedded devices (some decades old), transmission of commands is subject lines, per-message or per-recipient backward-pointing addresses, signed local parts, and other strangenesses including addresses that were deliberately extremely difficult to type, the WG extrapolated from the 821 rules and allowed virtually any Unicode code point in the local part, noting that, unlike the DNS situation, local parts were firmly under the control of the delivery system's management and that there were rules in the basic protocol specification prohibiting anything else from trying to interpret addresses or transform them into other forms. The discussion is possibly worth having again, but I'm not convinced that the very permissive rule that had WG consensus was either unwise or that it should be changed, so the situation is not like the PRECIS one. However, RFC 5321 already notes that [ASCII] local parts are case-sensitive but that any mail system that establishes "mailbox" and "mAIlbOx" as pointing to separate mailboxes and especially separate user accounts is looking for trouble. If and when we get around to revising 5321/5322 (and maybe sooner), I'm going to argue that the quoting mechanisms for assorted characters that cannot be used without quoting have never really worked consistently well (often due to operating system conventions and procedures not under the control of the mail system), that people don't understand them, and that the same "understand that, if you do this, you are looking for trouble" warning might reasonably be applied to creation of such mailbox names. Suggesting (or even specifying) a PRECIS profile for what is reasonably safe would seem reasonable; pushing to require it, maybe not so much so. But then we get back to John's comment and my problem example and why PRECIS is useful, but not sufficient. There are a number of combinations -- of local parts and domains and within local parts-- that I don't expect we will ever see in the wild because those who operate servers would not dream of allowing them except as a concession to people whose desire to be cute or to express themselves go far enough to be a hazard to others. It is reasonable to assume that, if such addresses show up on the wire, they are associated with spam or other malicious or destructive behavior. And it is reasonable to advise server operators that allowing those combinations is generally a bad idea, both to provide support to them when someone tries to object and to make the Internet a better and more reliable place. Generally, PRECIS (like IDNA) deals with code points and not strings and rules like "no mixed scripts" are far too blunt as instruments for the cases of interest. Instead, proper treatment of those cases actually does need to deal with individual sets of scripts and maybe even languages (my earlier notes said "writing system" rather than "script" for a reason. That means that, for mailbox names, the idea of requiring a PRECIS profile is both overly prescriptive and inadequate. As a source of advice: probably quite appropriate. Just my opinion but one based on a few years of experience with email and strong memories of the EAI WG discussions. best, john
- Re: [I18ndir] HTML, email addresses, etc John C Klensin
- Re: [I18ndir] HTML, email addresses, etc Martin J. Dürst
- [I18ndir] HTML, email addresses, etc John C Klensin
- Re: [I18ndir] HTML, email addresses, etc John Levine
- Re: [I18ndir] HTML, email addresses, etc Marc Blanchet
- Re: [I18ndir] HTML, email addresses, etc Marc Blanchet
- Re: [I18ndir] HTML, email addresses, etc John C Klensin
- Re: [I18ndir] HTML, email addresses, etc Martin J. Dürst
- Re: [I18ndir] HTML, email addresses, etc John R Levine
- Re: [I18ndir] HTML, email addresses, etc John C Klensin
- Re: [I18ndir] HTML, email addresses, etc Nico Williams
- Re: [I18ndir] HTML, email addresses, etc Nico Williams
- Re: [I18ndir] HTML, email addresses, etc Nico Williams
- Re: [I18ndir] HTML, email addresses, etc John C Klensin
- Re: [I18ndir] HTML, email addresses, etc Martin J. Dürst
- Re: [I18ndir] HTML, email addresses, etc Martin J. Dürst
- Re: [I18ndir] HTML, email addresses, etc John C Klensin
- Re: [I18ndir] HTML, email addresses, etc Nico Williams
- Re: [I18ndir] HTML, email addresses, etc Nico Williams