Re: [Emailcore] A/S outstanding issue #51 (email addresses in HTML forms)

John C Klensin <john-ietf@jck.com> Mon, 17 October 2022 14:32 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: emailcore@ietfa.amsl.com
Delivered-To: emailcore@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 09580C15270B for <emailcore@ietfa.amsl.com>; Mon, 17 Oct 2022 07:32:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.907
X-Spam-Level:
X-Spam-Status: No, score=-6.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kLkn-KtkL24X for <emailcore@ietfa.amsl.com>; Mon, 17 Oct 2022 07:32:02 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AB749C152707 for <emailcore@ietf.org>; Mon, 17 Oct 2022 07:32:01 -0700 (PDT)
Received: from localhost ([::1] helo=JcK-HP5) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1okR9s-000C17-8Y; Mon, 17 Oct 2022 10:32:00 -0400
Date: Mon, 17 Oct 2022 10:32:00 -0400
From: John C Klensin <john-ietf@jck.com>
To: Alexey Melnikov <aamelnikov@fastmail.fm>, emailcore@ietf.org, John R Levine <johnl@taugh.com>
Message-ID: <7D9B45F3E50A3F0DBF3BAE98@JcK-HP5>
In-Reply-To: <fd5dc688-621f-4f1e-97fd-0231dcff2232@app.fastmail.com>
References: <20221007203938.49CCD4C1266B@ary.qy> <f4e4025f-82dc-4453-866c-8c8893f64421@app.fastmail.com> <5A01B9831F9D4C0D01CA61BB@JcK-HP5> <fd5dc688-621f-4f1e-97fd-0231dcff2232@app.fastmail.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/emailcore/SEMK3CQqPC_LPVTjGI3nn-Cu1sQ>
Subject: Re: [Emailcore] A/S outstanding issue #51 (email addresses in HTML forms)
X-BeenThere: emailcore@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: EMAILCORE proposed working group list <emailcore.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/emailcore>, <mailto:emailcore-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/emailcore/>
List-Post: <mailto:emailcore@ietf.org>
List-Help: <mailto:emailcore-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/emailcore>, <mailto:emailcore-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Oct 2022 14:32:04 -0000


--On Monday, 17 October, 2022 14:35 +0100 Alexey Melnikov
<aamelnikov@fastmail.fm> wrote:

> Hi John,
> 
> On Mon, Oct 17, 2022, at 2:25 PM, John C Klensin wrote:
>> As participant only...
> 
> Likewise.
> 
>> --On Monday, 17 October, 2022 14:00 +0100 Alexey Melnikov
>> <aamelnikov@fastmail.fm> wrote:
>> 
>>> Hi John,
>>> I agree with you that we should say a bit more about
>>> problematic cases. Possible add something like your text
>>> after the paragraph that Ken suggested.
>>> 
>>> Some specific comments below:
>>> 
>>> On Fri, Oct 7, 2022, at 9:39 PM, John Levine wrote:
>>>> It appears that Ken Murchison  <murch@fastmail.com> said:
>>>>> I have crafted the following text for this issue:
>>> ...
>>>> If we are going to stick our foot into this swamp at all, I
>>>> think we should dive in and describe the popular ways that
>>>> non-mail systems screw up mail addresses such as
>>>> 
>>>> * Everyone assumes ASCII upper and lower case are
>>>> equivalent. Many turn addresses into all upper or all lower
>>>> before sending
>>> 
>>> Yes, I think we should this.
>> 
>> Agreed, but "everyone" is too strong and therein lies the
>> problem.  A bit more needs to be said to discourage the
>> practices and/or to predict occasional problems when those
>> transformations are made.
> 
> I think enough systems assume ASCII case-insensitivity that
> insisting that they are not is not going to work in many
> cases. I am afraid the boat has sailed on enforcing this one.

Then someone should be proposing that we change 5321bis, not
just make a comment in the A/S.  Either way, this increases my
concern about excluding SMTPUTF8 comments/advice from the A/S.
Based on the "case sensitive local parts" requirement, the EAI
WG decided that it did not need to explicitly insist on that.
However, if we say something equivalent to "it is ok to assume
that local-parts of addresses are case-insensitive because
everyone else does", then we probably need to be clear that, in
general, that does not apply to non-ASCII addresses in either
the local-part or, if expressed in UTF-8 rather than Punycode
encoding, the domain part. The A/S already steps rather far into
that swamp by saying that Internationalized Email SHOULD be
supported in Section 2.4 (incidentally the citation there is
wrong).  And then we probably need to figure out whether those
who assume case insensitivity for ASCII also assume it for
non-ASCII Latin script strings.  A reasonable, but naive,
assumption is that it should ("after all, what difference does a
diacritical make?") but the reality is that it does not work for
many cases.  

(( Example for those who have avoided immersion in the i18n
swamp: for some languages, in some localities, the upper case of
"á" (U+00E1) is "A" (U+0041).   Now, in a context in which
SMTPUTF8 addresses are allowed, what is the lower case of
"ABC@EFG".  If one assumes, a priori, that is an ASCII string,
then "abc@efg" is a reasonable (and correct and unique) answer.
But what if the "real" address was "ábc@éfg" and someone got
"ABC@EFG" by applying a "drop the diacritical marks when going
to upper case" rule?   The Unicode Case Mapping and Case Folding
rules prevent doing that, but the SMTPUTF8 specs don't reference
them as useful operations.   And, at the risk of invoking an
issue that brought about conflicting standards in the IDN world,
the character "ß" (U+00DF) does not have a distinct upper case
form... except when it does.  Those are just example that should
be at least mostly understandable to those reading this: there
are cases that are arguably much worse.  ))

So, if we are going to say something in the A/S that essentially
changes the requirement, we'd better write it very carefully --
and probably explicitly include RFC 6530ff in its scope.
 
>>> ...

More generally, as non-ASCII email addresses (even ASCII local
parts with IDNs expressed in UTF-8 not Punycode) become more
prevalent and especially if the A/S is going to put a SHOULD on
Internationalized Address support, I am becoming convinced that
we would be performing a real disservice to the international
email community, as well as nearly contradicting ourselves, by
pretending that issues like the above by ignoring the i18n
issues and, in particular, saying "ASCII addresses" and assuming
the reader will understand all of those subtleties .

(A/S co-author hat momentarily back on.)
Ken, unless someone sees a way to avoid the i18n issues that I
don't and can quickly get what appears to be WG consensus behind
it, I believe the next draft should include (at least) a
placeholder section after the current Section 4 (" MIME and Its
Implications") called "Internationalization of Addresses and
Headers and Its Implications" or words to that effect.

And I hope that at least some of those who are actively
promoting the use of SMTPUTF8 addresses and also following this
list will do some writing rather than either expecting me to do
it or assuming the correct text will magically appear.

best,
    john


   best,
     john