Re: [Emailcore] A/S outstanding issue #51 (email addresses in HTML forms)

John C Klensin <john-ietf@jck.com> Mon, 17 October 2022 21:15 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: emailcore@ietfa.amsl.com
Delivered-To: emailcore@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E196C1524D0 for <emailcore@ietfa.amsl.com>; Mon, 17 Oct 2022 14:15:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.904
X-Spam-Level:
X-Spam-Status: No, score=-1.904 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id F3raTtAMpItc for <emailcore@ietfa.amsl.com>; Mon, 17 Oct 2022 14:15:07 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 42927C1524C7 for <emailcore@ietf.org>; Mon, 17 Oct 2022 14:15:06 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1okXRu-000Cm2-9l; Mon, 17 Oct 2022 17:15:02 -0400
Date: Mon, 17 Oct 2022 17:14:57 -0400
From: John C Klensin <john-ietf@jck.com>
To: Barry Leiba <barryleiba@computer.org>
cc: Alexey Melnikov <aamelnikov@fastmail.fm>, emailcore@ietf.org, John R Levine <johnl@taugh.com>
Message-ID: <8C1D7ED1CBA4FDD103892A88@PSB>
In-Reply-To: <CALaySJJeM6myw0ZhmDp=-A-46WfutWNQdL0+iV-FXDA5HQ25Cg@mail.gmail.com>
References: <20221007203938.49CCD4C1266B@ary.qy> <f4e4025f-82dc-4453-866c-8c8893f64421@app.fastmail.com> <5A01B9831F9D4C0D01CA61BB@JcK-HP5> <fd5dc688-621f-4f1e-97fd-0231dcff2232@app.fastmail.com> <7D9B45F3E50A3F0DBF3BAE98@JcK-HP5> <CALaySJJeM6myw0ZhmDp=-A-46WfutWNQdL0+iV-FXDA5HQ25Cg@mail.gmail.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/emailcore/YsfJ-qTwhO96CF_CMNQ69u5SILU>
Subject: Re: [Emailcore] A/S outstanding issue #51 (email addresses in HTML forms)
X-BeenThere: emailcore@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: EMAILCORE proposed working group list <emailcore.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/emailcore>, <mailto:emailcore-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/emailcore/>
List-Post: <mailto:emailcore@ietf.org>
List-Help: <mailto:emailcore-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/emailcore>, <mailto:emailcore-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Oct 2022 21:15:11 -0000

Barry,

In case it wasn't clear from what I wrote earlier, I think your
description/ approach below, at least if I understand it
correctly, is absolutely correct.  

Put differently, I think saying anything like "if you depend on
case-insensitivity for ASCII characters, you are looking for
trouble and will not interwork well with many, maybe most, other
systems" in the A/S would be fine (and helpful).  It probably
would not even trigger my anxieties about the i18n-related
cases.  By contrast, anything that gets much closer to "it is ok
to treat addresses as case-insensitive because everyone else
does" causes me to get anxious, first because "everyone" is
hyperbole, because of those i18n issues, because we occasionally
don't remember that email is used for more than interpersonal
communication, etc.   I think it would be possible to write the
latter sort of statement without causing serious problems, but
it would need to accompanied by a great deal of very careful
explanation.  

I reinad intent to go most of the way toward the latter
statement into some of the recent notes in this thread.  If that
was not the intent, I apologize and suggest that we all need to
be precise about what we are suggesting.

thanks,
   john


--On Monday, October 17, 2022 10:53 -0400 Barry Leiba
<barryleiba@computer.org> wrote:

> Process: I think that it we change the case-sensitivity of
> local-part, we are no longer in an Internet Standard path, but
> would have to go back to Proposed Standard.
> 
> I think the best approach for us now is to leave the text in
> 5321bis that's in Section 2.4, which discourages
> case-sensitivity, to put very clear text in the AS that
> actually using case-sensitive local-part is bad for
> interoperability and will break with a lot of current software
> that assume insensitivity, however incorrectly, and to thus
> have the AS highlight that discouragement.
> 
> The result would be that the formal grammar would still allow
> case-sensitive local-part and SMTP would still normatively
> say, "The local-part of a mailbox MUST BE treated as case
> sensitive.  Therefore, SMTP implementations MUST take care to
> preserve the case of mailbox local-parts."  (Except that the
> "BE" should be in lower case... JCK please note.)  But it also
> would still say, "However, exploiting the case sensitivity of
> mailbox local-parts impedes interoperability and is
> discouraged," and the AS would follow up on that part.
> 
> I'm working on some text to propose for the AS in line with
> what I'm suggesting.
> 
> Barry
> 
> On Mon, Oct 17, 2022 at 10:32 AM John C Klensin
> <john-ietf@jck.com> wrote:
>> 
>> 
>> 
>> --On Monday, 17 October, 2022 14:35 +0100 Alexey Melnikov
>> <aamelnikov@fastmail.fm> wrote:
>> 
>> > Hi John,
>> > 
>> > On Mon, Oct 17, 2022, at 2:25 PM, John C Klensin wrote:
>> >> As participant only...
>> > 
>> > Likewise.
>> > 
>> >> --On Monday, 17 October, 2022 14:00 +0100 Alexey Melnikov
>> >> <aamelnikov@fastmail.fm> wrote:
>> >> 
>> >>> Hi John,
>> >>> I agree with you that we should say a bit more about
>> >>> problematic cases. Possible add something like your text
>> >>> after the paragraph that Ken suggested.
>> >>> 
>> >>> Some specific comments below:
>> >>> 
>> >>> On Fri, Oct 7, 2022, at 9:39 PM, John Levine wrote:
>> >>>> It appears that Ken Murchison  <murch@fastmail.com> said:
>> >>>>> I have crafted the following text for this issue:
>> >>> ...
>> >>>> If we are going to stick our foot into this swamp at
>> >>>> all, I think we should dive in and describe the popular
>> >>>> ways that non-mail systems screw up mail addresses such
>> >>>> as
>> >>>> 
>> >>>> * Everyone assumes ASCII upper and lower case are
>> >>>> equivalent. Many turn addresses into all upper or all
>> >>>> lower before sending
>> >>> 
>> >>> Yes, I think we should this.
>> >> 
>> >> Agreed, but "everyone" is too strong and therein lies the
>> >> problem.  A bit more needs to be said to discourage the
>> >> practices and/or to predict occasional problems when those
>> >> transformations are made.
>> > 
>> > I think enough systems assume ASCII case-insensitivity that
>> > insisting that they are not is not going to work in many
>> > cases. I am afraid the boat has sailed on enforcing this
>> > one.
>> 
>> Then someone should be proposing that we change 5321bis, not
>> just make a comment in the A/S.  Either way, this increases my
>> concern about excluding SMTPUTF8 comments/advice from the A/S.
>> Based on the "case sensitive local parts" requirement, the EAI
>> WG decided that it did not need to explicitly insist on that.
>> However, if we say something equivalent to "it is ok to assume
>> that local-parts of addresses are case-insensitive because
>> everyone else does", then we probably need to be clear that,
>> in general, that does not apply to non-ASCII addresses in
>> either the local-part or, if expressed in UTF-8 rather than
>> Punycode encoding, the domain part. The A/S already steps
>> rather far into that swamp by saying that Internationalized
>> Email SHOULD be supported in Section 2.4 (incidentally the
>> citation there is wrong).  And then we probably need to
>> figure out whether those who assume case insensitivity for
>> ASCII also assume it for non-ASCII Latin script strings.  A
>> reasonable, but naive, assumption is that it should ("after
>> all, what difference does a diacritical make?") but the
>> reality is that it does not work for many cases.
>> 
>> (( Example for those who have avoided immersion in the i18n
>> swamp: for some languages, in some localities, the upper case
>> of "á" (U+00E1) is "A" (U+0041).   Now, in a context in which
>> SMTPUTF8 addresses are allowed, what is the lower case of
>> "ABC@EFG".  If one assumes, a priori, that is an ASCII string,
>> then "abc@efg" is a reasonable (and correct and unique)
>> answer. But what if the "real" address was "ábc@éfg" and
>> someone got "ABC@EFG" by applying a "drop the diacritical
>> marks when going to upper case" rule?   The Unicode Case
>> Mapping and Case Folding rules prevent doing that, but the
>> SMTPUTF8 specs don't reference them as useful operations.
>> And, at the risk of invoking an issue that brought about
>> conflicting standards in the IDN world, the character "ß"
>> (U+00DF) does not have a distinct upper case form... except
>> when it does.  Those are just example that should be at least
>> mostly understandable to those reading this: there are cases
>> that are arguably much worse.  ))
>> 
>> So, if we are going to say something in the A/S that
>> essentially changes the requirement, we'd better write it
>> very carefully -- and probably explicitly include RFC 6530ff
>> in its scope.
>> 
>> >>> ...
>> 
>> More generally, as non-ASCII email addresses (even ASCII local
>> parts with IDNs expressed in UTF-8 not Punycode) become more
>> prevalent and especially if the A/S is going to put a SHOULD
>> on Internationalized Address support, I am becoming convinced
>> that we would be performing a real disservice to the
>> international email community, as well as nearly
>> contradicting ourselves, by pretending that issues like the
>> above by ignoring the i18n issues and, in particular, saying
>> "ASCII addresses" and assuming the reader will understand all
>> of those subtleties .
>> 
>> (A/S co-author hat momentarily back on.)
>> Ken, unless someone sees a way to avoid the i18n issues that I
>> don't and can quickly get what appears to be WG consensus
>> behind it, I believe the next draft should include (at least)
>> a placeholder section after the current Section 4 (" MIME and
>> Its Implications") called "Internationalization of Addresses
>> and Headers and Its Implications" or words to that effect.
>> 
>> And I hope that at least some of those who are actively
>> promoting the use of SMTPUTF8 addresses and also following
>> this list will do some writing rather than either expecting
>> me to do it or assuming the correct text will magically
>> appear.
>> 
>> best,
>>     john
>> 
>> 
>>    best,
>>      john
>> 
>> --
>> Emailcore mailing list
>> Emailcore@ietf.org
>> https://www.ietf.org/mailman/listinfo/emailcore