Re: [Emailcore] A/S outstanding issue #51 (email addresses in HTML forms)

Barry Leiba <barryleiba@computer.org> Thu, 20 October 2022 18:35 UTC

Return-Path: <barryleiba@gmail.com>
X-Original-To: emailcore@ietfa.amsl.com
Delivered-To: emailcore@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7A5D7C1522A8 for <emailcore@ietfa.amsl.com>; Thu, 20 Oct 2022 11:35:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.411
X-Spam-Level:
X-Spam-Status: No, score=-6.411 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.248, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fX6MrccL3wIJ for <emailcore@ietfa.amsl.com>; Thu, 20 Oct 2022 11:35:05 -0700 (PDT)
Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 64638C14CE41 for <emailcore@ietf.org>; Thu, 20 Oct 2022 11:34:43 -0700 (PDT)
Received: by mail-ed1-f48.google.com with SMTP id q19so797765edd.10 for <emailcore@ietf.org>; Thu, 20 Oct 2022 11:34:43 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f0zqql5MS9qfb+Zy75K9FRPXfxtsuS5gKAw1xvtKEK4=; b=4M/XdR2hpBQiUc06RRqM7bf9DWLm9tu36juO+3lPbdGaB/Oz1VZiDspGi9CUX5BkMY /tCrfs+glyfmns1hwyWbSq/BgTSo16FC/dw/xdICU8AC/Sfw6GZy7adusXKIayVyX/vn JLU1+YFJFEmDOlgz33permt+NOozh2gfLK0LL3SYEqMN29oMJp2tNJbcChnVB7B5eUaF PyCdzaBsu0lYPwFtgbom0utYqKwcHAVoO4D8rMkHsxtOxSGg3siuTddEUtBpmRui8xuZ Or1Oq5bpbJF7/wC0vNse6ApxYdaqrIQAZn/QRl0rQttjeYZAB6scttFeyu/qzXVvJl97 XoOg==
X-Gm-Message-State: ACrzQf1T/YauTxvSamGOfAHVHzq57wAswIN2R/OjwGAc4vLVqseulG2z 17ehBAVAGwTMWOQqy1QQuem88Sb30I95afPeW40=
X-Google-Smtp-Source: AMsMyM6BlsZhN+svkr2kTM1X/P2qOq6c52SDGs8wJUvlKxC/QAiEaH/rUn2WwMMpC6AEZDhqm0ScotgQzfk4RFRMi5s=
X-Received: by 2002:a05:6402:d5f:b0:458:dc7e:f728 with SMTP id ec31-20020a0564020d5f00b00458dc7ef728mr13881248edb.220.1666290881481; Thu, 20 Oct 2022 11:34:41 -0700 (PDT)
MIME-Version: 1.0
References: <20221007203938.49CCD4C1266B@ary.qy> <f4e4025f-82dc-4453-866c-8c8893f64421@app.fastmail.com> <5A01B9831F9D4C0D01CA61BB@JcK-HP5> <fd5dc688-621f-4f1e-97fd-0231dcff2232@app.fastmail.com> <7D9B45F3E50A3F0DBF3BAE98@JcK-HP5> <CALaySJJeM6myw0ZhmDp=-A-46WfutWNQdL0+iV-FXDA5HQ25Cg@mail.gmail.com> <9b021a56-e226-3a34-3a72-933ceaf724b5@fastmail.com>
In-Reply-To: <9b021a56-e226-3a34-3a72-933ceaf724b5@fastmail.com>
From: Barry Leiba <barryleiba@computer.org>
Date: Thu, 20 Oct 2022 14:34:29 -0400
Message-ID: <CALaySJKbVOTmXijit-nZO2wasWVmoLkQCe6SF+_85Xe5zE9iQg@mail.gmail.com>
To: Ken Murchison <murch@fastmail.com>
Cc: John C Klensin <john-ietf@jck.com>, Alexey Melnikov <aamelnikov@fastmail.fm>, emailcore@ietf.org, John R Levine <johnl@taugh.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/emailcore/VBGzgYn-k7AlvJxLEOz_lUsiYoM>
Subject: Re: [Emailcore] A/S outstanding issue #51 (email addresses in HTML forms)
X-BeenThere: emailcore@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: EMAILCORE proposed working group list <emailcore.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/emailcore>, <mailto:emailcore-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/emailcore/>
List-Post: <mailto:emailcore@ietf.org>
List-Help: <mailto:emailcore-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/emailcore>, <mailto:emailcore-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2022 18:35:06 -0000

> I look forward to your proposed text.  If you can post it and/or send it
> to me, I can get it in the A/S update that I intend to post before the
> deadline on Monday.

Here is text that I proposed adding after the paragraph Ken proposed
for the new Section 3.2:

ADD

In particular, SMTP specifies that the local-part of an email address
is case-sensitive (see Section 2.4 of
[I-D.ietf-emailcore-rfc5322bis]):

   The local-part of a mailbox MUST BE treated as case sensitive.
   Therefore, SMTP implementations MUST take care to preserve the case
   of mailbox local-parts.  In particular, for some hosts, the user
   "smith" is different from the user "Smith".  However, exploiting the
   case sensitivity of mailbox local-parts impedes interoperability and
   is discouraged.

While case-sensitivity is specified as an absolute requirement, it is
important to stress that most implementations do not make case
distinctions in local parts (most treat “smith”, “Smith”, and “SMITH”
as the same), and most implementations do preserve the case that is
received (from SMTP or HTTP, from address books, or from user input).
Maximum interoperability will be achieved by keeping local-parts
unchanged (and especially making no attempt to change their case in
any way) and by assuming that local-parts that differ only in their
case probably refer to the same mailbox.  This is particularly
important for software that validates user-input fields, where case
changes are tempting, but must be avoided.

It is also important to note, as we encounter non-ASCII local-parts
over time, that case changes are both character-set dependent and
language dependent, and attempts to change case without having the
full context necessary are likely to be wrong often enough to matter.

END

I also wonder if, somewhere, we should say that new implementations
SHOULD make local-parts that differ only in their case refer to the
same mailbox, thus strengthening the "is discouraged" from the SMTP
spec.  We might also be able to get away with moving from "is
discouraged" to some SHOULD NOT wording in SMTP, while still moving to
Internet Standard.

Barry

> On 10/17/22 10:53 AM, Barry Leiba wrote:
> > Process: I think that it we change the case-sensitivity of local-part,
> > we are no longer in an Internet Standard path, but would have to go
> > back to Proposed Standard.
> >
> > I think the best approach for us now is to leave the text in 5321bis
> > that's in Section 2.4, which discourages case-sensitivity, to put very
> > clear text in the AS that actually using case-sensitive local-part is
> > bad for interoperability and will break with a lot of current software
> > that assume insensitivity, however incorrectly, and to thus have the
> > AS highlight that discouragement.
> >
> > The result would be that the formal grammar would still allow
> > case-sensitive local-part and SMTP would still normatively say, "The
> > local-part of a mailbox MUST BE treated as case sensitive.  Therefore,
> > SMTP implementations MUST take care to preserve the case of mailbox
> > local-parts."  (Except that the "BE" should be in lower case... JCK
> > please note.)  But it also would still say, "However, exploiting the
> > case sensitivity of mailbox local-parts impedes interoperability and
> > is discouraged," and the AS would follow up on that part.
> >
> > I'm working on some text to propose for the AS in line with what I'm suggesting.
> >
> > Barry
> >
> > On Mon, Oct 17, 2022 at 10:32 AM John C Klensin <john-ietf@jck.com> wrote:
> >>
> >>
> >> --On Monday, 17 October, 2022 14:35 +0100 Alexey Melnikov
> >> <aamelnikov@fastmail.fm> wrote:
> >>
> >>> Hi John,
> >>>
> >>> On Mon, Oct 17, 2022, at 2:25 PM, John C Klensin wrote:
> >>>> As participant only...
> >>> Likewise.
> >>>
> >>>> --On Monday, 17 October, 2022 14:00 +0100 Alexey Melnikov
> >>>> <aamelnikov@fastmail.fm> wrote:
> >>>>
> >>>>> Hi John,
> >>>>> I agree with you that we should say a bit more about
> >>>>> problematic cases. Possible add something like your text
> >>>>> after the paragraph that Ken suggested.
> >>>>>
> >>>>> Some specific comments below:
> >>>>>
> >>>>> On Fri, Oct 7, 2022, at 9:39 PM, John Levine wrote:
> >>>>>> It appears that Ken Murchison  <murch@fastmail.com> said:
> >>>>>>> I have crafted the following text for this issue:
> >>>>> ...
> >>>>>> If we are going to stick our foot into this swamp at all, I
> >>>>>> think we should dive in and describe the popular ways that
> >>>>>> non-mail systems screw up mail addresses such as
> >>>>>>
> >>>>>> * Everyone assumes ASCII upper and lower case are
> >>>>>> equivalent. Many turn addresses into all upper or all lower
> >>>>>> before sending
> >>>>> Yes, I think we should this.
> >>>> Agreed, but "everyone" is too strong and therein lies the
> >>>> problem.  A bit more needs to be said to discourage the
> >>>> practices and/or to predict occasional problems when those
> >>>> transformations are made.
> >>> I think enough systems assume ASCII case-insensitivity that
> >>> insisting that they are not is not going to work in many
> >>> cases. I am afraid the boat has sailed on enforcing this one.
> >> Then someone should be proposing that we change 5321bis, not
> >> just make a comment in the A/S.  Either way, this increases my
> >> concern about excluding SMTPUTF8 comments/advice from the A/S.
> >> Based on the "case sensitive local parts" requirement, the EAI
> >> WG decided that it did not need to explicitly insist on that.
> >> However, if we say something equivalent to "it is ok to assume
> >> that local-parts of addresses are case-insensitive because
> >> everyone else does", then we probably need to be clear that, in
> >> general, that does not apply to non-ASCII addresses in either
> >> the local-part or, if expressed in UTF-8 rather than Punycode
> >> encoding, the domain part. The A/S already steps rather far into
> >> that swamp by saying that Internationalized Email SHOULD be
> >> supported in Section 2.4 (incidentally the citation there is
> >> wrong).  And then we probably need to figure out whether those
> >> who assume case insensitivity for ASCII also assume it for
> >> non-ASCII Latin script strings.  A reasonable, but naive,
> >> assumption is that it should ("after all, what difference does a
> >> diacritical make?") but the reality is that it does not work for
> >> many cases.
> >>
> >> (( Example for those who have avoided immersion in the i18n
> >> swamp: for some languages, in some localities, the upper case of
> >> "á" (U+00E1) is "A" (U+0041).   Now, in a context in which
> >> SMTPUTF8 addresses are allowed, what is the lower case of
> >> "ABC@EFG".  If one assumes, a priori, that is an ASCII string,
> >> then "abc@efg" is a reasonable (and correct and unique) answer.
> >> But what if the "real" address was "ábc@éfg" and someone got
> >> "ABC@EFG" by applying a "drop the diacritical marks when going
> >> to upper case" rule?   The Unicode Case Mapping and Case Folding
> >> rules prevent doing that, but the SMTPUTF8 specs don't reference
> >> them as useful operations.   And, at the risk of invoking an
> >> issue that brought about conflicting standards in the IDN world,
> >> the character "ß" (U+00DF) does not have a distinct upper case
> >> form... except when it does.  Those are just example that should
> >> be at least mostly understandable to those reading this: there
> >> are cases that are arguably much worse.  ))
> >>
> >> So, if we are going to say something in the A/S that essentially
> >> changes the requirement, we'd better write it very carefully --
> >> and probably explicitly include RFC 6530ff in its scope.
> >>
> >>>>> ...
> >> More generally, as non-ASCII email addresses (even ASCII local
> >> parts with IDNs expressed in UTF-8 not Punycode) become more
> >> prevalent and especially if the A/S is going to put a SHOULD on
> >> Internationalized Address support, I am becoming convinced that
> >> we would be performing a real disservice to the international
> >> email community, as well as nearly contradicting ourselves, by
> >> pretending that issues like the above by ignoring the i18n
> >> issues and, in particular, saying "ASCII addresses" and assuming
> >> the reader will understand all of those subtleties .
> >>
> >> (A/S co-author hat momentarily back on.)
> >> Ken, unless someone sees a way to avoid the i18n issues that I
> >> don't and can quickly get what appears to be WG consensus behind
> >> it, I believe the next draft should include (at least) a
> >> placeholder section after the current Section 4 (" MIME and Its
> >> Implications") called "Internationalization of Addresses and
> >> Headers and Its Implications" or words to that effect.
> >>
> >> And I hope that at least some of those who are actively
> >> promoting the use of SMTPUTF8 addresses and also following this
> >> list will do some writing rather than either expecting me to do
> >> it or assuming the correct text will magically appear.
> >>
> >> best,
> >>      john
> >>
> >>
> >>     best,
> >>       john
> >>
> >> --
> >> Emailcore mailing list
> >> Emailcore@ietf.org
> >> https://www.ietf.org/mailman/listinfo/emailcore