Re: Last Call: <draft-ietf-dane-openpgpkey-07.txt>

E Taylor <hagfish@hagfish.name> Mon, 15 February 2016 08:36 UTC

Return-Path: <hagfish@hagfish.name>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 17EB31A92B6 for <ietf@ietfa.amsl.com>; Mon, 15 Feb 2016 00:36:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.002
X-Spam-Level:
X-Spam-Status: No, score=-0.002 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, SPF_HELO_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qs_AJ4t1w_Rb for <ietf@ietfa.amsl.com>; Mon, 15 Feb 2016 00:36:39 -0800 (PST)
Received: from gradienthosting.co.uk (gradienthosting.co.uk [159.253.56.71]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B75921A88EB for <ietf@ietf.org>; Mon, 15 Feb 2016 00:36:39 -0800 (PST)
Received: from enoch.localdomain (host86-152-126-106.range86-152.btcentralplus.com [::ffff:86.152.126.106]) (AUTH: LOGIN hagfish, TLS: TLSv1/SSLv3,128bits,DHE-RSA-AES128-SHA) by gradienthosting.co.uk with ESMTPSA; Mon, 15 Feb 2016 08:36:37 +0000 id 0000000000080006.0000000056C18E15.000003CA
Received: from localhost ([127.0.0.1]) by enoch.localdomain with esmtp (Exim 4.80) (envelope-from <hagfish@hagfish.name>) id 1aVEe4-0005nq-Ug; Mon, 15 Feb 2016 08:36:36 +0000
Message-ID: <56C18E14.8060608@hagfish.name>
Date: Mon, 15 Feb 2016 08:36:36 +0000
From: E Taylor <hagfish@hagfish.name>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130116 Icedove/10.0.12
MIME-Version: 1.0
To: John C Klensin <john-ietf@jck.com>, ietf@ietf.org
Subject: Re: Last Call: <draft-ietf-dane-openpgpkey-07.txt>
References: <56C09764.1020700@hagfish.name> <3E8BDD1E0C94F17DFD06C92C@JcK-HP5.jck.com>
In-Reply-To: <3E8BDD1E0C94F17DFD06C92C@JcK-HP5.jck.com>
X-Enigmail-Version: 1.4.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/ietf/7qqmoghCO_BfAt3YdRn81chn5Mc>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Feb 2016 08:36:42 -0000

Hello,

Thank you, John, for your detailed comments on the i18n aspect of this
draft, which I admit I hadn't fully considered.  I think you're right
that, whatever approach is taken, it would make sense to add a short
"Internationalization Considerations" section to state what the expected
interaction is between this specification and non-ASCII addresses.

More comments inline below:

> Temporarily and for purposes of discussion, assume I agree with
> the above as far as it goes (see below).   Given that, what do
> you, and the systems you have tested, propose to do about
> addresses that contain non-ASCII characters in the local-part
> (explicitly allowed by the present spec)?  Note that lowercasing
> [1] and case folding are different and produce different results
> and that both are language-sensitive in a number of cases, what
> specifically do you think the spec should recommend?  

I have not seen any specific examples of software which unintentionally
converts characters to uppercase (although I can readily imagine such
bugs/features), so I'm prepared to assume that the lowercasing logic can
be safely limited to just the input strings which include only ASCII
characters.  My idea was for the client to make a reasonable effort to
correct for a plausible (but rare) problem, so for the purposes of an
experiment I think it is acceptable if this correction does not try
anything more clever, like converting MUSTAFA.AKINCI@EXAMPLE.COM to
mustafa.akıncı@example.com (although mustafa.akinci@example.com should
be tried).

> Also, do you think it is acceptable to publish this document
> with _any_ suggestions about lower-casing or "try this, then try
> something else" search without at least an "Internationalization
> Considerations" section that would discuss the issues [1] and/or
> some more specific recommendation than "try lowercase" (more on
> that, with a different problem case, below).

You are right that adding such a section could be of great benefit to at
least some implementers, even if the discussion in that section is
simply "Only try lower-casing when the input is all ASCII".  If someone
can come up with something more helpful than that brief statement, then
I'd be very supportive of it.

> Dropping that assumption of agreement for discussion, I
> personally believe that this document could be acceptable _as an
> Experimental spec_ with any of the following three models, but
> not without any of them:
>
>  (i) The present "MUST not try to guess" text.
>
>  (ii) A recommendation about lowercasing along the lines
> 	you have outlined but with a clear discussion of i18n
> 	issues and how to handle them [2].
>
>  (iii) A clear statement that the experiment is just an
> 	experiment and that, for the purposes of the experiment,
> 	addresses that contain non-ASCII characters in the local
> 	part are not acceptable (note that would also require
> 	pulling the UTF-8 discussion out of Section 3 and
> 	dropping the references to RFC 6530 and friends).

Perhaps you would settle for an option (ii.v) which is my lowercasing
recommendation + a discussion of the i18n issues + that discussion being
based on the experimental restriction of only applying the lowercasing
logic to ASCII-only local parts.  I hope that would be in keeping with
your sensible suggestions above.

> ...
> e.g., 
>    U+0066 U+006F U+0308 U+006F   and
>    U+0066 U+00F6 U+006F
> are perfectly good (and SMTPUTF8-valid) representations of the
> string "föo"    
>
> Using the same theory as your lower case approach, would you
> recommend trying first one of those and then the other [3]?

That is tempting, but I accept that it may be too much unnecessary
complexity to suggest or recommend it at this stage of the experiment. 
I know that various ideas have been proposed for handling normalisation
of local-parts more generally, and I think we should allow that work to
progress separately, uncoupling it from the document at hand.

> The more I think about it, the more I'm convinced that the
> specification and allowance for UTF-8 [4] in the first bullet of
> Section 3 is unacceptable without either text there that much
> more carefully describes (and specifies what to do about) these
> cases or an "Internationalization Considerations" section that
> provides the same information.  I suggest that anyone
> contemplating writing such text carefully study (not just
> reference) Section 10.1 of RFC 6530.   Of course, simply
> excluding non-ASCII local-parts from the experiment, as
> suggested in (iii) above, would be an alternative.  I have mixed
> feelings about whether it would be an acceptable one for an
> experiment.  I am quite sure it would not be acceptable for a
> standards-track document when the EAI work and/or the IETF
> commitment to diversity are considered.

I think that excluding non-ASCII local-parts from just the extra
lower-casing logic, and pointing out the complexity of case handling in
non-ASCII contexts in a separate section as you have suggested, might
address the outstanding concerns, without hindering diversity.

> ...
> [2] I note that, historically, the DNS community has been very
> reluctant to accept techniques that depend on or imply multiple
> lookups for a single perceived object and, separately, for
> "guess at this, try it, and, if that does not work, guess at
> something else" approaches.  Unless those concerns have
> disappeared, the potential for combinatorial explosion when
> lower-casing characters that may lie outside the ASCII
> repertoire is truly impressive.

That's another reasonable point, thank you.  Hopefully it is mitigated,
at least for the most part, by settling for only lower-casing characters
for all-ASCII local-parts, avoiding the combinatorial explosion you
mention.  Also, this extra lower-casing step will only happen in the
relatively rare situations where the input local-part contains at least
one upper-case character (although I don't know in practice how many
extra lookups that will lead to, on average).

Best regards,
Edwin