Re: [dane] email canonicalization for SMIMEA owner names

Viktor Dukhovni <ietf-dane@dukhovni.org> Thu, 11 December 2014 20:50 UTC

Return-Path: <ietf-dane@dukhovni.org>
X-Original-To: dane@ietfa.amsl.com
Delivered-To: dane@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF23C1A016B for <dane@ietfa.amsl.com>; Thu, 11 Dec 2014 12:50:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.9
X-Spam-Level:
X-Spam-Status: No, score=-3.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JkUojVyaLhe6 for <dane@ietfa.amsl.com>; Thu, 11 Dec 2014 12:50:54 -0800 (PST)
Received: from mournblade.imrryr.org (mournblade.imrryr.org [38.117.134.19]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 984331A700F for <dane@ietf.org>; Thu, 11 Dec 2014 12:50:54 -0800 (PST)
Received: by mournblade.imrryr.org (Postfix, from userid 1034) id 44B15282F8B; Thu, 11 Dec 2014 20:50:53 +0000 (UTC)
Date: Thu, 11 Dec 2014 20:50:53 +0000
From: Viktor Dukhovni <ietf-dane@dukhovni.org>
To: dane@ietf.org
Message-ID: <20141211205053.GN25666@mournblade.imrryr.org>
References: <95826148-4F06-4942-87A4-2F6601BA0F90@nist.gov>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <95826148-4F06-4942-87A4-2F6601BA0F90@nist.gov>
User-Agent: Mutt/1.5.23 (2014-03-12)
Archived-At: http://mailarchive.ietf.org/arch/msg/dane/Q-Ho-v4gAauEu_Uxd50B3j8MAnc
Subject: Re: [dane] email canonicalization for SMIMEA owner names
X-BeenThere: dane@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: dane@ietf.org
List-Id: DNS-based Authentication of Named Entities <dane.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dane>, <mailto:dane-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dane/>
List-Post: <mailto:dane@ietf.org>
List-Help: <mailto:dane-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dane>, <mailto:dane-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Dec 2014 20:50:57 -0000

On Thu, Dec 11, 2014 at 02:51:27PM -0500, Rose, Scott W. wrote:

> Realized the other action item I was assigned to from the interim
> meeting was email canonicalization for SMIMEA.  I believe it stems
> from Viktor Dukhovni's email to the endymail list:
> http://www.ietf.org/mail-archive/web/endymail/current/msg00134.html
> 
> I was wondering if we can borrow a page from RFC 4034 Section 6.2 and include text in the draft Section 3, item 1 in the numbered list:
> 
>      1.   The user name (the "left-hand side" of the email address, called
>        the "local-part" in the mail message format definition [RFC2822]
>        and the "local part" in the specification for internationalized
>        email [RFC6530]), is hashed using the SHA2-224 [RFC5754]
>        algorithm (with the hash being represented in its hexadecimal
>        representation, to become the left-most label in the prepared
>        domain name.  This does not include the "@" character that
>        separates the left and right sides of the email address.  The
>        string that is used for the local part is a Unicode string
>        encoded in UTF-8 **with all upper case letters converted to their
>        corresponding lower case letters where appropriate.**
> 
> The text between the '**' is new.  The goal is to prevent a situation when the email address is "JRandom@example.com" and the SMIMEA is created using "jrandom" as the user name.   Would this be enough, or are there scripts where this would result in different or potentially conflicting owner names?  

This proposal is sadly simply wrong.  There is no correct
(language-independent) canonicalization of Unicode to lower case.

Nor is it appropriate to down-case even ASCII localparts, because
these are by definition case-sensitive on the wire, with any
case-folding solely at the discretion of the destination system.

I have a proposal that solves the ASCII use-case.  Sadly, little
can be done for non-ASCII Unicode, those names will just have to
be used consistently by all parties.

For all-ASCII addresses, (ignoring for the moment Turkish case-
folding of "I" to a non-ASCII "dotless" "i"), the proposal is
as follows:

    * Clarification: Localparts that are not dot-atoms and
      require quoting, retain the quotes when hashed, only
      the @domain part of the address is removed, the rest
      of the address is retained verbatim.

    * Domains that publish user SMIMEA records, which intend for
      for the names to be treated case insensitively, compute two
      hashes for each name:

	    SHA2-224("Frank.Jr.")            -> <base32-hash1>
	    SHA2-224(@lower:"frank.jr.") -> <base32-hash2>

      The DNS records are then: 

	<base32-hash1>.example.com. IN SMIMEA ...
	<base32-hash2>.example.com. IN CNAME <base32-hash1>

    * Domains that don't do case-insensitive delivery publish only
      the as-is form of each address without any "@lower:" prefix.

    * Clients that encounter an ascii localpart that is not all lower-case
      try both keys, first the localpart as-is, then case-folded with
      the "@lower:" prefix.  
      
-- 
	Viktor.