Re: [dane] email canonicalization for SMIMEA owner names

"John Levine" <johnl@taugh.com> Thu, 11 December 2014 22:40 UTC

Return-Path: <johnl@taugh.com>
X-Original-To: dane@ietfa.amsl.com
Delivered-To: dane@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6DD461A0389 for <dane@ietfa.amsl.com>; Thu, 11 Dec 2014 14:40:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.663
X-Spam-Level: *
X-Spam-Status: No, score=1.663 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HELO_MISMATCH_COM=0.553, HOST_MISMATCH_NET=0.311, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mHo3yBEyrVQO for <dane@ietfa.amsl.com>; Thu, 11 Dec 2014 14:40:30 -0800 (PST)
Received: from miucha.iecc.com (abusenet-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:1126::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3DBE61A1BB1 for <dane@ietf.org>; Thu, 11 Dec 2014 14:40:30 -0800 (PST)
Received: (qmail 85850 invoked from network); 11 Dec 2014 22:40:25 -0000
Received: from miucha.iecc.com (64.57.183.18) by mail1.iecc.com with QMQP; 11 Dec 2014 22:40:25 -0000
Date: 11 Dec 2014 22:40:07 -0000
Message-ID: <20141211224007.10592.qmail@ary.lan>
From: "John Levine" <johnl@taugh.com>
To: dane@ietf.org
In-Reply-To: <20141211220308.GH3448@localhost>
Organization:
X-Headerized: yes
Mime-Version: 1.0
Content-type: text/plain; charset=utf-8
Content-transfer-encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/dane/zplY4TlC71RevHriDoypHsQdUhk
Subject: Re: [dane] email canonicalization for SMIMEA owner names
X-BeenThere: dane@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DNS-based Authentication of Named Entities <dane.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dane>, <mailto:dane-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dane/>
List-Post: <mailto:dane@ietf.org>
List-Help: <mailto:dane-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dane>, <mailto:dane-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Dec 2014 22:40:32 -0000

>Well, domains could publish the local-part canonicalization function
>they use, or, rather, a small index of well-known canonicalization
>functions.

Mail systems do fuzzy matching on local parts in an enormous number of
ways.  But you will find that once you get past "map it all to lower
case" the rest of them are all out at the tail of the curve.

I don't know anyone other than Gmail that treats dots as noise
characters (bobsmith@gmail.com, Bob.Smith@gmail.com, and
B.o.B.s.M.i.T.h@gmail.com are all the same mailbox) but since Gmail is
such a large player, do they get their own special case?

Personally, I think that:

a) Viktor's approach is terrible, and

2) it's the best we're going to do so we might as well use it.

R's,
John