[newprep] Directions for a Framework

Mark Lentczner <markl@lindenlab.com> Wed, 19 May 2010 15:56 UTC

Return-Path: <markl@lindenlab.com>
X-Original-To: newprep@core3.amsl.com
Delivered-To: newprep@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0DE9E3A69D5 for <newprep@core3.amsl.com>; Wed, 19 May 2010 08:56:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.332
X-Spam-Level:
X-Spam-Status: No, score=-0.332 tagged_above=-999 required=5 tests=[AWL=-0.334, BAYES_50=0.001, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n5bdIkSWfV1w for <newprep@core3.amsl.com>; Wed, 19 May 2010 08:56:40 -0700 (PDT)
Received: from mail-fx0-f44.google.com (mail-fx0-f44.google.com [209.85.161.44]) by core3.amsl.com (Postfix) with ESMTP id 01B123A6B77 for <newprep@ietf.org>; Wed, 19 May 2010 08:56:38 -0700 (PDT)
Received: by fxm2 with SMTP id 2so1087912fxm.31 for <newprep@ietf.org>; Wed, 19 May 2010 08:56:26 -0700 (PDT)
Received: by 10.223.56.212 with SMTP id z20mr3084614fag.1.1274284586421; Wed, 19 May 2010 08:56:26 -0700 (PDT)
Received: from 945battery-guestb-139.lindenlab.com ([38.99.63.41]) by mx.google.com with ESMTPS id z12sm3950239fah.21.2010.05.19.08.56.23 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 19 May 2010 08:56:25 -0700 (PDT)
From: Mark Lentczner <markl@lindenlab.com>
Content-Type: multipart/alternative; boundary="Apple-Mail-4--375362564"
Date: Wed, 19 May 2010 08:56:20 -0700
Message-Id: <E9728BD9-05DE-485B-B2DB-7F3D440B49E6@lindenlab.com>
To: newprep@ietf.org
Mime-Version: 1.0 (Apple Message framework v1078)
X-Mailer: Apple Mail (2.1078)
Subject: [newprep] Directions for a Framework
X-BeenThere: newprep@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Stringprep after IDNA2008 <newprep.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/newprep>, <mailto:newprep-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/newprep>
List-Post: <mailto:newprep@ietf.org>
List-Help: <mailto:newprep-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/newprep>, <mailto:newprep-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 May 2010 15:56:42 -0000

Charter issues aside, I'd like to share some thoughts about possible directions for a new framework (which would be one way address the current stringprep users):

Like stringprep, I imagine that such a framework should be limited in scope to address strings that are chosen by humans, typed by humans (at some point), but then need to be used as tokens within protocols. The aim is that when two humans enter what they believe to be the same string, the resulting prepared strings can be compared identically character by character. It is expected that different protocols using such a framework would need to adapt it to their needs, notably adjusting the set of acceptable characters primarily for syntactic reasons.  (I.e.: one protocol might need to exempt U+0040 COMMERCIAL AT ('@') because that is a separator in the protocol. Another protocol might be fine with that character, but need to exclude U+007C VERTICAL LINE for similar reasons.)

In the exploratory work I did (reported here earlier), I found there were three potential ways forward:

1) Build a stringprepbis along the same lines as stringprep, but defined in terms of Unicode properties. This would insulate it from being tied to one version of Unicode, while keeping it essentially the same.

2) Build a framework based on UAX #31. UAX #31 already is a framework, though it is admittedly looser in approach than stringprep. An IETF framework based on UAX #31 would probably settle on the basic method (UAX #31 has at least two), and reduce the set of options available for profiles to something closer to stringprep's mix-n-match approach.

3) Abstract out IDNA2008's work into a new framework. This would require generalizing the work in IDNA2008, as it is defined in light of IDNA2008's specific identifier needs. (For example, the restriction to lower case is incorporated into several of the other character tests.)

As I found, the second and third approaches are much more aligned (given UAX #31 as of Unicode 5.2, and given a conceptual generalization of the IDNA2008 approach), than the first. The stringprepbis approach would need significant work to incorporate the understanding gleaned from the original IDNA and work of the IDNAbis WG.

Between the second and third, the second would be less work (as the bulk of it is in UAX #31), but the third is likely to match IDNA2008 more closely (perhaps to the point that practically, implementations could implement IDNA2008 as a profile of the new framework.) 

Either of the later two approaches induces standards coupling: Either with UAX #31 or with IDNA2008. UAX #31 does have significant stability guarantees, and is intended for this kind of use. IDNA2008 expressly decided not to build a framework and so the third approach would either have to depend on the wording of IDNA2008 in a way it wasn't intended, or duplicate the work, and either track it or risk divergence. A particular issue with either are the "contextual checks", which both have, and can be considered a moving target.

It should be noted that *all* approaches induce standards coupling with Unicode itself, though that is, I think, a clear aim of this group. Unicode as a whole has many different stability guarantees, of various levels, and I believe it is quite reasonable to choose which parts of Unicode to couple to in order to achieve the aims needed by IETF protocols and human nomenclature.

While I think the above discussion reveals I lean toward the UAX #31 approach, I'm eager to learn what others think of these approaches, and ideas for other ways to proceed.

	- Mark




Mark Lentczner
Sr. Systems Architect
Technology Integration
Linden Lab

markl@lindenlab.com

Zero Linden
zero.linden@secondlife.com