[newprep] Directions for a Framework

Mark Lentczner <markl@lindenlab.com> Wed, 19 May 2010 15:56 UTC

From: Mark Lentczner <markl@lindenlab.com>
Content-Type: multipart/alternative; boundary="Apple-Mail-4--375362564"
Date: Wed, 19 May 2010 08:56:20 -0700
Message-Id: <E9728BD9-05DE-485B-B2DB-7F3D440B49E6@lindenlab.com>
To: newprep@ietf.org
Mime-Version: 1.0 (Apple Message framework v1078)
Subject: [newprep] Directions for a Framework
Precedence: list

Charter issues aside, I'd like to share some thoughts about possible directions for a new framework (which would be one way address the current stringprep users):

Like stringprep, I imagine that such a framework should be limited in scope to address strings that are chosen by humans, typed by humans (at some point), but then need to be used as tokens within protocols. The aim is that when two humans enter what they believe to be the same string, the resulting prepared strings can be compared identically character by character. It is expected that different protocols using such a framework would need to adapt it to their needs, notably adjusting the set of acceptable characters primarily for syntactic reasons. (I.e.: one protocol might need to exempt U+0040 COMMERCIAL AT ('@') because that is a separator in the protocol. Another protocol might be fine with that character, but need to exclude U+007C VERTICAL LINE for similar reasons.)

In the exploratory work I did (reported here earlier), I found there were three potential ways forward:

1) Build a stringprepbis along the same lines as stringprep, but defined in terms of Unicode properties. This would insulate it from being tied to one version of Unicode, while keeping it essentially the same.

2) Build a framework based on UAX #31. UAX #31 already is a framework, though it is admittedly looser in approach than stringprep. An IETF framework based on UAX #31 would probably settle on the basic method (UAX #31 has at least two), and reduce the set of options available for profiles to something closer to stringprep's mix-n-match approach.

3) Abstract out IDNA2008's work into a new framework. This would require generalizing the work in IDNA2008, as it is defined in light of IDNA2008's specific identifier needs. (For example, the restriction to lower case is incorporated into several of the other character tests.)

As I found, the second and third approaches are much more aligned (given UAX #31 as of Unicode 5.2, and given a conceptual generalization of the IDNA2008 approach), than the first. The stringprepbis approach would need significant work to incorporate the understanding gleaned from the original IDNA and work of the IDNAbis WG.

Between the second and third, the second would be less work (as the bulk of it is in UAX #31), but the third is likely to match IDNA2008 more closely (perhaps to the point that practically, implementations could implement IDNA2008 as a profile of the new framework.)

Either of the later two approaches induces standards coupling: Either with UAX #31 or with IDNA2008. UAX #31 does have significant stability guarantees, and is intended for this kind of use. IDNA2008 expressly decided not to build a framework and so the third approach would either have to depend on the wording of IDNA2008 in a way it wasn't intended, or duplicate the work, and either track it or risk divergence. A particular issue with either are the "contextual checks", which both have, and can be considered a moving target.

It should be noted that *all* approaches induce standards coupling with Unicode itself, though that is, I think, a clear aim of this group. Unicode as a whole has many different stability guarantees, of various levels, and I believe it is quite reasonable to choose which parts of Unicode to couple to in order to achieve the aims needed by IETF protocols and human nomenclature.

While I think the above discussion reveals I lean toward the UAX #31 approach, I'm eager to learn what others think of these approaches, and ideas for other ways to proceed.

- Mark

Mark Lentczner
Sr. Systems Architect
Technology Integration
Linden Lab

markl@lindenlab.com

Zero Linden
zero.linden@secondlife.com

[newprep] Directions for a Framework Mark Lentczner
[newprep] wg/newprep project: clarification asked JFC Morfin
Re: [newprep] wg/newprep project: clarification a… Andrew Sullivan
Re: [newprep] wg/newprep project: clarification a… jefsey
Re: [newprep] wg/newprep project: clarification a… Mark Lentczner
Re: [newprep] wg/newprep project: clarification a… jefsey
Re: [newprep] wg/newprep project: clarification a… Mark Lentczner
Re: [newprep] wg/newprep project: clarification a… JFC Morfin