[storm] More on the newprep BOF

<Black_David@emc.com> Thu, 25 March 2010 06:40 UTC

Return-Path: <Black_David@emc.com>
X-Original-To: storm@core3.amsl.com
Delivered-To: storm@core3.amsl.com
Received: from localhost (localhost []) by core3.amsl.com (Postfix) with ESMTP id 9D0CD3A6C63 for <storm@core3.amsl.com>; Wed, 24 Mar 2010 23:40:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.361
X-Spam-Status: No, score=-3.361 tagged_above=-999 required=5 tests=[AWL=-0.492, BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([]) by localhost (core3.amsl.com []) (amavisd-new, port 10024) with ESMTP id gxuTGm33DpA4 for <storm@core3.amsl.com>; Wed, 24 Mar 2010 23:40:50 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com []) by core3.amsl.com (Postfix) with ESMTP id 7329D3A6452 for <storm@ietf.org>; Wed, 24 Mar 2010 23:40:50 -0700 (PDT)
Received: from hop04-l1d11-si03.isus.emc.com (HOP04-L1D11-SI03.isus.emc.com []) by mexforward.lss.emc.com (Switch-3.3.2/Switch-3.1.7) with ESMTP id o2P6fBxn022821 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <storm@ietf.org>; Thu, 25 Mar 2010 02:41:11 -0400
Received: from mailhub.lss.emc.com (nagas.lss.emc.com []) by hop04-l1d11-si03.isus.emc.com (RSA Interceptor) for <storm@ietf.org>; Thu, 25 Mar 2010 02:41:08 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com []) by mailhub.lss.emc.com (Switch-3.4.2/Switch-3.3.2mp) with ESMTP id o2P6eFgl017813 for <storm@ietf.org>; Thu, 25 Mar 2010 02:41:08 -0400
Received: from CORPUSMX80B.corp.emc.com ([]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 25 Mar 2010 02:40:43 -0400
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-MimeOLE: Produced By Microsoft Exchange V6.5
Date: Thu, 25 Mar 2010 02:40:39 -0400
Message-ID: <C2D311A6F086424F99E385949ECFEBCB020DB34A@CORPUSMX80B.corp.emc.com>
Thread-Topic: More on the newprep BOF
thread-index: AcrL5hVyyKVKTyYuQkeR+MNrm0mT/Q==
From: <Black_David@emc.com>
To: <storm@ietf.org>
X-OriginalArrivalTime: 25 Mar 2010 06:40:43.0946 (UTC) FILETIME=[180640A0:01CACBE6]
X-EMM-EM: Active
Subject: [storm] More on the newprep BOF
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Mar 2010 06:40:51 -0000

Reminder - Look for newprep BOF materials here: https://datatracker.ietf.org/meeting/77/materials.html

This are the rest of my notes from the newprep BOF on Tuesday. 

A major motivation for moving beyond stringprep is that stringprep is fixed at Unicode 3.2 (the current version of Unicode is 5.2).  Characters added to Unicode versions subsequent to 3.2 are usually prohibited by stringprep because the stringprep tables treat their Unicode codepoints as unassigned.

In addition, there are problems in the original specification of stringprep that require changes to the processing of four Unicode characters:
	1) The eszett (sharp s) character in German (already discussed on this list).
	2) The sigma character in Greek when it occurs at the end of a word.
	3&4) The zero width joiner and zero width non-joiner characters.
These changes will show up in domain names, and hence impact IQN names for iSCSI

The two zero width characters are apparently crucial to Persian (Farsi) text.  We may need some help in determining whether/how that crucial role carries over to iSCSI IQN names.  If there's someone on this list who is fluent in written Persian (Farsi) and is familiar with iSCSI IQN names, please send a quick email to Tom (ttalpey@microsoft.com) and I (black_david@emc.com), as we may want to call upon your expertise at some point in the future.

But wait ... there's another (bigger, IMHO) problem.  iSCSI names are always lower case, and stringprep is used to realize a Unicode version of tolower().  The current approach to domain names (IDNA2008) has abandoned case folding (e.g., convert to lower case), as there are apparently languages in which it is not as simple as tolower().  In Unicode terms, this is a side effect of domain names changing from normalization form KC (NFKC) used by stringprep to normalization form C (NFC).

I want to emphasize that this is at an early stage, and it looks like we're going to get some help from the newprep effort.  My sense of the conclusion of the newprep BOF is that there was strong support to form a working group to do something about the problem, but very little sense about exactly what the new WG should do; watch for a proposed charter to emerge before the Maastricht meeting week.

Anyone interested in following or further participating in this adventure should join the newprep@ietf.org mailing list: https://www.ietf.org/mailman/listinfo/newprep

David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
black_david@emc.com        Mobile: +1 (978) 394-7754