[precis] Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname
Tom Worster <fsb@thefsb.org> Tue, 29 September 2015 21:28 UTC
Return-Path: <fsb@thefsb.org>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 205D21B5129 for <precis@ietfa.amsl.com>; Tue, 29 Sep 2015 14:28:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.201
X-Spam-Level:
X-Spam-Status: No, score=-1.201 tagged_above=-999 required=5 tests=[BAYES_50=0.8, GB_I_LETTER=-2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9YIPllKZHpdM for <precis@ietfa.amsl.com>; Tue, 29 Sep 2015 14:28:56 -0700 (PDT)
Received: from smtp98.iad3a.emailsrvr.com (smtp98.iad3a.emailsrvr.com [173.203.187.98]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DAE0E1B5127 for <precis@ietf.org>; Tue, 29 Sep 2015 14:28:55 -0700 (PDT)
Received: from smtp5.relay.iad3a.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp5.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id E2C94803FF; Tue, 29 Sep 2015 17:28:54 -0400 (EDT)
Received: by smtp5.relay.iad3a.emailsrvr.com (Authenticated sender: fsb-AT-thefsb.org) with ESMTPSA id F03CC80133; Tue, 29 Sep 2015 17:28:50 -0400 (EDT)
X-Sender-Id: fsb@thefsb.org
Received: from [10.0.1.2] (c-73-4-147-142.hsd1.ma.comcast.net [73.4.147.142]) (using TLSv1 with cipher DES-CBC3-SHA) by 0.0.0.0:465 (trex/5.4.2); Tue, 29 Sep 2015 21:28:54 GMT
User-Agent: Microsoft-MacOutlook/14.5.5.150821
Date: Tue, 29 Sep 2015 17:28:45 -0400
From: Tom Worster <fsb@thefsb.org>
To: Peter Saint-Andre <peter@andyet.com>, Alexey Melnikov <Alexey.Melnikov@isode.com>
Message-ID: <D230767C.6587A%fsb@thefsb.org>
Thread-Topic: Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname
Mime-version: 1.0
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/precis/7wX6-L_9HxLN2qmFp09J3MnOGLw>
Cc: precis@ietf.org
Subject: [precis] Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Sep 2015 21:28:58 -0000
Peter, Alexey, I think there is an ambiguity in the specification of case mapping in RFC 7613 and draft-ietf-precis-nickname-19. RFC 7613 section 3.2.2 says 3. Case-Mapping Rule: Uppercase and titlecase characters MUST be mapped to their lowercase equivalents, preferably using Unicode Default Case Folding as defined in the Unicode Standard [Unicode] (at the time of this writing, the algorithm is specified in Chapter 3 of [Unicode7.0], but the chapter number might change in a future version of the Unicode Standard); see further discussion in Section 3.4. But there are 55 code points in Unicode 7.0.0 that change under default case folding that are neither uppercase nor titlecase characters, 12 of which are Lowercase_Letter. I suspect this stems from a confusion between Unicode case mapping and case folding. From the Unicode Case Mapping FAQ (http://unicode.org/faq/casemap_charprop.html): Q: What is the difference between case mapping and case folding? A: Case mapping or case conversion is a process whereby strings are converted to a particular formuppercase, lowercase, or titlecase possibly for display to the user. Case folding is mostly used for caseless comparison of text, such as identifiers in a computer program, rather than actual text transformation. Case folding in Unicode is primarily based on the lowercase mapping, but includes additional changes to the source text to help make it language-insensitive and consistent. As a result, case-folded text should be used solely for internal processing and generally should not be stored or displayed to the end user. The purpose of the UsernameCaseMapped and Nickname PRECIS Profiles is internal string comparison so I would expect Unicode case folding without exception was intended. But it seems the text might be saying that we should remove from UCD CaseFolding.txt those code points that do not have either Uppercase_Letter or Titlecase_Letter. Similar text in draft-ietf-precis-nickname-19 leads to similar ambiguity. The nickname profile can be corrected or the algorithm clarified. I'm not sure what to do with a Proposed Standard RFC. Errata? Can the case mapping rule be changed in IANA? https://www.iana.org/assignments/precis-parameters/profiles/UsernameCaseMap ped.txt e.g. to "Apply Unicode default case folding" Tom These are the lines in question from UCD CaseFolding.txt to which I prefixed their General Category. Ll; 00B5; C; 03BC; # MICRO SIGN Ll; 017F; C; 0073; # LATIN SMALL LETTER LONG S Mn; 0345; C; 03B9; # COMBINING GREEK YPOGEGRAMMENI Ll; 03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA Ll; 03D0; C; 03B2; # GREEK BETA SYMBOL Ll; 03D1; C; 03B8; # GREEK THETA SYMBOL Ll; 03D5; C; 03C6; # GREEK PHI SYMBOL Ll; 03D6; C; 03C0; # GREEK PI SYMBOL Ll; 03F0; C; 03BA; # GREEK KAPPA SYMBOL Ll; 03F1; C; 03C1; # GREEK RHO SYMBOL Ll; 03F5; C; 03B5; # GREEK LUNATE EPSILON SYMBOL Ll; 1E9B; C; 1E61; # LATIN SMALL LETTER LONG S WITH DOT ABOVE Ll; 1FBE; C; 03B9; # GREEK PROSGEGRAMMENI Nl; 2160; C; 2170; # ROMAN NUMERAL ONE Nl; 2161; C; 2171; # ROMAN NUMERAL TWO Nl; 2162; C; 2172; # ROMAN NUMERAL THREE Nl; 2163; C; 2173; # ROMAN NUMERAL FOUR Nl; 2164; C; 2174; # ROMAN NUMERAL FIVE Nl; 2165; C; 2175; # ROMAN NUMERAL SIX Nl; 2166; C; 2176; # ROMAN NUMERAL SEVEN Nl; 2167; C; 2177; # ROMAN NUMERAL EIGHT Nl; 2168; C; 2178; # ROMAN NUMERAL NINE Nl; 2169; C; 2179; # ROMAN NUMERAL TEN Nl; 216A; C; 217A; # ROMAN NUMERAL ELEVEN Nl; 216B; C; 217B; # ROMAN NUMERAL TWELVE Nl; 216C; C; 217C; # ROMAN NUMERAL FIFTY Nl; 216D; C; 217D; # ROMAN NUMERAL ONE HUNDRED Nl; 216E; C; 217E; # ROMAN NUMERAL FIVE HUNDRED Nl; 216F; C; 217F; # ROMAN NUMERAL ONE THOUSAND So; 24B6; C; 24D0; # CIRCLED LATIN CAPITAL LETTER A So; 24B7; C; 24D1; # CIRCLED LATIN CAPITAL LETTER B So; 24B8; C; 24D2; # CIRCLED LATIN CAPITAL LETTER C So; 24B9; C; 24D3; # CIRCLED LATIN CAPITAL LETTER D So; 24BA; C; 24D4; # CIRCLED LATIN CAPITAL LETTER E So; 24BB; C; 24D5; # CIRCLED LATIN CAPITAL LETTER F So; 24BC; C; 24D6; # CIRCLED LATIN CAPITAL LETTER G So; 24BD; C; 24D7; # CIRCLED LATIN CAPITAL LETTER H So; 24BE; C; 24D8; # CIRCLED LATIN CAPITAL LETTER I So; 24BF; C; 24D9; # CIRCLED LATIN CAPITAL LETTER J So; 24C0; C; 24DA; # CIRCLED LATIN CAPITAL LETTER K So; 24C1; C; 24DB; # CIRCLED LATIN CAPITAL LETTER L So; 24C2; C; 24DC; # CIRCLED LATIN CAPITAL LETTER M So; 24C3; C; 24DD; # CIRCLED LATIN CAPITAL LETTER N So; 24C4; C; 24DE; # CIRCLED LATIN CAPITAL LETTER O So; 24C5; C; 24DF; # CIRCLED LATIN CAPITAL LETTER P So; 24C6; C; 24E0; # CIRCLED LATIN CAPITAL LETTER Q So; 24C7; C; 24E1; # CIRCLED LATIN CAPITAL LETTER R So; 24C8; C; 24E2; # CIRCLED LATIN CAPITAL LETTER S So; 24C9; C; 24E3; # CIRCLED LATIN CAPITAL LETTER T So; 24CA; C; 24E4; # CIRCLED LATIN CAPITAL LETTER U So; 24CB; C; 24E5; # CIRCLED LATIN CAPITAL LETTER V So; 24CC; C; 24E6; # CIRCLED LATIN CAPITAL LETTER W So; 24CD; C; 24E7; # CIRCLED LATIN CAPITAL LETTER X So; 24CE; C; 24E8; # CIRCLED LATIN CAPITAL LETTER Y So; 24CF; C; 24E9; # CIRCLED LATIN CAPITAL LETTER Z
- [precis] Ambiguity in specification of case mappi… Tom Worster
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre - &yet
- Re: [precis] Ambiguity in specification of case m… John C Klensin
- Re: [precis] Ambiguity in specification of case m… Tom Worster
- Re: [precis] Ambiguity in specification of case m… John C Klensin
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre - &yet
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
- Re: [precis] Ambiguity in specification of case m… John C Klensin
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
- Re: [precis] Ambiguity in specification of case m… John C Klensin
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
- Re: [precis] Ambiguity in specification of case m… Tom Worster
- Re: [precis] Ambiguity in specification of case m… John C Klensin
- Re: [precis] Ambiguity in specification of case m… Tom Worster
- Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
- Re: [precis] Ambiguity in specification of case m… Tom Worster
- Re: [precis] Ambiguity in specification of case m… John C Klensin