[precis] Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname

Tom Worster <fsb@thefsb.org> Tue, 29 September 2015 21:28 UTC

User-Agent: Microsoft-MacOutlook/14.5.5.150821
Date: Tue, 29 Sep 2015 17:28:45 -0400
From: Tom Worster <fsb@thefsb.org>
To: Peter Saint-Andre <peter@andyet.com>, Alexey Melnikov <Alexey.Melnikov@isode.com>
Message-ID: <D230767C.6587A%fsb@thefsb.org>
Thread-Topic: Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname
Mime-version: 1.0
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/precis/7wX6-L_9HxLN2qmFp09J3MnOGLw>
Cc: precis@ietf.org
Subject: [precis] Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname
Precedence: list

Peter, Alexey,

I think there is an ambiguity in the specification of case mapping in RFC
7613 and draft-ietf-precis-nickname-19.

RFC 7613 section 3.2.2 says

   3.  Case-Mapping Rule: Uppercase and titlecase characters MUST be
       mapped to their lowercase equivalents, preferably using Unicode
       Default Case Folding as defined in the Unicode Standard [Unicode]
       (at the time of this writing, the algorithm is specified in
       Chapter 3 of [Unicode7.0], but the chapter number might change in
       a future version of the Unicode Standard); see further discussion
       in Section 3.4.

But there are 55 code points in Unicode 7.0.0 that change under default
case folding that are neither uppercase nor titlecase characters, 12 of
which are Lowercase_Letter. I suspect this stems from a confusion between
Unicode case mapping and case folding. From the Unicode Case Mapping FAQ
(http://unicode.org/faq/casemap_charprop.html):

  Q: What is the difference between case mapping and case folding?

  A: Case mapping or case conversion is a process whereby strings are
  converted to a particular formuppercase, lowercase, or titlecase
  possibly for display to the user. Case folding is mostly used for
  caseless comparison of text, such as identifiers in a computer program,
  rather than actual text transformation. Case folding in Unicode is
  primarily based on the lowercase mapping, but includes additional
  changes to the source text to help make it language-insensitive and
  consistent. As a result, case-folded text should be used solely for
  internal processing and generally should not be stored or displayed to
  the end user.

The purpose of the UsernameCaseMapped and Nickname PRECIS Profiles is
internal string comparison so I would expect Unicode case folding without
exception was intended. But it seems the text might be saying that we
should remove from UCD CaseFolding.txt those code points that do not have
either Uppercase_Letter or Titlecase_Letter.

Similar text in draft-ietf-precis-nickname-19 leads to similar ambiguity.

The nickname profile can be corrected or the algorithm clarified. I'm
not sure what to do with a Proposed Standard RFC. Errata? Can the case
mapping rule be changed in IANA?
https://www.iana.org/assignments/precis-parameters/profiles/UsernameCaseMap
ped.txt
e.g. to "Apply Unicode default case folding"

Tom


These are the lines in question from UCD CaseFolding.txt to which I
prefixed their General Category.

Ll; 00B5; C; 03BC; # MICRO SIGN
Ll; 017F; C; 0073; # LATIN SMALL LETTER LONG S
Mn; 0345; C; 03B9; # COMBINING GREEK YPOGEGRAMMENI
Ll; 03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA
Ll; 03D0; C; 03B2; # GREEK BETA SYMBOL
Ll; 03D1; C; 03B8; # GREEK THETA SYMBOL
Ll; 03D5; C; 03C6; # GREEK PHI SYMBOL
Ll; 03D6; C; 03C0; # GREEK PI SYMBOL
Ll; 03F0; C; 03BA; # GREEK KAPPA SYMBOL
Ll; 03F1; C; 03C1; # GREEK RHO SYMBOL
Ll; 03F5; C; 03B5; # GREEK LUNATE EPSILON SYMBOL
Ll; 1E9B; C; 1E61; # LATIN SMALL LETTER LONG S WITH DOT ABOVE
Ll; 1FBE; C; 03B9; # GREEK PROSGEGRAMMENI
Nl; 2160; C; 2170; # ROMAN NUMERAL ONE
Nl; 2161; C; 2171; # ROMAN NUMERAL TWO
Nl; 2162; C; 2172; # ROMAN NUMERAL THREE
Nl; 2163; C; 2173; # ROMAN NUMERAL FOUR
Nl; 2164; C; 2174; # ROMAN NUMERAL FIVE
Nl; 2165; C; 2175; # ROMAN NUMERAL SIX
Nl; 2166; C; 2176; # ROMAN NUMERAL SEVEN
Nl; 2167; C; 2177; # ROMAN NUMERAL EIGHT
Nl; 2168; C; 2178; # ROMAN NUMERAL NINE
Nl; 2169; C; 2179; # ROMAN NUMERAL TEN
Nl; 216A; C; 217A; # ROMAN NUMERAL ELEVEN
Nl; 216B; C; 217B; # ROMAN NUMERAL TWELVE
Nl; 216C; C; 217C; # ROMAN NUMERAL FIFTY
Nl; 216D; C; 217D; # ROMAN NUMERAL ONE HUNDRED
Nl; 216E; C; 217E; # ROMAN NUMERAL FIVE HUNDRED
Nl; 216F; C; 217F; # ROMAN NUMERAL ONE THOUSAND
So; 24B6; C; 24D0; # CIRCLED LATIN CAPITAL LETTER A
So; 24B7; C; 24D1; # CIRCLED LATIN CAPITAL LETTER B
So; 24B8; C; 24D2; # CIRCLED LATIN CAPITAL LETTER C
So; 24B9; C; 24D3; # CIRCLED LATIN CAPITAL LETTER D
So; 24BA; C; 24D4; # CIRCLED LATIN CAPITAL LETTER E
So; 24BB; C; 24D5; # CIRCLED LATIN CAPITAL LETTER F
So; 24BC; C; 24D6; # CIRCLED LATIN CAPITAL LETTER G
So; 24BD; C; 24D7; # CIRCLED LATIN CAPITAL LETTER H
So; 24BE; C; 24D8; # CIRCLED LATIN CAPITAL LETTER I
So; 24BF; C; 24D9; # CIRCLED LATIN CAPITAL LETTER J
So; 24C0; C; 24DA; # CIRCLED LATIN CAPITAL LETTER K
So; 24C1; C; 24DB; # CIRCLED LATIN CAPITAL LETTER L
So; 24C2; C; 24DC; # CIRCLED LATIN CAPITAL LETTER M
So; 24C3; C; 24DD; # CIRCLED LATIN CAPITAL LETTER N
So; 24C4; C; 24DE; # CIRCLED LATIN CAPITAL LETTER O
So; 24C5; C; 24DF; # CIRCLED LATIN CAPITAL LETTER P
So; 24C6; C; 24E0; # CIRCLED LATIN CAPITAL LETTER Q
So; 24C7; C; 24E1; # CIRCLED LATIN CAPITAL LETTER R
So; 24C8; C; 24E2; # CIRCLED LATIN CAPITAL LETTER S
So; 24C9; C; 24E3; # CIRCLED LATIN CAPITAL LETTER T
So; 24CA; C; 24E4; # CIRCLED LATIN CAPITAL LETTER U
So; 24CB; C; 24E5; # CIRCLED LATIN CAPITAL LETTER V
So; 24CC; C; 24E6; # CIRCLED LATIN CAPITAL LETTER W
So; 24CD; C; 24E7; # CIRCLED LATIN CAPITAL LETTER X
So; 24CE; C; 24E8; # CIRCLED LATIN CAPITAL LETTER Y
So; 24CF; C; 24E9; # CIRCLED LATIN CAPITAL LETTER Z

[precis] Ambiguity in specification of case mappi… Tom Worster
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre - &yet
Re: [precis] Ambiguity in specification of case m… John C Klensin
Re: [precis] Ambiguity in specification of case m… Tom Worster
Re: [precis] Ambiguity in specification of case m… John C Klensin
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre - &yet
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
Re: [precis] Ambiguity in specification of case m… John C Klensin
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
Re: [precis] Ambiguity in specification of case m… John C Klensin
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
Re: [precis] Ambiguity in specification of case m… Tom Worster
Re: [precis] Ambiguity in specification of case m… John C Klensin
Re: [precis] Ambiguity in specification of case m… Tom Worster
Re: [precis] Ambiguity in specification of case m… Peter Saint-Andre
Re: [precis] Ambiguity in specification of case m… Tom Worster
Re: [precis] Ambiguity in specification of case m… John C Klensin