Re: [precis] Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname

Peter Saint-Andre - &yet <peter@andyet.net> Wed, 30 September 2015 21:17 UTC

Return-Path: <peter@andyet.net>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3E5BA1A9096 for <precis@ietfa.amsl.com>; Wed, 30 Sep 2015 14:17:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.702
X-Spam-Level:
X-Spam-Status: No, score=-2.702 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EQ4O-8XmSu0G for <precis@ietfa.amsl.com>; Wed, 30 Sep 2015 14:17:05 -0700 (PDT)
Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8BDCE1A9094 for <precis@ietf.org>; Wed, 30 Sep 2015 14:17:01 -0700 (PDT)
Received: by iow1 with SMTP id 1so25959787iow.1 for <precis@ietf.org>; Wed, 30 Sep 2015 14:17:00 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=kirWqMlp83m5zqfJeYqGqFaystaxGXMVZxFlYTfz5y8=; b=a6RCARqa2qzI6TD2si9TC0pA1SBIJ52tANyVXws6qnC31qD2FJ/mAq1+nF2HuU9+Oh akakz4eDfW+txF/rlHyNZ2rvhkP15XwaHb6WVdVTFfX6JdHN0vus1NfB+dp0/SAj1LiV thh/0aSk6fWdWB64dbpj168t7cm/4xanat4U0qBHPTNI+Djcp9aSPKCEifgcmBd2CIs3 9EbsThfljZ4byV+uA5HM99y5n+4i6iDJYNXac62lvoY1hD3rug+WgqLoZq5Y5y3SSMgY MWCczo1AUtYAPWIU7KuZbFPdT4bq5jBYohOpf9vEQyqMpkgD2fTFS7ipPPyVk2Y6Einu KPNw==
X-Gm-Message-State: ALoCoQkv5d2bpnQtZChhkd6XYTkY/aq8LIFDPcnYBuvVyh6sF+SHstv9WjzO/klkpz1yRU5xY0m6
X-Received: by 10.107.132.21 with SMTP id g21mr8518207iod.175.1443647820595; Wed, 30 Sep 2015 14:17:00 -0700 (PDT)
Received: from aither.local ([2601:282:4201:ef5b:60ed:f1b7:fac4:1f7e]) by smtp.googlemail.com with ESMTPSA id y6sm1138795igl.17.2015.09.30.14.16.58 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Sep 2015 14:16:59 -0700 (PDT)
To: Tom Worster <fsb@thefsb.org>, Alexey Melnikov <Alexey.Melnikov@isode.com>
References: <D230767C.6587A%fsb@thefsb.org>
From: Peter Saint-Andre - &yet <peter@andyet.net>
Message-ID: <560C5149.5090607@andyet.net>
Date: Wed, 30 Sep 2015 15:16:57 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <D230767C.6587A%fsb@thefsb.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/precis/k4Nep28ztBMcfgeXHsghGfopwpQ>
Cc: precis@ietf.org
Subject: Re: [precis] Ambiguity in specification of case mapping in RFC 7613 and draft-ietf-precis-nickname
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Sep 2015 21:17:07 -0000

Hi Tom, thanks for the note.

My feeling is that we phrased things in a slightly wrong way, because we 
assumed that case-mapping applies primarily or only to uppercase and 
titlecase characters. I think this was more a matter of communication 
(because people think of case mapping as something needed only with 
respect to uppercase characters), whereas it obviously applies more 
generally (i.e., applying Unicode Default Case Folding will result in 
mapping of the code points you mention here).

We could do something like this in the nickname spec...

OLD

    3.  Case Mapping Rule: Uppercase and titlecase characters MUST be
        mapped to their lowercase equivalents using Unicode Default Case
        Folding as defined in the Unicode Standard [Unicode] (at the time
        of this writing, the algorithm is specified in Chapter 3 of
        [Unicode7.0]).  In applications that prohibit conflicting
        nicknames, this rule helps to reduce the possibility of confusion
        by ensuring that nicknames differing only by case (e.g.,
        "stpeter" vs. "StPeter") would not be presented to a human user
        at the same time.

NEW

    3.  Case Mapping Rule: Unicode Default Case Folding MUST be applied,
        as defined in the Unicode Standard [Unicode] (at the time
        of this writing, the algorithm is specified in Chapter 3 of
        [Unicode7.0]).  The primary result of doing so is that uppercase
        characters are mapped to lowercase characters. In applications
        that prohibit conflicting nicknames, this rule helps to reduce
        the possibility of confusion by ensuring that nicknames
        differing only by case (e.g., "stpeter" vs. "StPeter") would not
        be presented to a human user at the same time.

Thanks for raising this issue.

Peter

On 9/29/15 3:28 PM, Tom Worster wrote:
> Peter, Alexey,
>
> I think there is an ambiguity in the specification of case mapping in RFC
> 7613 and draft-ietf-precis-nickname-19.
>
> RFC 7613 section 3.2.2 says
>
>     3.  Case-Mapping Rule: Uppercase and titlecase characters MUST be
>         mapped to their lowercase equivalents, preferably using Unicode
>         Default Case Folding as defined in the Unicode Standard [Unicode]
>         (at the time of this writing, the algorithm is specified in
>         Chapter 3 of [Unicode7.0], but the chapter number might change in
>         a future version of the Unicode Standard); see further discussion
>         in Section 3.4.
>
> But there are 55 code points in Unicode 7.0.0 that change under default
> case folding that are neither uppercase nor titlecase characters, 12 of
> which are Lowercase_Letter. I suspect this stems from a confusion between
> Unicode case mapping and case folding. From the Unicode Case Mapping FAQ
> (http://unicode.org/faq/casemap_charprop.html):
>
>    Q: What is the difference between case mapping and case folding?
>
>    A: Case mapping or case conversion is a process whereby strings are
>    converted to a particular form‹uppercase, lowercase, or titlecase‹
>    possibly for display to the user. Case folding is mostly used for
>    caseless comparison of text, such as identifiers in a computer program,
>    rather than actual text transformation. Case folding in Unicode is
>    primarily based on the lowercase mapping, but includes additional
>    changes to the source text to help make it language-insensitive and
>    consistent. As a result, case-folded text should be used solely for
>    internal processing and generally should not be stored or displayed to
>    the end user.
>
> The purpose of the UsernameCaseMapped and Nickname PRECIS Profiles is
> internal string comparison so I would expect Unicode case folding without
> exception was intended. But it seems the text might be saying that we
> should remove from UCD CaseFolding.txt those code points that do not have
> either Uppercase_Letter or Titlecase_Letter.
>
> Similar text in draft-ietf-precis-nickname-19 leads to similar ambiguity.
>
> The nickname profile can be corrected or the algorithm clarified. I'm
> not sure what to do with a Proposed Standard RFC. Errata? Can the case
> mapping rule be changed in IANA?
> https://www.iana.org/assignments/precis-parameters/profiles/UsernameCaseMap
> ped.txt
> e.g. to "Apply Unicode default case folding"
>
> Tom
>
>
> These are the lines in question from UCD CaseFolding.txt to which I
> prefixed their General Category.
>
> Ll; 00B5; C; 03BC; # MICRO SIGN
> Ll; 017F; C; 0073; # LATIN SMALL LETTER LONG S
> Mn; 0345; C; 03B9; # COMBINING GREEK YPOGEGRAMMENI
> Ll; 03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA
> Ll; 03D0; C; 03B2; # GREEK BETA SYMBOL
> Ll; 03D1; C; 03B8; # GREEK THETA SYMBOL
> Ll; 03D5; C; 03C6; # GREEK PHI SYMBOL
> Ll; 03D6; C; 03C0; # GREEK PI SYMBOL
> Ll; 03F0; C; 03BA; # GREEK KAPPA SYMBOL
> Ll; 03F1; C; 03C1; # GREEK RHO SYMBOL
> Ll; 03F5; C; 03B5; # GREEK LUNATE EPSILON SYMBOL
> Ll; 1E9B; C; 1E61; # LATIN SMALL LETTER LONG S WITH DOT ABOVE
> Ll; 1FBE; C; 03B9; # GREEK PROSGEGRAMMENI
> Nl; 2160; C; 2170; # ROMAN NUMERAL ONE
> Nl; 2161; C; 2171; # ROMAN NUMERAL TWO
> Nl; 2162; C; 2172; # ROMAN NUMERAL THREE
> Nl; 2163; C; 2173; # ROMAN NUMERAL FOUR
> Nl; 2164; C; 2174; # ROMAN NUMERAL FIVE
> Nl; 2165; C; 2175; # ROMAN NUMERAL SIX
> Nl; 2166; C; 2176; # ROMAN NUMERAL SEVEN
> Nl; 2167; C; 2177; # ROMAN NUMERAL EIGHT
> Nl; 2168; C; 2178; # ROMAN NUMERAL NINE
> Nl; 2169; C; 2179; # ROMAN NUMERAL TEN
> Nl; 216A; C; 217A; # ROMAN NUMERAL ELEVEN
> Nl; 216B; C; 217B; # ROMAN NUMERAL TWELVE
> Nl; 216C; C; 217C; # ROMAN NUMERAL FIFTY
> Nl; 216D; C; 217D; # ROMAN NUMERAL ONE HUNDRED
> Nl; 216E; C; 217E; # ROMAN NUMERAL FIVE HUNDRED
> Nl; 216F; C; 217F; # ROMAN NUMERAL ONE THOUSAND
> So; 24B6; C; 24D0; # CIRCLED LATIN CAPITAL LETTER A
> So; 24B7; C; 24D1; # CIRCLED LATIN CAPITAL LETTER B
> So; 24B8; C; 24D2; # CIRCLED LATIN CAPITAL LETTER C
> So; 24B9; C; 24D3; # CIRCLED LATIN CAPITAL LETTER D
> So; 24BA; C; 24D4; # CIRCLED LATIN CAPITAL LETTER E
> So; 24BB; C; 24D5; # CIRCLED LATIN CAPITAL LETTER F
> So; 24BC; C; 24D6; # CIRCLED LATIN CAPITAL LETTER G
> So; 24BD; C; 24D7; # CIRCLED LATIN CAPITAL LETTER H
> So; 24BE; C; 24D8; # CIRCLED LATIN CAPITAL LETTER I
> So; 24BF; C; 24D9; # CIRCLED LATIN CAPITAL LETTER J
> So; 24C0; C; 24DA; # CIRCLED LATIN CAPITAL LETTER K
> So; 24C1; C; 24DB; # CIRCLED LATIN CAPITAL LETTER L
> So; 24C2; C; 24DC; # CIRCLED LATIN CAPITAL LETTER M
> So; 24C3; C; 24DD; # CIRCLED LATIN CAPITAL LETTER N
> So; 24C4; C; 24DE; # CIRCLED LATIN CAPITAL LETTER O
> So; 24C5; C; 24DF; # CIRCLED LATIN CAPITAL LETTER P
> So; 24C6; C; 24E0; # CIRCLED LATIN CAPITAL LETTER Q
> So; 24C7; C; 24E1; # CIRCLED LATIN CAPITAL LETTER R
> So; 24C8; C; 24E2; # CIRCLED LATIN CAPITAL LETTER S
> So; 24C9; C; 24E3; # CIRCLED LATIN CAPITAL LETTER T
> So; 24CA; C; 24E4; # CIRCLED LATIN CAPITAL LETTER U
> So; 24CB; C; 24E5; # CIRCLED LATIN CAPITAL LETTER V
> So; 24CC; C; 24E6; # CIRCLED LATIN CAPITAL LETTER W
> So; 24CD; C; 24E7; # CIRCLED LATIN CAPITAL LETTER X
> So; 24CE; C; 24E8; # CIRCLED LATIN CAPITAL LETTER Y
> So; 24CF; C; 24E9; # CIRCLED LATIN CAPITAL LETTER Z
>
>