Re: [precis] Applying the rules three times to get a stable output string?

William Fisher <william.w.fisher@gmail.com> Sat, 09 December 2017 22:09 UTC

Return-Path: <william.w.fisher@gmail.com>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C580E120454 for <precis@ietfa.amsl.com>; Sat, 9 Dec 2017 14:09:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ET8-FR8-qpdg for <precis@ietfa.amsl.com>; Sat, 9 Dec 2017 14:09:22 -0800 (PST)
Received: from mail-lf0-x22e.google.com (mail-lf0-x22e.google.com [IPv6:2a00:1450:4010:c07::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 98979120227 for <precis@ietf.org>; Sat, 9 Dec 2017 14:09:22 -0800 (PST)
Received: by mail-lf0-x22e.google.com with SMTP id 94so15316044lfy.10 for <precis@ietf.org>; Sat, 09 Dec 2017 14:09:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=aSjHrEm2I7GDFrc/77BqPwNcqjKHi4nGELrNu2DTCXw=; b=Tu5aiqg/39mNJdp5raV0bq2gJ8JvOTLm456Mh5QrCx816/jc0+4qItpGAQ1kQCwg7U eL0wCdmOmhOxw6oSy9NSyHmys44js2FDxjObk8fWAH5UxilAXO99ceaERuVJpEW9UIhu SnUk0kOyeiJuFzBwn1RUEAkzrHTuZmGwHeP1HMCCtE9RvY0ckrxP1DM9MDHO+IfYIOGT HNtyOZfyS9TLNa2nEXnxga0TaQmHyw/1qq/BXXomURypoQtZhEaoIOYj1LcSOMgj7bs8 ZKSYgHcwtXXcvUnNlnJPIVxqKw5/YctZZgiuspF/Y8hokihid1r+4ryYpqCR0dJ5rB6+ bHKA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=aSjHrEm2I7GDFrc/77BqPwNcqjKHi4nGELrNu2DTCXw=; b=oKq+hQEIkmsNl4Z1dHNYeLzLVauUZxSGs9q3jbT2Tp3c6MoB2LKvbi7UiyXwT7soq0 y0jn2v6gEW80HeLLFSUAWqvQ9xJqRDtXmMkkOOo1sXStgZ7FPGw5xQQh/KPDUvcWfZxM R9CGwUdA76cWh91dqu3+nlGeN87C4tZsZR0YAWlxScu9EfkPlx8JIB+fbjNOwW4Nrqt7 edm/yx7rpIFnnk9meyFbYg2T/Ou7MP1pq81Dvkz0dzeqCtnuA9X+dE1FwshRgSf2YU3U Za+nThaLfytwoRCzUDffWgeKvKVqDglZI0+gaH1VCMjnaJTR1KpPpcxYno/flwKPbIjm by7w==
X-Gm-Message-State: AJaThX5fVOK9TZ0kTZaVZizUWpx33JMChTplWb3SA6GiDcBlt3XaizfF VD5DB/Ke8npmiYRGabKOPFDkJ2QfiXIU53QV35c=
X-Google-Smtp-Source: AGs4zMb3wsguIoGA731azEFvgReGcmb6um3EGYX4VZE0P4940WOcHs/XstQniHm+4YlCcclkM8ryNqIiAAErOOOkN1w=
X-Received: by 10.46.20.5 with SMTP id u5mr19384781ljd.9.1512857360629; Sat, 09 Dec 2017 14:09:20 -0800 (PST)
MIME-Version: 1.0
Received: by 10.179.26.33 with HTTP; Sat, 9 Dec 2017 14:09:20 -0800 (PST)
In-Reply-To: <C31DFCC1-31BB-49E4-A9BD-071BF5AC6C02@gmx.de>
References: <C64B78C6-8109-4F36-BB76-EA8AB229FCE2@gmx.de> <CAHVjMKGmZK1DQJmbM-4Gb6W8NUbzG-qQXnXBScr6Yh+o==wxuw@mail.gmail.com> <C31DFCC1-31BB-49E4-A9BD-071BF5AC6C02@gmx.de>
From: William Fisher <william.w.fisher@gmail.com>
Date: Sat, 9 Dec 2017 15:09:20 -0700
Message-ID: <CAHVjMKEEndoJhMvMEQPPvvCS+t_4vkpp61iFoKrXNksrCB6ohA@mail.gmail.com>
To: Christian Schudt <christian.schudt@gmx.de>
Cc: precis@ietf.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/hVKk3369xkUd7mJBDI5Z9t5O6Tg>
Subject: Re: [precis] Applying the rules three times to get a stable output string?
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Dec 2017 22:09:25 -0000

I did not come across any code points where IdentifierClass/Usernames
required multiple passes to make the result idempotent. Only the
Nickname profile is affected, due to the interaction between NFKC and
the case/space rules.

My implementation applies an extra iteration for the Nickname profile.
The other profiles verify that the result is idempotent and raise a
DISALLOWED/not_idempotent error if this is violated. I do not believe
there are legal inputs for Usernames which violate the idempotency
requirement, so this is purely defensive.


On Sat, Dec 9, 2017 at 2:27 PM, Christian Schudt
<christian.schudt@gmx.de> wrote:
> Great, thanks! These code points revealed some bugs :-). They should have been included in the Examples.
>
> Are there any known code points for the IdentifierClass / Usernames as well?
> Seems like all these code points are disallowed anyway.
>
> If not, implementations could save 1-2 iterations and only apply the „3-times“-rule for FreeformClass.
>
>
>
>> Am 09.12.2017 um 20:34 schrieb William Fisher <william.w.fisher@gmail.com>om>:
>>
>> Where it makes a difference for NicknameCaseMapped:
>>
>> "\u210c"
>> "\u20a8"
>>
>> Where it makes a difference for Nickname due to spaces:
>>
>> "\u00a8"
>> "\u02dc"
>>
>>
>> On Sat, Dec 9, 2017 at 8:37 AM, Christian Schudt
>> <christian.schudt@gmx.de> wrote:
>>> Hi,
>>>
>>> RFC 8264 introduced these new sentences:
>>>
>>>   under certain circumstances, such as when Unicode
>>>   Normalization Form KC is used, performing Unicode normalization after
>>>   case mapping can still yield uppercase characters for certain code
>>>   points
>>>
>>>   Therefore, an implementation SHOULD apply the rules
>>>   repeatedly until the output string is stable
>>>
>>>
>>> I could imagine these sentences refer to code points of the „Unstable“ category, but this category is unused.
>>>
>>> Are there any concrete code points or input strings which show this unstable behaviour?
>>> I am asking for some test vectors, i.e. an input string, which doesn’t have the expected output string after the first rule application, but after the second one.
>>>
>>> Thanks,
>>> — Christian
>>> _______________________________________________
>>> precis mailing list
>>> precis@ietf.org
>>> https://www.ietf.org/mailman/listinfo/precis
>