Re: [precis] PRECIS bidi rule for usernames

Peter Saint-Andre <stpeter@stpeter.im> Mon, 10 October 2016 03:12 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 699BA1295CB for <precis@ietfa.amsl.com>; Sun, 9 Oct 2016 20:12:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.898
X-Spam-Level:
X-Spam-Status: No, score=-4.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-2.996, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yVuvuaPi-BfF for <precis@ietfa.amsl.com>; Sun, 9 Oct 2016 20:12:54 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id C0F14129468 for <precis@ietf.org>; Sun, 9 Oct 2016 20:12:54 -0700 (PDT)
Received: from aither.local (unknown [76.25.4.24]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 19E4F40325; Sun, 9 Oct 2016 21:15:29 -0600 (MDT)
To: Sam Whited <sam@samwhited.com>, William Fisher <william.w.fisher@gmail.com>
References: <CAHVjMKH6BGhhOoCC=1yv1y5njoq0jobbBPS6eWpB2vwtibASFA@mail.gmail.com> <CAHbk4RK4hVJzVWft9Vzn4Xfi+J_troCMDQi6eCHgFzCSaoDWiw@mail.gmail.com>
From: Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <c071125b-91d0-377b-3232-e803edd06c05@stpeter.im>
Date: Sun, 09 Oct 2016 21:12:52 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <CAHbk4RK4hVJzVWft9Vzn4Xfi+J_troCMDQi6eCHgFzCSaoDWiw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/ZNAkDtcy8oczKEWAJpOr6HDP59I>
Cc: precis@ietf.org
Subject: Re: [precis] PRECIS bidi rule for usernames
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Oct 2016 03:12:56 -0000

On 10/7/16 1:28 PM, Sam Whited wrote:
> On Fri, Oct 7, 2016 at 2:00 PM, William Fisher
> <william.w.fisher@gmail.com> wrote:
>> I've been working on a python implementation of the PRECIS specification
>> (https:github.com/byllyfish/precis-i18n).
>
> Great timing; I was just this moment looking at your library and
> considering including it in some interoperability tests for a set of
> test vectors I'm working on.
>
>
>> Can someone please clarify whether:
>>
>> A. The "Bidi rule" is ONLY applied to strings that contain right-to-left
>> characters.
>> B. The "Bidi rule" is applied to ALL strings.
>
> The recent draft-ietf-precis-7613bis-03 clarifies this:
>
>        Directionality Rule: Apply the "Bidi Rule" defined in [RFC5893]
>        to strings that contain right-to-left characters (i.e., each of
>        the six conditions of the Bidi Rule must be satisfied); for
>        strings that do not contain right-to-left characters, there is no
>        special processing for directionality.
>
> I was apparently confused about this before too, I'm applying Bidi all
> the time in the Go implementation. I'll be sure to add a test for this
> in the test vectors when I publish them.
>
> I'm actually not convinced this is the correct behavior though; it
> seems confusing to me that usernames with RTL characters couldn't end
> with punctuation, but strings with them could.

There are plenty of RTL punctuation characters (e.g., U+05BE), and those 
are allowed in RTL strings (even as the last character). RFC 5893 says 
that an RTL string must not have an LTR character at the end, with the 
result that an RTL string cannot end in "."; this helps to prevent 
confusion in the typical presentation of domain names (which is the 
target use case for RFC 5893).

> This violates the
> principal of least suprise.

It's perhaps not advisable for you and I to speculate about what might 
surprise a user whose native language is represented in a right-to-left 
script.

Peter