Re: [precis] 2 questions from an app developer

John C Klensin <> Tue, 03 November 2015 17:38 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 660511A1A7E for <>; Tue, 3 Nov 2015 09:38:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.61
X-Spam-Status: No, score=-4.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id JtVUC1XpYQcR for <>; Tue, 3 Nov 2015 09:38:22 -0800 (PST)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 577431A1A92 for <>; Tue, 3 Nov 2015 09:38:22 -0800 (PST)
Received: from [] ( by with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <>) id 1ZtfXF-000Hyj-IV; Tue, 03 Nov 2015 12:38:17 -0500
Date: Tue, 03 Nov 2015 12:38:12 -0500
From: John C Klensin <>
To: Tom Worster <>,
Message-ID: <>
In-Reply-To: <>
References: <> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Archived-At: <>
Subject: Re: [precis] 2 questions from an app developer
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 03 Nov 2015 17:38:25 -0000

--On Tuesday, November 03, 2015 09:52 -0500 Tom Worster
<> wrote:

> Now there's some traffic in the PRECIS list, I'd like to ask
> this question again, phrased differently.
> Afaict, the etiology of this implementer's non-minimal
> astonishment is:
> o Password is based on OpaqueString Profile
> o OpaqueString Profile is based on FreeformClass
> o FreeformClass uses Exceptions (F) from RFC 5892 sec. 2.6
> o Exceptions (F) disallows MIDDLE DOT except under CONTEXTO
> o CONTEXTO rule MIDDLE DOT in RFC 5892 A.3 says "Between 'l'
> (U+006C) characters only, used to permit the Catalan character
> ela geminada to be expressed."
> o Therefore, for example
>     ihαtePa§sωrdrul·lz    is valid
>     ihαtePa§sωrdrul·ze    is invalid
> Authoring a validation error message that helps the user
> understand and fix it was a challenge. "Password may not
> contain the · character except as part of a Catalan character
> ela geminada," is a cute easter egg[1] but not much use.

Of course, the nice, simple, message is "prohibited everywhere".
See below.

> I imagine IDNA would not want MIDDLE DOTs in domain names and
> some identifiers because of spoofing

Not just spoofing.  The rule that you cite serves to allow
MIDDLE DOT (U+00B7), which would normally be DISALLOWED as
Punctuation, to be used in that one case.  It is not, as you
seem to have inferred, a rule that restricts the use of a
character that would otherwise be allowed (PVALID).

> but that concern is
> specific to that domain and surely not to passwords. 

Well, it depends.  Punctuation characters were DISALLOWED for
multiple reasons, including the observation that many parsers
consider many (but certainly not all) of them to be string
terminators or introducers of one flavor or another and at least
some rendering arrangements, including those who use substitute
graphics when a preferred glyph for a code point is not
available, are fairly liberal about swapping them around.
Perhaps because it is so easily written, there are also a lot of
dot and dot-like characters and even a lot of middle-type ones,
some of which are punctuation or other symbols and some of which
are letters or numbers. I suppose that translates to "spoofing
problem", but spoofing was not the main IDNA motivation.

Whether, in this day and age, the advantages of a password that
can't be typed without special knowledge outweigh the
disadvantages of a password that may be hard to use consistently
in contexts that require parsing or rendering, is a judgment
call.  Given that, maybe "not to passwords" is true, but
"surely" is not so clear.

> I don't
> know Catalan but I use MIDDLE DOTs for a variety of purposes,
> not quite daily but often enough to know it's been OPT-SHIFT-9
> since very early MacOS. It's a useful character so I suspect
> people will encounter this rule.

What I'm about to say is based on trying to adopt a very broad
i18n perspective.   I'm not questioning or trying to denounce
your choices, which I assume are reasonable.

Almost everyone has a candidate or two for "favorite character
outside the Basic Latin letter range" or, more generally,
"favorite character outside the normal letter range of my most
common use of my favorite script".  Such characters are useful
in their own right, for special emphasis, to add security by
obscurity in various places where can can be reasonably assured
that they won't be prominent in, e.g., password guessing files,
and so on.  There was a mini-stink when we started transitioning
to IDNA2008 about assorted currency symbols which people wanted
to use, not just in currency contexts.  

If the goal is global interoperability, then using them (or
counting on having them available) is a bad idea.  That applies
most strongly to identifier-type things, but even applies to
things like passwords: if you ever find yourself traveling to a
place where you need to use a keyboard or IME that is optimized
for a completely different script of set of assumptions, you
could find that you have "secured" yourself out of all ability
to access the relevant information.  That might have advantages
too, but you --and any users you are advising through error
messages or the like-- should probably understand the tradeoffs.

> RFCs 7564 and 7613 are done and dusted so my question is: did
> I decode the specs correctly?

I'll let someone else comment on that.   Based on comments at
several IETF meetings (more clear there than on-list), several
people and organizations had the main (or only) goal / success
criterion for PRECIS that they would end up having something
that could be applied by both implementers and users, and used
globally, without any deep understanding, evaluation of
circumstances, or specific localizations... ideally with
decisions compatible with what they have already been doing.
To the extent to which those are the criteria, every case you
and a few others have come up with -- sensitivity to the
difference between toLowerCase and toCaseFold, individual
characters that behave or are defined in a way that seems
astonishing, the nasty problems with Turkic languages written in
Latin script, asserted ambiguities in the specs that really
should be interpreted based on local context-- demonstrate that
this is harder and requires far more understanding than people
would like, i.e., that, by those criteria, PRECIS has simply
failed and we need to get used to, and document and explain, a
more complex world.

Some of the problem is a disconnect between IETF priorities and
Unicode ones, with the latter emphasizing large blocks of
running text in known languages rather than strings that may not
be words or language-associated at all (most of our identifiers
and most good passwords fall into that category).  The rest is
more fundamental: several people have observed that the creation
stories of a number of religions explain the diversity and
incompatibilities among human languages (and, by extension,
writing systems) in terms of deliberate, divine, intervention to
make things difficult.  Whether one accepts those explanations
or not, it is clear that many of these problems did not
originate with the Internet, Unicode, or PRECIS. :-(

We do what we can and I have no delusion about achieving
perfection.  I just wish we could abandon the goal of
simplifying it to the point of application that is globally
predictable by people who are not willing to understand the
issues and, instead, allow for localization where needed and do
a better job of explaining the issues and tradeoffs.