Re: [precis] rationale of rfc7613 decisions

Peter Saint-Andre <stpeter@stpeter.im> Tue, 02 May 2017 01:27 UTC

To: Nikos Mavrogiannopoulos <nmav@redhat.com>, precis@ietf.org
References: <1490885635.10364.10.camel@redhat.com> <5d02a0bc-5f53-a9fe-33fe-be0c66de24ee@stpeter.im> <1490948974.24162.5.camel@redhat.com>
From: Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <44146cf5-a5af-a0f2-5bb8-af06924f2c6d@stpeter.im>
Date: Mon, 01 May 2017 19:24:44 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <1490948974.24162.5.camel@redhat.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/7hUID3ANSWxgocrWygb_TnxHlBw>
Subject: Re: [precis] rationale of rfc7613 decisions
Precedence: list

Hi Nikos,

I will add some explanatory text to the relevant specs (IMHO it doesn't
all belong in 7613bis - some of it is more appropriate for 7564bis).
Details below.

On 3/31/17 2:29 AM, Nikos Mavrogiannopoulos wrote:
> On Thu, 2017-03-30 at 19:45 -0600, Peter Saint-Andre wrote:
> 
>>> I'm checking both rfc7564 and rfc7613, and I cannot find the
>>> rationale
>>> of the restrictions being done. In particular:
>>>  1. why rfc7613 restricts all spaces for passwords to U+0020?
>>
>>
> [...]
>>>  2. what is the purpose of "Contextual Rule Required" in section
>>> 4.3.2
>>> of rfc7564?
>>
>> It's complicated, but in essence PRECIS is consistent with IDNA2008
>> here
>> (see RFC 5891, RFC 5892, and RFC 5894). In particular, the code
>> points
>> ZERO WIDTH JOINER (U+200D) and ZERO WIDTH NON-JOINER (U+200C) are
>> necessary to produce certain combinatiosn of characters in certain
>> scripts (e.g., Arabic, Persian, and Indic scripts) but if used in
>> other
>> contexts can have consequences that violate the principle of least
>> user astonishment.
> 
> I think that such issues should warrant extensive discussions in an RFC
> like 7613. It is not apparent for me for example why that principle
> should apply for passwords (which are not visible). 

It's not just about (visible or auditory) presentation. A user can be
astonished if, for example, they can use a character in a userid but not
in a password. We try to keep these differences to a minimum, while at
the same time meeting the needs of the underlying constructs (e.g.,
allowing more entropy in passwords than in userids).

For passwords, another aspect of the thinking behind RFC 4013 and RFC
7613 is that we want people to be able to reproduce a password on
multiple different systems (e.g., a mobile device and a desktop
machine). Introducing too many different characters (and RFC 7613 is
more liberal here than RFC 4013 was) will make that more difficult.

> I guess there are
> arguments for that, but should be presented in order to understand and
> be able to convince people that RFC7613 is the way to go.

As I pointed out, in the IETF those who believe they want something
other than RFC 7613 really need to be the ones making a strong argument.
Rolling your own can be just as hazardous in internationalization as it
is in cryptography.

>>  3. why freeform class doesn't allow "Old Hangul Jamo characters"?
>> As explained in §2.9 of RFC 5892:
>>
>>    Elimination of conjoining Hangul Jamo from the set of PVALID
>>    characters results in restricting the set of Korean PVALID
>> characters
>>    just to preformed, modern Hangul syllable characters.
>> Here again PRECIS is consistent with IDNA2008.
> 
> As I am mostly restricted in the context of passwords, my question is
> mostly on why is this done for the passwords. E.g., Is it because the
> Hangual Jamo set is a deprecated set which may not be in use years from
> now or another reason?

Those are archaic characters absent from modern Hangul. It's not as if
they may not be in use years from now - they haven't been used in quite
some time.

>>>  4. why freeform class doesn't allow ignorable charaters?
>>
>> These are things like soft hyphen, certain joiners, specialized code
>> points for use within Unicode itself (e.g., language tags and
>> variation
>> selectors), and so on. They were disallowed in RFC 4013 and are
>> disallowed in IDNA2008, too.
>>
>> By saying "PRECIS is consistent with IDNA2008" I'm not appealing to
>> authority or saying that a consistency is necessarily a good thing.
>> Instead, defining as few string handling methods as possible helps
>> users
>> because strings aren't handled differently in different protocols and
>> contexts (see §5.1 of RFC 7564). This has security implications, too,
>> because the more such methods exist the easier it will be for
>> attackers to trick users.
> 
> In the context of 'passwords', I see very little applicability of such
> attacks, though I may be wrong. The main concern I see for passwords
> used for storage is compatibility, e.g., even with legacy software
> which did not follow these rules, and simplicity, so that software can
> follow the rules under reasonable for the task effort (I find the
> effort RFC7613 requires for processing UTF-8 passwords unproportionaly
> complex to the effort needed for US-ASCII passwords).

You might not be the target audience for internationalized strings
(whether passwords or usernames or anything else).

>>> The context of that, is that I am trying to understand what would
>>> be
>>> the drawbacks from recommending a fixed normalization form (e.g.,
>>> NFC),
>>> for passwords, in contrast to recommending rfc7613.
>>
>> Nikos, instead of asking us why the foregoing restrictions were made,
>> ask yourself why you would want to ignore them and whether you
>> understand internationalization well enough to independently craft
>> appropriate rules and guidelines for the RFC you're updating. Because
>> you actively work on security technologies, think of it this way:
>> would
>> you want someone who doesn't understand all the issues to "just use
>> TLS"
>> without specifying appropriate cipher suites (ignoring RFC 7525) or
>> certificate checking procedures (ignoring RFC 5280 and RFC 6125)? The
>> issues involved with internationalization are just as complex (albeit
>> in different ways) and the whole reason we developed IDNA2008 and
>> PRECIS is so that well-meaning folks like you don't shoot yourselves
>> in the foot.
> 
> I cannot disagree with that, however, providing rationale for the
> decisions is important, especially in documents which are developed in
> disconnect with many existing protocols/practices. The current state in
> PKCS#12, PKCS#8 encrypted files, is pass there whatever you have as
> long as it is UTF-8. Convincing developers to deploy thousands lines of
> code for pre-processing such passwords, would require to underline the
> problems of the previous practice.

Implement or deploy? There are, of course, libraries for such things.

> RFC7613 unfortunately ignores that
> part completely, and I have no arguments when trying to convince people
> that this should be preferred.

I tend to agree with you that internationalization is not always
necessary. Protocol designers need to weigh the tradeoffs. The PRECIS
specifications don't tell people that they have to support
internationalized strings - instead, they give people a tool they can
use to support internationalized strings in the smartest, safest way
possible. If the community of PKCS#12 / PKCS#8 developers and users
don't see a compelling need for internationalized passwords, then
there's no strong reason for them to add this support to their
specifications and software. Stick with ASCII if that's the best course!

>> I strongly encourage you to use the PRECIS profile for passwords in
>> RFC7613, and we'd be happy to help you do so in the safest ways
>> possible.
> 
> I'm trying to make a list of items which make apparent why RFC7613 is
> needed. What I have now is:
> 
> "UTF-8 however, does not imply that strings conforming to it, are
> unambiguously unique, since there are can be various forms of the same
> string which may look identical to an observer, although being
> represented by a different byte string. 

For sure - and there are plenty of attacks here.

> Some issues are the following."
> 
> [The NFC argument is the easier to explain]
>  * Various normalization forms, which result to different data for the
> same input.
> 
> [why spaces need to be merged to 0x20 is harder to sell]
>  * The unicode standard includes a number of space characters which
> cannot be distinguished from each other, or have no width resulting to
> different results when switching to a different input method
> 
> [Hangual Jamo even harder]
>  * There are deprecated alphabet sets, which are no longer in use(?)
> and may not be available as input methods in the future.
> 
> [contextual rule]
>  * Certain combinations of code points between certain scripts produce
> unexpected visible results. (the question here is why would one care
> for visible results on passwords which are not printed)

Those aren't bad summaries, but as mentioned I'll add some informational
text to the PRECIS-bis documents so that the explanations are clearer.

Peter

Re: [precis] rationale of rfc7613 decisions Peter Saint-Andre
[precis] rationale of rfc7613 decisions Nikos Mavrogiannopoulos
Re: [precis] rationale of rfc7613 decisions Peter Saint-Andre
Re: [precis] rationale of rfc7613 decisions Nikos Mavrogiannopoulos
Re: [precis] rationale of rfc7613 decisions Peter Saint-Andre