Re: [precis] rationale of rfc7613 decisions

Nikos Mavrogiannopoulos <nmav@redhat.com> Fri, 31 March 2017 08:29 UTC

DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8319464062
Message-ID: <1490948974.24162.5.camel@redhat.com>
From: Nikos Mavrogiannopoulos <nmav@redhat.com>
To: Peter Saint-Andre <stpeter@stpeter.im>, precis@ietf.org
Date: Fri, 31 Mar 2017 10:29:34 +0200
In-Reply-To: <5d02a0bc-5f53-a9fe-33fe-be0c66de24ee@stpeter.im>
References: <1490885635.10364.10.camel@redhat.com> <5d02a0bc-5f53-a9fe-33fe-be0c66de24ee@stpeter.im>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/CbnmXB_3UDPTrESi1ZFhT_yt36s>
Subject: Re: [precis] rationale of rfc7613 decisions
Precedence: list

On Thu, 2017-03-30 at 19:45 -0600, Peter Saint-Andre wrote:

> > I'm checking both rfc7564 and rfc7613, and I cannot find the
> > rationale
> > of the restrictions being done. In particular:
> >  1. why rfc7613 restricts all spaces for passwords to U+0020?
> 
> 
[...]
> >  2. what is the purpose of "Contextual Rule Required" in section
> > 4.3.2
> > of rfc7564?
> 
> It's complicated, but in essence PRECIS is consistent with IDNA2008
> here
> (see RFC 5891, RFC 5892, and RFC 5894). In particular, the code
> points
> ZERO WIDTH JOINER (U+200D) and ZERO WIDTH NON-JOINER (U+200C) are
> necessary to produce certain combinatiosn of characters in certain
> scripts (e.g., Arabic, Persian, and Indic scripts) but if used in
> other
> contexts can have consequences that violate the principle of least
> user astonishment.

I think that such issues should warrant extensive discussions in an RFC
like 7613. It is not apparent for me for example why that principle
should apply for passwords (which are not visible). I guess there are
arguments for that, but should be presented in order to understand and
be able to convince people that RFC7613 is the way to go.

>  3. why freeform class doesn't allow "Old Hangul Jamo characters"?
> As explained in §2.9 of RFC 5892:
> 
>    Elimination of conjoining Hangul Jamo from the set of PVALID
>    characters results in restricting the set of Korean PVALID
> characters
>    just to preformed, modern Hangul syllable characters.
> Here again PRECIS is consistent with IDNA2008.

As I am mostly restricted in the context of passwords, my question is
mostly on why is this done for the passwords. E.g., Is it because the
Hangual Jamo set is a deprecated set which may not be in use years from
now or another reason?

> >  4. why freeform class doesn't allow ignorable charaters?
> 
> These are things like soft hyphen, certain joiners, specialized code
> points for use within Unicode itself (e.g., language tags and
> variation
> selectors), and so on. They were disallowed in RFC 4013 and are
> disallowed in IDNA2008, too.
> 
> By saying "PRECIS is consistent with IDNA2008" I'm not appealing to
> authority or saying that a consistency is necessarily a good thing.
> Instead, defining as few string handling methods as possible helps
> users
> because strings aren't handled differently in different protocols and
> contexts (see §5.1 of RFC 7564). This has security implications, too,
> because the more such methods exist the easier it will be for
> attackers to trick users.

In the context of 'passwords', I see very little applicability of such
attacks, though I may be wrong. The main concern I see for passwords
used for storage is compatibility, e.g., even with legacy software
which did not follow these rules, and simplicity, so that software can
follow the rules under reasonable for the task effort (I find the
effort RFC7613 requires for processing UTF-8 passwords unproportionaly
complex to the effort needed for US-ASCII passwords).

> > The context of that, is that I am trying to understand what would
> > be
> > the drawbacks from recommending a fixed normalization form (e.g.,
> > NFC),
> > for passwords, in contrast to recommending rfc7613.
> 
> Nikos, instead of asking us why the foregoing restrictions were made,
> ask yourself why you would want to ignore them and whether you
> understand internationalization well enough to independently craft
> appropriate rules and guidelines for the RFC you're updating. Because
> you actively work on security technologies, think of it this way:
> would
> you want someone who doesn't understand all the issues to "just use
> TLS"
> without specifying appropriate cipher suites (ignoring RFC 7525) or
> certificate checking procedures (ignoring RFC 5280 and RFC 6125)? The
> issues involved with internationalization are just as complex (albeit
> in different ways) and the whole reason we developed IDNA2008 and
> PRECIS is so that well-meaning folks like you don't shoot yourselves
> in the foot.

I cannot disagree with that, however, providing rationale for the
decisions is important, especially in documents which are developed in
disconnect with many existing protocols/practices. The current state in
PKCS#12, PKCS#8 encrypted files, is pass there whatever you have as
long as it is UTF-8. Convincing developers to deploy thousands lines of
code for pre-processing such passwords, would require to underline the
problems of the previous practice. RFC7613 unfortunately ignores that
part completely, and I have no arguments when trying to convince people
that this should be preferred.

> I strongly encourage you to use the PRECIS profile for passwords in
> RFC7613, and we'd be happy to help you do so in the safest ways
> possible.

I'm trying to make a list of items which make apparent why RFC7613 is
needed. What I have now is:

"UTF-8 however, does not imply that strings conforming to it, are
unambiguously unique, since there are can be various forms of the same
string which may look identical to an observer, although being
represented by a different byte string. Some issues are the following."

[The NFC argument is the easier to explain]
 * Various normalization forms, which result to different data for the
same input.

[why spaces need to be merged to 0x20 is harder to sell]
 * The unicode standard includes a number of space characters which
cannot be distinguished from each other, or have no width resulting to
different results when switching to a different input method

[Hangual Jamo even harder]
 * There are deprecated alphabet sets, which are no longer in use(?)
and may not be available as input methods in the future.

[contextual rule]
 * Certain combinations of code points between certain scripts produce
unexpected visible results. (the question here is why would one care
for visible results on passwords which are not printed)

regards,
Nikos

Re: [precis] rationale of rfc7613 decisions Peter Saint-Andre
[precis] rationale of rfc7613 decisions Nikos Mavrogiannopoulos
Re: [precis] rationale of rfc7613 decisions Peter Saint-Andre
Re: [precis] rationale of rfc7613 decisions Nikos Mavrogiannopoulos
Re: [precis] rationale of rfc7613 decisions Peter Saint-Andre