Re: [secdir] Secdir last call review of draft-klensin-idna-unicode-review-03

John C Klensin <john-ietf@jck.com> Wed, 11 September 2019 22:07 UTC

Date: Wed, 11 Sep 2019 18:07:24 -0400
From: John C Klensin <john-ietf@jck.com>
To: Christopher Wood <caw@heapingbits.net>
cc: secdir@ietf.org, iesg@ietf.org, draft-klensin-idna-unicode-review.all@ietf.org
Message-ID: <645388ABB5D92E33447C55DF@PSB>
In-Reply-To: <156816606075.22400.22167404102467671@ietfa.amsl.com>
References: <156816606075.22400.22167404102467671@ietfa.amsl.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/nuUJBG9TGwJ6LHV3C-QivmH0DkA>
Subject: Re: [secdir] Secdir last call review of draft-klensin-idna-unicode-review-03
Precedence: list

Christopher,

(since Last Call has ended, iesg added and the IETF list removed)

This has already been discussed in the context of another review
but, the tracker showed the review as being due by August 30,
the same date as Last Call closed, until yesterday.  It is
important that area reviews appear within the Last Call period
so that others in the community can comment on them.  Late
reviews, especially ones that arrive after the IESG evaluation
period starts, are also a bit disrespectful of authors who
struggle to get complete documents posted immediately after IETF
Last Call and before the IESG evaluation period begins so as to
have documents that are as complete as possible before ADs start
their reviews and of those ADs who might have done so.

That said....

--On Tuesday, September 10, 2019 18:41 -0700 Christopher Wood
via Datatracker <noreply@ietf.org> wrote:

> Reviewer: Christopher Wood
> Review result: Has Nits
> 
> This document looks mostly good to go. I only have a few
> questions and some various editorial nits.
> 
> Questions:
> - Section 4, last paragraph: Will code points "considered
> unsafe" be labelled as such, and if so, where? In the derived
> property IANA tables? (Assuming those tables are kept.) -

The document should be clear on this.  In particular, the last
sentence of Section 3.2, which, AFAICT, is the first comment in
it about "considered unsafe", says

	'The affected code points should be considered unsafe
	and identified as "under review" in the IANA tables
	until final derived properties are assigned.'

That seems very clear to me unless you think the document should
say "... identified as 'unsafe and under review' in the IANA
tables..." or something similar.  That, to me, is a tradeoff
against tediousness and against the fairly clear language in the
base IDNA specifications, as well as language in
draft-klensin-idna-5891bis (in Last Call and IESG evaluation
current with this document) that spell out what a zone is
allowed to do with code points that IDNA considers PVALID.  The
intent is to give a registry a clear warning that the status of
a code point might change as reviews continue.   If there were
community consensus that the issue should be described in more
detail in this document, I assume we would happily change it.
But, because this issue did not come up during IETF Last Call,
there is no time for such a discussion and, IMO, a presumption
that the text is ok.

> Section 5, second paragraph: How will the success of this
> document's proposed changes be measured in order to determine
> if further steps towards minimizing confusion are needed?

First of all, the nature of human languages and writing systems
and their evolution is such that, as characters other than those
from very simple scripts designed for easy recognition (e.g.,
Roman script as used in the early Roman Republic) are allowed in
identifiers (including domain name labels), aspirations for zero
confusion are up there with aspirations for perfect security
that cannot be broken by any mechanism now or in the future.
That makes "good enough" subjective, circumstantial, and, to
some extent, dependent on the dedication of the attackers and
resources available to them.    But this paragraph is not about
that.  It is about the observation that publishing non-normative
tables with IANA, rather than telling people what the rules and
algorithms are and expected them to do their own computation or
relying on tables that are not an IETF responsibility,  has
resulted in a good deal of confusion (and some complaints) about
what the true and correct values are.  If those complaints now
stop, we are successful.  If they continue, then we conclude the
that IANA tables are a bad idea on balance and we drop them.  It
occurs to me that your comment about "unsafe and under review"
and the discussion above suggest that, if we decided to get rid
of the IANA tables in their current form, we'd need to find a
place to publish that information.  But let's cross that bridge
when and if we get to it.

> Nits:
> - Section 2, first paragraph, first sentence: It seems a comma
> is missing after [RFC3491] reference, i.e., "..., commonly
> known as "IDNA2003" [RFC3490] [RFC3491], ...".

Correct.  Working draft fixed.  Thanks although I'm confident
the RFC Editor would have caught this. 

> - Section 3,
> second paragraph: s/full Unicode versions/major Unicode
> versions?

IIR, "full Unicode versions" is Unicode's preferred terminology.
We should check this.

 - Section 3.1: s/also concluded that maintain
> Unicode/also concluded that Unicode?

Yes.  Cut and paste error as that sentence was rewritten several
times.  Fixed in working draft.

> - Section 4, third
> paragraph: Is the requirement that changes which are
> "documented" redundant with the following "explained"
> requirement? (That is, perhaps just say "... must be
> documented and explained."

It is actually not redundant although I understand why you might
read it that way.  Explanation on request if you or some AD
think it is important.

> - Security Considerations, second
> paragraph: Do "end users" include systems that process or
> interpret Unicode values? If not, it might help to specifically
> call them out, as problems may arise from misinterpretation
> there.

They do not.  "End user" refers to human beings and, perhaps
eventually, robots who are using the Internet as surrogates for
human beings.  One of the design goals of IDNA2008 (successfully
realized) was to create a situation in which there simply are no
ambiguities in Unicode strings as they pass through the
protocol.  As an overused example, as far as computer systems
processing or interpreting values are concerned, Latin small "a"
(U+0061) is quite distinct from Cyrillic small "а" (U+0430) in
all relevant encoding forms including what happens when those
code points are passed through the Punycode algorithm.   The
problem occurs only when those characters are displayed to
humans and they can't tell the difference.  Hence end users.  

Processes that try to figure out what a human might confuse are
a different story, but they, necessarily, operate off
assumptions about the graphemes associated with particular code
points and not the code points themselves.  Hence they are just
not relevant to this document.

Thanks for the careful reading.
best,
   john

[secdir] Secdir last call review of draft-klensin… Christopher Wood via Datatracker
Re: [secdir] Secdir last call review of draft-kle… John C Klensin
Re: [secdir] Secdir last call review of draft-kle… Christopher Wood
Re: [secdir] Secdir last call review of draft-kle… Patrik Fältström
Re: [secdir] Secdir last call review of draft-kle… John C Klensin
Re: [secdir] Secdir last call review of draft-kle… Roman Danyliw