Re: [I18ndir] Fwd: New Version Notification for draft-bray-unichars-04.txt

Asmus Freytag <asmusf@ix.netcom.com> Fri, 15 September 2023 21:01 UTC

Content-Type: multipart/alternative; boundary="------------52707jQRdpk0Abk5C2nEG1Gx"
Message-ID: <472ef154-3f4b-d6f0-dc48-8599a7896f13@ix.netcom.com>
Date: Fri, 15 Sep 2023 14:01:48 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1
Content-Language: en-US
To: Tim Bray <tbray@textuality.com>, ART Area <art@ietf.org>, i18ndir@ietf.org
References: <169479938668.18742.9199862891950651366@ietfa.amsl.com> <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com>
From: Asmus Freytag <asmusf@ix.netcom.com>
In-Reply-To: <CAHBU6ivzUV947N+n7AoYkCFT3ZfaLobCQ4fBXw3dvkqTT=LBAw@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/-xgCeZ0Q4y6GpR4drYMqh6tdnLs>
Subject: Re: [I18ndir] Fwd: New Version Notification for draft-bray-unichars-04.txt
Precedence: list

This time I looked only at the diffs and sometimes a bit of adjacent text.

This first one is not major, but a small fix would avoid a contradiction 
in terms.

> The numbers assigned to Unicode characters are called “code points”;
This is backwards, as can be seen by "unassigned" code points. Easy to fix:

> The numbers to which Unicode characters are assigned are called “code 
> points”;

That matches the way the Unicode Standard talks about this process and 
makes "unassigned" code points no longer something that seemingly 
contradicts their definition.

In the paragraph above, this instance also has the backwards description 
assignment:

> each Unicode character is assigned an integer identifier in the range 
> U+0000-U+10FFFF.  These numbers are used to
simple fix, also adding "unique" for precision.

> each Unicode character is assigned to a unique integer identifier in 
> the range U+0000-U+10FFFF.  These numbers are used to

---

New text on dealing with problematic code points:

I like the new section. However, I'm troubled by the some of the wording 
related to RFC9413 which I believe can be misunderstood and even cited 
out of context to support some dangerous strategies.

RF4913 states:

> However, ignoring faulty or ambiguous input is almost always the 
> incorrect solution to the problem.

Because silently ignoring individual code points can be used to evade 
detection of malicious input, this should not be understood as "ignoring 
faulty characters individually", but as "ignoring text fields with 
faulty characters".

The distinction is crucial, and is what gave rise to Unicode's 
recommendation on how to treat ill-formed UTF-8.

There are known attack techniques that rely on part of a string being 
discarded silently. For example adding an unpaired surrogate to foil 
matching of known malicious content. If the surrogate is later 
discarded, the remaining string then represents an attack payload that 
escaped the defenses. This was the impetus for Unicode to add the 
recommendations you cite.

I'm not  suggesting that you necessarily delve into too much detail 
here, but that you introduce the concept that a single ill-formed part 
of a string makes the whole text field ill-formed, and that the 
recommendation of RFC9413 should therefore never apply to single 
characters or code points in isolation.

Here's suggested text:

> In applying the recommendations of RFC19413 for text fields containing 
> ill-formed UTF-8, for example, the recommendations must be applied to 
> the field as a whole, not on the character or byte level. In fact, 
> silently ignoring an ill-formed part of a string is a known security 
> risk. Responding to that risk, [UNICODE] section 3.2 ....

The last paragraph is overselling RFC9413, because the phrasing 
conceivably implies that it contains guidance specific to code points, 
when it is more generically concerned with problematic input. It also 
doesn't flow particularly well.

You could move it at the head of Section 5, with tweak.

> Problematic code points are an example of problematic input. 
> [RFC9413], "Maintaining Robust Protocols", provides a thorough 
> discussion of error-handling options when choosing a strategy for 
> dealing with problematic input. Different types of problematic code 
> points cause different issues.
>
> Noncharacters....

(I'm also suggesting adding a sentence to make the transition)

This way, you put RFC9413 in perspective before relying on it later in 
the text, and you also don't inadvertently set up a contrast between 
Unicode's recommendation and those of RCF9413. What Unicode does, is to 
give a specification for the option when you don't want to or cannot 
discard the whole text field. And it clarifies that on the character or 
code point level, silently discarding part of the text is a big security 
no-no.

With these fixes, OK to ship it.

A./

[I18ndir] Fwd: New Version Notification for draft… Tim Bray
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Steffen Nurpmeso
Re: [I18ndir] Fwd: New Version Notification for d… Asmus Freytag
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Asmus Freytag
Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] Fwd: New Version Notification for d… Tim Bray
Re: [I18ndir] [art] Fwd: New Version Notification… Asmus Freytag
Re: [I18ndir] Fwd: New Version Notification for d… Asmus Freytag
Re: [I18ndir] [art] Fwd: New Version Notification… Manger, James
Re: [I18ndir] [art] New Version Notification for … Carsten Bormann
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
Re: [I18ndir] [art] Fwd: New Version Notification… Steffen Nurpmeso
Re: [I18ndir] [art] Fwd: New Version Notification… Steffen Nurpmeso
Re: [I18ndir] [art] Fwd: New Version Notification… Asmus Freytag
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Manger, James
Re: [I18ndir] [art] Fwd: New Version Notification… Manger, James
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Asmus Freytag
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Steffen Nurpmeso
Re: [I18ndir] [art] Fwd: New Version Notification… Carsten Bormann
Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Asmus Freytag
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Tim Bray
Re: [I18ndir] [art] Fwd: New Version Notification… Carsten Bormann
Re: [I18ndir] [art] Fwd: New Version Notification… Asmus Freytag
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre
Re: [I18ndir] [art] Fwd: New Version Notification… Rob Sayre