Re: I-D Action:draft-klensin-net-utf8-04.txt

John C Klensin <> Fri, 05 October 2007 15:54 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1IdpVj-0001vE-Rf; Fri, 05 Oct 2007 11:54:43 -0400
Received: from discuss by with local (Exim 4.43) id 1IdpVj-0001v9-JZ for; Fri, 05 Oct 2007 11:54:43 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1IdpVj-0001v1-A2 for; Fri, 05 Oct 2007 11:54:43 -0400
Received: from ([] by with esmtp (Exim 4.43) id 1IdpVh-0001qW-Dj for; Fri, 05 Oct 2007 11:54:43 -0400
Received: from [] (helo=p3.JCK.COM) by with esmtp (Exim 4.34) id 1IdpVX-000Aig-OH; Fri, 05 Oct 2007 11:54:32 -0400
Date: Fri, 05 Oct 2007 11:54:30 -0400
From: John C Klensin <>
To: Stephane Bortzmeyer <>,
Subject: Re: I-D Action:draft-klensin-net-utf8-04.txt
Message-ID: <3DD121D8A8CB33BE639A9B9E@p3.JCK.COM>
In-Reply-To: <>
References: <> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 0.0 (/)
X-Scan-Signature: a7d2e37451f7f22841e3b6f40c67db0f
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

--On Friday, 05 October, 2007 17:07 +0200 Stephane Bortzmeyer
<> wrote:

> On Fri, Oct 05, 2007 at 01:40:02AM -0400,
> <> wrote 
>  a message of 91 lines which said:
>> 	Title           : Unicode Format for Network Interchange
>> 	Author(s)       : J. Klensin, M. Padlipsky
>> 	Filename        : draft-klensin-net-utf8-04.txt
> I have read and studied this I-D and I find it basically OK,
> suitable for approval and very useful for the Internet, where
> internationalization is an important issue.
> I have some reservations, which are mostly details:
>> Section 1.1 [...] preferred to the double-byte encoding of
>> "extended ASCII" [RFC0698]
> This reference to a very obsolete system does not bring useful
> information. Delete it or move it to the interesting "History
> and Context" appendix.

Officially, RFC698 is not obsolete and applies specifically to
NVT-like streams, which is why it seemed worth singling out. The
spec should have listed "obsoletes RFC 698", which means that
this can't be in a section that is purely informative.  I'll try
to figure out a better way to handle it, but am going to leave
the text more or less as is until I see further comments.  A
better alternative would be for someone to create an RFC titled
something like "implications of the character set policy" that
clears out internationalization cruft like RFC 698 and "extended
ASCII".  Opinions and volunteers would be welcome.

>> Section 2.1 [...] None of those uses is inappropriate for
>> streams of plain text.
> Isn't it a typo? It should be "appropriate".

Yes.  It was a type.  Fixed in -05.  There are several other
typos, including syntax that omits closing single quotes, that
have been reported offlist and fixed in -05.

>> Section 3 [...] Recognition of the fact that some applications
>> implementations may rely on operating system libraries over
>> which they have little control and adherence to the
>> robustness principle suggests that receivers of such strings
>> should be prepared to receive unnormalized ones
> This is also a security issue. An attacker could deliberately
> send unormalized text even if the specification says MUST. As
> such, it is worth a mention in the security considerations.

Reasonable idea.  Text added.

>> Section 5.2 [...] internationalized domain names (IDNA
>> [RFC3490]) [...]  specific difficulties with IDNA in this
>> regard are discussed in [RFC4690]
> The two mentions of IDNA brings no value and really smell like
> a personal issue. Discussions of the UTF-8 RFC are
> understandable but other RFC talking about Unicode are not
> mentioned. Why specifically IDNA?

It isn't really an IDNA issue at all, but the issue with Unicode
versioning and libraries.  4690 contains the best discussion of
that subject of anything now published in the RFC series (the
discussion in draft-klensin-idnabis-issues is arguably even
better, but that document is in a sufficiently preliminary state
that having this one reference it would be unwise).  

If you think it would significantly improve the document, I
could make that text say, e.g., that IDNA, SASLPrep, and
possibly other protocols are tied to Unicode 3.2 via Stringprep
and then point to 4690.  But, either way, it is just a comment
about something we've done that is weak and should not be
repeated for Net-Unicode and an informative reference for
further reading.

>> Section 6 [...]
> A mention about firewalls and unormalized UTF-8 streams could
> be useful. Something like "Firewalls and other systems
> interpreting UTF-8 streams should be developed with the clear
> knowledge that an attacker may deliberately send unnormalized
> text, for instance to avoid detection by naive text-matching
> systems."


>> Appendix A [...] whois [RFC0954]
> If it is the current version, it should be RFC3912. If it is
> the original one, which would make sense in an historical
> section, it should be RFC0812.

Here, I disagree.  RFC954 was chosen because it was the last,
and most clear, version of the original spec.  RFC3912 is
different in several respects and arguably introduces new
ambiguities.  I could make it "[RFC0812] [RFC0954]" if you think
that would improve clarity.