Re: I-D Action:draft-klensin-net-utf8-04.txt

Stephane Bortzmeyer <> Fri, 05 October 2007 15:08 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1IdomY-0002FF-Ld; Fri, 05 Oct 2007 11:08:02 -0400
Received: from discuss by with local (Exim 4.43) id 1IdomX-0002Ck-A8 for; Fri, 05 Oct 2007 11:08:01 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1IdomX-0002Cb-0W for; Fri, 05 Oct 2007 11:08:01 -0400
Received: from ([]) by with esmtp (Exim 4.43) id 1IdomV-0000R8-Nj for; Fri, 05 Oct 2007 11:08:00 -0400
Received: from (localhost []) by (Postfix) with SMTP id 58F8C1C0100 for <>; Fri, 5 Oct 2007 17:07:59 +0200 (CEST)
Received: from ( []) by (Postfix) with ESMTP id 540B61C00F7 for <>; Fri, 5 Oct 2007 17:07:59 +0200 (CEST)
Received: from ( []) by (Postfix) with ESMTP id 515CF58EBBF for <>; Fri, 5 Oct 2007 17:07:59 +0200 (CEST)
Date: Fri, 5 Oct 2007 17:07:59 +0200
From: Stephane Bortzmeyer <>
Subject: Re: I-D Action:draft-klensin-net-utf8-04.txt
Message-ID: <>
References: <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
X-Operating-System: Debian GNU/Linux 4.0
X-Kernel: Linux 2.6.18-4-686 i686
Organization: NIC France
User-Agent: Mutt/1.5.13 (2006-08-11)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: f4c2cf0bccc868e4cc88dace71fb3f44
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

On Fri, Oct 05, 2007 at 01:40:02AM -0400, <> wrote 
 a message of 91 lines which said:

> 	Title           : Unicode Format for Network Interchange
> 	Author(s)       : J. Klensin, M. Padlipsky
> 	Filename        : draft-klensin-net-utf8-04.txt

I have read and studied this I-D and I find it basically OK, suitable
for approval and very useful for the Internet, where
internationalization is an important issue.

I have some reservations, which are mostly details:

> Section 1.1 [...] preferred to the double-byte encoding of "extended
> ASCII" [RFC0698]

This reference to a very obsolete system does not bring useful
information. Delete it or move it to the interesting "History and
Context" appendix.

> Section 2.1 [...] None of those uses is inappropriate for streams of
> plain text.

Isn't it a typo? It should be "appropriate".

> Section 3 [...] Recognition of the fact that some applications
> implementations may rely on operating system libraries over which
> they have little control and adherence to the robustness principle
> suggests that receivers of such strings should be prepared to
> receive unnormalized ones

This is also a security issue. An attacker could deliberately send
unormalized text even if the specification says MUST. As such, it is
worth a mention in the security considerations.

> Section 5.2 [...] internationalized domain names (IDNA [RFC3490])
> [...]  specific difficulties with IDNA in this regard are discussed
> in [RFC4690]

The two mentions of IDNA brings no value and really smell like a
personal issue. Discussions of the UTF-8 RFC are understandable but
other RFC talking about Unicode are not mentioned. Why specifically

> Section 6 [...]

A mention about firewalls and unormalized UTF-8 streams could be
useful. Something like "Firewalls and other systems interpreting UTF-8
streams should be developed with the clear knowledge that an attacker
may deliberately send unnormalized text, for instance to avoid
detection by naive text-matching systems."

> Appendix A [...] whois [RFC0954]

If it is the current version, it should be RFC3912. If it is the
original one, which would make sense in an historical section, it
should be RFC0812.