Re: draft-klensin-net-utf8

John C Klensin <> Fri, 05 October 2007 16:58 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1IdqV4-0001w4-5A; Fri, 05 Oct 2007 12:58:06 -0400
Received: from discuss by with local (Exim 4.43) id 1IdqV2-0001te-Sk for; Fri, 05 Oct 2007 12:58:04 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1IdqV2-0001tV-JI for; Fri, 05 Oct 2007 12:58:04 -0400
Received: from ([] by with esmtp (Exim 4.43) id 1IdqV1-0004F3-AQ for; Fri, 05 Oct 2007 12:58:04 -0400
Received: from [] (helo=p3.JCK.COM) by with esmtp (Exim 4.34) id 1IdqUv-000CQU-QS; Fri, 05 Oct 2007 12:57:57 -0400
Date: Fri, 05 Oct 2007 12:57:57 -0400
From: John C Klensin <>
To: "Clive D.W. Feather" <>
Subject: Re: draft-klensin-net-utf8
Message-ID: <99A43B12619CB831A2FDE27E@p3.JCK.COM>
In-Reply-To: <>
References: <3A8797AD0BB8B1EF4FAA7DE8@p3.JCK.COM> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 0.0 (/)
X-Scan-Signature: bb8f917bb6b8da28fc948aeffb74aa17
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

--On Friday, 05 October, 2007 17:20 +0100 "Clive D.W. Feather"
<> wrote:

>> draft-klensin-net-utf8
> Section 2.1 item 3: "control characters" needs to be properly
> defined, since it's a SHOULD. is currently
> offline, but at the least you either need to make it explicit
> ranges (say U+0000 to U+001F and U+007F to U+009F), or base it
> on a Unicode character class.

Once upon a time, the criterion for the level of referencing and
specificity in an RFC was "everyone, or at least everyone
competent to use the spec, knows what one is talking about".
Being pedantic for its own sake or on principle was not
considered useful or desirable.

If anyone is writing code that uses Unicode and doesn't know
what a control character is, they are in much bigger trouble
than this document can fix.   And "C0" and "C1" are defined in
Unicode as the names of parts of blocks.

I'll make this change, but would encourage everyone to think
carefully about whether we really want to go in the direction I
think it represents.