Re: Form feed in Net-UTF8? (Was: FWD: Re: Comments onUnicode Format for Network Interchange

John C Klensin <> Mon, 08 October 2007 17:03 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1Iew1F-0002Yq-Il; Mon, 08 Oct 2007 13:03:49 -0400
Received: from discuss by with local (Exim 4.43) id 1Iew1F-0002YW-2M for; Mon, 08 Oct 2007 13:03:49 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1Iew1E-0002Xy-OM for; Mon, 08 Oct 2007 13:03:48 -0400
Received: from ([] by with esmtp (Exim 4.43) id 1Iew1D-00066D-Cc for; Mon, 08 Oct 2007 13:03:48 -0400
Received: from [] (helo=p2) by with esmtp (Exim 4.34) id 1Iew1C-000OwW-MX; Mon, 08 Oct 2007 13:03:47 -0400
Date: Mon, 08 Oct 2007 13:03:45 -0400
From: John C Klensin <>
To: Bill McQuillan <>
Subject: Re: Form feed in Net-UTF8? (Was: FWD: Re: Comments onUnicode Format for Network Interchange
Message-ID: <18BEB54BDA79DAB1880CBBCE@[]>
In-Reply-To: <>
References: <398A6C120C8B166FCBD3BDAF@p3.JCK.COM> <> <E877BB045466189D5B4E287A@p3.JCK.COM> <> <16279928490191A12E3B99E8@p3.JCK.COM> <> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 0.0 (/)
X-Scan-Signature: cd26b070c2577ac175cd3a6d878c6248
Cc: Apps-Discusssion <>
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

--On Monday, October 08, 2007 9:39 AM -0700 Bill McQuillan 
<> wrote:

> On Sun, 2007-10-07, Dave Crocker wrote:
>> John C Klensin wrote:
>>> While I could live with the text you propose above, it seems
>>> to me to lead onto a slippery slope.  Try substituting, for
>>> "page breaks" above, terms like "alert sounds", "visual
>>> emphasis such as highlighting or colored characters", or
>>> "flashing lines". FormFeeds are much more frequent in text,
>>> and do appear in RFCs, but the principles are, I think, the
>>> same.
>> Seems like the discussion is really distinguishing among
>> different classes of use, where each might be labeled
>> distinctly and have variants of permitted characters.
> It seems to me that this draft is trying to cover two separate
> areas of concern:
> 1 - How to express information (UTF-8, NFC, CRLF line-endings,
> no BOM).
> 2 - What information to express.
> Is it necessary to tell prospective users of Network Text
> which characters are allowed for their application? Wouldn't
> it be simplier to merely specify the encoding to be used and
> then mention that use of things like control characters should
> be specified within the protocol definition?
> Since the use of "text" occurs in many varied contexts, not
> all of which are merely for display, I think that alerting
> users to the problems of control characters and making it
> their responsibility to address them is sufficient.

Bill, Dave,

To me, the purpose of trying to develop an IETF specification is 
to promote interoperability.  If you send me encoded characters 
that I don't know how to interpret, or don't reliably know how 
you intend that I interpret, we have an interoperability 
problem.  That problem may be more or less severe depending on 
the range of possible interpretations and their consequences. 
The reason this document was developed was to reduce those 
interoperability problems and also to reduce the number of 
"N-squared" problems in which my receiving system is expected to 
understand all of the possible formats and interpretations that 
other systems can send out, even if they are explicitly 

I hope it is obvious that, if one identifies each body of text, 
in-band, with a description of what characters are used and how 
they are to be interpreted, then the net-utf8 format is not 
needed.  Everything that is needed is in the UTF8 and Unicode 
specs themselves.  I assume it is equally obvious that it makes 
little difference --at least to the utility of this spec-- 
whether such a description is explicit and local or provided in 
the form of some label that points to an external definition of 
a class of characters.

It seems to me that we have three plausible paths from here:

(1) Go ahead with net-Unicode (net-utf8), with the understanding 
that some protocols and contexts will find it useful and others 

(2) Try to insist that every protocol that transmits Unicode 
"plain-text" explicitly identifies the forms it is using (or the 
category to which it belongs, after defining those categories).

(3) Decide that interoperability of text interpretation is not 
an issue and just drop these ideas.