Re: I-D.klensin-unicode-escapes

John C Klensin <john-ietf@jck.com> Fri, 02 February 2007 23:02 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HD7Q8-0007YB-VP; Fri, 02 Feb 2007 18:02:16 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HD7Q8-0007Y1-Cs for discuss@apps.ietf.org; Fri, 02 Feb 2007 18:02:16 -0500
Received: from ns.jck.com ([209.187.148.211] helo=bs.jck.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HD7Q6-000753-Vp for discuss@apps.ietf.org; Fri, 02 Feb 2007 18:02:16 -0500
Received: from [127.0.0.1] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.34) id 1HD7Q6-000GHZ-AY; Fri, 02 Feb 2007 18:02:14 -0500
Date: Fri, 02 Feb 2007 18:02:13 -0500
From: John C Klensin <john-ietf@jck.com>
To: Frank Ellermann <nobody@xyzzy.claranet.de>, discuss@apps.ietf.org
Subject: Re: I-D.klensin-unicode-escapes
Message-ID: <6459D49DFD9478F24C914416@p3.JCK.COM>
In-Reply-To: <45C3B9DF.6DA@xyzzy.claranet.de>
References: <875A124D75A8B481E176CF06@p3.JCK.COM> <20070202113853.GW7742@finch-staff-1.thus.net> <45C33D0C.7BF@xyzzy.claranet.de> <20070202185104.GH68544@finch-staff-1.thus.net> <45C3B9DF.6DA@xyzzy.claranet.de>
X-Mailer: Mulberry/4.0.7 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Score: 1.1 (+)
X-Scan-Signature: e8a67952aa972b528dd04570d58ad8fe
Cc:
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org


--On Friday, 02 February, 2007 23:23 +0100 Frank Ellermann
<nobody@xyzzy.claranet.de> wrote:

> Clive D.W. Feather wrote:
> 
>  [U+NN]
>> Um, the wording I've used is almost identical to that in the
>> charmod document (section 1.3).
> 
> The wording is okay, but IMO your reversed mnemonic
> U+[[N]N]NNNN is better than only U+NN.  With readers you never
> know, some like me never look into the prose if the ABNF is
> apparently clear, while others also including me look for the
> examples before ever reading a single word of the prose or
> ABNF, and if I understood John correctly his approach is more
> like the opposite, he looks into the ABNF if prose and examples
> are hopeless... ;-)  

Pretty close.  I'll look at formal and semi-formal definitions
first, but only if they cover semantics in addition to syntax.
If they don't, I tend to focus on semantics and worry about the
syntax details later.  Probably too many years spent worrying
about formal definitions of programming languages.

>>> a royal PITA in conjunction with <quoted-string>, when it
>>> results in multiple backslashes.
>  
>> Um, every scheme has that problem, surely? See "&amp;#x1234;".
> 
> Yes, but it doesn't have to fight with putting
> <quoted-string>s into MIME parameter values and similar
> horrors, compare the RFC 3696 errata.
> 
> Escaping backslashes is a pain, the USEFOR WG needed some
> months^Wtime to figure this out.  And for 2831bis it strikes
> again.  Probably it's a matter of taste, I recall times when I
> desperately tried \\ or \\\\ or worse with sh or csh scripts.

I have to confess that an early pre-posting draft of
unicode-escapes-00 had a section that was supposed to be titled
something like
   Recommendation for \UNNNNNNNN
Processing various combinations of slashes, escapes, and named
characters got two slashes in the output, occasionally three,
and occasionally the escapes themselves, but never one.  I
imagine, given enough time, that I could figure it out, but I'm
really sympathetic your concern above.

>> the security section needs to explicitly point at the security
>> section of 3629; it's not enough to say "people should know
>> it".
> 
> Yes.  Only copying the same old UTF-8 security considerations
> again and again is boring, a distraction from "real" (specific
> and fresh) issues.

See other note about the distinction between a spec about the
use of Unicode characters and one about escapes.

        john