Re: [Json] On characters and code points

I agree 100% with Stephen Dolan.  -T

On Fri, Jun 7, 2013 at 9:09 AM, Stephen Dolan <stephen.dolan@cl.cam.ac.uk>wrote:

> I think it is useful to distinguish three cases of codepoint:
>  (1) Those which are valid characters in a particular Unicode revision
>  (2) Those which are unallocated codepoints which may become valid
> characters in a later Unicode revision
>  (3) The noncharacter codepoints which will never be valid
>
> (3) includes such beasts as U+FFFE (which you can only get by reading
> a UTF16 byte order mark with the wrong byte order). The set (1)
> increases with every Unicode revision to include characters from (2),
> but (3) is stable (see
> http://unicode.org/policies/stability_policy.html).
>
> I think JSON should allow characters from (1) and (2) to avoid being
> dependent on a specific Unicode revision. I do not think (3) should be
> allowed - this would cause problems with many existing parsers which
> represent JSON strings using another system's native unicode
> representation.
>
> The argument about testsuites does not seem compelling, as any such
> testsuite testing behaviour of string functions with bad Unicode would
> also include invalidly-encoded Unicode (such as overlong UTF8
> sequences) which cannot be represented at all in JSON, even with
> escaping.
>
> Stephen
>
> On Fri, Jun 7, 2013 at 4:56 PM, Paul Hoffman <paul.hoffman@vpnc.org>
> wrote:
> > <no hat>
> >
> > This may be a part of the spec where some people have to hold their
> noses. The Unicode definition of "character" does not include
> non-characters, and the code points for some of those non-characters make
> sense in JSON strings when those strings. Bjoern has pointed out a good
> one: strings used for test cases of other code. The issue not just unpaired
> surrogates. Do we *really* want to prohibit:
> >    { "End of data marker": "\uFFFF" }
> >
> > Proposal:
> >
> > Remove the word "character" from the spec except in an explanatory
> paragraph in Section 2.5 that says:
> >    All code points, even those that represent non-characters in the
> Unicode specification [UNICODE], are allowed in JSON strings.
> >
> > --Paul Hoffman
> > _______________________________________________
> > json mailing list
> > json@ietf.org
> > https://www.ietf.org/mailman/listinfo/json
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json
>