Re: [Json] On characters and code points

I think it is useful to distinguish three cases of codepoint:
 (1) Those which are valid characters in a particular Unicode revision
 (2) Those which are unallocated codepoints which may become valid
characters in a later Unicode revision
 (3) The noncharacter codepoints which will never be valid

(3) includes such beasts as U+FFFE (which you can only get by reading
a UTF16 byte order mark with the wrong byte order). The set (1)
increases with every Unicode revision to include characters from (2),
but (3) is stable (see
http://unicode.org/policies/stability_policy.html).

I think JSON should allow characters from (1) and (2) to avoid being
dependent on a specific Unicode revision. I do not think (3) should be
allowed - this would cause problems with many existing parsers which
represent JSON strings using another system's native unicode
representation.

The argument about testsuites does not seem compelling, as any such
testsuite testing behaviour of string functions with bad Unicode would
also include invalidly-encoded Unicode (such as overlong UTF8
sequences) which cannot be represented at all in JSON, even with
escaping.

Stephen

On Fri, Jun 7, 2013 at 4:56 PM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
> <no hat>
>
> This may be a part of the spec where some people have to hold their noses. The Unicode definition of "character" does not include non-characters, and the code points for some of those non-characters make sense in JSON strings when those strings. Bjoern has pointed out a good one: strings used for test cases of other code. The issue not just unpaired surrogates. Do we *really* want to prohibit:
>    { "End of data marker": "\uFFFF" }
>
> Proposal:
>
> Remove the word "character" from the spec except in an explanatory paragraph in Section 2.5 that says:
>    All code points, even those that represent non-characters in the Unicode specification [UNICODE], are allowed in JSON strings.
>
> --Paul Hoffman
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json