Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR tag for RFC 5646 Language Tags

John Cowan <> Thu, 15 May 2014 12:26 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 4F5081A04CA for <>; Thu, 15 May 2014 05:26:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.352
X-Spam-Status: No, score=-1.352 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id L0ULW60zEITt for <>; Thu, 15 May 2014 05:26:07 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id A6A011A03D4 for <>; Thu, 15 May 2014 05:26:07 -0700 (PDT)
Received: from cowan by with local (Exim 4.72) (envelope-from <>) id 1WkuR0-0002cv-UR; Thu, 15 May 2014 08:06:54 -0400
Date: Thu, 15 May 2014 08:06:50 -0400
From: John Cowan <>
To: Dave Cridland <>
Message-ID: <>
References: <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <>
Cc: LTRU Working Group <>, Doug Ewell <>
Subject: Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR tag for RFC 5646 Language Tags
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 15 May 2014 12:26:10 -0000

Dave Cridland scripsit:

> Of course, an invalid-UTF-8 based proposal simply means that it's no
> longer UTF-8 per-se, and so needs itself to be tagged differently.

The whole point of the invalid-UTF-8 was that the less fussy decoders of
the day would quietly drop the hidden information and display the string.
That is much less likely to happen now.

> Exactly the same caveats apply to Plane 14 tagging, mind, 

Not so much.  Unicode renderers that don't understand language tags will
not display them (because stuff on plane E is never displayed per its
Unicode properties), or if the renderer doesn't even understand that,
will at worst generate boxes or other "character unknown" glyphs,
often only three of them.

> The main consideration, I think, is what happens when a CBOR processor
> encounters a language-tagged string when it doesn't understand the concept.

Agreed, and if that is a serious problem, Plane E starts to look better.

John Cowan
[P]olice in many lands are now complaining that local arrestees are insisting
on having their Miranda rights read to them, just like perps in American TV
cop shows.  When it's explained to them that they are in a different country,
where those rights do not exist, they become outraged.  --Neal Stephenson