Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR tag for RFC 5646Language Tags

John Cowan <> Wed, 14 May 2014 17:38 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 460D01A0107; Wed, 14 May 2014 10:38:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.551
X-Spam-Status: No, score=-2.551 tagged_above=-999 required=5 tests=[BAYES_50=0.8, GB_I_LETTER=-2, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9ShF5de6UTEk; Wed, 14 May 2014 10:38:33 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 8DA091A0104; Wed, 14 May 2014 10:38:33 -0700 (PDT)
Received: from cowan by with local (Exim 4.72) (envelope-from <>) id 1Wkd8L-0001iI-NV; Wed, 14 May 2014 13:38:25 -0400
Date: Wed, 14 May 2014 13:38:25 -0400
From: John Cowan <>
To: Dave Cridland <>
Message-ID: <>
References: <> <9BE5D3F7FAEE4CAB8FD3326ED8F1ED75@PeterPC> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <>
Cc: Randy Presuhn <>, LTRU Working Group <>, "" <>
Subject: Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR tag for RFC 5646Language Tags
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 14 May 2014 17:38:35 -0000

Dave Cridland scripsit:

> Many years ago, Mark Crispin and Chris Newman had a proposal for embedding
> language tags in invalid UTF-8; I seem to recall they publicly renounced
> their proposal rather dramatically in favour of a Unicode Consortium
> proposal for embedding the language tags somewhere in Plane 14 - published
> as RFC 2482.

That's correct.  There is a character meaning "Language tag follows"
(U+E0001) and then there are tag versions of the 96 ASCII graphic
characters (U+E0020 through U+E007E), though in fact only letters,
digits, and hyphen would be useful.  U+E007F means "No language tag".

> The fact it was all initiated in order to support the pressing needs of
> ACAP might give you some hints as to why it never really took off, but as a
> counter-proposal to language tags in metadata, it might be worth
> re-examining.

It might, but don't get your hopes up.

John Cowan
SAXParserFactory [is] a hideous, evil monstrosity of a class that should
be hung, shot, beheaded, drawn and quartered, burned at the stake,
buried in unconsecrated ground, dug up, cremated, and the ashes tossed
in the Tiber while the complete cast of Wicked sings "Ding dong, the
witch is dead."  --Elliotte Rusty Harold on xml-dev