Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR tag for RFC 5646 Language Tags

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Fri, 16 May 2014 09:18 UTC

Message-ID: <5375D7C5.3060609@it.aoyama.ac.jp>
Date: Fri, 16 May 2014 18:17:57 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: Doug Ewell <doug@ewellic.org>, Dave Cridland <dave@cridland.net>
References: <20140515083955.665a7a7059d7ee80bb4d670165c8327d.b69c089194.wbe@email03.secureserver.net>
In-Reply-To: <20140515083955.665a7a7059d7ee80bb4d670165c8327d.b69c089194.wbe@email03.secureserver.net>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/ltru/7wbjt4PftpkfgLPqvUZ6KAEQrog
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR tag for RFC 5646 Language Tags
Precedence: list

On 2014/05/16 00:39, Doug Ewell wrote:
> Dave Cridland <dave at cridland dot net> wrote:
>
>> Of course, an invalid-UTF-8 based proposal simply means that it's no
>> longer UTF-8 per-se, and so needs itself to be tagged differently.

But this is highly counter-productive. US computer and Internet 
technology was so successful among else because everything was ASCII. We 
are finally getting close to a place where (almost) everything is UTF-8. 
Some of us already in 1997 (or even earlier) knew that that was the 
direction to go. UTF-8 "variants" would have killed a lot of the 
advantages of moving towards UTF-8.

>> Other than that, I don't see it's a bad idea from a technical
>> standpoint. The use of the word "invalid" probably scares people, but
>> I note that's really a shorthand for "not backwards compatible by
>> existing UTF-8 processors".

It was not such a bad idea on the level of "let's use a screw to hold 
these two pieces of metal together". I was a very bad idea because much 
of technological (prospect of) success is or has to be measured not by 
how many good pieces of technology you have (how many different sizes of 
screws), but how *few* of them you have.

> The proposal from 1997 ("MLSF") did call it an extra layer on top of
> UTF-8, and included lots of health warnings that it was not really
> UTF-8.

Only after quite a bit of pressure on the authors (including from me). 
See http://tools.ietf.org/rfcdiff?url2=draft-ietf-acap-mlsf-01.txt.

> That didn't remove the danger, though, because it looked so much
> like UTF-8. John's response about decoders was spot-on.

It was an extremely ugly chameleon mixing character encoding with 
higher-level information, messing around with the structural cleanness 
and heuristic detectability of UTF-8, and inviting all kinds of other 
crazy cludges for other "UTF-8-but-not-quite" chimeras.

>> Exactly the same caveats apply to Plane 14 tagging, mind, and
>> moreover, we could invent our own - indeed, that's what we're doing by
>> having these arrays of (tag, string) tuples.

Plane 14 language tags are strictly within UTF-8 and also of course work 
with UTF-16. They are therefore quite a bit less bad than MLSF, but 
still bad enough.

> As Mark knows, I never bought into the deprecation argument about how
> evil Plane 14 tag characters are. Handling them correctly just isn't
> that difficult.

There are hundreds of ideas that look "not that difficult" to implement. 
But usually, everything turns out to be more difficult than estimated, 
and what's more important, the combinations of the different ideas turn 
out to be the killer.

In some ways, plane 14 language tags were born dead. Putting them in 
plane 14 was an explicit decision that sent a clear message that there 
was no expectation that they would or should be used frequently.

> For CBOR, you may be better off with the tag/string
> tuples; the tags in that case are much easier to see and don't need to
> be stripped from the string for display or comparison. But if this
> "tagged text" model is too far out of step with the CBOR/JSON way of
> thinking, Plane 14 is out there.

As far as I understand, the tagged text model should work well, about as 
well as lang/xml:lang attributes for HTML and XML.

Regards,   Martin.

[Ltru] Fwd: [apps-discuss] Defining a CBOR tag fo… Ira McDonald
Re: [Ltru] Fwd: [apps-discuss] Defining a CBOR ta… John Cowan
Re: [Ltru] Fwd: [apps-discuss] Defining a CBOR ta… Randy Presuhn
Re: [Ltru] Fwd: [apps-discuss] Defining a CBOR ta… Peter Occil
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Peter Occil
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Dave Cridland
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Carsten Bormann
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Mark Davis ☕️
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… John Cowan
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Doug Ewell
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Mark Davis ☕️
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Doug Ewell
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Peter Occil
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Dave Cridland
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… John Cowan
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Dave Cridland
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Doug Ewell
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Peter Occil
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Mark Davis ☕️
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Doug Ewell
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Mark Davis ☕️
Re: [Ltru] [apps-discuss] Fwd: Defining a CBOR ta… Martin J. Dürst
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Peter Occil
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Carsten Bormann
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Peter Occil
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Joe Hildebrand (jhildebr)
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Peter Occil
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Doug Ewell
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Peter Occil
Re: [Ltru] [apps-discuss] Defining a CBOR tag for… Peter Occil