Re: [Ltru] Macrolanguage usage

On Fri, May 16, 2008 at 1:35 PM, Shawn Steele <Shawn.Steele@microsoft.com>
wrote:

>  > That lets people like Google, which has a very serious and important
> issue
> > with backwards compatibility, support the vast amount of
> Mandarin/Arabic/...
> > that is tagged with "zh", and continue to tag Mandarin content with "zh".
> It
> > also lets people like Shawn's group at Microsoft, having no backwards
> > compatibility issues with "cmn" and "zh", to shift over immediately.
>  This seems to be oversimplifying the issue.  I presume that Google would
> also be interested in recognizing cmn tagged data and cmn requests when they
> see them?
>

Yes, clearly. Although even that takes time and effort to do; there is no
magic wand. (As I've said, the addition of non-predominant encompassed
language subtags like "yue" is important and useful for us. The addition of
the predominant encompassed language subtags just means a lot of work for no
additional benefit to us or our users. Following the de/gsw paradigm would
have been *much* simpler. But that is water under the bridge; we just have
to deal with the situation as it is, and try to give people guidance as to
the best way to handle them.)

> If its "just" some internal database of languages and you have to map
> between the actual request/content names anyway, then I don't know why this
> working group would care if Google called their internal data "zh" or "abc"
> or whatever.
>

The working group might not, but we have a goal of using BCP 47 not only
externally, but in communication *within* Google among the many different
products and programs. I assume that's not a bad thing ;-)

It the intent is interchange, then continuing to use "zh" when what is
> really meant is a large subset of zh (Mandarin) seems to perpetuate the
> existing ambiguity.
>

The issue is on output. If a program switches to "cmn" from "zh" on output,
then an external party who doesn't recognize "cmn" breaks. So that program
needs to output "zh" until it is certain that the recipients would all
recognize "cmn".

We could remain silent on this issue in the spec, but that would just be
withholding useful advice for people in terms of "tagging wisely", advice
that would allow people to interoperate more effectively.

> Microsoft is also concerned with the backwards compatibility issues,
> however we recognize that existing "zh" tagged data is not necessarily
> Mandarin (even though its likely).  We don't have language detection tools
> to try to guess what the application's resources or a web page actually has
> for a language.  For back-compat, you'll recall that I thought zh-cmn
> helped, which would seem to solve both problems.  (Google'd have the zh it
> wants and I'd have the specificity that I'm looking for.)
>

No, it doesn't solve the problem at all. There are many, many circumstances
where you want precise communication, not lookup&fallback. If recipient
expecting "zh" (and who doesn't know about the new codes) will break when
they get "zh-cmn" just like they will break when they get "cmn": extlang or
not makes no difference. (Extlang only really has an effect with lookup, you
think positive, I think negative, but let's not discuss that here -- we've
all agreed to look at that issue *after* this round.)

>
> Presumably (unless its only for internal use), Microsoft and Google can't
> have different definitions of "zh" for interchange.  If the definitions
> differ, then indexing of IIS served pages or requests from IE browsers would
> not necessarily provide the expected results.
>

Once both parties can handle both, it is not a problem. And where there is a
handshaking protocol it is not a problem. But there will be a *long*
transition period where only "zh" can be depended on to work.

> What happens for other language tags that change?  Like serbian (serbian
> remains the same, but data tagged at the region level will change.)  Needing
> to support zh + cmn isn't that different than other common scenarios.
>

Right, I've been saying all along that this is the same issue with any
predominant encompassed language.

>
> - Shawn
>
>

-- 
Mark