Re: [Ltru] Minor proofreading nits again

Mark Davis ☕ <> Mon, 18 July 2011 23:36 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id F27F221F8506 for <>; Mon, 18 Jul 2011 16:36:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.959
X-Spam-Status: No, score=-0.959 tagged_above=-999 required=5 tests=[AWL=-0.833, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_FONT_FACE_BAD=0.884, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-1, SARE_HTML_USL_OBFU=1.666]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 1LyQtBThPZgq for <>; Mon, 18 Jul 2011 16:36:07 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 58A4221F86BA for <>; Mon, 18 Jul 2011 16:36:07 -0700 (PDT)
Received: by gwb20 with SMTP id 20so1765048gwb.31 for <>; Mon, 18 Jul 2011 16:36:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=B4+kCfredOBg+WUyVOcDeJQCe/Cwei/HRDnc3yfyneM=; b=eV/edi15WCzWW1tL70IqD1lI+GDRDS/3kEvnGaPPQX/mGPht9YsJQAmjRIoXEjKOMc GT45juAQ0j1nSvjfjOgsVQs+l/+zE4s41weYQ20KIzaG7ZoLAjOQoaoztfbU5DYkXzAI knKDka9lflhSeBGvEWcFM7OALcZ0D3DNzL/UE=
MIME-Version: 1.0
Received: by with SMTP id p4mr995324yba.79.1311032165283; Mon, 18 Jul 2011 16:36:05 -0700 (PDT)
Received: by with HTTP; Mon, 18 Jul 2011 16:36:05 -0700 (PDT)
In-Reply-To: <>
References: <SNT142-w47E796198D72F223478656B3470@phx.gbl> <> <> <>
Date: Mon, 18 Jul 2011 16:36:05 -0700
X-Google-Sender-Auth: mEboiMq6ZaPFn41eM9ANNP400Kg
Message-ID: <>
From: =?UTF-8?B?TWFyayBEYXZpcyDimJU=?= <>
To: "Jukka K. Korpela" <>
Content-Type: multipart/alternative; boundary=000e0cd47afa80376f04a860769d
Subject: Re: [Ltru] Minor proofreading nits again
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 18 Jul 2011 23:36:12 -0000

*— Il meglio è l’inimico del bene —*

On Mon, Jul 18, 2011 at 02:39, Jukka K. Korpela <> wrote:

> 18.07.2011 11:57, "Martin J. Dürst" wrote:
>  There are certainly cases where there's more than the source and target
>> language and script involved. But on the other hand, there are also
>> cases where there's not really a target language.
> Yes; I was writing about what translation _may_ depend on. Now that I read
> the sentence “That is, for fully specifying such content, it is important to
> specify the source language and/or script,” I realize that it doesn’t say
> “may.” In fact, it’s somewhat odd—as the source language of transliterated
> or otherwise transformed text is supposed to be indicated using existing
> methods for identifying a language. When you use, say, the tag ru-Latn, you
> are saying that the text is in Russian, and there is no need for
> additionally specifying “source language.”
> I’d suggest that the sentence and the sentence after it in the Introduction
> be changed thusly:
> “In order to fully specify such content, the transformation needs to be
> specified in addition to the language. This may require the identification
> of the source script, the target script, and the specific transformation.”

I changed the working copy to the following. I reworded a bit, because the
bcp47 tags already supply the target language

   In order to fully specify such content, the transformation needs to be
specified in addition to the language.

   This may require the identification of the source script or source
language, in addition to the main subtags in the language tag.

   It may also require the identification of the specific conventions used
by transformation, such as the rules used by a UNGEGN transliteration.

How does that look?

>  An example would be what can currently be denoted by ja-Latn-hepburn.My
>> understanding is that such cases are also supposed to be covered by -t.
>> How would such cases look? How much more time and effort (than for a
>> variant subtag) would be required for registration.
> (I assume that you mean “jp,” not “ja.”)
> As far as I can see, jp-Latin-hepburn as such is unambiguous, because the
> Hepburn system does not depend on “target” language (or language context, as
> I would say).

Agreed. For those mechanisms that are only used with a specific source
script, the -t- extension is not needed.

Note: The correct code would be "ja-Latn-hepburn", but that doesn't affect
your main point.

Type: variant
Subtag: hepburn
Description: Hepburn romanization
Added: 2009-10-01
Prefix: ja-Latn

> But in different countries, some modifications may be in use, or may have
> been in use.
> This raises an issue that doesn’t really fall under “minor proofreading
> nits” (sorry!). What does a subtag like “hepburn” really mean? A very
> specific system, or system with known variants, or loosely a set of systems
> that share some common properties? I think we need to be inclined into a
> loose meaning, one that can be further clarified using additional subtags.
> This would imply that you cannot be absolutely sure that a particular
> character in a text labelled as jp-Latin-hepburn can be unambiguously
> interpreted—you may need to look at possible additional subtags or to assume
> that some default variant of Hepburn is used.

Agreed. That is the whole design philosophy of BCP47; that additional
subtags can be used to get a higher degree of specificity -- where the more
specific information is known / needed. That is why we allow the mechanism
to have multiple subtags, so that a greater or lesser degree of specificity
can be used.

> I’m not aware of specifically language-dependent variants of Hepburn, for
> example, but I know that in Finnish, a national variant (e.g., with “š”
> instead of “sh”) has been recommended and used, though nowadays the global
> variant is more common. When the differences matter and need to be
> indicated, a particular named variant is needed, rather than destination
> language specifier.
> --
> Yucca,**jkorpela/ <>
> ______________________________**_________________
> Ltru mailing list