Re: [Ltru] Minor proofreading nits again

"Jukka K. Korpela" <jkorpela@cs.tut.fi> Mon, 18 July 2011 09:39 UTC

Return-Path: <jkorpela@cs.tut.fi>
X-Original-To: ltru@ietfa.amsl.com
Delivered-To: ltru@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E19C221F8B3A for <ltru@ietfa.amsl.com>; Mon, 18 Jul 2011 02:39:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.099
X-Spam-Level:
X-Spam-Status: No, score=-4.099 tagged_above=-999 required=5 tests=[AWL=-0.500, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dLzgYvyWBXCw for <ltru@ietfa.amsl.com>; Mon, 18 Jul 2011 02:39:15 -0700 (PDT)
Received: from mail.cs.tut.fi (mail.cs.tut.fi [130.230.4.42]) by ietfa.amsl.com (Postfix) with ESMTP id EE91621F8B17 for <ltru@ietf.org>; Mon, 18 Jul 2011 02:39:13 -0700 (PDT)
Received: from amavis1.cs.tut.fi (amavis1.cs.tut.fi [130.230.4.69]) by mail.cs.tut.fi (Postfix) with ESMTP id E1FFFE17 for <ltru@ietf.org>; Mon, 18 Jul 2011 12:39:11 +0300 (EEST)
Received: from mail.cs.tut.fi ([130.230.4.42]) by amavis1.cs.tut.fi (amavis1.cs.tut.fi [130.230.4.69]) (amavisd-maia, port 10024) with ESMTP id 23618-38 for <ltru@ietf.org>; Mon, 18 Jul 2011 12:39:11 +0300 (EEST)
Received: from [10.0.0.8] (a91-152-110-148.elisa-laajakaista.fi [91.152.110.148]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cs.tut.fi (Postfix) with ESMTP id E434BE15 for <ltru@ietf.org>; Mon, 18 Jul 2011 12:39:10 +0300 (EEST)
Message-ID: <4E23FF3E.5090008@cs.tut.fi>
Date: Mon, 18 Jul 2011 12:39:10 +0300
From: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; fi; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11
MIME-Version: 1.0
To: ltru@ietf.org
References: <SNT142-w47E796198D72F223478656B3470@phx.gbl> <4E1E9857.1090209@cs.tut.fi> <4E23F565.2040606@it.aoyama.ac.jp>
In-Reply-To: <4E23F565.2040606@it.aoyama.ac.jp>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: Maia Mailguard 1.0.2
Subject: Re: [Ltru] Minor proofreading nits again
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Jul 2011 09:39:20 -0000

18.07.2011 11:57, "Martin J. Dürst" wrote:

> There are certainly cases where there's more than the source and target
> language and script involved. But on the other hand, there are also
> cases where there's not really a target language.

Yes; I was writing about what translation _may_ depend on. Now that I 
read the sentence “That is, for fully specifying such content, it is 
important to specify the source language and/or script,” I realize that 
it doesn’t say “may.” In fact, it’s somewhat odd—as the source language 
of transliterated or otherwise transformed text is supposed to be 
indicated using existing methods for identifying a language. When you 
use, say, the tag ru-Latn, you are saying that the text is in Russian, 
and there is no need for additionally specifying “source language.”

I’d suggest that the sentence and the sentence after it in the 
Introduction be changed thusly:

“In order to fully specify such content, the transformation needs to be 
specified in addition to the language. This may require the 
identification of the source script, the target script, and the specific 
transformation.”

> An example would be what can currently be denoted by ja-Latn-hepburn.My
> understanding is that such cases are also supposed to be covered by -t.
> How would such cases look? How much more time and effort (than for a
> variant subtag) would be required for registration.

(I assume that you mean “jp,” not “ja.”)
As far as I can see, jp-Latin-hepburn as such is unambiguous, because 
the Hepburn system does not depend on “target” language (or language 
context, as I would say). But in different countries, some modifications 
may be in use, or may have been in use.

This raises an issue that doesn’t really fall under “minor proofreading 
nits” (sorry!). What does a subtag like “hepburn” really mean? A very 
specific system, or system with known variants, or loosely a set of 
systems that share some common properties? I think we need to be 
inclined into a loose meaning, one that can be further clarified using 
additional subtags. This would imply that you cannot be absolutely sure 
that a particular character in a text labelled as jp-Latin-hepburn can 
be unambiguously interpreted—you may need to look at possible additional 
subtags or to assume that some default variant of Hepburn is used.

I’m not aware of specifically language-dependent variants of Hepburn, 
for example, but I know that in Finnish, a national variant (e.g., with 
“š” instead of “sh”) has been recommended and used, though nowadays the 
global variant is more common. When the differences matter and need to 
be indicated, a particular named variant is needed, rather than 
destination language specifier.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/