Re: [Ietf-languages] From Arabic to Latin

Hugh Paterson III <sil.linguist@gmail.com> Mon, 15 January 2024 21:21 UTC

Return-Path: <sil.linguist@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C7172C14F61C for <ietf-languages@ietfa.amsl.com>; Mon, 15 Jan 2024 13:21:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.437
X-Spam-Level:
X-Spam-Status: No, score=-1.437 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O41ficKjzeiG for <ietf-languages@ietfa.amsl.com>; Mon, 15 Jan 2024 13:21:21 -0800 (PST)
Received: from out.mail.icann.org (out.mail.icann.org [64.78.33.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9BD6EC14F605 for <ietf-languages@ietf.org>; Mon, 15 Jan 2024 13:21:21 -0800 (PST)
Received: from MBX112-E2-CO-1.pexch112.icann.org (10.226.41.200) by MBX112-W2-CO-1.pexch112.icann.org (10.226.41.128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 15 Jan 2024 13:21:20 -0800
Received: from aesmt112-co-1-2.serverpod.net (10.224.74.76) by MBX112-E2-CO-1.pexch112.icann.org (10.226.41.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28 via Frontend Transport; Mon, 15 Jan 2024 13:21:20 -0800
Received: from aesc112-co-1-1.serverpod.net (aesc112-co-1-1.serverpod.net [10.224.76.90]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by aesmt112-co-1.serverpod.net (Postfix) with ESMTPS id 831D3120003 for <ietf-languages@ex.icann.org>; Mon, 15 Jan 2024 13:21:20 -0800 (PST)
Received: from exmx112-co-1-1.serverpod.net (exmx112-co-1-1.serverpod.net [10.224.72.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by aesmt112-co-1.serverpod.net (Postfix) with ESMTPS id 5D305120002 for <ietf-languages@ex.icann.org>; Mon, 15 Jan 2024 13:21:20 -0800 (PST)
Received: from pechora3.dc.icann.org (pechora3.icann.org [192.0.46.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by west.smtp.mx.icann.org (Postfix) with ESMTPS id C29F1180003 for <ietf-languages@ex.icann.org>; Mon, 15 Jan 2024 13:21:19 -0800 (PST)
Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id 729247000623 for <ietf-languages@iana.org>; Mon, 15 Jan 2024 21:21:18 +0000 (UTC)
Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-50e78f1f41fso9664905e87.2 for <ietf-languages@iana.org>; Mon, 15 Jan 2024 13:21:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705353677; x=1705958477; darn=iana.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=2RMXOFTMnDj+zCAWcrim+jJR89RBMh+J+wXaqzgoZXw=; b=UFuOghN+lnQUAjiEhOfTUupskNRJi55aMjwwZtka1ypFxmQl63/lXmjGLdUhDk6bmQ sB/WDBrFKFpzn+QNq17To003aaWgkjihq5aaC5sVdOnKMGfV3be4DQqDK1a/u4yameTi Y+qn2xYv0goyGcNA+E2OgegQGUCQCM+asxnBlKLnqoIo0AiVzWTw0GeJdKbQw7n9/YTS iZT4Vo8iSC3BpA0C6a/ng8OLPtEkE9zYvP8gx5MxY76HIeO9DsJvVMbPWgWQHJCA1T9g kjqw+MnWYW0js8EGfbGrVakCOsPASOvQALBdebdNU0EA8lrmcNWky7dG/kOPNYbUivY4 ptNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705353677; x=1705958477; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2RMXOFTMnDj+zCAWcrim+jJR89RBMh+J+wXaqzgoZXw=; b=FSZtafM+XUAKOwIajp/WAZPld9w2QJ/SnYnzn0bke9ipTBDqfUdSEIQ2P1uzuyh6R+ QjFM249LLTucUvHD8EtBux1z4Hpb+dDcZ8qS39FMHMPA99fP7F+kHhiP+II7n+96X9Jd qly/ka4ViBtoUgEDf9HlSDnhgowW5aGNTizlNv9jrid5/JBVWgqmBcERwPN8g0WYnyWB atmH9UvvKxgTAjO65gFe4vc2W2shuBT7xTOWPYooIDqJMKptgPXoOXpUVIlZB06HnlY1 1DQ4OFQvW/AM9JpvlLsYXrOpPZ8UiMcIHOcCyctOeCV7AO6mL/FLnm+lss5fK4V4rgtb ctKg==
X-Gm-Message-State: AOJu0YySytJ6M+5cVUTH4qSUgXybbfo/svSDoqKFHQAVGfhds0rl1b9R OOUYkIL1PE9CHPuDt53j/YvvOMV1PyfO+YAZHUYZcDYZ
X-Google-Smtp-Source: AGHT+IEjdXS4JUsLy5pTfpxuiH/iRKNWmIYhWWIcIFn02Dlb8NEuaxR35hw3PkmbTz8PnUJ41QS1MRWeawCnHsFr0bg=
X-Received: by 2002:a19:915d:0:b0:50e:74f3:9c5a with SMTP id y29-20020a19915d000000b0050e74f39c5amr2311242lfj.38.1705353676780; Mon, 15 Jan 2024 13:21:16 -0800 (PST)
MIME-Version: 1.0
References: <CAE=3Ky-8Gt-AWP0xeincs8heb0aYHuh0KWjxynRP1922nTYiVA@mail.gmail.com> <SJ0PR03MB6598830E6D413F07E9E9381DCA6C2@SJ0PR03MB6598.namprd03.prod.outlook.com>
In-Reply-To: <SJ0PR03MB6598830E6D413F07E9E9381DCA6C2@SJ0PR03MB6598.namprd03.prod.outlook.com>
From: Hugh Paterson III <sil.linguist@gmail.com>
Date: Mon, 15 Jan 2024 13:21:05 -0800
Message-ID: <CAE=3Ky_-QFoF7YMv-oujbNBNbe2Rh7nQx=1n3GQCJFBojbxtWw@mail.gmail.com>
To: Doug Ewell <doug@ewellic.org>
Cc: IETF Languages Discussion <ietf-languages@iana.org>
Content-Type: multipart/alternative; boundary="0000000000001ef8ff060f029800"
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=UvJwis8B c=1 sm=1 tr=0 ts=65a5a1cf a=Z2iVbzAMQWfC12katpY7Eg==:117 a=Z2iVbzAMQWfC12katpY7Eg==:17 a=xqWC_Br6kY4A:10 a=dEuoMetlWLkA:10 a=x7bEGLp0ZPQA:10 a=7MiK3HiER0sA:10 a=nORFd0-XAAAA:8 a=xIcxQ2-mdSSSYpEe7QwA:9 a=QEXdDO2ut3YA:10 a=e-G8sqdFkiYQBOJDUVEA:9 a=IAv-B7SKlLgcRYaR:21 a=lqcHg5cX4UMA:10 a=AYkXoqVYie-NGRFAsbO8:22 a=wwAePvBONnjDQaqHVNx2:22
X-SOURCE-IP: 192.0.46.73
X-SPF-STATUS: soft_fail
X-SPF-FROM-STATUS: not_checked
X-RDNS-STATUS: pass
X-HELO-STRING: pechora3.dc.icann.org
Spam-Stopper-Id: b6484cf2-2d12-4729-bc98-7c65c901a566
Spam-Stopper-v2: Yes
X-Envelope-Mail-From: sil.linguist@gmail.com
X-AES-Category: LEGIT
X-AES-Processing-Results: eNrFVO9v0zAQ/VeqfF6mNkt/7RNdN2ASg6kNQmNClWNfElPHDrYjFk393zk7bdPRFb4gIeVL7j3fvTuf33NgeQnGkrIKLntB1I/isD8IB8MkGly6r38eD4dfg7NeYCpSroxVVQV6xZmjp6N4EtMsCiM2iMJ4HE3DlE4n4ZiOhnTaH5DhaNQdpUoDnpK1EC625tUq48KCNhjtY4gSC7nSTUdq8ZWGSmnraM+BUKpa0QLoGrTTUBFjgLkqtuCaVUTbpsUdetU4vOehnseCDVKxZUNySDShay5zn7htiWaMRuNxHE7SwTSMCUvDaQr9MIsmU7ggF2wyIq5YLXnGgc07ycHnj/NZcvPu0+J2eXPtOIQxbrmSRGxpHFwPj98QYzh0LomDd6GCSCZaNY9B8v52cd27ny2Sh97Vw/1suQwcB7RWeq7YwSB/FtzCB26s76IklhZzVUu7HaoPALt5qjQ2jeX2Eo6RBfyouYZZbYsdKW3aAc/dRNujboC5UCkRX/5H6VTgpb0sepipG8yJ8p38q3+QiaBgkJbT3VU+47ZnB2teZTPBcwmsi4HMlKbAllV2d3JEB6QZtS/kWGGOkyXC/D0Zkn5PdtDAXqNrrCq4KXZvA57wGeIed8eYKgmXtyWagVH7Pd6C+F5Fk6guIIHotLn2Z16P3p2e/iu05Usv2WmdK2nBr98WqA3oExqPoFMK3DBqLe61skD3l4wPdY1L+1NziyXdwnueNHVqqOYp/IFlSd56GZK/Y85OVAGEeVvbFW4Db7UqncMYLs6dQ9S4s29ynIc4p4iceQQbn6uq8YnReVwTW7vZbDrDWh2b7PEKb/zK5Fz6BP7XoDml6sn/bn4BPHUbFg==
X-Spam-Category: None
X-Spam-Reasons: None
X-Auto-Response-Suppress: DR, OOF, AutoReply
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/AXxSpanHtyVmyCMcxWa-0DGVxgM>
Subject: Re: [Ietf-languages] From Arabic to Latin
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jan 2024 21:21:26 -0000

Thank you, I see you mention the sub-tag registry for some of these
transformations/transliterations.  I’ll have a closer look.  It seems that
it is possible to define some directly as IANA sub tags and others in the
CLDR registry list.  I hadn’t noticed that before.

All the best,
-Hugh

Sent from my iPhone


On Mon, Jan 15, 2024 at 12:38 PM Doug Ewell <doug@ewellic.org> wrote:

> I was waiting for Mark or someone else more closely connected with CLDR to
> respond, but here goes. Corrections and amplifications are of course
> welcome.
>
> Hugh Paterson III wrote:
>
> > My understanding is that RFC6[4]97 does not distinguish between
> > "transliteration, transcription, and translation".
>
> It does not, at least in part because the boundaries between these
> concepts can be hard to define. I recall Addison Phillips using the term
> “transmogrification” to observe this.
>
> > However, if I wanted to specify the type of transliteration method
> > used I would need to use [ar-t-ar-Latn].
>
> The part after the -t- singleton is the language (etc.) being transformed
> FROM. So it would be "ar-Latn-t-ar".
>
> You could make it "ar-Latn-t-ar-Arab" instead, to call extra attention to
> the change in script from Arabic to Latin. Balancing this against the
> normal guidelines about honoring Suppress-Script, keeping tags short, etc.
> is left as an exercise.
>
> > But this tag seems to be missing something. How would I construct a
> > valid transliteration tag using the CLDR indication for the Buckwalter
> > method?
>
> That would be "ar-Latn-t-ar-m0-buckwalt". I trust you have read enough of
> RFC 6497 and the CLDR file "transform.xml" to be able to find the subtag
> 'buckwalt' and be able to construct this tag.
>
> > Questions:
> > 1. How would I construct a valid transliteration tag from Arabic in
> > Arabic script  to Arabic in Latin Script using the CLDR indication for
> > the Buckwalter method?
>
> As above.
>
> > 2. When would it be appropriate to use [ar-t-ar-Latn]
>
> recte: "ar-Latn-t-ar"
>
> > instead of just [ar-Latn]? -- in my case I want to specify which
> > transliteration system was used.
>
> If all you need to say is “Arabic language, Latin script” then you are
> probably fine with just "ar-Latn" — the predominant use of the Arabic
> script for that language means that Latin-script content pretty much had to
> have been transliterated. The same would go for, say, "en-Arab".
>
> If you need to specify a transliteration method, and that method is not
> one of the few which has its own variant subtag (e.g. 'hepburn'), then you
> must use the full syntax with the -t- extension.
>
> > 3. What programmatic ways are there to determine if the -t- tag
>
> “subtag” or “singleton”
>
> > is used to mean translation instead of just transliteration? Does
> > just checking for the same language tag on both sides of the -t- work
> > for all use cases? (e.g., sameLangage-t-sameLangage = transliteration
> > : sameLangage-t-differentLanguage = translation)
>
> You would have to devise your own approach. The one you suggested might
> work. Extension T is not designed to support these distinctions, RFC 6497
> does not discuss them, and frankly I doubt there are many processes which
> depend on them.
>
> > 4. I am dealing with several different systems of transliteration
> > within bibliographic data (in library and archival contexts). For
> > example, Wikipedia outlines over 20 systems of transliterating Arabic
> > script to Latin script. Is there a way to designate the specific
> > transliteration method used? For example, I see Buckwalter Arabic
> > transliteration system mentioned in CLDR [5] but I don't see the
> > Library of Congress' transliteration table [6] or for that matter the
> > BGN/PCGN 1956 System [7].
>
> A quick scan of transform.xml will reveal 'alaloc' for the former (not to
> be confused with the 'alaloc' variant subtag) and 'bgn' for the latter.
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>
>