[Ietf-languages] From Arabic to Latin

Hugh Paterson III <sil.linguist@gmail.com> Mon, 08 January 2024 04:39 UTC

Return-Path: <sil.linguist@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5735C14F690 for <ietf-languages@ietfa.amsl.com>; Sun, 7 Jan 2024 20:39:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.428
X-Spam-Level:
X-Spam-Status: No, score=-6.428 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id deN3BjRbP8jz for <ietf-languages@ietfa.amsl.com>; Sun, 7 Jan 2024 20:39:54 -0800 (PST)
Received: from out.mail.icann.org (out.mail.icann.org [64.78.33.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6CC9BC14F61D for <ietf-languages@ietf.org>; Sun, 7 Jan 2024 20:39:54 -0800 (PST)
Received: from MBX112-W2-CO-1.pexch112.icann.org (10.226.41.128) by MBX112-E2-CO-1.pexch112.icann.org (10.226.41.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Sun, 7 Jan 2024 20:39:53 -0800
Received: from aesmt112-va-1-2.serverpod.net (10.216.74.35) by MBX112-W2-CO-1.pexch112.icann.org (10.226.41.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28 via Frontend Transport; Sun, 7 Jan 2024 20:39:52 -0800
Received: from aesc112-va-1-1.serverpod.net (aesc112-va-1-1.serverpod.net [10.216.76.34]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aesmt112-va-1.serverpod.net (Postfix) with ESMTPS id A145A60002 for <ietf-languages@ex.icann.org>; Sun, 7 Jan 2024 20:39:52 -0800 (PST)
Received: from exmx112-va-1-1.serverpod.net (exmx112-va-1-1.serverpod.net [10.216.72.34]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aesmt112-va-1.serverpod.net (Postfix) with ESMTPS id 69D8DA0003 for <ietf-languages@ex.icann.org>; Sun, 7 Jan 2024 20:39:52 -0800 (PST)
Received: from pechora3.dc.icann.org (pechora3.icann.org [192.0.46.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by east.smtp.mx.icann.org (Postfix) with ESMTPS id 37FFD80002 for <ietf-languages@ex.icann.org>; Sun, 7 Jan 2024 20:39:52 -0800 (PST)
Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id E28057000684 for <ietf-languages@iana.org>; Mon, 8 Jan 2024 04:39:51 +0000 (UTC)
Received: by mail-ej1-x62c.google.com with SMTP id a640c23a62f3a-a28e31563ebso130051366b.2 for <ietf-languages@iana.org>; Sun, 07 Jan 2024 20:39:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704688771; x=1705293571; darn=iana.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=XWb+jzSI7kYeizjaomDsdItl7WsHpiHAGgHvkKy+MEs=; b=kEJtKJgyNFksAxRYXTobmEssIhZfEDlwFiErxeDOOTxnPqtapGqDzj5gDBOtxvLhS3 bOybN0CLMWX28sKb9gO6JcQomsTXzWLjra+1NJl+HzLqd6rz0HlOH0Wp0XsMpbbb0Ln0 /AxIIZK0RZpT+Jh6Yd8OxU5r62LhbttS9TgMG3oOLNtb8YzGun5cQWNEByeBLg4iIjTn PHjTc2UQyAcJ0dbK2WW4npMCJxzRIeAPl6OiPVxo8lT1MhYdTkfby+cz9BPqRG/J9eq5 WmxYtHoOljvIr2IMuB94a1AQiRoxAHWcHY1pjOafflr+TeFjRpD7E7v9HWubb+k5FwaA sfmQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704688771; x=1705293571; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=XWb+jzSI7kYeizjaomDsdItl7WsHpiHAGgHvkKy+MEs=; b=DLatxQOylmS4i0LPS333t9X5G35mvAlkSvB/fVbFFfD0PuQ/YXAaPQYpIgQkywRqNp c20YqqGQ2j89ZVmsmIfwCdcVsu44udenOJKuV8z/j+uZRVJ7MNSK+keg9mFljICd+BOl H861t3apasycqaPAYwusCR3w4/Vy52ro86Bff50BhwLU0empQ/ltHsMaOMZIBfrV/Oug ntYGfnnkzVO8r5f6fm5T7qYPn2Iu9zK/mFblXa+XSmMNQBz/mkF2nbGZmEEOS8G37/f0 kWjiEdVUHlQrzXkyQ08AQ1i9iPVb+BnyGdaDwKrkP5kJMDT3dGUSdlvsRfwqAuUXFeX8 yGXw==
X-Gm-Message-State: AOJu0Yz+wDUDW5fvgZbVssd+G4HR491ujwwTMTPQ6gj/JMvF4hGmpSDS 7VOoZpFxbj5PEdNF5ZYsJ/qW4/mqKW4y+fu2QLUcSpg7bjM=
X-Google-Smtp-Source: AGHT+IG5zwot+YpCmh4NORYVy+pS6TlYejdTcIM4VcIh2KzDX/M7hKCyXkW2ATqZMxfhr4eYCDp6QYvmIN6snjAQqdU=
X-Received: by 2002:a17:906:184e:b0:a29:1e55:92f7 with SMTP id w14-20020a170906184e00b00a291e5592f7mr1156341eje.126.1704688770197; Sun, 07 Jan 2024 20:39:30 -0800 (PST)
MIME-Version: 1.0
From: Hugh Paterson III <sil.linguist@gmail.com>
Date: Sun, 07 Jan 2024 23:39:18 -0500
Message-ID: <CAE=3Ky-8Gt-AWP0xeincs8heb0aYHuh0KWjxynRP1922nTYiVA@mail.gmail.com>
To: IETF Languages Discussion <ietf-languages@iana.org>
Content-Type: multipart/alternative; boundary="00000000000099ad68060e67c828"
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=Gt5RR25C c=1 sm=1 tr=0 ts=659b7c98 a=Z2iVbzAMQWfC12katpY7Eg==:117 a=Z2iVbzAMQWfC12katpY7Eg==:17 a=xqWC_Br6kY4A:10 a=dEuoMetlWLkA:10 a=x7bEGLp0ZPQA:10 a=7MiK3HiER0sA:10 a=S4xm-A6YAAAA:8 a=48vgC7mUAAAA:8 a=8pif782wAAAA:8 a=NEAV23lmAAAA:8 a=dMZebQ6EAAAA:8 a=jU346j_QAAAA:8 a=0AjVP4BO_v5mwSD812YA:9 a=QEXdDO2ut3YA:10 a=NDeZ9B8Ixz5eDOXyvpUA:9 a=lEQ2aaRLvdL-hd7N:21 a=RMW-xzXD2FULRQJgPDaH:22 a=w1C3t2QeGrPiZgrLijVG:22 a=y23V0cLlet-LDF5r07ca:22 a=vexicnJHVZC5-rvN4z9Y:22
X-SOURCE-IP: 192.0.46.73
X-SPF-STATUS: soft_fail
X-SPF-FROM-STATUS: not_checked
X-RDNS-STATUS: pass
X-HELO-STRING: pechora3.dc.icann.org
Spam-Stopper-Id: 1361488c-5c2f-490b-b69b-a85e11e7206b
Spam-Stopper-v2: Yes
X-Envelope-Mail-From: sil.linguist@gmail.com
X-AES-Processing-Results: eNrFVF1v0zAU/StVnpcpTdrS7okuGzCJwdQGoTGhyh83ialjB9vRFk3979hO23R0hRckHn3P8b3nHl/f58CwCrRBVR1cDII4ikdhNAyjaRaNLpLZxTg+Hyezb8HZINA1qlbayLoGtWLU0YfJZDiaTkk4JnEejmYRDvFkhkM0HcNwCG/iaIL7q0QqsLdEw7mLrVm9yhk3oLSNRjZEkIFCqrYndfhKQS2VcbTngEtZr0gJZA3KaaiR1kBdFVMyRWukTNvhDr1sHT7w0MBjwcZSbcsaFZApRNZMFD5x1xKmNE9GKA4jmKFwlCTTEOPJMEwAx3gcD+NonLtijWA5A5r2koMvn9J5dv3+8+JmeX3lOIhSZpgUiG9pDFwPD98tRq3pTCAH70IlEpR3ah6C7MPN4mpwN19k94PL+7v5chk4DiglVSrpgZGPJTPwkWnju6iQIWUqG2G2pvoA0OunWtmmbbm9hGNkAT8bpmDemHJHwm1ncOoc7a46AwsuMeJf/0dpzO2jvSx6mKk35kT5Xv7lP8iErGAQhpHdUz7bac8PxrzO55wVAmgfA5FLRYAu6/z2pEUHpDkxL+QYro+TZVz/PZkl/Z7soIG9RtdYXTJd7v4GPNlvaOe4v0ZlhZi4qewy0HI/x1vQ/lfeZrIPCEAKt1f+zuvR29Puv0JbvtwlO62pFAb8+G2BRoM6ofEIOqXAmdEofqekAbJ/ZPtR13ZoHxUztqQbeM8TusGaKIbhDyyDim6XWfIPm7MXVQKifq3tCneBd0pWbsNoxs/dhmjszL4trB/8nFjkzCO28VTWrU9sN49rYrtuNpt+Ya2Ol+zxCG/8yBRM+AT+qO1ywvLJHze/AO/1Gss=
X-AES-Category: LEGIT
X-Spam-Category: None
X-Spam-Reasons: None
X-Auto-Response-Suppress: DR, OOF, AutoReply
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/RFG-dqgovmfIUEXm0HLpX30_H2M>
Subject: [Ietf-languages] From Arabic to Latin
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Jan 2024 04:39:58 -0000

Greetings,

I am continuing my work with Latin text tagging. We have some cases where
texts are translations from other languages.

My understanding is that RFC6497 [1] adds a subtag -t to BCP-47. If I read
RFC6497 correctly, then [la-t-ar] would mean a Latin text transformed from
Arabic. My understanding is that RFC697 does not distinguish between
"transliteration, transcription, and translation". As I am dealing with
both ancient and modern texts one possible language tag for the first known
Latin translation of the Qur'an [into Medieval Latin by Robert of Ketton
(c. 1110 – 1160 AD)] *Lex Mahumet pseudoprophete* would be [la-t-ar]. (If
I'm not applying RFC697 correctly please comment!).

Now when applying language tags to modern contexts the website [
https://quran411.com/side-by-side-view?sn=1] provides the Qur'an in
Arabic Script on the right, an English Translation in the middle, and a
romanized transliteration on the left side. My understanding is that
RFC6497 would not necessarily be needed and the language content could be
accurately indicated in the following way:  [ar-Latn::en::ar]. However, if
I wanted to specify the type of transliteration method used I would need
to use [ar-t-ar-Latn]. But this tag seems to be missing something. How
would I construct a valid transliteration tag using the CLDR indication for
the Buckwalter method?

Questions:
1. How would I construct a valid transliteration tag from Arabic in
Arabic script  to Arabic in Latin Script using the CLDR indication for the
Buckwalter method?
2. When would it be appropriate to use [ar-t-ar-Latn] instead of just
[ar-Latn]? -- in my case I want to specify which transliteration system was
used.
3. What programmatic ways are there to determine if the -t- tag is used to
mean translation instead of just transliteration? Does just checking for
the same language tag on both sides of the -t- work for all use cases?
(e.g., sameLangage-t-sameLangage = transliteration :
sameLangage-t-differentLanguage = translation)
4. I am dealing with several different systems of transliteration within
bibliographic data (in library and archival contexts). For example,
Wikipedia outlines over 20 systems of transliterating Arabic script to
Latin script. Is there a way to designate the specific transliteration
method used? For example, I see Buckwalter Arabic transliteration system
mentioned in CLDR [5] but I don't see the Library of Congress'
transliteration table [6] or for that matter the BGN/PCGN 1956 System [7].

Kind Regards,
- Hugh Paterson III

[1]. https://www.ietf.org/rfc/rfc6497.txt
[2]. https://en.wikipedia.org/wiki/Lex_Mahumet_pseudoprophete
[3]. https://quran411.com/side-by-side-view?sn=1
[4]. https://en.wikipedia.org/wiki/Romanization_of_Arabic
[5].
https://github.com/unicode-org/cldr/blob/maint/maint-41/common/bcp47/transform.xml
[6]. https://www.loc.gov/catdir/cpso/romanization/arabic.pdf
[7].
https://assets.publishing.service.gov.uk/media/637df1aae90e076b73e074c7/ROMANIZATION_OF_ARABIC_-_Nov_22.pdf