Re: [Ietf-languages] From Arabic to Latin

Doug Ewell <doug@ewellic.org> Mon, 15 January 2024 20:38 UTC

Return-Path: <doug@ewellic.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CD65EC14F682 for <ietf-languages@ietfa.amsl.com>; Mon, 15 Jan 2024 12:38:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.903
X-Spam-Level:
X-Spam-Status: No, score=-1.903 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_FAIL=0.001, SPF_HELO_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Pt1pI-FCJJ3s for <ietf-languages@ietfa.amsl.com>; Mon, 15 Jan 2024 12:38:56 -0800 (PST)
Received: from out.mail.icann.org (out.mail.icann.org [64.78.33.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0547DC14F61C for <ietf-languages@ietf.org>; Mon, 15 Jan 2024 12:38:55 -0800 (PST)
Received: from MBX112-W2-CO-2.pexch112.icann.org (10.226.41.130) by MBX112-W2-CO-2.pexch112.icann.org (10.226.41.130) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 15 Jan 2024 12:38:54 -0800
Received: from aesmt112-co-1-2.serverpod.net (10.224.74.76) by MBX112-W2-CO-2.pexch112.icann.org (10.226.41.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28 via Frontend Transport; Mon, 15 Jan 2024 12:38:54 -0800
Received: from aesc112-co-1-2.serverpod.net (aesc112-co-1-2.serverpod.net [10.224.76.91]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by aesmt112-co-1.serverpod.net (Postfix) with ESMTPS id BD062120003 for <ietf-languages@ex.icann.org>; Mon, 15 Jan 2024 12:38:54 -0800 (PST)
Received: from exmx112-co-1-2.serverpod.net (exmx112-co-1-2.serverpod.net [10.224.72.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by aesmt112-co-1.serverpod.net (Postfix) with ESMTPS id 97AAD120002 for <ietf-languages@ex.icann.org>; Mon, 15 Jan 2024 12:38:54 -0800 (PST)
Received: from pechora1.lax.icann.org (pechora1.icann.org [192.0.33.71]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by west.smtp.mx.icann.org (Postfix) with ESMTPS id 4C218140002 for <ietf-languages@ex.icann.org>; Mon, 15 Jan 2024 12:38:53 -0800 (PST)
Received: from NAM02-BN1-obe.outbound.protection.outlook.com (mail-bn1nam02on20601.outbound.protection.outlook.com [IPv6:2a01:111:f403:2407::601]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pechora1.lax.icann.org (Postfix) with ESMTPS id 43B87700032B for <ietf-languages@iana.org>; Mon, 15 Jan 2024 20:38:53 +0000 (UTC)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=h3WZY0FEntKujRa8lQitC75GERdFwIP/nNyQFNoFhig/AV00/aT7va3Hmb1vFDZ1MwUAllJt2FVWxsyyU/XkNubsl3OwOJWn8TZlt5dcaQBo9gQPD7H21/BOxnMpjDhJIP4kQ29sn5xaKtrmzeuRcnxrFP7DyzRuJKfwmjgBWnkQNjh8+9KdfzkHvabt0BgBLHisXv94W5LNb7A/df5Ry+KebJjIE5SSbGvU4iPAvvAKBr3Vc36z+L7LX7wuirlflhJFs601FPZZObX7ps28V6ID0fUNZKG8eRmZjNoDivCMOpCk2CJa2YbQtL4qajax1TbPeT3Wm0ELqWVAJ0klGg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=SvSoD1vHCzm9ggwlHrEOWINGy2AL1HSwfJPIKFGh80c=; b=GT3qKX+B/WSuUpHWKF5PidUiead4/X+ximrHdO6XBZRvn+JC5QxIofnvQooYlvVmAcOK1GPZqKQpOx4O6yy6jGEbC8S3F5CXIOw079/BipKneK07cpfCEnFkP8NLAyRsqE+cFQqb8be7/OwVqhaPtbL1wYePVe+lYi5tgRA2RVjTIQsmGqUyEVm5e8Yewek+UgSlq/ui1ufHFjXvIevdMgXVyxukpoqMAPBziQy4poY1n3WOfLJTDpzl4qaZNd3i6tWS6cNMD+jJi4RAKpfr+94JQejXlAB1KS2O/qkmDqc7K8CUvm6gmBByby6mo/mNTretdrqvJrhDKJysnr0ZGw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=ewellic.org; dmarc=pass action=none header.from=ewellic.org; dkim=pass header.d=ewellic.org; arc=none
Received: from SJ0PR03MB6598.namprd03.prod.outlook.com (2603:10b6:a03:38a::21) by SA3PR03MB7492.namprd03.prod.outlook.com (2603:10b6:806:399::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.17; Mon, 15 Jan 2024 20:38:29 +0000
Received: from SJ0PR03MB6598.namprd03.prod.outlook.com ([fe80::747c:c94a:1920:ba25]) by SJ0PR03MB6598.namprd03.prod.outlook.com ([fe80::747c:c94a:1920:ba25%5]) with mapi id 15.20.7181.020; Mon, 15 Jan 2024 20:38:28 +0000
From: Doug Ewell <doug@ewellic.org>
To: Hugh Paterson III <sil.linguist@gmail.com>, IETF Languages Discussion <ietf-languages@iana.org>
Thread-Topic: [Ietf-languages] From Arabic to Latin
Thread-Index: AQHaQey/vCgmc3DKFkadZEjFLVmhmrDbQBGA
Date: Mon, 15 Jan 2024 20:38:28 +0000
Message-ID: <SJ0PR03MB6598830E6D413F07E9E9381DCA6C2@SJ0PR03MB6598.namprd03.prod.outlook.com>
References: <CAE=3Ky-8Gt-AWP0xeincs8heb0aYHuh0KWjxynRP1922nTYiVA@mail.gmail.com>
In-Reply-To: <CAE=3Ky-8Gt-AWP0xeincs8heb0aYHuh0KWjxynRP1922nTYiVA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=ewellic.org;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: SJ0PR03MB6598:EE_|SA3PR03MB7492:EE_
x-ms-office365-filtering-correlation-id: d6972763-39ae-48ee-84f4-08dc1609ee48
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: Iq6P8ZS21L6pdAjqRq6xT0Gjz3mEoCEUPftRS7t8MBWIl/o2RxkdAkmPelwVaS1MAToQeh0yrjDGnwoIZvkluFVoE2xx04XynEJLTo9FYS/oD6rJd5Q+yoemQos7yYqMJezZbGacd2aOaxTy+R35kHDHcEeJMGlOQXlXizgzTPZHfM7f/ItmAv1c7e8SuJsa85FzzJz3mr/x8yI9IxD+dNF0Z/yj5eIi3+4iMWDg6U9zwzsSDlpkGjPKILpNU2GrKCHNOOGydEOLD3b6E7AgsyVhELD8iWHFEku3KwCLryFuzRTUsBj1Onf58YUwGx2pmsoOl2qBltUB0uBI+e9kAZC7SMHgeg8ecIFxeVHNyAHZuMfpxgb80wmp4YAxBlsCkcxQVm3vdNxz1MzA/0L0CH6lymkwmm5u3ftX9KdEb70mkE2nh3XgtIpTD4a3tRgbQYK+ZDdzpVxc9QBf2AAiOHWQn2QrJWOVEQsRwK9gRupC0UoceZ1bZJvg+eeJA1I35ACzJrIPfTc/r820426wEexE4Xhs3UZX7FfIYexElRmLYhrTt59yVOKB5C0tuVNnrKZ+ouwIZeRpBcH3qZHaTD6q9hWpVt+k9tbwpxGivwN5oTNcvkyDupNmk20oshUN
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SJ0PR03MB6598.namprd03.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39830400003)(136003)(346002)(396003)(366004)(376002)(230922051799003)(1800799012)(451199024)(186009)(64100799003)(2906002)(5660300002)(38070700009)(66899024)(41300700001)(83380400001)(71200400001)(26005)(33656002)(478600001)(7696005)(6506007)(9686003)(8676002)(110136005)(86362001)(122000001)(55016003)(38100700002)(8936002)(66476007)(66946007)(52536014)(64756008)(316002)(66446008)(76116006)(66556008); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: sIpE6twuvGWy4EwH1+Ndu6JZEajLg+SpafEqLXGhDUVNTx08FmHCw7U4EFdUFRGOei0YHmekJ/NJR79vXAb/rQATSOo40rqRjj8jt5Y7kz3Uh7u4sfnA6mgVm6eFogylKIqt/DIMiT04tvC1p8btOpHiVdxSkkbijkCcPc8MbwFhIY8V/Xv8FX2kX2z7QtDtSf5fU+s2zFC1faQSF7qWmS+e4fLrn+pTtLtU1li2JI2evWTCwcmkv63iaZAlokC4QRSIRKeUjqNgXxNKB0hrHKoXDaMJ1Wd5C6nCNha1mPMfncSypP/Yk/ORhXk4zX03oq9w3AIh2FFqxuYFAnHAEZxXfzqttSC8T3MG+pYAkxOF8D9pPS/R2BYQsXtTcpBzog4FMqBo5P/oY7ftSt6cAMFqKr1xohBIG0oE0pdZSK7ZqlwLiZh+HuPIMXi+lT/nFnTfQ0KrtZZSJpv9MaS79D7eSXr6FGma7u5GnIcI/ff1+jUrYtfjHoWmeMcXnhTvjlAqurYQ+z5691m41+PAcyGYm2IFiFjoGJU6Cdb/AHglO6QFqxVZTx7yYTEVTT30/kkTl6dVwf5XEXlusX/9lGOx4hvlSoN2ELDOinwSu/eYya1/ianYcZPk1qaF1i84nktltY03KOGNlkS9mzZld5qPvq92XpAFEowT1yZl+Q16A1EDcHT6/9ltMIDwuhxgEsTN2nJK82wY9d5BvfXH2oOuqPYrfv5w6ld7TR+cMpdL0yVezXowy7tYHa6jl0Vx2SM1xYND2Jl3P9KjBis2gvzwhBUwQbF3QW9EZthQUuI/202gtyb+UDXj/u3V20IYWsTZUrHylCmWt1qJyGfMWZtypKLUJRUSKPT4+Ec4nyItIEzaGu268+yxkwAtUmOa6fIv4jLmomiOTsSDc4J0XVT7pEvUS7UMOyAEoI2qGfaJwqY1sxHxbjfFV7qBFBaIEh6AGYztr0uICPhPnm0Rympv4AK2i9trGkfVO95l5xtqlssbmsh6fVU3mTaoQAC56SmJo6QU2rZTD8M4EuHsj+WhA/11VgXAkfFfgkf2g58CpC7Ig+fE0aSOmyHc96AvqiJvD5jVafGMpkGrYzjo+AQITian9iV7R7VIsTVSTzCgMXGhmdkPI7xgoxq5ftC9Ike81a1oTVVbMpFfM36D07WuBfBuYC6syLUHM0s+3tfHKKegu33ZKRXimWAzrdlEO0nq0CDC+5sIwKy2MvczVNL7jQpugAuZ/mNc3UZvF9lwQo060BRL2D7ei6Qdg03HBQ0+X9WAawEaYCIQv5GcrTgWj/vXolinP474g9vr7mn6r21JpSa6pOq7bymwXJNgjUjRTlsiQDtXBeRv9X16B8uVGfP/eYWGq/iXuKAKFEjvBYaPAq+YZSiVr/SO+fB9CTVw8TTMswZMRdDuwGFFPKecqdne3FRasaPB1jqMJ7dl4IuAnSyPkRvqoKZOggu8xpK39URslbrTxbrvYCqFPtFhl3t550j84zv+a2qEp29lSxNicVV2ksLD/n1lW5Ut4YMBIO+zhUTlNpjatu3H6S++DUvOSA2NeBTQUGWYf/k=
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: SJ0PR03MB6598.namprd03.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: d6972763-39ae-48ee-84f4-08dc1609ee48
X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jan 2024 20:38:28.7961 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: af914547-9fbe-40e1-a852-1a58e1f247dc
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 7gjRREJaYj2lmlP+1Cx1zvTxS5PC1TgNGwVgdAgkfWxbLDQWm4NbC680rDDSppjFsOLMXmVkZ9i8kjEgkejYrA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA3PR03MB7492
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=a9IjSGeF c=1 sm=1 tr=0 ts=65a597de a=4hrTe8xcFnigCrauA2tCjw==:117 a=4hrTe8xcFnigCrauA2tCjw==:17 a=xqWC_Br6kY4A:10 a=IkcTkHD0fZMA:10 a=dEuoMetlWLkA:10 a=NdtbqBCgZM0A:10 a=nORFd0-XAAAA:8 a=e-G8sqdFkiYQBOJDUVEA:9 a=QEXdDO2ut3YA:10 a=AYkXoqVYie-NGRFAsbO8:22
X-SOURCE-IP: 192.0.33.71
X-SPF-STATUS: hard_fail
X-SPF-FROM-STATUS: not_checked
X-RDNS-STATUS: pass
X-HELO-STRING: pechora1.lax.icann.org
Spam-Stopper-Id: f30398a8-1ca5-4366-bffe-59fa23837f00
Spam-Stopper-v2: Yes
X-Envelope-Mail-From: doug@ewellic.org
X-Spam-Category: None
X-AES-Category: LEGIT
X-AES-Processing-Results: eNrFVFFv2jAQ/isoz0sVCATo0yjttkrrVkGmqasm5NiXxMOxPdtRiSr++2wHCB1le5m0R9/3+e678/l7DgytQBtUyeCyFwyiwTCM+mF/lA6iy3hyORpeJNPBt+BNL9ASVStthJSgVpQ4eh5H8XSCJmEfo1E4jJMkzPIcwtE0R4N4Eo/zKOquYqHA3uI1Yy62pnKVU2ZAaRuNbAgjA4VQTUdq8ZUCKZRxtOeACSFXuAS8BuU0SKQ1EFfFlFQRiZRpWtyhV43Dex7qeSzYWqptWaMCUoXwmvLCJ25bihNCsiSehtlkhMNhEo9DlJFhGMN40McJGo8nuStWc5pTIPNOcvDl03yW3rz/vLhd3lw7DiKEGio4YjsaBdfD43eLETt0ypGD96ESccJaNY9B+uF2cd27ny3Sh97Vw/1suQwcB5QSai7I0SCfSmrgI9XGd1Ehg8u5qLnZDdUHgNxspLJN23IHCafIAn7WVMGsNuWelDXtgOduou1VN8CCiQyxr/+jdMbso70sepypG8yZ8p38q3+QCVnBwA3F+6d8ttueH625zGeMFhxIFwOeC4WBLGV+d3ZER6QZNi/kGKZPk6VM/z2ZJf2e7KiBg0bXmCypLvd/Azb2G9o97q4RUSHKbytrBloc9ngH2v/KmlR0AQ5IZc21v/N69O789F+hLV96yV7rXHADfv12QK1BndF4Ap1T4IZRK3avhAF8eGT7Udd2aZ8UNbakW3jP47rONFY0gz+wDCpaL7PkHzZnJ6oERLyt7Qu3gXdKVM5hiKiLt/AEjFF8IVThvZUy2/JcyMantJ7j5O+MZrvtrGp1aq+ny7v1y1JQ7hP4o7a2lImNP25/ATdmGKQ=
X-Spam-Reasons: None
X-Auto-Response-Suppress: DR, OOF, AutoReply
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/nxFcRVtwrZXInWNthlu2bns3-7U>
Subject: Re: [Ietf-languages] From Arabic to Latin
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jan 2024 20:38:59 -0000

I was waiting for Mark or someone else more closely connected with CLDR to respond, but here goes. Corrections and amplifications are of course welcome.

Hugh Paterson III wrote:

> My understanding is that RFC6[4]97 does not distinguish between
> "transliteration, transcription, and translation".

It does not, at least in part because the boundaries between these concepts can be hard to define. I recall Addison Phillips using the term “transmogrification” to observe this.

> However, if I wanted to specify the type of transliteration method
> used I would need to use [ar-t-ar-Latn].

The part after the -t- singleton is the language (etc.) being transformed FROM. So it would be "ar-Latn-t-ar". 

You could make it "ar-Latn-t-ar-Arab" instead, to call extra attention to the change in script from Arabic to Latin. Balancing this against the normal guidelines about honoring Suppress-Script, keeping tags short, etc. is left as an exercise.

> But this tag seems to be missing something. How would I construct a
> valid transliteration tag using the CLDR indication for the Buckwalter
> method?

That would be "ar-Latn-t-ar-m0-buckwalt". I trust you have read enough of RFC 6497 and the CLDR file "transform.xml" to be able to find the subtag 'buckwalt' and be able to construct this tag.

> Questions: 
> 1. How would I construct a valid transliteration tag from Arabic in
> Arabic script  to Arabic in Latin Script using the CLDR indication for
> the Buckwalter method?

As above.

> 2. When would it be appropriate to use [ar-t-ar-Latn]

recte: "ar-Latn-t-ar"

> instead of just [ar-Latn]? -- in my case I want to specify which
> transliteration system was used.

If all you need to say is “Arabic language, Latin script” then you are probably fine with just "ar-Latn" — the predominant use of the Arabic script for that language means that Latin-script content pretty much had to have been transliterated. The same would go for, say, "en-Arab".

If you need to specify a transliteration method, and that method is not one of the few which has its own variant subtag (e.g. 'hepburn'), then you must use the full syntax with the -t- extension.

> 3. What programmatic ways are there to determine if the -t- tag

“subtag” or “singleton”

> is used to mean translation instead of just transliteration? Does
> just checking for the same language tag on both sides of the -t- work
> for all use cases? (e.g., sameLangage-t-sameLangage = transliteration
> : sameLangage-t-differentLanguage = translation)

You would have to devise your own approach. The one you suggested might work. Extension T is not designed to support these distinctions, RFC 6497 does not discuss them, and frankly I doubt there are many processes which depend on them.

> 4. I am dealing with several different systems of transliteration
> within bibliographic data (in library and archival contexts). For
> example, Wikipedia outlines over 20 systems of transliterating Arabic
> script to Latin script. Is there a way to designate the specific
> transliteration method used? For example, I see Buckwalter Arabic
> transliteration system mentioned in CLDR [5] but I don't see the
> Library of Congress' transliteration table [6] or for that matter the
> BGN/PCGN 1956 System [7].

A quick scan of transform.xml will reveal 'alaloc' for the former (not to be confused with the 'alaloc' variant subtag) and 'bgn' for the latter.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org