Re: [Ietf-languages] From Arabic to Latin

Mark Davis ☕️ <mark@macchiato.com> Tue, 16 January 2024 22:47 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EB410C15152C for <ietf-languages@ietfa.amsl.com>; Tue, 16 Jan 2024 14:47:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.729
X-Spam-Level:
X-Spam-Status: No, score=-0.729 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail-com.20230601.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n36Bf8v2bWeN for <ietf-languages@ietfa.amsl.com>; Tue, 16 Jan 2024 14:47:48 -0800 (PST)
Received: from out.mail.icann.org (out.mail.icann.org [64.78.33.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4D5CFC15107F for <ietf-languages@ietf.org>; Tue, 16 Jan 2024 14:47:48 -0800 (PST)
Received: from MBX112-W2-CO-2.pexch112.icann.org (10.226.41.130) by MBX112-W2-CO-2.pexch112.icann.org (10.226.41.130) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 16 Jan 2024 14:47:47 -0800
Received: from aesmt112-va-1-1.serverpod.net (10.216.74.34) by MBX112-W2-CO-2.pexch112.icann.org (10.226.41.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28 via Frontend Transport; Tue, 16 Jan 2024 14:47:47 -0800
Received: from aesc112-va-1-1.serverpod.net (aesc112-va-1-1.serverpod.net [10.216.76.34]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aesmt112-va-1.serverpod.net (Postfix) with ESMTPS id E93D8A0003 for <ietf-languages@ex.icann.org>; Tue, 16 Jan 2024 14:47:46 -0800 (PST)
Received: from exmx112-va-1-1.serverpod.net (exmx112-va-1-1.serverpod.net [10.216.72.34]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aesmt112-va-1.serverpod.net (Postfix) with ESMTPS id C381E60002 for <ietf-languages@ex.icann.org>; Tue, 16 Jan 2024 14:47:46 -0800 (PST)
Received: from pechora3.dc.icann.org (pechora3.icann.org [192.0.46.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by east.smtp.mx.icann.org (Postfix) with ESMTPS id 8B02580002 for <ietf-languages@ex.icann.org>; Tue, 16 Jan 2024 14:47:46 -0800 (PST)
Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id 572E17000639 for <ietf-languages@iana.org>; Tue, 16 Jan 2024 22:47:46 +0000 (UTC)
Received: by mail-pj1-x102d.google.com with SMTP id 98e67ed59e1d1-28bec6ae0ffso6332170a91.3 for <ietf-languages@iana.org>; Tue, 16 Jan 2024 14:47:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail-com.20230601.gappssmtp.com; s=20230601; t=1705445265; x=1706050065; darn=iana.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=tVy+jHDxql8jsV5zlHmaMzl0J0fSaOb/Xug4QbualK8=; b=KwQftvzq54CjYXG0gSPYqWoRyus+tWzwwWT5T8Gt5TVelDD7A89XY9QejTWj31JUVH DlnaCb1XQwqOiMLbXW3GAZ5jz40mKCja+xqeoGHFvWgT94rKLzUYgJeOsOhh3xnzzh7e 4k4Tb0XaRdXfjckgncxcUXPVcds1QQXY+TQDyX4ehGNSwLuiKeYXMWF7rBL84mYHgMTc HBvQOMHIgiVc+nuD4FP9f9IjlUmpioGDGW9m42bsHZbWX3aqVSGGmSaddXnDFtxZuS6s KqjX73c7w/WeShd+sxhYi8hwvusarQ4fz4JLyLHSnV3HBUK+1lcoSxUbMmVKN0NAaY6j qi8w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705445265; x=1706050065; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tVy+jHDxql8jsV5zlHmaMzl0J0fSaOb/Xug4QbualK8=; b=WS4EJrt0Um6a5HBd13Ct1QwkQuj+z+IXDZ95dyIh/PhAHkmfwkcSkmgZhUgdGEwZMj Ytn1RRTRbVH+6bWY3KRXn7rvrSP0LE+Gr1Qf9HQlHR1Uar/PLpiEvblMY89tIZp4RX82 ByJPQgRodCRMSBIulPZh/Y16br0IcRq/lXJmvvwsqJ0fvq+ubkY1eikmW9zyFeh9NNbo ceeGvEJsm0bMsJRGUJXP1R4+mE27jf0l5qi8UgAwMVUzHAPJrN7aLdJYT2SeVUcYMVDs W4QiaPRsfDdXqhq3rVA+A1q80T5tDgMQSUTvHCS5BiS+rlNyX1lf/JwMDc5eI8Uqfb5V Mp0g==
X-Gm-Message-State: AOJu0Yxhyn/hnk/hktWQnf3Hzya/Q6F+0Yr4DclYzm8Ux9YyQGlfchhV qMs8aXKYptovYlgvrM/iU47esPCvyCwenImBOro=
X-Google-Smtp-Source: AGHT+IE8e1aKIs4qY9ySrq+7rf41Zl9boOUBJPnT4Ew9JjKDVTne8zYvki9yqWiFf3lOBOFIZBqy1ndsjXNVrcJWJvs=
X-Received: by 2002:a17:90b:247:b0:28f:f144:86b8 with SMTP id fz7-20020a17090b024700b0028ff14486b8mr346421pjb.81.1705445265291; Tue, 16 Jan 2024 14:47:45 -0800 (PST)
MIME-Version: 1.0
References: <CAE=3Ky-8Gt-AWP0xeincs8heb0aYHuh0KWjxynRP1922nTYiVA@mail.gmail.com> <SJ0PR03MB6598830E6D413F07E9E9381DCA6C2@SJ0PR03MB6598.namprd03.prod.outlook.com>
In-Reply-To: <SJ0PR03MB6598830E6D413F07E9E9381DCA6C2@SJ0PR03MB6598.namprd03.prod.outlook.com>
From: Mark Davis ☕️ <mark@macchiato.com>
Date: Tue, 16 Jan 2024 14:47:33 -0800
Message-ID: <CAJ2xs_HRCap1UoECMLyPcVcrnLmw7e7R34z_ChCSph87rRDDag@mail.gmail.com>
To: Doug Ewell <doug@ewellic.org>
Cc: Hugh Paterson III <sil.linguist@gmail.com>, IETF Languages Discussion <ietf-languages@iana.org>
Content-Type: multipart/alternative; boundary="00000000000038bde1060f17eb5c"
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=Gt5RR25C c=1 sm=1 tr=0 ts=65a70792 a=Z2iVbzAMQWfC12katpY7Eg==:117 a=Z2iVbzAMQWfC12katpY7Eg==:17 a=OBCh2BWVGUryTzm1J6akcm4hsXU=:19 a=xqWC_Br6kY4A:10 a=dEuoMetlWLkA:10 a=M51BFTxLslgA:10 a=A4EqBspgoKYA:10 a=NEAV23lmAAAA:8 a=nORFd0-XAAAA:8 a=48vgC7mUAAAA:8 a=xIcxQ2-mdSSSYpEe7QwA:9 a=QEXdDO2ut3YA:10 a=tWrj5ZuVAlfT_A0RapUA:9 a=Ca5O3xTtG9pCeLfb:21 a=lqcHg5cX4UMA:10 a=AYkXoqVYie-NGRFAsbO8:22 a=w1C3t2QeGrPiZgrLijVG:22
X-SOURCE-IP: 192.0.46.73
X-SPF-STATUS: soft_fail
X-SPF-FROM-STATUS: not_checked
X-RDNS-STATUS: pass
X-HELO-STRING: pechora3.dc.icann.org
X-DMARC-STATUS: absent
Authentication-Results: east.smtp.mx.icann.org; iprev=pass ip=192.0.46.73; spf=soft_fail client-ip=192.0.46.73 smtp.mailfrom=mark.edward.davis@gmail.com; dkim=pass (2048-bit key) header.i=@gmail-com.20230601.gappssmtp.com; dmarc=absent header.from=macchiato.com
Spam-Stopper-Id: d26c8aa2-e3d4-4316-b992-0a339358c80c
Spam-Stopper-v2: Yes
X-Envelope-Mail-From: mark.edward.davis@gmail.com
X-AES-Category: LEGIT
X-Spam-Reasons: None
X-AES-Processing-Results: eNrFVFFv0zAQ/itVnsmUpm2a7okuGzCJwdQGoTGhyrEvialjG9vRGk3979hJ23R0hRckHn3f57vvzufv2TO0Am1QJb3LgRcG4dgPhv4wSsPwcjy9HEcXcTz75r0ZeFqiaqWNkBLUihJHJ2GEY4RCH0Zk7I9Hw8jPZrPQD9BoNBtNYhwHuL+KhQJ7i9eMudiaylVOmQGlbTSwIYwMFEI1PanDVwqkUMbRnj0mhFzhEvAalNMgkdZAXBVTUkUkUqbpcIdeNQ4ftNCgxbytpdqWNSogVQivKS/axF1L0TSASTzL/Sgf5v6YoKmPUET8aTyZAGQoD0YzV6zmNKdAkl6y9+VTMk9v3n9e3C5vrh0HEUINFRyxHY2C6+Hxu8WIHTrlyMH7UIk4YZ2aRy/9cLu4HtzPF+nD4Orhfr5ceo4DSgmVCHI0yKeSGvhItWm7qJDBZSJqbnZDbQNAbjZS2aZtuYOEU2QBP2uqYF6bck/Kmm7AiZtod9UNsGAiQ+zr/yidMftoL4seZ+oHc6Z8L//qH2RCVjBwQ/H+KZ/ttudHay7zOaMFB9LHgOdCYSBLmd+dHdERaY7NCzmG6dNkKdN/T2ZJvyc7auCg0TUmS6rL/d+Ajf2Gdo/7a0RUiPLbypqBFoc93oH2v7ImFX2AA1JZc93eeT16d376r9CWL71krzUR3EC7fjug1qDOaDyBzilww6gVu1fCAD48sv2oa7u0T4oaW9ItfMvjus40VjSDP7AMKjovs+QfNmcvqgREWlvbF+4C75SonMNUSK3fVgjjkiIjLrCNuhWjzDadCNm0Sa3ruAZ2VrPd9ma1OjXY0/XdtutSUN4maI/aGlMmNu1x+wtklBmT
X-Spam-Category: None
X-Auto-Response-Suppress: DR, OOF, AutoReply
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/KWXPJCwWFgCRs01UmJBp7DaCm7c>
Subject: Re: [Ietf-languages] From Arabic to Latin
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jan 2024 22:47:52 -0000

Doug has a very good answer (as usual).

Some small additions:

   1. Just to emphasize what Doug said: The valid m0 values are on
   https://github.com/unicode-org/cldr/blob/main/common/bcp47/transform.xml.
   You can file a CLDR ticket to add others.
   2. There are private use -t extension key-values that you can use for
   your own m0 values. Eg ar-t-en-x0-hugh-paterso. Of source, like all private
   use, there is no expectation of general interoperability
   3. If there are no other -t extension key-values, the presumption is
   that the transformation is a transliteration if the language subtag (source
   / destination) are the same and the script is different, and translation if
   the the language subtags are different. Hybrids can use -t-h0-hybrid. You
   can of course have a transformation that is even milder, such as
   en-US-t-en-ca to convert from Canadian English to US English.



Mark


On Mon, Jan 15, 2024 at 12:39 PM Doug Ewell <doug@ewellic.org> wrote:

> I was waiting for Mark or someone else more closely connected with CLDR to
> respond, but here goes. Corrections and amplifications are of course
> welcome.
>
> Hugh Paterson III wrote:
>
> > My understanding is that RFC6[4]97 does not distinguish between
> > "transliteration, transcription, and translation".
>
> It does not, at least in part because the boundaries between these
> concepts can be hard to define. I recall Addison Phillips using the term
> “transmogrification” to observe this.
>
> > However, if I wanted to specify the type of transliteration method
> > used I would need to use [ar-t-ar-Latn].
>
> The part after the -t- singleton is the language (etc.) being transformed
> FROM. So it would be "ar-Latn-t-ar".
>
> You could make it "ar-Latn-t-ar-Arab" instead, to call extra attention to
> the change in script from Arabic to Latin. Balancing this against the
> normal guidelines about honoring Suppress-Script, keeping tags short, etc.
> is left as an exercise.
>
> > But this tag seems to be missing something. How would I construct a
> > valid transliteration tag using the CLDR indication for the Buckwalter
> > method?
>
> That would be "ar-Latn-t-ar-m0-buckwalt". I trust you have read enough of
> RFC 6497 and the CLDR file "transform.xml" to be able to find the subtag
> 'buckwalt' and be able to construct this tag.
>
> > Questions:
> > 1. How would I construct a valid transliteration tag from Arabic in
> > Arabic script  to Arabic in Latin Script using the CLDR indication for
> > the Buckwalter method?
>
> As above.
>
> > 2. When would it be appropriate to use [ar-t-ar-Latn]
>
> recte: "ar-Latn-t-ar"
>
> > instead of just [ar-Latn]? -- in my case I want to specify which
> > transliteration system was used.
>
> If all you need to say is “Arabic language, Latin script” then you are
> probably fine with just "ar-Latn" — the predominant use of the Arabic
> script for that language means that Latin-script content pretty much had to
> have been transliterated. The same would go for, say, "en-Arab".
>
> If you need to specify a transliteration method, and that method is not
> one of the few which has its own variant subtag (e.g. 'hepburn'), then you
> must use the full syntax with the -t- extension.
>
> > 3. What programmatic ways are there to determine if the -t- tag
>
> “subtag” or “singleton”
>
> > is used to mean translation instead of just transliteration? Does
> > just checking for the same language tag on both sides of the -t- work
> > for all use cases? (e.g., sameLangage-t-sameLangage = transliteration
> > : sameLangage-t-differentLanguage = translation)
>
> You would have to devise your own approach. The one you suggested might
> work. Extension T is not designed to support these distinctions, RFC 6497
> does not discuss them, and frankly I doubt there are many processes which
> depend on them.
>
> > 4. I am dealing with several different systems of transliteration
> > within bibliographic data (in library and archival contexts). For
> > example, Wikipedia outlines over 20 systems of transliterating Arabic
> > script to Latin script. Is there a way to designate the specific
> > transliteration method used? For example, I see Buckwalter Arabic
> > transliteration system mentioned in CLDR [5] but I don't see the
> > Library of Congress' transliteration table [6] or for that matter the
> > BGN/PCGN 1956 System [7].
>
> A quick scan of transform.xml will reveal 'alaloc' for the former (not to
> be confused with the 'alaloc' variant subtag) and 'bgn' for the latter.
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@ietf.org
> https://www.ietf.org/mailman/listinfo/ietf-languages
>