Re: [Ietf-languages] Fwd: Proposal for variant language subtags for German dialects

Sebastian Drude <drude@xs4all.nl> Wed, 13 September 2023 15:02 UTC

Return-Path: <drude@xs4all.nl>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7B782C15155A for <ietf-languages@ietfa.amsl.com>; Wed, 13 Sep 2023 08:02:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.441
X-Spam-Level:
X-Spam-Status: No, score=-6.441 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=xs4all.nl
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C3cKxy9VIGwB for <ietf-languages@ietfa.amsl.com>; Wed, 13 Sep 2023 08:02:19 -0700 (PDT)
Received: from out.mail.icann.org (out.mail.icann.org [64.78.33.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7E5C8C1516E9 for <ietf-languages@ietf.org>; Wed, 13 Sep 2023 08:02:18 -0700 (PDT)
Received: from MBX112-E2-CO-1.pexch112.icann.org (10.226.41.200) by MBX112-W2-CO-1.pexch112.icann.org (10.226.41.128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.37; Wed, 13 Sep 2023 08:02:17 -0700
Received: from aesmt112-co-1-2.serverpod.net (10.224.74.76) by MBX112-E2-CO-1.pexch112.icann.org (10.226.41.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.37 via Frontend Transport; Wed, 13 Sep 2023 08:02:17 -0700
Received: from aesc112-co-1-1.serverpod.net (aesc112-co-1-1.serverpod.net [10.224.76.90]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by aesmt112-co-1.serverpod.net (Postfix) with ESMTPS id 9F8AF120002 for <ietf-languages@ex.icann.org>; Wed, 13 Sep 2023 08:02:17 -0700 (PDT)
Received: from exmx112-co-1-1.serverpod.net (exmx112-co-1-1.serverpod.net [10.224.72.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by aesmt112-co-1.serverpod.net (Postfix) with ESMTPS id 79CA340002 for <ietf-languages@ex.icann.org>; Wed, 13 Sep 2023 08:02:17 -0700 (PDT)
Received: from pechora3.dc.icann.org (pechora3.icann.org [192.0.46.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by west.smtp.mx.icann.org (Postfix) with ESMTPS id 70CBA180002 for <ietf-languages@ex.icann.org>; Wed, 13 Sep 2023 08:02:15 -0700 (PDT)
Received: from ewsoutbound.kpnmail.nl (ewsoutbound.kpnmail.nl [195.121.94.185]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id 5198670006AB for <ietf-languages@iana.org>; Wed, 13 Sep 2023 15:02:14 +0000 (UTC)
X-KPN-MessageId: 779a2e60-5246-11ee-a194-005056999439
Received: from smtp.kpnmail.nl (unknown [10.31.155.6]) by ewsoutbound.so.kpn.org (Halon) with ESMTPS id 779a2e60-5246-11ee-a194-005056999439; Wed, 13 Sep 2023 17:01:50 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xs4all.nl; s=xs4all01; h=content-type:mime-version:message-id:date:subject:to:from; bh=C7d+PpJ1zgfcPhIPQVyMPpnPxhCxEEEGmmmYUOlAcYQ=; b=Z+qduBlHGkiaH7QJ1pdPcZIRBv0f02599mqBjR13D2z5VIvhBPjn+UYTgzumquJk5oprZgqM4Wpxq Zbhu0iJN22BSy2n5VDWvqp1tI+4K8yl0ybtoA5OWYBnjm3/IpQDlRI/I2gckeFOoKSokEsO89Cseil 0V1K1zeBIi96LFggYj6ZTqsctS45Nbp4Gz58nDfNvIXCqyK5zzZHae5XwfWlsHSRATSBC8Ip3sO7Zn CvItcW8uskw7yNO7Z8KrChnmzF1o/Z8QgU0/YSd8hgxHhaHu/4sKPDlUmKXXfU0EsHiGVST0GUGBSd u2+bqhH0CPBpy53za/bXiaUfU7Aj5kQ==
X-KPN-MID: 33|lOF8Dciqg7/66nDV8hZlbKEHViMiVbaEnWORqKBhVMjUzkeRBXfob+QpWHNKjf4 n/tCMHStC4Timzz/XL7Eq28/+QeKcmTlElldZzRvtSfQ=
X-KPN-VerifiedSender: Yes
X-CMASSUN: 33|58UDxQSYyAWuw9SYYR6t2YxAng+bERWVpF5TjairB3NigtScAtbEupXk6q1AlPP CQuZyC3bI+9m3acALKkqpMg==
X-Originating-IP: 200.129.128.254
Received: from PAT023364 (vpn.museu-goeldi.br [200.129.128.254]) by smtp.kpnmail.nl (Halon) with ESMTPSA id 75655844-5246-11ee-9dc8-00505699772e; Wed, 13 Sep 2023 17:01:51 +0200 (CEST)
From: Sebastian Drude <drude@xs4all.nl>
To: 'Doug Ewell' <doug@ewellic.org>, 'Hugh Paterson III' <sil.linguist@gmail.com>, 'John Cowan' <cowan@ccil.org>
CC: 'Lisa Dücker' <lisa.duecker=40uni-marburg.de@dmarc.ietf.org>, ietf-languages@iana.org, Christian Galinski <christian.galinski@chello.at>
References: <20230911135450.Horde.C3GF4Fl3isb4n4eJMJT5bTp@home.staff.uni-marburg.de> <ZP8OX/qk2NWz1tku@sources.org> <CAD2gp_T7+MXd2qr84z+39w5zgt38TM8L3gKA6FUcTe=C2j=VqA@mail.gmail.com> <CAE=3Ky8mtG7Q9SkOCGSpVXAojTJkJCwJWoDS56dEHqW9igHukw@mail.gmail.com> <CAD2gp_R1qoRST0LS=EV9rtvyW2Td7kLcx3GobFXea6kufG1jjw@mail.gmail.com> <CAE=3Ky_71yeh+-2rR6Qp+eBoLW7SHATQ8zY8tJtjy=_T+F=JVw@mail.gmail.com> <SJ0PR03MB6598211FDF90569B00022B7CCAF1A@SJ0PR03MB6598.namprd03.prod.outlook.com>
In-Reply-To: <SJ0PR03MB6598211FDF90569B00022B7CCAF1A@SJ0PR03MB6598.namprd03.prod.outlook.com>
Date: Wed, 13 Sep 2023 12:01:55 -0300
Message-ID: <003701d9e653$3fec8700$bfc59500$@xs4all.nl>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQKhwjvK1c8dOEWcc4ufgUKEV1cl1AKw1H2pAcXEb6kC42pffgI8lkqaAlA8+aIBryjyfq4cVeaA
Content-Language: en-gb
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=UvJwis8B c=1 sm=1 tr=0 ts=6501cef7 a=Z2iVbzAMQWfC12katpY7Eg==:117 a=Z2iVbzAMQWfC12katpY7Eg==:17 a=LomH26ciyqUA:10 a=IkcTkHD0fZMA:10 a=zNV7Rl7Rt7sA:10 a=hUPTCDgYAAAA:8 a=5k6HccHfAAAA:8 a=CaU-Ozy2AAAA:8 a=8pif782wAAAA:8 a=xOd6jRPJAAAA:8 a=48vgC7mUAAAA:8 a=pGLkceISAAAA:8 a=2tsvuTQuAAAA:8 a=I0CVDw5ZAAAA:8 a=nORFd0-XAAAA:8 a=BB0g0Y5zErQuBu9B64wA:9 a=QEXdDO2ut3YA:10 a=V2tJNjH2qGujC-K859gb:22 a=BukB40I6FuKppiBORtFA:22 a=lZm9IFvHIFz4wGN9HlqF:22 a=w1C3t2QeGrPiZgrLijVG:22 a=w1QI8THEI4iyJQ0oNEIE:22 a=YdXdGVBxRxTCRzIkH2Jn:22 a=AYkXoqVYie-NGRFAsbO8:22
X-SOURCE-IP: 192.0.46.73
X-SPF-STATUS: soft_fail
X-SPF-FROM-STATUS: not_checked
X-RDNS-STATUS: pass
X-HELO-STRING: pechora3.dc.icann.org
Spam-Stopper-Id: 4a92f9e6-6280-4bb1-8a74-32fc696b640b
Spam-Stopper-v2: Yes
X-Envelope-Mail-From: drude@xs4all.nl
X-AES-Category: LEGIT
X-Spam-Category: None
X-Spam-Reasons: None
X-Auto-Response-Suppress: DR, OOF, AutoReply
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/fPErwWmu7_YlXtY62ViLb5KjnzE>
Subject: Re: [Ietf-languages] Fwd: Proposal for variant language subtags for German dialects
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Sep 2023 15:02:24 -0000

Dear all,


Some thoughts on this.

In ISO 639(-3), German is already now an example, perhaps the most extreme one, for a language which is covered by too many identifiers for variants, many of which are (and were even 150 to 110 years ago, when the classical dialectal surveys and maps were made) mutually intelligible at least between neighbouring variants or between them and standard German.  So asking for even more ISO 639 entries for German varieties is most certainly a no-go.

I was under the impression that the "de/deu/ger" code element would cover German as a whole, including regional dialects, not just standard German (although it is not a macro-language code).  This is in accordance to ISO 639, clause 4.2.1.1: "Every language identifier in accordance with this document corresponds either to an individual language in its entirety with all its language varieties or to a language group."  (True, there is a discrepancy here between the Ethnologue https://www.ethnologue.com/language/deu/ and ISO https://iso639-3.sil.org/code/deu regarding this identifier; I take ISO as authoritative for us.)

As to the tags which are now requested, I am not convinced that these are needed by a large user group, because the subtle differences between the respective dialects probably concern mostly spoken language, i.e., the tags would mostly if not exclusively be used for multimedia recordings.  However, in recent times, since larger amounts of such recordings can be made, the dialectal differences have diminished due to constant intense diglossic contact with standard German, so that speakers are not any longer bilingual but master two variants (standard and regional) of the same language, and where the former distinct dialects (many of them once were mutually unintelligible, at least between the more distant ones) have faded into regional variants of standard German.

As to ISO 21636, which has been mentioned here, its Part 2 (the description of the framework for identifying language varieties) is already published, and the two other parts (Part 1: vocabulary [terms and definitions], and Part 3: application) are in the final stages of being published; they past the last (FDIS) bullet in ISO last month with unanimous approval.  I understand some of you have access to the ISO documents.

In ISO 21636, the different dimensions of linguistic variation which have been mentioned here (social class, time, degrees of formality, etc., besides the spatial (geographical = dialectal) variation) are cleanly kept apart (although they obviously influence one another).  For the (geographical) space dimension, it is possible, as Doug showed for the BCP47 tags in his examples "en-US-engne-enboston" or "en-US-engne-ennwyork"*, to have two identifiers, one for a broader and another for a more narrow geographical variant, or to use the most specific variant.  I probably would prefer the latter, because using the more specific one implies the broader one, and I do not believe that ISO or this group should engage in settling dialectological questions about which is the correct hierarchical subdivision of a language or regional variant of a language, i.e., which hierarchy of dialects and sub-dialects etc. is to be applied.  The user could include that somewhere else in their metadata, if it is relevant.

With some of you, I have chatted about how to implement / integrate the ISO 21636 framework into the BCP47 language tags; a system of subtags in a specialized extension seems to be a suitable solution.  With regard to identifiers for linguistic variation, the next greater point is to set up a system for registering individual values for each of the dimensions (possibly starting with the variant subtags which have already been established in the language tags by this group over the years).  Where such a system could be established is not yet clear;  perhaps it could be integrated in the procedures of this group so that the homogeneity between ISO21636 identifiers and BCP47 (extension) subtags would be guaranteed.  
Towards the end of this year I would like to take up this discussion (ISO 21636 and BCP47) again with those of you who are interested.

Best greetings,

Sebastian

* as to Dougs comment: "although I doubt New Yorkers would identify themselves as part of New England": their own political/cultural etc. identification is irrelevant here.  Many times political/administrative boarders (or the self-identification or allegiances of speakers) do not coincide with isoglosses of geographical language varieties.  We are coding the latter.
The question is whether there is something like a broader New-England variety of English, by which criteria, and whether the English of New York does meet these criteria to be considered as being part of it.  (That said, my understanding is that most current dialectological analyses of North American English would NOT include NY-English within New-England English, see for instance https://aschmann.net/AmEng or https://en.wikipedia.org/wiki/New_England_English.)

-- 
Museu P.E. Goeldi, CCH, Linguistica  ▪  Av. Perimetral, 1901
Terra Firme, CEP: 66077-530  ▪  Belém do Pará – PA  ▪  Brazil
drude@xs4all.nl  ▪  +55 (91) 3217 6024  ▪  +55 (91) 983733319

-----Original Message-----
From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Doug Ewell
Sent: Tuesday, September 12, 2023 8:16 PM
To: Hugh Paterson III <sil.linguist@gmail.com>; John Cowan <cowan@ccil.org>
Cc: Lisa Dücker <lisa.duecker=40uni-marburg.de@dmarc.ietf.org>; ietf-languages@iana.org
Subject: Re: [Ietf-languages] Fwd: Proposal for variant language subtags for German dialects

There is work being done at present on a new standard (ISO 21636, “A Framework for Language Varieties”) which might cover a lot of what Hugh is talking about, whenever it is finalized and published.

Beyond that, it is always possible in BCP 47 to have both variant and private-use subtags at multiple levels, so that one could have some combination of:

en-US-engne
en-US-enboston *or* en-US-engne-enboston en-US-ennwyork *or* en-US-engne-ennwyork (although I doubt New Yorkers would identify themselves as part of New England) en-US-ennwyork-enmanhtn

and the beat goes on, with any of the lower-level variants in these examples being replaced by private-use subtags:

en-US-x-engne
en-US-engne-x-enboston
etc.

As always, there is a two-edged sword when it comes to private-use coding elements. If a private code achieves widespread use, that is a good indication that the thing being privately coded needs to be assigned a formal code; but then it becomes increasingly difficult for users to migrate away from the private code, which then might have to be supported indefinitely.

If it does not achieve widespread use, despite its existence being known, that probably means it was a wise decision not to assign a formal code.

Language can be micro-analyzed down to a very fine level of detail, but the question must always be asked whether there is a broad need to interchange identifiers for different sub-sub-varieties. Many New Yorkers can distinguish English as spoken in Manhattan vs. Brooklyn vs. Queens, but would content in these varieties (not only on computers, and not only spoken) actually be tagged separately, by more than one individual or small research body, if the possibility to do so without private-use were available?

I wish it were possible to know how people use BCP 47, and how people wish they could use it but either can’t, or don’t know that they can, similar to the way Unicode is able to use Google statistics to estimate which emoji are most popular.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org



From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Hugh Paterson III
Sent: Tuesday, September 12, 2023 16:26
To: John Cowan <cowan@ccil.org>
Cc: Lisa Dücker <lisa.duecker=40uni-marburg.de@dmarc.ietf.org>; ietf-languages@iana.org
Subject: Re: [Ietf-languages] Fwd: Proposal for variant language subtags for German dialects

John,

Can you help clarify how many sub levels are presumed in these sub-tags.  For example, if we say that we assign a tag to ‘New England English’ does that preclude creating a tag for ‘New York’ and/or ‘Boston’ Englishes? If New York Englanish has a tag does that preclude a Manhattan English? If English is the language name and there is hierarchy in the geographical units does that create an assumption that the speech varieties are also hierarchical in their designation too? Or is it all flat and exclusive within the context of the sub-language level? 

Written varieties which have been debated in this forum (such as the German orthography) generally have dates associated giving them a time depth with dates usually based on implementation dates.  In contrast oral records and oral realities “on the ground” do not change with discrete dates. For example, New York English “on the ground” doesn’t generally sound like Bernie Sanders or Christopher Walken.  That is they represent Brooklyn/queens New York English they represent a certain time depth and social class which may have been dominant at one point for the speech they represent. However, Even the same social class and racial backgrounds don’t sound the same today with younger generations. How do we model time depth for archival oral materials via sub tags?

_______________________________________________
Ietf-languages mailing list
Ietf-languages@ietf.org
https://www.ietf.org/mailman/listinfo/ietf-languages