Re: [Ietf-languages] Fwd: Proposal for variant language subtags for German dialects

Mark Davis ☕️ <mark@macchiato.com> Wed, 13 September 2023 18:27 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C0675C169509 for <ietf-languages@ietfa.amsl.com>; Wed, 13 Sep 2023 11:27:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.514
X-Spam-Level:
X-Spam-Status: No, score=0.514 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, NORMAL_HTTP_TO_IP=0.001, NUMERIC_HTTP_ADDR=1.242, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=macchiato-com.20230601.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oPMmDJCEvzrM for <ietf-languages@ietfa.amsl.com>; Wed, 13 Sep 2023 11:27:16 -0700 (PDT)
Received: from out.mail.icann.org (out.mail.icann.org [64.78.33.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 195E7C13739A for <ietf-languages@ietf.org>; Wed, 13 Sep 2023 11:27:16 -0700 (PDT)
Received: from MBX112-E2-CO-1.pexch112.icann.org (10.226.41.200) by MBX112-E2-CO-1.pexch112.icann.org (10.226.41.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.37; Wed, 13 Sep 2023 11:27:14 -0700
Received: from aesmt112-co-1-1.serverpod.net (10.224.74.75) by MBX112-E2-CO-1.pexch112.icann.org (10.226.41.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.37 via Frontend Transport; Wed, 13 Sep 2023 11:27:14 -0700
Received: from aesc112-co-1-1.serverpod.net (aesc112-co-1-1.serverpod.net [10.224.76.90]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by aesmt112-co-1.serverpod.net (Postfix) with ESMTPS id 953C940002 for <ietf-languages@ex.icann.org>; Wed, 13 Sep 2023 11:27:14 -0700 (PDT)
Received: from exmx112-co-1-2.serverpod.net (exmx112-co-1-2.serverpod.net [10.224.72.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by aesmt112-co-1.serverpod.net (Postfix) with ESMTPS id 6FE8E120002 for <ietf-languages@ex.icann.org>; Wed, 13 Sep 2023 11:27:14 -0700 (PDT)
Received: from pechora3.dc.icann.org (pechora3.icann.org [192.0.46.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by west.smtp.mx.icann.org (Postfix) with ESMTPS id 1E2D0140002 for <ietf-languages@ex.icann.org>; Wed, 13 Sep 2023 11:27:13 -0700 (PDT)
Received: from mail-vk1-xa2a.google.com (mail-vk1-xa2a.google.com [IPv6:2607:f8b0:4864:20::a2a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id F4074700062D for <ietf-languages@iana.org>; Wed, 13 Sep 2023 18:27:12 +0000 (UTC)
Received: by mail-vk1-xa2a.google.com with SMTP id 71dfb90a1353d-495e049a28bso82374e0c.1 for <ietf-languages@iana.org>; Wed, 13 Sep 2023 11:27:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=macchiato-com.20230601.gappssmtp.com; s=20230601; t=1694629612; x=1695234412; darn=iana.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1TnUX42ESDDcy2o/ctWtNZTDYpgX8vL5IWp0GR3KlYU=; b=KUpZnVN5u73aO4WbuyrWKEHWM+V85VFh6GbdRrfSZkbCdmS+Ya9YaDgNMrPdGN7R1x MIpAsfyz1JEhi5C76g5SrscDITFLwB3shIjFwrw7SVlohRev/FHld32M0gTrWC6QT0do PgG2FlPsuFp9rX1QUcrT/ku8/Cfhv3Am4W0aZyGRRMSXzaUDtglk64faJotdDpVRb1hU NTmYw5eHzEMx4fyLaBUiiEtbVJMrzLxvqTDloBkwb107bnPo1olv93rXWEBklYoHVd/y uhrTl+yF+0LBA2CzMNnHPkh4W6KTqmuvvgi/II5PCYpOeBXP0ZrJ3M4KGvK/4BoREGrz ddRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694629612; x=1695234412; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1TnUX42ESDDcy2o/ctWtNZTDYpgX8vL5IWp0GR3KlYU=; b=khlfUBbOiRkxRSZHVDJSGtqGgDWOkC6xJNZ+Yg9rZOK/iDYqtCLixBk/i4R4xGVMzh gj6+a4eUSekYe8zaiBtAoUeKVmzIxaFAIpLLBmgYOvjrCIqOkbfsndTTIWHH+5qMmY45 sMkodYgTR8GMNIoRxir+iynxuuHVPpAg8Upr2haOyDE/bR/7M3MvEXZjU0qUYvw6Bije NGZJxqDWzBVaVzJV8SR7GzgFQ9CU+e0EyVFSOfTQ326Pv/P0dsUEh3l6txUeozOMpXuV xLV0peiSBjU67Uql4bKJzkW0OwfP+f/7QNR6Zu6Pf2+g58zxuMXpcgChmIFdYSIYNiRK BN8w==
X-Gm-Message-State: AOJu0YzJv6qAcBF85xRex/QwJ7dpEOUB19cgLkV4egReZeleXM8en7sn EeF9uizCmUp2a1cI/Arhg4TyfvwQKZHlEjKdDKw=
X-Google-Smtp-Source: AGHT+IFsvCYpcKxgmPKubMSRIUGhHfQKHCh0aX3O4hqc+oDlVvt0HyBloRQ/pxZiKNk0Gs2/fK2f4p9BxN2NPVrAdKw=
X-Received: by 2002:a1f:4807:0:b0:48d:bdd:9913 with SMTP id v7-20020a1f4807000000b0048d0bdd9913mr3409385vka.12.1694629612062; Wed, 13 Sep 2023 11:26:52 -0700 (PDT)
MIME-Version: 1.0
References: <20230911135450.Horde.C3GF4Fl3isb4n4eJMJT5bTp@home.staff.uni-marburg.de> <ZP8OX/qk2NWz1tku@sources.org> <CAD2gp_T7+MXd2qr84z+39w5zgt38TM8L3gKA6FUcTe=C2j=VqA@mail.gmail.com> <CAE=3Ky8mtG7Q9SkOCGSpVXAojTJkJCwJWoDS56dEHqW9igHukw@mail.gmail.com> <CAD2gp_R1qoRST0LS=EV9rtvyW2Td7kLcx3GobFXea6kufG1jjw@mail.gmail.com> <CAE=3Ky_71yeh+-2rR6Qp+eBoLW7SHATQ8zY8tJtjy=_T+F=JVw@mail.gmail.com> <SJ0PR03MB6598211FDF90569B00022B7CCAF1A@SJ0PR03MB6598.namprd03.prod.outlook.com> <003701d9e653$3fec8700$bfc59500$@xs4all.nl>
In-Reply-To: <003701d9e653$3fec8700$bfc59500$@xs4all.nl>
From: Mark Davis ☕️ <mark@macchiato.com>
Date: Wed, 13 Sep 2023 11:26:40 -0700
Message-ID: <CAJ2xs_HwJN3k_=iYCCHuE01NUUC6Bf++4Y77xqgP=rTv-s1m1Q@mail.gmail.com>
To: Sebastian Drude <drude@xs4all.nl>
Cc: Doug Ewell <doug@ewellic.org>, Hugh Paterson III <sil.linguist@gmail.com>, John Cowan <cowan@ccil.org>, Lisa Dücker <lisa.duecker=40uni-marburg.de@dmarc.ietf.org>, ietf-languages@iana.org, Christian Galinski <christian.galinski@chello.at>
Content-Type: multipart/alternative; boundary="0000000000000d8ee5060541b4cf"
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=a9IjSGeF c=1 sm=1 tr=0 ts=6501ff02 a=Z2iVbzAMQWfC12katpY7Eg==:117 a=Z2iVbzAMQWfC12katpY7Eg==:17 a=xqWC_Br6kY4A:10 a=zNV7Rl7Rt7sA:10 a=M51BFTxLslgA:10 a=A4EqBspgoKYA:10 a=xOd6jRPJAAAA:8 a=hUPTCDgYAAAA:8 a=5k6HccHfAAAA:8 a=CaU-Ozy2AAAA:8 a=8pif782wAAAA:8 a=48vgC7mUAAAA:8 a=pGLkceISAAAA:8 a=2tsvuTQuAAAA:8 a=I0CVDw5ZAAAA:8 a=nORFd0-XAAAA:8 a=Z_Ari2BnD3WhKrLynJoA:9 a=QEXdDO2ut3YA:10 a=B05vsb9NAAAA:8 a=PPLkax1A3YtRYPcLWs0A:9 a=Ip2FvFyQUO_G2Z1N:21 a=lqcHg5cX4UMA:10 a=nSH5mSe777c-oNn7SVEg:22 a=NWVoK91CQySWRX1oVYDe:22 a=V2tJNjH2qGujC-K859gb:22 a=BukB40I6FuKppiBORtFA:22 a=lZm9IFvHIFz4wGN9HlqF:22 a=w1C3t2QeGrPiZgrLijVG:22 a=w1QI8THEI4iyJQ0oNEIE:22 a=YdXdGVBxRxTCRzIkH2Jn:22 a=AYkXoqVYie-NGRFAsbO8:22 a=zlo-SMxQfHlx9WUMtBAg:22
X-SOURCE-IP: 192.0.46.73
X-SPF-STATUS: soft_fail
X-SPF-FROM-STATUS: not_checked
X-RDNS-STATUS: pass
X-HELO-STRING: pechora3.dc.icann.org
Spam-Stopper-Id: ce70df96-57df-4ffa-a5fd-b0dc6b5dc1df
Spam-Stopper-v2: Yes
X-Envelope-Mail-From: mark.edward.davis@gmail.com
X-Spam-Category: None
X-Spam-Reasons: None
X-AES-Category: LEGIT
X-Auto-Response-Suppress: DR, OOF, AutoReply
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/OvD_p-6ii51q4MF_Mwih0QoNzNU>
Subject: Re: [Ietf-languages] Fwd: Proposal for variant language subtags for German dialects
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Sep 2023 18:27:20 -0000

I'd like to point out that BCP47 already allows for finer-grained
geographic associations via the sd key for geographic subdivisions of
countries. For example, de-u-sd-deby represents German as used in Bavaria;
en-u-sd-usma is English as used in Massachusetts. The script tag Zxxx can
be used for indicating dialects that are primarily spoken.

   - The upside is that these designations are extensive (thousands), and
   likely to be better supported by general-purpose software than are
   variants.
   - The downside is that like country boundaries, there may not be a
   subdivision that reasonably associates with a particular dialect.


Mark


On Wed, Sep 13, 2023 at 8:02 AM Sebastian Drude <drude@xs4all.nl> wrote:

> Dear all,
>
>
> Some thoughts on this.
>
> In ISO 639(-3), German is already now an example, perhaps the most extreme
> one, for a language which is covered by too many identifiers for variants,
> many of which are (and were even 150 to 110 years ago, when the classical
> dialectal surveys and maps were made) mutually intelligible at least
> between neighbouring variants or between them and standard German.  So
> asking for even more ISO 639 entries for German varieties is most certainly
> a no-go.
>
> I was under the impression that the "de/deu/ger" code element would cover
> German as a whole, including regional dialects, not just standard German
> (although it is not a macro-language code).  This is in accordance to ISO
> 639, clause 4.2.1.1: "Every language identifier in accordance with this
> document corresponds either to an individual language in its entirety with
> all its language varieties or to a language group."  (True, there is a
> discrepancy here between the Ethnologue
> https://www.ethnologue.com/language/deu/ and ISO
> https://iso639-3.sil.org/code/deu regarding this identifier; I take ISO
> as authoritative for us.)
>
> As to the tags which are now requested, I am not convinced that these are
> needed by a large user group, because the subtle differences between the
> respective dialects probably concern mostly spoken language, i.e., the tags
> would mostly if not exclusively be used for multimedia recordings.
> However, in recent times, since larger amounts of such recordings can be
> made, the dialectal differences have diminished due to constant intense
> diglossic contact with standard German, so that speakers are not any longer
> bilingual but master two variants (standard and regional) of the same
> language, and where the former distinct dialects (many of them once were
> mutually unintelligible, at least between the more distant ones) have faded
> into regional variants of standard German.
>
> As to ISO 21636, which has been mentioned here, its Part 2 (the
> description of the framework for identifying language varieties) is already
> published, and the two other parts (Part 1: vocabulary [terms and
> definitions], and Part 3: application) are in the final stages of being
> published; they past the last (FDIS) bullet in ISO last month with
> unanimous approval.  I understand some of you have access to the ISO
> documents.
>
> In ISO 21636, the different dimensions of linguistic variation which have
> been mentioned here (social class, time, degrees of formality, etc.,
> besides the spatial (geographical = dialectal) variation) are cleanly kept
> apart (although they obviously influence one another).  For the
> (geographical) space dimension, it is possible, as Doug showed for the
> BCP47 tags in his examples "en-US-engne-enboston" or
> "en-US-engne-ennwyork"*, to have two identifiers, one for a broader and
> another for a more narrow geographical variant, or to use the most specific
> variant.  I probably would prefer the latter, because using the more
> specific one implies the broader one, and I do not believe that ISO or this
> group should engage in settling dialectological questions about which is
> the correct hierarchical subdivision of a language or regional variant of a
> language, i.e., which hierarchy of dialects and sub-dialects etc. is to be
> applied.  The user could include that somewhere else in their metadata, if
> it is relevant.
>
> With some of you, I have chatted about how to implement / integrate the
> ISO 21636 framework into the BCP47 language tags; a system of subtags in a
> specialized extension seems to be a suitable solution.  With regard to
> identifiers for linguistic variation, the next greater point is to set up a
> system for registering individual values for each of the dimensions
> (possibly starting with the variant subtags which have already been
> established in the language tags by this group over the years).  Where such
> a system could be established is not yet clear;  perhaps it could be
> integrated in the procedures of this group so that the homogeneity between
> ISO21636 identifiers and BCP47 (extension) subtags would be guaranteed.
> Towards the end of this year I would like to take up this discussion (ISO
> 21636 and BCP47) again with those of you who are interested.
>
> Best greetings,
>
> Sebastian
>
> * as to Dougs comment: "although I doubt New Yorkers would identify
> themselves as part of New England": their own political/cultural etc.
> identification is irrelevant here.  Many times political/administrative
> boarders (or the self-identification or allegiances of speakers) do not
> coincide with isoglosses of geographical language varieties.  We are coding
> the latter.
> The question is whether there is something like a broader New-England
> variety of English, by which criteria, and whether the English of New York
> does meet these criteria to be considered as being part of it.  (That said,
> my understanding is that most current dialectological analyses of North
> American English would NOT include NY-English within New-England English,
> see for instance https://aschmann.net/AmEng or
> https://en.wikipedia.org/wiki/New_England_English.)
>
> --
> Museu P.E. Goeldi, CCH, Linguistica  ▪  Av. Perimetral, 1901
> Terra Firme, CEP: 66077-530  ▪  Belém do Pará – PA  ▪  Brazil
> drude@xs4all.nl  ▪  +55 (91) 3217 6024  ▪  +55 (91) 983733319
>
> -----Original Message-----
> From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Doug
> Ewell
> Sent: Tuesday, September 12, 2023 8:16 PM
> To: Hugh Paterson III <sil.linguist@gmail.com>; John Cowan <cowan@ccil.org
> >
> Cc: Lisa Dücker <lisa.duecker=40uni-marburg.de@dmarc.ietf.org>;
> ietf-languages@iana.org
> Subject: Re: [Ietf-languages] Fwd: Proposal for variant language subtags
> for German dialects
>
> There is work being done at present on a new standard (ISO 21636, “A
> Framework for Language Varieties”) which might cover a lot of what Hugh is
> talking about, whenever it is finalized and published.
>
> Beyond that, it is always possible in BCP 47 to have both variant and
> private-use subtags at multiple levels, so that one could have some
> combination of:
>
> en-US-engne
> en-US-enboston *or* en-US-engne-enboston en-US-ennwyork *or*
> en-US-engne-ennwyork (although I doubt New Yorkers would identify
> themselves as part of New England) en-US-ennwyork-enmanhtn
>
> and the beat goes on, with any of the lower-level variants in these
> examples being replaced by private-use subtags:
>
> en-US-x-engne
> en-US-engne-x-enboston
> etc.
>
> As always, there is a two-edged sword when it comes to private-use coding
> elements. If a private code achieves widespread use, that is a good
> indication that the thing being privately coded needs to be assigned a
> formal code; but then it becomes increasingly difficult for users to
> migrate away from the private code, which then might have to be supported
> indefinitely.
>
> If it does not achieve widespread use, despite its existence being known,
> that probably means it was a wise decision not to assign a formal code.
>
> Language can be micro-analyzed down to a very fine level of detail, but
> the question must always be asked whether there is a broad need to
> interchange identifiers for different sub-sub-varieties. Many New Yorkers
> can distinguish English as spoken in Manhattan vs. Brooklyn vs. Queens, but
> would content in these varieties (not only on computers, and not only
> spoken) actually be tagged separately, by more than one individual or small
> research body, if the possibility to do so without private-use were
> available?
>
> I wish it were possible to know how people use BCP 47, and how people wish
> they could use it but either can’t, or don’t know that they can, similar to
> the way Unicode is able to use Google statistics to estimate which emoji
> are most popular.
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>
>
>
> From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Hugh
> Paterson III
> Sent: Tuesday, September 12, 2023 16:26
> To: John Cowan <cowan@ccil.org>
> Cc: Lisa Dücker <lisa.duecker=40uni-marburg.de@dmarc.ietf.org>;
> ietf-languages@iana.org
> Subject: Re: [Ietf-languages] Fwd: Proposal for variant language subtags
> for German dialects
>
> John,
>
> Can you help clarify how many sub levels are presumed in these sub-tags.
> For example, if we say that we assign a tag to ‘New England English’ does
> that preclude creating a tag for ‘New York’ and/or ‘Boston’ Englishes? If
> New York Englanish has a tag does that preclude a Manhattan English? If
> English is the language name and there is hierarchy in the geographical
> units does that create an assumption that the speech varieties are also
> hierarchical in their designation too? Or is it all flat and exclusive
> within the context of the sub-language level?
>
> Written varieties which have been debated in this forum (such as the
> German orthography) generally have dates associated giving them a time
> depth with dates usually based on implementation dates.  In contrast oral
> records and oral realities “on the ground” do not change with discrete
> dates. For example, New York English “on the ground” doesn’t generally
> sound like Bernie Sanders or Christopher Walken.  That is they represent
> Brooklyn/queens New York English they represent a certain time depth and
> social class which may have been dominant at one point for the speech they
> represent. However, Even the same social class and racial backgrounds don’t
> sound the same today with younger generations. How do we model time depth
> for archival oral materials via sub tags?
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@ietf.org
> https://www.ietf.org/mailman/listinfo/ietf-languages
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@ietf.org
> https://www.ietf.org/mailman/listinfo/ietf-languages
>