Re: [Ietf-languages] Fwd: Proposal for variant language subtags for German dialects

Dear all,

Thank you for your comments and for your patience!

I will now try to address your concerns:

I have attached two maps that show the language regions referenced by  
the tags that I have proposed. First is the (German) map from  
Wiesinger (1983) that is being referenced widely in German  
dialectology to this day. The language regions correspond to the  
language tags I have proposed. Second is a map from Lameli (2022) that  
shows those regions with names of the languag regions translated into  
English.

I do not plan on using the dialect tags to document these varieties.  
The aim of the proposed tags is to use them to tag linguistic  
resources like linguistic data sets, language maps and atlasses of  
regional varieties in different databases. At Research Center  
Deutscher Sprachatlas, we host (and make available to researchers and  
the public) a lot of different resources like several atlasses that  
document different varieties of German, like the Nordbairischer  
Sprachatlas (Northbavarian Language Atlas)  
(https://regionalsprache.de/nordbsa.aspx) and the Schlesischer  
Sprachatlas (Silesian Language Atlas)  
(https://regionalsprache.de/schlessa.aspx). Those resources refer not  
to the German lanuage as a whole, but to regional varieties.

One way of making those resources available is via hosting them in a  
repository that is currently being set up (https://lingurep.dsa.info)  
and that is specialized in data that pertains to regional varieties of  
languages (especially but not exclusively German). Since this is a  
repository especially for dialect research, we will offer the  
possibility to search by language region. A lot of the data will be  
focused on language regions smaller than “de”, “nds” and “gsw” and we  
would like to offer our users the possibility to use a standardized  
tagset for tagging their data accordingly. A standardized set of  
dialect tags would ensure a smooth process.

We purposefully chose what Wiesinger (1983) calls “Dialektregionen”  
and “Dialektverbände” (plus the language region “Westdeutsch” as  
proposed by Lameli 2013) and not the most fine-grained dialect  
groupings in use in German dialectology, to keep the number of  
different tags at a manageable level. I am sure that once we get all  
our own data into the repository (which may take some time yet), that  
we ourselves will use most, if not all of the tags I have proposed.

The repository will also be open to host data by other dialectologists  
with some external requests already in our inboxes and we would like  
to have solutions at the ready for these external datasets. As I wrote  
as part of my original proposal, we also aim to incorporate the  
language variant tags into the metadata schema for the Federated  
Content Search by the Text+ project that works on the National  
Research Data Infrastructurein Germany.

Another project that the tags will be very useful for is a  
geo-referenced bibliography for areal linguistics (GOBA;  
https://regionalsprache.de/inhalte_bibliographie.aspx), which can be  
accessed through this link:  
https://regionalsprache.de/GOBA/Catalogue.aspx. With this  
bibliography, it is possible to search for literature by geographical  
data, e.g. language regions or coordinates. We would also like to make  
a standardized set of tags for language regions available for  
searching this database. So the language tags would not only benefit  
the Deutscher Sprachatlas, but the German-language community of  
variationist linguists as a whole.

With regards to the tags for the Alemannic varieties: I would very  
much like for all the Alemannic varieties to be prefixed with “de”,  
since linguistically speaking, they are varieties of the German  
language. And as such, I would very much like for them to show up in a  
search for varieties of “de”. I do see the argument for tags like  
North Low German to be prefixed by “nds”, since Low German is a  
language distinct from High German.

Please let me know of any more concerns and questions that I have  
missed, and I will try to answer them!

Thank you,
Lisa

References
Lameli, Alfred. 2013. Strukturen im Sprachraum. Analysen zur  
arealtypologischen Komplexität der Dialekte in Deutschland. Berlin: de  
Gruyter.
Lameli, Alfred. 2022. Syllable Structure Spatially Distributed:  
Patterns of Monosyllables in German Dialects. Journal of germanic  
linguistics 34 (3), 241–287.
Wiesinger, Peter. 1983. Die Einteilung der deutschen Dialekte.  
Dialektologie. Ein internationales Handbuch, ed. by Werner Besch,  
Ulrich Knoop, Wolfgang Putschke, & Herbert Ernst Wiegand, 807–900.  
Berlin: de Gruyter.

Zitat von Mark Davis ☕️ <mark@macchiato.com>:

> I'd like to point out that BCP47 already allows for finer-grained
> geographic associations via the sd key for geographic subdivisions of
> countries. For example, de-u-sd-deby represents German as used in Bavaria;
> en-u-sd-usma is English as used in Massachusetts. The script tag Zxxx can
> be used for indicating dialects that are primarily spoken.
>
>    - The upside is that these designations are extensive (thousands), and
>    likely to be better supported by general-purpose software than are
>    variants.
>    - The downside is that like country boundaries, there may not be a
>    subdivision that reasonably associates with a particular dialect.
>
>
> Mark
>
>
> On Wed, Sep 13, 2023 at 8:02 AM Sebastian Drude <drude@xs4all.nl> wrote:
>
>> Dear all,
>>
>>
>> Some thoughts on this.
>>
>> In ISO 639(-3), German is already now an example, perhaps the most extreme
>> one, for a language which is covered by too many identifiers for variants,
>> many of which are (and were even 150 to 110 years ago, when the classical
>> dialectal surveys and maps were made) mutually intelligible at least
>> between neighbouring variants or between them and standard German.  So
>> asking for even more ISO 639 entries for German varieties is most certainly
>> a no-go.
>>
>> I was under the impression that the "de/deu/ger" code element would cover
>> German as a whole, including regional dialects, not just standard German
>> (although it is not a macro-language code).  This is in accordance to ISO
>> 639, clause 4.2.1.1: "Every language identifier in accordance with this
>> document corresponds either to an individual language in its entirety with
>> all its language varieties or to a language group."  (True, there is a
>> discrepancy here between the Ethnologue
>> https://www.ethnologue.com/language/deu/ and ISO
>> https://iso639-3.sil.org/code/deu regarding this identifier; I take ISO
>> as authoritative for us.)
>>
>> As to the tags which are now requested, I am not convinced that these are
>> needed by a large user group, because the subtle differences between the
>> respective dialects probably concern mostly spoken language, i.e., the tags
>> would mostly if not exclusively be used for multimedia recordings.
>> However, in recent times, since larger amounts of such recordings can be
>> made, the dialectal differences have diminished due to constant intense
>> diglossic contact with standard German, so that speakers are not any longer
>> bilingual but master two variants (standard and regional) of the same
>> language, and where the former distinct dialects (many of them once were
>> mutually unintelligible, at least between the more distant ones) have faded
>> into regional variants of standard German.
>>
>> As to ISO 21636, which has been mentioned here, its Part 2 (the
>> description of the framework for identifying language varieties) is already
>> published, and the two other parts (Part 1: vocabulary [terms and
>> definitions], and Part 3: application) are in the final stages of being
>> published; they past the last (FDIS) bullet in ISO last month with
>> unanimous approval.  I understand some of you have access to the ISO
>> documents.
>>
>> In ISO 21636, the different dimensions of linguistic variation which have
>> been mentioned here (social class, time, degrees of formality, etc.,
>> besides the spatial (geographical = dialectal) variation) are cleanly kept
>> apart (although they obviously influence one another).  For the
>> (geographical) space dimension, it is possible, as Doug showed for the
>> BCP47 tags in his examples "en-US-engne-enboston" or
>> "en-US-engne-ennwyork"*, to have two identifiers, one for a broader and
>> another for a more narrow geographical variant, or to use the most specific
>> variant.  I probably would prefer the latter, because using the more
>> specific one implies the broader one, and I do not believe that ISO or this
>> group should engage in settling dialectological questions about which is
>> the correct hierarchical subdivision of a language or regional variant of a
>> language, i.e., which hierarchy of dialects and sub-dialects etc. is to be
>> applied.  The user could include that somewhere else in their metadata, if
>> it is relevant.
>>
>> With some of you, I have chatted about how to implement / integrate the
>> ISO 21636 framework into the BCP47 language tags; a system of subtags in a
>> specialized extension seems to be a suitable solution.  With regard to
>> identifiers for linguistic variation, the next greater point is to set up a
>> system for registering individual values for each of the dimensions
>> (possibly starting with the variant subtags which have already been
>> established in the language tags by this group over the years).  Where such
>> a system could be established is not yet clear;  perhaps it could be
>> integrated in the procedures of this group so that the homogeneity between
>> ISO21636 identifiers and BCP47 (extension) subtags would be guaranteed.
>> Towards the end of this year I would like to take up this discussion (ISO
>> 21636 and BCP47) again with those of you who are interested.
>>
>> Best greetings,
>>
>> Sebastian
>>
>> * as to Dougs comment: "although I doubt New Yorkers would identify
>> themselves as part of New England": their own political/cultural etc.
>> identification is irrelevant here.  Many times political/administrative
>> boarders (or the self-identification or allegiances of speakers) do not
>> coincide with isoglosses of geographical language varieties.  We are coding
>> the latter.
>> The question is whether there is something like a broader New-England
>> variety of English, by which criteria, and whether the English of New York
>> does meet these criteria to be considered as being part of it.  (That said,
>> my understanding is that most current dialectological analyses of North
>> American English would NOT include NY-English within New-England English,
>> see for instance https://aschmann.net/AmEng or
>> https://en.wikipedia.org/wiki/New_England_English.)
>>
>> --
>> Museu P.E. Goeldi, CCH, Linguistica  ▪  Av. Perimetral, 1901
>> Terra Firme, CEP: 66077-530  ▪  Belém do Pará – PA  ▪  Brazil
>> drude@xs4all.nl  ▪  +55 (91) 3217 6024  ▪  +55 (91) 983733319
>>
>> -----Original Message-----
>> From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Doug
>> Ewell
>> Sent: Tuesday, September 12, 2023 8:16 PM
>> To: Hugh Paterson III <sil.linguist@gmail.com>; John Cowan <cowan@ccil.org
>> >
>> Cc: Lisa Dücker <lisa.duecker=40uni-marburg.de@dmarc.ietf.org>;
>> ietf-languages@iana.org
>> Subject: Re: [Ietf-languages] Fwd: Proposal for variant language subtags
>> for German dialects
>>
>> There is work being done at present on a new standard (ISO 21636, “A
>> Framework for Language Varieties”) which might cover a lot of what Hugh is
>> talking about, whenever it is finalized and published.
>>
>> Beyond that, it is always possible in BCP 47 to have both variant and
>> private-use subtags at multiple levels, so that one could have some
>> combination of:
>>
>> en-US-engne
>> en-US-enboston *or* en-US-engne-enboston en-US-ennwyork *or*
>> en-US-engne-ennwyork (although I doubt New Yorkers would identify
>> themselves as part of New England) en-US-ennwyork-enmanhtn
>>
>> and the beat goes on, with any of the lower-level variants in these
>> examples being replaced by private-use subtags:
>>
>> en-US-x-engne
>> en-US-engne-x-enboston
>> etc.
>>
>> As always, there is a two-edged sword when it comes to private-use coding
>> elements. If a private code achieves widespread use, that is a good
>> indication that the thing being privately coded needs to be assigned a
>> formal code; but then it becomes increasingly difficult for users to
>> migrate away from the private code, which then might have to be supported
>> indefinitely.
>>
>> If it does not achieve widespread use, despite its existence being known,
>> that probably means it was a wise decision not to assign a formal code.
>>
>> Language can be micro-analyzed down to a very fine level of detail, but
>> the question must always be asked whether there is a broad need to
>> interchange identifiers for different sub-sub-varieties. Many New Yorkers
>> can distinguish English as spoken in Manhattan vs. Brooklyn vs. Queens, but
>> would content in these varieties (not only on computers, and not only
>> spoken) actually be tagged separately, by more than one individual or small
>> research body, if the possibility to do so without private-use were
>> available?
>>
>> I wish it were possible to know how people use BCP 47, and how people wish
>> they could use it but either can’t, or don’t know that they can, similar to
>> the way Unicode is able to use Google statistics to estimate which emoji
>> are most popular.
>>
>> --
>> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>>
>>
>>
>> From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Hugh
>> Paterson III
>> Sent: Tuesday, September 12, 2023 16:26
>> To: John Cowan <cowan@ccil.org>
>> Cc: Lisa Dücker <lisa.duecker=40uni-marburg.de@dmarc.ietf.org>;
>> ietf-languages@iana.org
>> Subject: Re: [Ietf-languages] Fwd: Proposal for variant language subtags
>> for German dialects
>>
>> John,
>>
>> Can you help clarify how many sub levels are presumed in these sub-tags.
>> For example, if we say that we assign a tag to ‘New England English’ does
>> that preclude creating a tag for ‘New York’ and/or ‘Boston’ Englishes? If
>> New York Englanish has a tag does that preclude a Manhattan English? If
>> English is the language name and there is hierarchy in the geographical
>> units does that create an assumption that the speech varieties are also
>> hierarchical in their designation too? Or is it all flat and exclusive
>> within the context of the sub-language level?
>>
>> Written varieties which have been debated in this forum (such as the
>> German orthography) generally have dates associated giving them a time
>> depth with dates usually based on implementation dates.  In contrast oral
>> records and oral realities “on the ground” do not change with discrete
>> dates. For example, New York English “on the ground” doesn’t generally
>> sound like Bernie Sanders or Christopher Walken.  That is they represent
>> Brooklyn/queens New York English they represent a certain time depth and
>> social class which may have been dominant at one point for the speech they
>> represent. However, Even the same social class and racial backgrounds don’t
>> sound the same today with younger generations. How do we model time depth
>> for archival oral materials via sub tags?
>>
>> _______________________________________________
>> Ietf-languages mailing list
>> Ietf-languages@ietf.org
>> https://www.ietf.org/mailman/listinfo/ietf-languages
>>
>> _______________________________________________
>> Ietf-languages mailing list
>> Ietf-languages@ietf.org
>> https://www.ietf.org/mailman/listinfo/ietf-languages
>>

-- 
Lisa Dücker
Wissenschaftliche Mitarbeiterin
regionalsprache.de (REDE)
Forschungszentrum Deutscher Sprachatlas
Philipps-Universität Marburg

sie/ihr