RE: [Ltru] Re: extlang

From: Don Osborn [mailto:dzo@bisharat.net]

>> Collections are one thing and macrolanguages are another.  A
>> macrolanguage
>> is a grouping of languages that can usefully be treated as a single
>> language in certain contexts.
>
> Thanks, John. The question I have is what are the contexts? IOW,
> are the various macrolanguage-sublanguage relationships, which seem
> to represent varying realities in terms of usage, categorizable into
>  certain types?

Answering that involves considering what are the reasons one person might have for grouping and what are the reasons that another might have for splitting.

Splitting

The splitting in most or all cases we're dealing with (where macrolanguages are involved) comes from Ethnologue. The rational is that, on the basis of analysis of the spoken language (the vast majority of languages catalogued being unwritten), including things like lexical similarities, comprehension testing, assessment of usage or assessment of attitudes, the conclusion is drawn that development of literature in one form is unlikely to provide adequate communication for all of the communities involved. (That's the basic principle; it may or may not have been consistently and correctly applied in all instances.)

Note that Ethnologue has not always followed every claim of a split. A familiar example is Serbo-Croatian vs. Bosnian, Croatian and Serbian, for which E only adopted the split only in the latest edition. Or, a less familiar example: I know that, in Mexico, among Mixtec varieties, many communities are wont to make hyper-differentiations, saying that each village is different when researchers find no observable barrier to communication: in these cases, E does not make distinctions the speakers themselves might indicate.

At the same time, the research should be looking for situations in which distinct spoken varieties can be bridged by a common literature, and that has not always happened. Sometimes that may have been because those doing the research didn't have that mindset. But in some cases, probably the research has gotten dated with conditions having changed in ways that make such merging more feasible.

Grouping

The reasons for grouping are various:

- The various communities share in common a single form of communication that de facto is in usage.

- There is a common perception (whether among the communities themselves or among people outside the communities) that there a single identity. (In some cases, the need of any split may have only lately come to light.)

- In spite of differences, policies or practice have been adopted to bring different varieties together with a common literary form.

- Someone gathering information resources in or about the language varieties in question doesn't have the need to differentiate between them. (E.g. a library that has neither the resources or the demand for distinguishing the items -- especially likely if the common perceptions are that there's just one entity.)

Our familiar macrolanguage, Chinese, is an example of the first two: there is a written form that for the most part is shared, and for some time that has contributed to a perception that it's all one language.

The third is something that may relate to cases such as Fula, which you mentioned.

The last is one that certainly pertains to some of the entries in 639-2. There definitely are cases in which librarians have done one of the following:

- A group of undeveloped varieties that share a label (e.g. "Quechua") are treated as a single category for cataloguing purposes.

- An ID is assigned for a major, developed variety (e.g., Malay); but then there are also minor varieties (undeveloped, smaller communities) that share the same label, and the librarians group those with the major language for cataloguing purposes.

- A taxonomy (perhaps a folk taxonomy) exists in which varieties were grouped under a single label for some convenience, and this is used as the basis for cataloguing. (E.g., "Lahnda", "Bihari").

In cases like Quechua, Bihari or Lahnda in which librarians have adopted groupings of varieties at comparable levels of development, the option existed with the introduction of 639-3 to treat the existing 639-2 ID either as a collection or as a macrolanguage. The question was whether there was some reason to warrant the latter. Taking Quechua as an example, I know that there's some ethnic nationalist sentiment that promotes a single identity, and that there has also been some recent language development activity that treated "Quechua" as one. On that basis, I suggested that macrolanguage might be the better property to assign, and the JAC just went along. In the case of Bihari, I wasn't aware of any utility for treating it as a macrolanguage, so suggested it be considered a collection, and the JAC went along. Who can judge whether these were the "right" decisions.

I think cases like Malay are unfortunate: it doesn't seem to me that there's a particular use for the grouping in software applications, but the grouping was done in other applications. Because of the latter, msa can't be treated as an individual-language ID, and I wouldn't like the idea of resources for a major language like Malay getting tagged with a collection ID. So, that leaves the macrolanguage alternative. Again, who can tell me if that was the right decision?

Now, as for IETF lang tags, I think it's reasonable to ask whether it make sense to use the extlang mechanism in all these cases. For Chinese, extlangs were basically introduced years ago. Does it also make sense for Quechua, Lahnda, Malay and all the others to do the same? I don't know.

Peter

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru