Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)

Mark Davis ⌛ <mark@macchiato.com> Wed, 15 July 2009 23:09 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 2BC9C28C14F for <ltru@core3.amsl.com>; Wed, 15 Jul 2009 16:09:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.513
X-Spam-Level:
X-Spam-Status: No, score=-2.513 tagged_above=-999 required=5 tests=[AWL=0.563, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, HTML_MESSAGE=0.001, J_CHICKENPOX_34=0.6, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iJIsFfnjr5As for <ltru@core3.amsl.com>; Wed, 15 Jul 2009 16:09:44 -0700 (PDT)
Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.249]) by core3.amsl.com (Postfix) with ESMTP id F30D428C10C for <ltru@ietf.org>; Wed, 15 Jul 2009 16:09:05 -0700 (PDT)
Received: by an-out-0708.google.com with SMTP id c37so1858523anc.4 for <ltru@ietf.org>; Wed, 15 Jul 2009 16:08:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type; bh=/nNRKaerAeoITJaoVlTG2z2Ie0GFx5sFk6QBc9049iw=; b=rPKSwCW3zu7FYoEthwIDG67kTEuipqd9OB1L+psz4FW5jhcTlYAnusNn7HFoqKtGjh uiHaPCX74lUdGeaCnujYhAkMJ7ZSfAZ0KIOBVL7AaYcyPAIOvL8XqkYT6ncCmkxFHKHq VHtzVrpl8+ytnfIK8hXrYU6edz5vvuOYtNcGY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=hqhCOzZRHF4kBDVbjMtJvUHyCa+jQDq7YtFyDxtqNw0/jelgTNEZTygJ4+i6V7/V2n fl5/vT+qsKvC2Vp9VToLWQgeimISPSzPsrrqG3D7w7+HrvXSVcyKCNhgr72cvJejaZ1h VOqeq5Y8dQOSkxGjnpDBLK4gN6NyQwEtGAypc=
MIME-Version: 1.0
Sender: mark.edward.davis@gmail.com
Received: by 10.100.11.14 with SMTP id 14mr10946042ank.81.1247699016201; Wed, 15 Jul 2009 16:03:36 -0700 (PDT)
In-Reply-To: <207901ca059b$5e54dc90$0c00a8c0@CPQ86763045110>
References: <C683A5F6.F25A%kent.karlsson14@comhem.se> <8D97027965E89F488BC87B919382D9FD0510BEC2@ussdixms01.am.sony.com> <30b660a20907151031i6e8bbeedyc4d33ffa59cce113@mail.gmail.com> <203d01ca058c$7bd8da50$0c00a8c0@CPQ86763045110> <30b660a20907151415o36cdaf8br5d5ec951688ebdac@mail.gmail.com> <207901ca059b$5e54dc90$0c00a8c0@CPQ86763045110>
Date: Wed, 15 Jul 2009 16:03:35 -0700
X-Google-Sender-Auth: 49fc0ad192361014
Message-ID: <30b660a20907151603h20828ad0j7fe6e653f7dfb052@mail.gmail.com>
From: Mark Davis ⌛ <mark@macchiato.com>
To: debbie@ictmarketing.co.uk
Content-Type: multipart/alternative; boundary="0016e642dd64a5f0e3046ec69039"
Cc: LTRU Working Group <ltru@ietf.org>, "Broome, Karen" <Karen.Broome@am.sony.com>
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jul 2009 23:09:48 -0000

Mark


On Wed, Jul 15, 2009 at 15:26, Debbie Garside <debbie@ictmarketing.co.uk>wrote:

>  Mark wrote:
>
> > A good example.
>
> *Thank you!*
>
> > Here's what I suggest you do:
>
> > The following are all expressible in BCP47, and need to be tossed out:
>
>    1. All those already with language codes (ang, enm).
>
> *These are all ISO 639-2/3/5 codes used within the hierarchical system of
> ISO 639-6 in order to relate the languages, so they are already within the
> LSR.  It is a simple procedure to remove them.*
> **
> > 2.  All of the written vs non-written differences
> *Why?  *
>

Because that distinction is already expressible in BCP47, as I said.

> **
>
> > 3.  All "written in <script>" variants
>
> *As previously discussed, those that are expressible within BCP 47, yes,
> the others, of which there are many, no.*
>
> > Most of the rest are associated with geographic designations.
>
> *Language does tend to be linked to where people live... I will agree with
> that... but...*
>
> >  Rather than have a hit&miss approach to those that seem important to the
> designers of 639-6, we'd be better off having some variant convention like
> the following.
>
>    - unlXXXXX is a UN/LOCODE minus the country designator (see for this
>    case, http://www.unece.org/cefact/locode/gb.htm)
>    - isoXXXXX is an ISO 3166-2 code (as regularized and stabilized by
>    CLDR). See for this case, http://en.wikipedia.org/wiki/ISO_3166-2:GB,
>
> > So toss out all those that could be expressed in this way (using case
> just to make the derivation clear):
>
> en-GB-unlHMA for Helmsdale
> en-GB-unlNCS for Newcastle
> en-GB-isoSCT for Scots
> en-GB-isoZET for Shetlandic
>
>
> *I don't think so... and here's why... , let us look at the Staffordshire
> dialect as a case in point...*
> **
> *The staffordshire dialect area, as designated within ISO 639-6,  includes
> the whole of Staffordshire itself plus most of Cheshire, northern
> Shropshire, and parts of Southern Derbyshire, northwestern Warwickshire and
> northeastern Worcestershire.  it could be said that the "Potteries" dialect
> is within this dialect but I believe it is a distinct variant.  There is no
> way of defining the "potteries" via ISO or UN/LOCODE or the Staffordshire
> dialect area for that matter - it would be a complete mess.    The following
> example shows some distinctions within this dialects vowel sounds:*
> **
> *bait is pronounced like beat*
> *beat is pronounced like bait*
> **
> *In fact, I believe that in this dialect the words "It seems the same" are
> pronounced "It sames the seem"!  Quite distinct and very apt to my comments,
> I think you will agree.*
> **
> *Similarly, the Lincolnshire dialect area consists only of central
> Lincolnshire, while the Leicestershire area includes most of Leicestershire
> (other than a bit linked to the Nottinghamshire dialect) plus part of south
> Nottinghamshire and parts of western Lincolnshire.*
> **
> *I think you get my drift...*
>

It doesn't matter. The use of a region code in BCP 47 means that the variant
is associated with that region. That does not mean that the region includes
all instances, nor that the entire region only uses that variant. And no
matter how you "define" a region (eg "*whole of Staffordshire itself plus
most of Cheshire, northern Shropshire, and parts of Southern Derbyshire,
northwestern Warwickshire and northeastern Worcestershire*", that will be
the case.

**
> **
> *Best regards*
> **
> *Debbie*
>
>  ------------------------------
> *From:* mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] *On
> Behalf Of *Mark Davis ?
> *Sent:* 15 July 2009 22:15
> *To:* debbie@ictmarketing.co.uk
> *Cc:* Broome, Karen; Kent Karlsson; LTRU Working Group
> *Subject:* Re: [Ltru] rechartering to handle 639-6 (was FW:
> Anomalyinupcomingregistry)
>
> A good example. Here's what I suggest you do:
>
> The following are all expressible in BCP47, and need to be tossed out:
>
>    1. All those already with language codes (ang, enm).
>    2. All of the written vs non-written differences
>    3. All "written in <script>" variants
>
> Most of the rest are associated with geographic designations. Rather than
> have a hit&miss approach to those that seem important to the designers of
> 639-6, we'd be better off having some variant convention like the following.
>
>    - unlXXXXX is a UN/LOCODE minus the country designator (see for this
>    case, http://www.unece.org/cefact/locode/gb.htm)
>    - isoXXXXX is an ISO 3166-2 code (as regularized and stabilized by
>    CLDR). See for this case, http://en.wikipedia.org/wiki/ISO_3166-2:GB,
>
> So toss out all those that could be expressed in this way (using case just
> to make the derivation clear):
>
> en-GB-unlHMA for Helmsdale
> en-GB-unlNCS for Newcastle
> en-GB-isoSCT for Scots
> en-GB-isoZET for Shetlandic
> ...
>
> Then what you have left would be useful to review.
>
> Mark
>
>
> On Wed, Jul 15, 2009 at 13:40, Debbie Garside <debbie@ictmarketing.co.uk>wrote:
>
>>  Mark wrote:
>>
>> >> (cf
>> http://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language)
>>
>> A more in-depth version of en-GB dialects... representing the draft data
>> for ISO 639-6   And... I personally have to take full responsibility for
>> this particular piece of research so corrections/suggestions etc. are most
>> welcome!
>>
>>     gmcw grmc Germanic West nsea gmcw North Sea angl nsea Anglic ango
>> angl Anglo Saxon meng ango Middle English enen meng Early Northern Middle
>> English emsc enen Early Scots Northern Middle English msco emsc Middle
>> Scots sco engc Scots scow sco Scots Written scol scow Scots Written  Latin
>> Script scos sco Scots Spoken sctl sco Scots-L sctw sctl Scots-L Written
>> sotl sctw Scots-L Written Latin Script llan sco Lallans llaw llan Lallans
>> Written llnl llaw Lallans Written Latin Script budo sco Buchan-Doric wbud
>> budo Buchan-Doric Written lbud wbud Buchan-Doric Written Latin Script
>> stld scos Shetlandic orcd scos Orcadian cthn scos Caithness helm scos
>> Helmsdale bkil scos Black-Isle nirn scos Nairn mray scos Moray bchn scos
>> Buchan abrd scos Aberdonian munh scos Mounth csct scos Central-Scots glca
>> scos Glesca swst scos Southwest-Scots brsc scos Border-Scots ulla llan
>> Ullans wull ulla Ullans Written ulll wull Ullans Written Latin Script
>> sull ulla Ullans Spoken dngl sull Donegal drya sull Derry-Antrim cntd
>> sull County-Down mlde scos Madeleine-Scots otgo scos Otago-Scots nubn sco
>> Northumbrian  nrbn nubn Northumbrian Spoken nond nrbn Northumberland grdi
>> nrbn Geordie wrde nrbn Wearside teed nrbn Tees-Side dumc nrbn
>> Durham-County swwe nrbn Swaledale-Wensleydale yrkm nrbn Yorkshire-Moors
>> hmbn nrbn Humberside-N lxyn nrbn Lyne crle nrbn Carlisle vled nrbn
>> Vale-Of-Eden lkln nrbn Lakeland-N lknd nrbn Lakeland-S esse meng Early
>> Southern And South Western Middle English emse meng Early Midland And
>> South Eastern Middle English emen emse Early Modern English aeng eng
>> Anglo-English waen aeng Anglo-English Written laen waen Anglo-English
>> Written Latin Script seng aeng Anglo-English Spoken aenn seng Anglo-English
>> North Cluster naen aenn Northern Anglo-English nann naen Northern
>> Anglo-English Northeast newc nann Newcastle sdld nann Sunderland ddbh
>> nann Middlesborough tyne nann Tyneside naln naen Northern Anglo-English
>> Lower North nalc naln Northern Anglo-English Lower North Central clsl
>> nalc Carlisle shff nalc Sheffield cumb nalc Cumbria angy nalc Yorkshire
>> wngy angy Yorkshire Western hlfx wngy Halifax hddf wngy Huddersfield brdd
>> wngy Bradford lddd wngy Leeds yrkk wngy York lanc nalc Lancashire lacc
>> lanc Lancashire Central hmbe lanc Humberside huuu nalc Hull gmby nalc
>> Grimsby cang seng Anglo-English Central Cluster aenc cang Anglo-English
>> West Central Cluster scou aenc Scouse nwme aenc North West Midlands
>> English mctr nwme Manchester chsr nwme Cheshire shpn nwme Shropshire
>> North drby nwme Derbyshire stff nwme Staffordshire pttr nwme Potteries
>> wmen cang West Midlands English brmm wmen Birmingham blco wmen Black
>> Country ecen cang East Central English  cece ecen Central Midlands
>> English nnle cece Northern Nottinghamshire-Leicester nttm cece Nottingham
>> leic cece Leicester neme ecen North-East Midlands English lncn neme
>> Lincoln eden ecen East Midlands English nttt eden Nottinghamshire-S slcn
>> eden Lincolnshire-S snrt eden Northamptonshire-S cmbn eden
>> Cambridgeshire-N grnk cmbr Grantham-Newark pbom cmbr Peterborough-Oakham
>> crby cmbr Corby bnor cmbr Northamptonshire Borders saen seng Anglo-English
>> South Cluster swen saen Anglo-English South Southwest uswe swen Upper
>> Southwest English hrfd uswe Herefordshire shrp uswe Shropshire gloc uswe
>> Gloucestershire wrrc uswe Worcestershire gowp uswe Gower Peninsula spem
>> uswe South Pembrokeshire cswe swen Central Southwest English glct cswe
>> Gloucestershire bucc cswe Buckinghamshire-S ofrd cswe Oxfordshire brke
>> cswe Berkshire-S hmpe cswe Hampshire-S wilt cswe Wiltshire avnn cswe Avon
>> sstt cswe Somerset brst cswe Bristol lswe swen Lower Southwest English
>> dvon lswe Devon lswc lswe Cornwall eacr lswc East Cornwall wecr lswc West
>> Cornwall anec seng Anglo-English East Cluster smen anec South Midlands
>> English nthh smen Northamptonshire bedd smen Bedfordshire camb smen
>> Cambridgeshire buck smen Buckinghamshire-NW eaan anec East Anglia sufn
>> eaan Suffolk-NE nflk eaan Norfolk nfol nflk Norfolk Northern nnoe nflk Norfolk
>> Eastern fenn eaan Fens snea eaan Southern-East Anglia hcen anec Home
>> Counties English cckn hcen Cockney lndn hcen London Counties hant hcen
>> Hampshire nrks hcen Berkshire bckk hcen Buckinghamshire
>>
>>
>> Best regards
>>
>> Debbie
>>
>>  ------------------------------
>> *From:* mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com]
>> *On Behalf Of *Mark Davis ?
>> *Sent:* 15 July 2009 18:31
>> *To:* Broome, Karen
>> *Cc:* Kent Karlsson; debbie@ictmarketing.co.uk; LTRU Working Group
>> *Subject:* Re: [Ltru] rechartering to handle 639-6 (was FW:
>> Anomalyinupcomingregistry)
>>
>> I think the difference among dialects is important, but 639-6 doesn't work
>> for incorporation into BCP47. The advantage of the BCP47 structure is that
>> it allows reasonable behavior for applications that don't recognize the
>> dialectical distinctions. That is, if we have:
>>
>> en-US-southern
>> en-US-newengla
>> en-US-philly (pronouncing "huge" as /judʒ/)
>> en-US-general
>> en-GB-scots
>> ...
>> (cf http://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language
>> )
>>
>> Then applications that don't recognize the variants can fall back to en-US
>> and en-GB. If these all had different atomic primary language codes, then
>> applications would be forced to keep all the relationship data for all the
>> codes hanging around - which, frankly, they are not going to do.
>>
>> Mark
>>
>>
>> On Wed, Jul 15, 2009 at 08:37, Broome, Karen <Karen.Broome@am.sony.com>wrote:
>>
>>>  For what it's worth, the film and television world does have a pretty
>>> heavy requirement for dialect distinctions. We also have a need to identify
>>> spoken and written variants. ISO 639-6 also provides a fixed-length tag,
>>> which can be advantageous in some situations. While I tend to see ISO 639-6
>>> as an interesting alternative to xml:lang and not necessarily something I'd
>>> use within xml:lang, I wanted to correct the assumption that dialect tagging
>>> is obscure and the distinction between spoken and written variants is not
>>> useful.
>>>
>>> Regards,
>>>
>>> Karen Broome
>>>
>>>
>>> -----Original Message-----
>>> From: ltru-bounces@ietf.org on behalf of Kent Karlsson
>>> Sent: Wed 7/15/2009 6:27 AM
>>> To: debbie@ictmarketing.co.uk; 'LTRU Working Group'
>>> Subject: Re: [Ltru] rechartering to handle 639-6 (was FW:
>>> Anomalyinupcomingregistry)
>>>
>>>
>>>  Den 2009-07-15 09.00, skrev "Debbie Garside" <debbie@ictmarketing.co.uk
>>> >:
>>>
>>> > Well for starters, there are separate codes for Catalan and Valencian
>>> :-)
>>>
>>> So does BCP 47 (well, nearly):
>>>     ca
>>>     ca-valencia
>>>
>>> There is nothing in principle hindering a registration of a variant
>>> subtag
>>> specifically for "true" Catalan (no value judgement implied).
>>>
>>> > And, I rather like the way ISO 639-6 deals with variants of Chinese.
>>>
>>> 639-3 also deals with "variants" of Chinese (separate languages, really).
>>> How does 639-6 do it differently (apart from using 4-letter codes instead
>>> of
>>> 3-letter codes)?
>>>
>>> > Perhaps you would like to tell me how many of the 7000+ codes of ISO
>>> 639-3
>>> > will be used.  My guess is approximately 2-300 at present but over time
>>> more
>>> > and more.  The answer is the same for ISO 639-6.
>>> >
>>> > Essentially, all the reasons for including ISO 639-6 are the same as
>>> for
>>> > including ISO 639-3.  Unless of course, you think that ISO 639-3 is
>>> perfect
>>> > and defines all languages distinctly and that anything else cannot, is
>>> not,
>>> > and definitely is not a language.  Then of course you have to decide
>>> that
>>> > BCP 47 will only deal with languages and not dialects.
>>>
>>> BCP 47 does deal with dialects, using variant subtags. However, it is
>>> very
>>> very far from systematic or comprehensive. It requires individual
>>> registration of each variant. I would venture to guess that that process
>>> will never result in a systematic or (in some sense) comprehensive set
>>> of variant subtags for dialects. On the other hand, the call for tagging
>>> dialects separately, currently seems fairly small amongst the consumers
>>> of
>>> BCP 47, IMHO.
>>>
>>>     /kent k
>>>
>>> > Then, and only then,
>>> > may you exclude ISO 639-6.
>>> >
>>> >
>>> > Debbie
>>>
>>>
>>> _______________________________________________
>>> Ltru mailing list
>>> Ltru@ietf.org
>>> https://www.ietf.org/mailman/listinfo/ltru
>>>
>>>
>>>
>>> _______________________________________________
>>> Ltru mailing list
>>> Ltru@ietf.org
>>> https://www.ietf.org/mailman/listinfo/ltru
>>>
>>>
>>
>