Re: [Ltru] rechartering to handle 639-6 (was FW:Anomalyinupcomingregistry)

"Debbie Garside" <debbie@ictmarketing.co.uk> Thu, 16 July 2009 12:58 UTC

Return-Path: <prvs=1448bb83d1=debbie@ictmarketing.co.uk>
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 453C628C1F6 for <ltru@core3.amsl.com>; Thu, 16 Jul 2009 05:58:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.971
X-Spam-Level:
X-Spam-Status: No, score=-2.971 tagged_above=-999 required=5 tests=[AWL=0.727, BAYES_00=-2.599, GB_I_LETTER=-2, HTML_MESSAGE=0.001, J_CHICKENPOX_34=0.6, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ONOonj9XHq4A for <ltru@core3.amsl.com>; Thu, 16 Jul 2009 05:58:20 -0700 (PDT)
Received: from mx1.nexbyte.net (132.nexbyte.net [62.197.41.132]) by core3.amsl.com (Postfix) with ESMTP id 4571D28C1D1 for <ltru@ietf.org>; Thu, 16 Jul 2009 05:58:15 -0700 (PDT)
Received: from 145.nexbyte.net ([62.197.41.145]) by mx1.nexbyte.net (mx1.nexbyte.net [62.197.41.132]) (MDaemon PRO v9.6.6) with ESMTP id md50009639688.msg for <ltru@ietf.org>; Thu, 16 Jul 2009 14:18:25 +0100
X-Spam-Processed: mx1.nexbyte.net, Thu, 16 Jul 2009 14:18:25 +0100 (not processed: message from trusted or authenticated source)
X-MDRemoteIP: 62.197.41.145
X-Return-Path: prvs=1448bb83d1=debbie@ictmarketing.co.uk
X-Envelope-From: debbie@ictmarketing.co.uk
X-MDaemon-Deliver-To: ltru@ietf.org
Received: from Vickynew ([213.208.115.6]) by 145.nexbyte.net with MailEnable ESMTP; Thu, 16 Jul 2009 13:57:59 +0100
From: "Debbie Garside" <debbie@ictmarketing.co.uk>
To: "'Peter Constable'" <petercon@microsoft.com>, =?UTF-8?Q?'Mark_Davis_=E2=8C=9B'?= <mark@macchiato.com>
References: <C683A5F6.F25A%kent.karlsson14@comhem.se><8D97027965E89F488BC87B919382D9FD0510BEC2@ussdixms01.am.sony.com><30b660a20907151031i6e8bbeedyc4d33ffa59cce113@mail.gmail.com><203d01ca058c$7bd8da50$0c00a8c0@CPQ86763045110><30b660a20907151415o36cdaf8br5d5ec951688ebdac@mail.gmail.com> <207901ca059b$5e54dc90$0c00a8c0@CPQ86763045110> <DDB6DE6E9D27DD478AE6D1BBBB8357956B0B29A08D@NA-EXMSG-C117.redmond.corp.microsoft.com>
Date: Thu, 16 Jul 2009 13:48:46 +0100
Message-ID: <446901ca0613$c2efc840$0300a8c0@Vickynew>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_446A_01CA061C.24B43040"
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350
In-Reply-To: <DDB6DE6E9D27DD478AE6D1BBBB8357956B0B29A08D@NA-EXMSG-C117.redmond.corp.microsoft.com>
thread-index: AcoFkdJKlRHl6NRYSMCYyB2rBR7NMgAAEVgQAAlt++AAFr+Q8A==
X-MDAV-Processed: mx1.nexbyte.net, Thu, 16 Jul 2009 14:18:27 +0100
Cc: 'LTRU Working Group' <ltru@ietf.org>, "'Broome, Karen'" <Karen.Broome@am.sony.com>
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW:Anomalyinupcomingregistry)
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: debbie@ictmarketing.co.uk
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Jul 2009 13:09:18 -0000

Peter wrote:

 

> prototypes – the representative centers

 

Prototype could also be defined as the primary stimulus within a category.  

 

> HYPERLINK "http://www.sil.org/silewp/abstract.asp?ref=2002-003"http://www.sil.org/silewp/abstract.asp?ref=2002-003, p. 12ff.)

 

Good paper!

 

Debbie 

 

   _____  

From: Peter Constable [mailto:petercon@microsoft.com] 
Sent: 16 July 2009 02:56
To: debbie@ictmarketing.co.uk; 'Mark Davis ⌛'
Cc: 'LTRU Working Group'; 'Broome, Karen'
Subject: RE: [Ltru] rechartering to handle 639-6 (was FW:Anomalyinupcomingregistry)

 

I believe I wrote several years back that it is not a good idea to try to define language entities in terms of their boundaries – or, by implication, by their extents (geographical or otherwise). Rather, they should be defined in terms of prototypes – the representative centers. (Cf HYPERLINK "http://www.sil.org/silewp/abstract.asp?ref=2002-003"http://www.sil.org/silewp/abstract.asp?ref=2002-003, p. 12ff.)

 

 

Peter

 

From: ltru-bounces@ietf.org [mailto:ltru-bounces@ietf.org] On Behalf Of Debbie Garside
Sent: Wednesday, July 15, 2009 3:27 PM
To: 'Mark Davis ⌛'
Cc: 'LTRU Working Group'; 'Broome, Karen'
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)

 

Mark wrote:

 

> A good example.

 

Thank you!

 

> Here's what I suggest you do:

> The following are all expressible in BCP47, and need to be tossed out:

1.	All those already with language codes (ang, enm). 

These are all ISO 639-2/3/5 codes used within the hierarchical system of ISO 639-6 in order to relate the languages, so they are already within the LSR.  It is a simple procedure to remove them.

 

> 2.  All of the written vs non-written differences

Why?  

 

> 3.  All "written in <script>" variants

 

As previously discussed, those that are expressible within BCP 47, yes, the others, of which there are many, no.

 

> Most of the rest are associated with geographic designations. 

 

Language does tend to be linked to where people live... I will agree with that... but...

 

>  Rather than have a hit&miss approach to those that seem important to the designers of 639-6, we'd be better off having some variant convention like the following.

*	unlXXXXX is a UN/LOCODE minus the country designator (see for this case, HYPERLINK "http://www.unece.org/cefact/locode/gb.htm"http://www.unece.org/cefact/locode/gb.htm) 
*	isoXXXXX is an ISO 3166-2 code (as regularized and stabilized by CLDR). See for this case, HYPERLINK "http://en.wikipedia.org/wiki/ISO_3166-2:GB"http://en.wikipedia.org/wiki/ISO_3166-2:GB, 

> So toss out all those that could be expressed in this way (using case just to make the derivation clear):

en-GB-unlHMA for Helmsdale
en-GB-unlNCS for Newcastle
en-GB-isoSCT for Scots
en-GB-isoZET for Shetlandic

 

 

I don't think so... and here's why... , let us look at the Staffordshire dialect as a case in point...

 

The staffordshire dialect area, as designated within ISO 639-6,  includes the whole of Staffordshire itself plus most of Cheshire, northern Shropshire, and parts of Southern Derbyshire, northwestern Warwickshire and northeastern Worcestershire.  it could be said that the "Potteries" dialect is within this dialect but I believe it is a distinct variant.  There is no way of defining the "potteries" via ISO or UN/LOCODE or the Staffordshire dialect area for that matter - it would be a complete mess.    The following example shows some distinctions within this dialects vowel sounds:

 

bait is pronounced like beat

beat is pronounced like bait

 

In fact, I believe that in this dialect the words "It seems the same" are pronounced "It sames the seem"!  Quite distinct and very apt to my comments, I think you will agree.

 

Similarly, the Lincolnshire dialect area consists only of central Lincolnshire, while the Leicestershire area includes most of Leicestershire (other than a bit linked to the Nottinghamshire dialect) plus part of south Nottinghamshire and parts of western Lincolnshire.

 

I think you get my drift...

 

Best regards

 

Debbie


   _____  


From: mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: 15 July 2009 22:15
To: debbie@ictmarketing.co.uk
Cc: Broome, Karen; Kent Karlsson; LTRU Working Group
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)

A good example. Here's what I suggest you do:

The following are all expressible in BCP47, and need to be tossed out:

1.	All those already with language codes (ang, enm). 
2.	All of the written vs non-written differences
3.	All "written in <script>" variants

Most of the rest are associated with geographic designations. Rather than have a hit&miss approach to those that seem important to the designers of 639-6, we'd be better off having some variant convention like the following.

*	unlXXXXX is a UN/LOCODE minus the country designator (see for this case, HYPERLINK "http://www.unece.org/cefact/locode/gb.htm"http://www.unece.org/cefact/locode/gb.htm) 
*	isoXXXXX is an ISO 3166-2 code (as regularized and stabilized by CLDR). See for this case, HYPERLINK "http://en.wikipedia.org/wiki/ISO_3166-2:GB"http://en.wikipedia.org/wiki/ISO_3166-2:GB, 

So toss out all those that could be expressed in this way (using case just to make the derivation clear):

en-GB-unlHMA for Helmsdale
en-GB-unlNCS for Newcastle
en-GB-isoSCT for Scots
en-GB-isoZET for Shetlandic
...


Then what you have left would be useful to review.

Mark

On Wed, Jul 15, 2009 at 13:40, Debbie Garside <HYPERLINK "mailto:debbie@ictmarketing.co.uk"debbie@ictmarketing.co.uk> wrote:

Mark wrote:

 

>> (cf HYPERLINK "http://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language" \nhttp://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language)

 

A more in-depth version of en-GB dialects... representing the draft data for ISO 639-6   And... I personally have to take full responsibility for this particular piece of research so corrections/suggestions etc. are most welcome!  

 


gmcw

grmc

Germanic West


nsea

gmcw

North Sea


angl

nsea

Anglic


ango

angl

Anglo Saxon


meng

ango

Middle English


enen

meng

Early Northern Middle English


emsc

enen

Early Scots Northern Middle English


msco

emsc

Middle Scots


sco

engc

Scots


scow

sco

Scots Written


scol

scow

Scots Written  Latin Script


scos

sco

Scots Spoken


sctl

sco

Scots-L


sctw

sctl

Scots-L Written


sotl

sctw

Scots-L Written Latin Script


llan

sco

Lallans


llaw

llan

Lallans Written


llnl

llaw

Lallans Written Latin Script


budo

sco

Buchan-Doric


wbud

budo

Buchan-Doric Written


lbud

wbud

Buchan-Doric Written Latin Script


stld

scos

Shetlandic


orcd

scos

Orcadian


cthn

scos

Caithness


helm

scos

Helmsdale


bkil

scos

Black-Isle


nirn

scos

Nairn


mray

scos

Moray


bchn

scos

Buchan


abrd

scos

Aberdonian


munh

scos

Mounth


csct

scos

Central-Scots


glca

scos

Glesca


swst

scos

Southwest-Scots


brsc

scos

Border-Scots


ulla

llan

Ullans


wull

ulla

Ullans Written


ulll

wull

Ullans Written Latin Script


sull

ulla

Ullans Spoken


dngl

sull

Donegal


drya

sull

Derry-Antrim


cntd

sull

County-Down


mlde

scos

Madeleine-Scots


otgo

scos

Otago-Scots


nubn

sco

Northumbrian 


nrbn

nubn

Northumbrian Spoken


nond

nrbn

Northumberland


grdi

nrbn

Geordie


wrde

nrbn

Wearside


teed

nrbn

Tees-Side


dumc

nrbn

Durham-County


swwe

nrbn

Swaledale-Wensleydale


yrkm

nrbn

Yorkshire-Moors


hmbn

nrbn

Humberside-N


lxyn

nrbn

Lyne


crle

nrbn

Carlisle


vled

nrbn

Vale-Of-Eden


lkln

nrbn

Lakeland-N


lknd

nrbn

Lakeland-S


esse

meng

Early Southern And South Western Middle English


emse

meng

Early Midland And South Eastern Middle English


emen

emse

Early Modern English


aeng

eng

Anglo-English


waen

aeng

Anglo-English Written


laen

waen

Anglo-English Written Latin Script


seng

aeng

Anglo-English Spoken


aenn

seng

Anglo-English North Cluster


naen

aenn

Northern Anglo-English


nann

naen

Northern Anglo-English Northeast


newc

nann

Newcastle


sdld

nann

Sunderland


ddbh

nann

Middlesborough


tyne

nann

Tyneside


naln

naen

Northern Anglo-English Lower North


nalc

naln

Northern Anglo-English Lower North Central


clsl

nalc

Carlisle


shff

nalc

Sheffield


cumb

nalc

Cumbria


angy

nalc

Yorkshire


wngy

angy

Yorkshire Western


hlfx

wngy

Halifax


hddf

wngy

Huddersfield


brdd

wngy

Bradford


lddd

wngy

Leeds


yrkk

wngy

York


lanc

nalc

Lancashire


lacc

lanc

Lancashire Central


hmbe

lanc

Humberside


huuu

nalc

Hull


gmby

nalc

Grimsby


cang

seng

Anglo-English Central Cluster


aenc

cang

Anglo-English West Central Cluster


scou

aenc

Scouse


nwme

aenc

North West Midlands English


mctr

nwme

Manchester


chsr

nwme

Cheshire


shpn

nwme

Shropshire North


drby

nwme

Derbyshire


stff

nwme

Staffordshire


pttr

nwme

Potteries


wmen

cang

West Midlands English


brmm

wmen

Birmingham


blco

wmen

Black Country


ecen

cang

East Central English 


cece

ecen

Central Midlands English


nnle

cece

Northern Nottinghamshire-Leicester


nttm

cece

Nottingham


leic

cece

Leicester


neme

ecen

North-East Midlands English


lncn

neme

Lincoln


eden

ecen

East Midlands English


nttt

eden

Nottinghamshire-S


slcn

eden

Lincolnshire-S


snrt

eden

Northamptonshire-S


cmbn

eden

Cambridgeshire-N


grnk

cmbr

Grantham-Newark


pbom

cmbr

Peterborough-Oakham


crby

cmbr

Corby


bnor

cmbr

Northamptonshire Borders


saen

seng

Anglo-English South Cluster


swen

saen

Anglo-English South Southwest


uswe

swen

Upper Southwest English


hrfd

uswe

Herefordshire


shrp

uswe

Shropshire


gloc

uswe

Gloucestershire


wrrc

uswe

Worcestershire


gowp

uswe

Gower Peninsula


spem

uswe

South Pembrokeshire


cswe

swen

Central Southwest English


glct

cswe

Gloucestershire


bucc

cswe

Buckinghamshire-S


ofrd

cswe

Oxfordshire


brke

cswe

Berkshire-S


hmpe

cswe

Hampshire-S


wilt

cswe

Wiltshire


avnn

cswe

Avon


sstt

cswe

Somerset


brst

cswe

Bristol


lswe

swen

Lower Southwest English


dvon

lswe

Devon


lswc

lswe

Cornwall


eacr

lswc

East Cornwall


wecr

lswc

West Cornwall


anec

seng

Anglo-English East Cluster


smen

anec

South Midlands English


nthh

smen

Northamptonshire


bedd

smen

Bedfordshire


camb

smen

Cambridgeshire


buck

smen

Buckinghamshire-NW


eaan

anec

East Anglia


sufn

eaan

Suffolk-NE


nflk

eaan

Norfolk


nfol

nflk

Norfolk Northern


nnoe

nflk

Norfolk Eastern


fenn

eaan

Fens


snea

eaan

Southern-East Anglia


hcen

anec

Home Counties English


cckn

hcen

Cockney


lndn

hcen

London Counties


hant

hcen

Hampshire


nrks

hcen

Berkshire


bckk

hcen

Buckinghamshire

 

 

Best regards

 

Debbie

 


   _____  


From: HYPERLINK "mailto:mark.edward.davis@gmail.com" \nmark.edward.davis@gmail.com [mailto:HYPERLINK "mailto:mark.edward.davis@gmail.com" \nmark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: 15 July 2009 18:31
To: Broome, Karen
Cc: Kent Karlsson; HYPERLINK "mailto:debbie@ictmarketing.co.uk" \ndebbie@ictmarketing.co.uk; LTRU Working Group
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)

I think the difference among dialects is important, but 639-6 doesn't work for incorporation into BCP47. The advantage of the BCP47 structure is that it allows reasonable behavior for applications that don't recognize the dialectical distinctions. That is, if we have:

en-US-southern
en-US-newengla
en-US-philly (pronouncing "huge" as /judʒ/)
en-US-general
en-GB-scots
...
(cf HYPERLINK "http://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language" \nhttp://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language)

Then applications that don't recognize the variants can fall back to en-US and en-GB. If these all had different atomic primary language codes, then applications would be forced to keep all the relationship data for all the codes hanging around - which, frankly, they are not going to do.

Mark

On Wed, Jul 15, 2009 at 08:37, Broome, Karen <HYPERLINK "mailto:Karen.Broome@am.sony.com" \nKaren.Broome@am.sony.com> wrote:

For what it's worth, the film and television world does have a pretty heavy requirement for dialect distinctions. We also have a need to identify spoken and written variants. ISO 639-6 also provides a fixed-length tag, which can be advantageous in some situations. While I tend to see ISO 639-6 as an interesting alternative to xml:lang and not necessarily something I'd use within xml:lang, I wanted to correct the assumption that dialect tagging is obscure and the distinction between spoken and written variants is not useful.

Regards,

Karen Broome 




-----Original Message-----
From: HYPERLINK "mailto:ltru-bounces@ietf.org" \nltru-bounces@ietf.org on behalf of Kent Karlsson
Sent: Wed 7/15/2009 6:27 AM
To: HYPERLINK "mailto:debbie@ictmarketing.co.uk" \ndebbie@ictmarketing.co.uk; 'LTRU Working Group'
Subject: Re: [Ltru] rechartering to handle 639-6 (was FW: Anomalyinupcomingregistry)

Den 2009-07-15 09.00, skrev "Debbie Garside" <HYPERLINK "mailto:debbie@ictmarketing.co.uk" \ndebbie@ictmarketing.co.uk>:

> Well for starters, there are separate codes for Catalan and Valencian :-)

So does BCP 47 (well, nearly):
    ca
    ca-valencia

There is nothing in principle hindering a registration of a variant subtag
specifically for "true" Catalan (no value judgement implied).

> And, I rather like the way ISO 639-6 deals with variants of Chinese.

639-3 also deals with "variants" of Chinese (separate languages, really).
How does 639-6 do it differently (apart from using 4-letter codes instead of
3-letter codes)?

> Perhaps you would like to tell me how many of the 7000+ codes of ISO 639-3
> will be used.  My guess is approximately 2-300 at present but over time more
> and more.  The answer is the same for ISO 639-6.
>
> Essentially, all the reasons for including ISO 639-6 are the same as for
> including ISO 639-3.  Unless of course, you think that ISO 639-3 is perfect
> and defines all languages distinctly and that anything else cannot, is not,
> and definitely is not a language.  Then of course you have to decide that
> BCP 47 will only deal with languages and not dialects.

BCP 47 does deal with dialects, using variant subtags. However, it is very
very far from systematic or comprehensive. It requires individual
registration of each variant. I would venture to guess that that process
will never result in a systematic or (in some sense) comprehensive set
of variant subtags for dialects. On the other hand, the call for tagging
dialects separately, currently seems fairly small amongst the consumers of
BCP 47, IMHO.

    /kent k

> Then, and only then,
> may you exclude ISO 639-6.
>
>
> Debbie


_______________________________________________
Ltru mailing list
HYPERLINK "mailto:Ltru@ietf.org" \nLtru@ietf.org
HYPERLINK "https://www.ietf.org/mailman/listinfo/ltru" \nhttps://www.ietf.org/mailman/listinfo/ltru


_______________________________________________
Ltru mailing list
HYPERLINK "mailto:Ltru@ietf.org" \nLtru@ietf.org
HYPERLINK "https://www.ietf.org/mailman/listinfo/ltru" \nhttps://www.ietf.org/mailman/listinfo/ltru

 

 


Internal Virus Database is out-of-date.
Checked by AVG.
Version: 7.5.560 / Virus Database: 270.12.26/2116 - Release Date: 15/05/2009 06:16



Internal Virus Database is out-of-date.
Checked by AVG. 
Version: 7.5.560 / Virus Database: 270.12.26/2116 - Release Date: 15/05/2009 06:16