Re: [Ltru] Fw: I-D Action:draft-burnett-pronunciation-alphabet-registry-00.txt

Leif Halvard Silli <lhs@malform.no> Wed, 16 December 2009 13:28 UTC

Return-Path: <lhs@malform.no>
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 991E73A687D for <ltru@core3.amsl.com>; Wed, 16 Dec 2009 05:28:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_34=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YEov+EzUcvDf for <ltru@core3.amsl.com>; Wed, 16 Dec 2009 05:28:34 -0800 (PST)
Received: from smtp.domeneshop.no (smtp.domeneshop.no [194.63.248.54]) by core3.amsl.com (Postfix) with ESMTP id A8B303A680E for <ltru@ietf.org>; Wed, 16 Dec 2009 05:28:34 -0800 (PST)
Received: from cm-84.208.110.159.getinternet.no ([84.208.110.159] helo=[10.0.1.2]) by smtp.domeneshop.no with esmtpa (Exim 4.69) (envelope-from <lhs@malform.no>) id 1NKtvM-0000iT-Dq; Wed, 16 Dec 2009 14:28:19 +0100
Date: Wed, 16 Dec 2009 14:28:04 +0100
From: Leif Halvard Silli <lhs@malform.no>
To: Peter Constable <petercon@microsoft.com>
Message-ID: <20091216142804233819.e6a7ff2f@malform.no>
In-Reply-To: <BF2262AF099A70419F68A17FF6338DF40443A0F1@TK5EX14MBXC141.redmond.corp.microsoft.com>
References: <012a01ca7c75$07aa4e60$6801a8c0@oemcomputer> <20091215210229.GA28404@mercury.ccil.org> <001d01ca7e1b$4a6796c0$6801a8c0@oemcomputer> <BF2262AF099A70419F68A17FF6338DF40443A0F1@TK5EX14MBXC141.redmond.corp.microsoft.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: base64
Organization: =?utf-8?B?TcOlbGZvcm0ubm8=?=
X-Mailer: GyazMail version 1.5.9
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] Fw: I-D Action:draft-burnett-pronunciation-alphabet-registry-00.txt
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Dec 2009 13:28:35 -0000

Peter Constable, Wed, 16 Dec 2009 08:59:49 +0000:
> From: ltru-bounces@ietf.org [mailto:ltru-bounces@ietf.org] On Behalf 
> Of Randy Presuhn
> 
>> (FWIW, I don't see why a pronouncing alphavet would *necessarily* be 
>> a romanization system.)
> 
> Indeed: I met a Korean linguist about 10 years ago who was promoting 
> use of Hangul for general phonetic transcription.

Indeed. But whether we talk about Romanization or a form of 
transliteration/transcription into a non-Roman script - such as 
Cyrillic, seems besides the point. Or? 

The pronunciation alphabet registry is meant to be used in documents 
crafted according to the Pronunciation Lexicon Specification or the 
Speech Synthesis Markup Language, which both of them reckon IPA as one 
of the possible pronunciation alphabets. As code for IPA, they both use 
"ipa'. However, 'ipa' is not registered in the pronunciation alphabet 
registry. As we know, the LTR tag for IPA is 'fonipa'.

The real question is: Should they have relied on the language tag 
registry or on their own registry?

The Pronunication Lexicon spec includes xml:lang as part of its 
language: [1]

 ]]
The required xml:lang attribute allows identification of the language 
for which the pronunciation lexicon is relevant. IETF Best Current 
Practice 47 [BCP47] is the normative reference on the values of the 
xml:lang attribute.
Note that xml:lang specifies a single unique language for the entire 
PLS document. This does not limit the ability to create multilingual 
SRGS [SRGS] and SSML [SSML] documents. These documents may reference 
multiple pronunciation lexicons, possibly written for different 
languages.
 [[

Here is a simplification of one example in the PLS spec: [2]

<lexicon alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>tomato</grapheme>
    <phoneme>təmei̥ɾou̥</phoneme>
    <!-- IPA string is: "t&#x0259;mei&#x325;&#x027E;ou&#x325;" -->
  </lexeme>
</lexicon>

I guess, in theory, they could have skipped the alphabet attribute and 
used a language tag in the phoneme element instead - though it could be 
very 
impractical and perhaps error prone:

<lexicon xml:lang="en-US">
  <lexeme>
    <grapheme>tomato</grapheme>
    <phoneme xml:lang="en-US-fonipa" >təmei̥ɾou̥</phoneme>
    <!-- IPA string is: "t&#x0259;mei&#x325;&#x027E;ou&#x325;" -->
  </lexeme>
</lexicon>

But may be they could have changed alphabet="ipa" to 
alphabet="en-US-fonipa"? That would at least require a rewrite of the 
spec. But I think this also would have been impractical and also prone 
to errors. E.g. what if the xml:lang said "en-us" and the alphabet 
attribute said "ru-ru-fonipa"?

[1] http://www.w3.org/TR/pronunciation-lexicon/#S4.1
[2] http://www.w3.org/TR/pronunciation-lexicon/#S4.1.1
-- 
leif halvard silli