Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language

Gunnar Hellström <gunnar.hellstrom@omnitor.se> Sun, 19 November 2017 10:41 UTC

Return-Path: <gunnar.hellstrom@omnitor.se>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1D79C1200FC for <slim@ietfa.amsl.com>; Sun, 19 Nov 2017 02:41:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ct-ozw3BSNQg for <slim@ietfa.amsl.com>; Sun, 19 Nov 2017 02:40:59 -0800 (PST)
Received: from bin-vsp-out-02.atm.binero.net (vsp-unauthed02.binero.net [195.74.38.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BF0671200C1 for <slim@ietf.org>; Sun, 19 Nov 2017 02:40:58 -0800 (PST)
X-Halon-ID: 18762ae7-cd16-11e7-96ac-005056917f90
Authorized-sender: gunnar.hellstrom@omnitor.se
Received: from [192.168.2.136] (unknown [83.209.157.37]) by bin-vsp-out-02.atm.binero.net (Halon) with ESMTPSA id 18762ae7-cd16-11e7-96ac-005056917f90; Sun, 19 Nov 2017 11:40:44 +0100 (CET)
To: Bernard Aboba <bernard.aboba@gmail.com>
Cc: slim@ietf.org
References: <CAOW+2dsZtuciPiKMfif=ZmUqBcUd9TyYtL5gPYDp7ZfLOHHDBA@mail.gmail.com> <6ebf2b8a-8699-27c1-87af-41acab4cb940@omnitor.se> <CAOW+2duq9qkXBy8S+a_GSpmPwypMGLfYL3V9ZZfkrDraSA+S1w@mail.gmail.com>
From: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
Message-ID: <7cf08f46-af12-a18b-b9e0-a0b14be7c816@omnitor.se>
Date: Sun, 19 Nov 2017 11:40:48 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <CAOW+2duq9qkXBy8S+a_GSpmPwypMGLfYL3V9ZZfkrDraSA+S1w@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------7EBCE60112A79A8F382445A8"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/jnShF9HGN3lYLSbzy6P_tqI_0Vk>
Subject: Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Nov 2017 10:41:02 -0000

Den 2017-11-19 kl. 06:24, skrev Bernard Aboba:
> Gunnar said:
>
> "I earlier thought that an application needed to look into the 
> language subtag Description to find the word "sign" there in the text 
> string. That is not a good solution."
>
> [BA] Agreed. Also, as indicated in RFC 5646 Section 4.1.2 and the IANA 
> registry 
> (https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), 
> sign languages may not have always have a subtag of 'sgn':
>
>     Sign languages share a mode of communication rather than a linguistic
>     heritage.  There are many sign languages that have developed
>     independently, and the subtag 'sgn' indicates only the presence of a
>     sign language.  A number of sign languages also had grandfathered
>     tags registered for them during theRFC 3066 <https://tools.ietf.org/html/rfc3066>  era.  For example, the
>     grandfathered tag "sgn-US" was registered to represent 'American Sign
>     Language' specifically, without reference to the United States.  This
>     is still valid, but deprecated: a document in American Sign Language
>     can be labeled either "ase" or "sgn-ase" (the 'ase' subtag is for the
>     language called 'American Sign Language').
>
<GH>Right. It cannot be seen by the language tag itself if it is a sign 
language. You need to interrogate the IANA registry.
> Gunnar also said:
>
> "A specific sign language can be identified by its existence in the IANA
> registry of language subtags according to BCP 47 [RFC5646] , and finding
> that the language subtag is found at least in two entries in the
> registry, once with the Type field "language" and once with the Type
> field "extlang" combined with the Prefix field value "sgn".
>
> So that should be the response on Dales request to easily decide if a 
> language tag is for a sign language."
>
> [BA]  Looking at the IANA registry, having a Type field "extlang" 
> combined with the Prefix field value 'sgn' seems to be used as an 
> indicator of a sign language.  Do you think we can rely on this? 
> Currently this is only a SHOULD  in RFC 5646 Section 3.4:
>
>              3.  Sign languages SHOULD have an 'extlang' record with an'Prefix' of 'sgn'.
<GH>Yes, it is a SHOULD. But all registered sign languages so far follow 
this rule, so I think it is solid. Do you think we need to weaken the 
start of our recommended procedure:

"A specific sign language can be identified...."   The "can" could be weakened, but I do not think we need to do so.

> "My wording proposal starts with the obvious cases: a non-signed 
> language tag in audio media is spoken, and a non-signed language tag 
> in text media is written."
<GH>Among the three obvious cases are also that sign languages in video 
media are signed.
> [BA] Assuming your suggested approach allows us to reliably determine non-sign languages, this seems solid.
> Gunnar further said:
> "But for other cases, like in video or message or application or multiplexed media, other indications must be used to
> understand the intended modality... I wish for the ambiguous cases we could use the script subtag -zxxx to indicate
> spoken modality and a real script subtag..."
> [BA] This is where I become uneasy, because without an explicit mechanism such as script subtags, there is the potential for ambiguity.
> Trying to address reduce that ambiguity via heuristics could turn out to be a bad idea, compared with proceeding more cautiously
> by leaving behavior undefined for now and revisiting the situation later when we understand the problem better. For example:
> "Use for sending of a visual view of a speaking person may be indicated by the value "speaker" in an SDP
> Content attribute according to RFC 4796 [RFC4796] in a "video" media stream or another media carrying video (e.g. "message" or "application")."
> [BA] There are quite a few potential corner cases here. For example, if "en-US" language is included in an offer within a video m-line, should the answerer assume this implies a willingness to lip read US English if the value "speaker" is in the Content attribute?  What can be assumed if a value other than "speaker" is in the Content attribute? Might that represent something entirely different, such as the desire to receive captioning in US English? If so, how could the Offerer indicate both the capability of lipreading and the ability to handle captions? And what happens if the Answerer doesn't mimic the Content attribute in the Offer? Seems like there are some potential "gotchas" here.
<GH>Yes, I agree that the hint to use the "Content" attribute was not 
solid. One problem with it is that it is said to indicate only what is 
to be sent in the media, so we have no way to use it for desired 
received modality. Thereby it cannot be used as a confirmation from the 
answering party. So it is a weak hint and may confuse more than it helps.
The idea was however to be informative guide to a number of possible 
ways for applications to indicate modality when it is not one of the 
three obvious cases. We can delete the hint about the "Content" attribute.
> "Use of written modality in another media stream than
> "text", may be discriminated by use of a script subtag in the language
> tag, where that is appropriate."
> [BA] What if the language in question has script subtags suppressed? What if the Offerer includes a script subtag in the video m-line but also "speaker" in the Content attribute? Again, there could be quite a few corner cases lurking here.
<GH> The application must not create illogical combinations. But let us 
drop the "Content" attribute hint. I find the use of the script subtag 
much more solid for the otherwise ambiguous cases. RFC 5646 clearly says 
that it is allowed to use even suppressed script subtags when its use 
has an important discriminating meaning. (Section 4.1 of RFC 5646 says:

        "The script subtag SHOULD NOT be used to form language tags unless
        the script adds some distinguishing information to the tag."

That is true for our case. The problem is that we have had persistent 
resistance from the language experts for both using suppressed script 
substags for written modality and using the -Zxxx script subtag for 
spoken modality.  Can we explain our case better and get a go ahead from 
the language experts? That would result in a really simple set of rules.

Gunnar

>   
>
> On Sat, Nov 18, 2017 at 2:33 PM, Gunnar Hellström 
> <gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>> wrote:
>
>     Thanks Bernard for pushing for closing the last open issue.
>     Den 2017-11-18 kl. 19:46, skrev Bernard Aboba:
>>     At this point, only a single Issue (43) remains open on
>>     draft-ietf-slim-negotiating-human-language::
>>     https://trac.ietf.org/trac/slim/ticket/43
>>     <https://trac.ietf.org/trac/slim/ticket/43>
>>
>>     This relates to the modality of a language indication.
>>
>>     Currently, Gunnar has suggested a modification to the text of
>>     Section 5.4 in order to address the issue:
>>     https://mailarchive.ietf.org/arch/msg/slim/A4b6Wpgh0Z0zpXKqpwF9bfdW35g
>>     <https://mailarchive.ietf.org/arch/msg/slim/A4b6Wpgh0Z0zpXKqpwF9bfdW35g>
>>
>>
>>     Can WG participants review this suggested change, so that we can
>>     determine how to move forward?
>>
>>     Currently, Section 5.4 states that:
>>
>>         The problem of knowing which language tags are signed and which are
>>         not is out of scope of this document.
>     I earlier thought that an application needed to look into the
>     language subtag Description to find the word "sign" there in the
>     text string. That is not a good solution. But when studying the
>     topic again in RFC 5646 I found that there is a consistent
>     machine-implementable way to assess if a language subtag is for a
>     sign language.
>     Therefore I included this text in the latest proposal for section
>     5.4 at the link that Bernard provided:
>
>     "
>
>     A specific sign language can be identified by its existence in the IANA
>     registry of language subtags according to BCP 47 [RFC5646] , and finding
>     that the language subtag is found at least in two entries in the
>     registry, once with the Type field "language" and once with the Type
>     field "extlang" combined with the Prefix field value "sgn".
>
>     "
>
>     So that should be the response on Dales request to easily decide
>     if a language tag is for a sign language.
>
>     Worse is next topic in issue 43: to assess if a language tag is
>     for a spoken modality or written modality of a language.
>     My wording proposal starts with the obvious cases: a non-signed
>     languge tag in audio media is spoken, and a non-signed language
>     tag in text media is written.  But for other cases, like in video
>     or message or application or multiplexed media, other indications
>     must be used to understand the intended modality. The proposed
>     text mentions a few, and leaves to applications to decide which
>     mechanisms to use for such cases. I wish we for the ambiguous
>     cases could use the script subtag -zxxx to indicate spoken
>     modality and a real script subtag even on language subtags where
>     script subtags are suppressed, because that would satisfy issue 43
>     nicely and make section 5.4 much shorter and clearer. But we have
>     had resistance against that solution.
>
>     The proposed text might be a bit long and detailed. I am prepared
>     to agree on a shortened version if there are any proposals. I
>     think though that contents of 5.4 in the direction of my proposal
>     is what satisfies  issue 43 and also the comments lately that
>     section 5.4 is too restrictive.
>
>     /Gunnar
>
>
>>
>>
>>     _______________________________________________
>>     SLIM mailing list
>>     SLIM@ietf.org <mailto:SLIM@ietf.org>
>>     https://www.ietf.org/mailman/listinfo/slim
>>     <https://www.ietf.org/mailman/listinfo/slim>
>
>     -- 
>     -----------------------------------------
>     Gunnar Hellström
>     Omnitor
>     gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>
>     +46 708 204 288
>
>

-- 
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288