Re: [Slim] Issue 43: How to know the modality of a language indication?

Paul Kyzivat <pkyzivat@alum.mit.edu> Sun, 15 October 2017 19:27 UTC

Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0405F1331E4 for <slim@ietfa.amsl.com>; Sun, 15 Oct 2017 12:27:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.201
X-Spam-Level:
X-Spam-Status: No, score=-4.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DtSo342TU_Mk for <slim@ietfa.amsl.com>; Sun, 15 Oct 2017 12:27:44 -0700 (PDT)
Received: from alum-mailsec-scanner-1.mit.edu (alum-mailsec-scanner-1.mit.edu [18.7.68.12]) by ietfa.amsl.com (Postfix) with ESMTP id 8E427133078 for <slim@ietf.org>; Sun, 15 Oct 2017 12:27:44 -0700 (PDT)
X-AuditID: 1207440c-7fdff7000000143e-35-59e3b6ad3f25
Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by alum-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP id 06.1B.05182.EA6B3E95; Sun, 15 Oct 2017 15:27:42 -0400 (EDT)
Received: from PaulKyzivatsMBP.localdomain (c-24-62-227-142.hsd1.ma.comcast.net [24.62.227.142]) (authenticated bits=0) (User authenticated as pkyzivat@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id v9FJResa023215 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sun, 15 Oct 2017 15:27:41 -0400
To: Bernard Aboba <bernard.aboba@gmail.com>
Cc: Gunnar Hellström <gunnar.hellstrom@omnitor.se>, slim@ietf.org
References: <CAOW+2dtSOgp3JeiSVAttP+t0ZZ-k3oJK++TS71Xn7sCOzMZNVQ@mail.gmail.com> <p06240606d607257c9584@172.20.60.54> <fb9e6b79-7bdd-9933-e72e-a47bc8c93b58@omnitor.se> <CAOW+2dtteOadptCT=yvfmk01z-+USfE4a7JO1+u_fkTp72ygNA@mail.gmail.com> <da5cfaea-75f8-3fe1-7483-d77042bd9708@alum.mit.edu> <b2611e82-2133-0e77-b72b-ef709b1bba3c@omnitor.se> <1b0380ef-b57d-3cc7-c649-5351dc61f878@alum.mit.edu> <CAOW+2dtVE5BDmD2qy_g-asXvxntif4fVC8LYO4j7QLQ5Kq2E+g@mail.gmail.com>
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
Message-ID: <3fc6d055-08a0-2bdb-f6e9-99b94efc49df@alum.mit.edu>
Date: Sun, 15 Oct 2017 15:27:40 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <CAOW+2dtVE5BDmD2qy_g-asXvxntif4fVC8LYO4j7QLQ5Kq2E+g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprFKsWRmVeSWpSXmKPExsUixO6iqLtu2+NIg0tXrS027PvPbLHj/RkW i5kfOtkcmD12zrrL7rFkyU8mj4mLPzEHMEdx2aSk5mSWpRbp2yVwZSyctISl4LFrxdONT9kb GH8adDFyckgImEj0nlzN1MXIxSEksINJ4sn1kywQzkMmiTeXnzKBVAkLBElsOn2CDcQWEdCW 6Pu2DyzOLBAt8fDDD2aIhpvMEo+b97GAJNgEtCTmHPoPZHNw8ArYS9yYXwESZhFQlTi3ZQEj iC0qkCZxZ8ZDsDm8AoISJ2c+ASvnFAiUeLdUCmK8mcS8zQ+ZIWxxiVtP5kOtlZdo3jqbeQKj wCwk3bOQtMxC0jILScsCRpZVjHKJOaW5urmJmTnFqcm6xcmJeXmpRbqGermZJXqpKaWbGCFB zbOD8ds6mUOMAhyMSjy8AhmPIoVYE8uKK3MPMUpyMCmJ8p5rfRgpxJeUn1KZkVicEV9UmpNa fIhRgoNZSYR3TsPjSCHelMTKqtSifJiUNAeLkjiv6hJ1PyGB9MSS1OzU1ILUIpisDAeHkgRv /FagRsGi1PTUirTMnBKENBMHJ8hwHqDhZSA1vMUFibnFmekQ+VOMlhw9PTf+MHHsuH8bSD66 cfcPkxBLXn5eqpQ4Ly9IgwBIQ0ZpHtxMWJJ6xSgO9KIw73GQKh5ggoOb+gpoIRPQwncRD0AW liQipKQaGD08/j77vubMEt+lF+dNX//26Z/D+4KmvQlLYf/uu2yRRtahQhkN/amRC2SCviw4 lX73eevVbcfmmRya+Hv5acejewsMwrYVLd2n6P1oeUlYqt1JE1F2nZKl3Rvm52++azAjv4LP +8bSUIWnv6z3bbtgYtt2dHp41dzPiYfzDZwqlfzeLxTsOPNHiaU4I9FQi7moOBEA/Skpdi0D AAA=
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/4rRWfS7VZreBqwuspZ4tIE57gqg>
Subject: Re: [Slim] Issue 43: How to know the modality of a language indication?
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Oct 2017 19:27:47 -0000

On 10/15/17 1:49 PM, Bernard Aboba wrote:
> Paul said:
> 
> "For the software to know must mean that it will behave differently for 
> a tag that represents a sign language than for one that represents a 
> spoken or written language. What is it that it will do differently?"
> 
> [BA] In terms of behavior based on the signed/non-signed distinction, in 
> -17 the only reference appears to be in Section 5.4, stating that 
> certain combinations are not defined in the document (but that 
> definition of those combinations was out of scope):

I'm asking whether this is a distinction without a difference. I'm not 
asking whether this makes a difference in the *protocol*, but whether in 
the end it benefits the participants in the call in any way. For instance:

- does it help the UA to decide how to alert the callee, so that the
   callee can better decide whether to accept the call or instruct the
   UA about how to handle the call?

- does it allow the UA to make a decision whether to accept the media?

- can the UA use this information to change how to render the media?

And if there is something like this, will the UA be able to do this 
generically based on whether the media is sign language or not, or will 
the UA need to already understand *specific* sign language tags?

E.g., A UA serving a deaf person might automatically introduce a sign 
language interpreter into an incoming audio-only call. If the incoming 
call has both audio and video then the video *might* be for conveying 
sign language, or not. If not then the UA will still want to bring in a 
sign language interpreter. But is knowing the call generically contains 
sign language sufficient to decide against bringing in an interpreter? 
Or must that depend on it being a sign language that the user can use? 
If the UA is configured for all the specific sign languages that the 
user can deal with then there is no need to recognize other sign 
languages generically.

	Thanks,
	Paul

>       5.4
>       <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4>.
>       Undefined Combinations
> 
> 
> 
>     The behavior when specifying a non-signed language tag for a video
>     media stream, or a signed language tag for an audio or text media
>     stream, is not defined in this document.
> 
>     The problem of knowing which language tags are signed and which are
>     not is out of scope of this document.
> 
> 
> 
> On Sun, Oct 15, 2017 at 10:13 AM, Paul Kyzivat <pkyzivat@alum.mit.edu 
> <mailto:pkyzivat@alum.mit.edu>> wrote:
> 
>     On 10/15/17 2:24 AM, Gunnar Hellström wrote:
> 
>         Paul,
>         Den 2017-10-15 kl. 01:19, skrev Paul Kyzivat:
> 
>             On 10/14/17 2:03 PM, Bernard Aboba wrote:
> 
>                 Gunnar said:
> 
>                 "Applications not implementing such specific notations
>                 may use the following simple deductions.
> 
>                 - A language tag in audio media is supposed to indicate
>                 spoken modality.
> 
>                 [BA] Even a tag with "Sign Language" in the description??
> 
>                 - A language tag in text media is supposed to indicate 
>                 written modality.
> 
>                 [BA] If the tag has "Sign Language" in the description,
>                 can this document really say that?
> 
>                 - A language tag in video media is supposed to indicate
>                 visual sign language modality except for the case when
>                 it is supposed to indicate a view of a speaking person
>                 mentioned in section 5.2 characterized by the exact same
>                 language tag also appearing in an audio media specification.
> 
>                 [BA] It seems like an over-reach to say that a spoken
>                 language tag in video media should instead be
>                 interpreted as a request for Sign Language.  If this
>                 were done, would it always be clear which Sign Language
>                 was intended?  And could we really assume that both
>                 sides, if negotiating a spoken language tag in video
>                 media, were really indicating the desire to sign?  It
>                 seems like this could easily result interoperability
>                 failure.
> 
> 
>             IMO the right way to indicate that two (or more) media
>             streams are conveying alternative representations of the
>             same language content is by grouping them with a new
>             grouping attribute. That can tie together an audio with a
>             video and/or text. A language tag for sign language on the
>             video stream then clarifies to the recipient that it is sign
>             language. The grouping attribute by itself can indicate that
>             these streams are conveying language.
> 
>         <GH>Yes, and that is proposed in
>         draft-hellstrom-slim-modality-grouping    with two kinds of
>         grouping: One kind of grouping to tell that two or more
>         languages in different streams are alternatives with the same
>         content and a priority order is assigned to them to guide the
>         selection of which one to use during the call. The other kind of
>         grouping telling that two or more languages in different streams
>         are desired together with the same language content but
>         different modalities ( such as the use for captioned telephony
>         with the same content provided in both speech and text, or sign
>         language interpretation where you see the interpreter,  or
>         possibly spoken language interpretation with the languages
>         provided in different audio streams ). I hope that that draft
>         can be progressed. I see it as a needed complement to the pure
>         language indications per media.
> 
> 
>     Oh, sorry. I did read that draft but forgot about it.
> 
>         The discussion in this thread is more about how an application
>         would easily know that e.g. "ase" is a sign language and "en" is
>         a spoken (or written) language, and also a discussion about what
>         kinds of languages are allowed and indicated by default in each
>         media type. It was not at all about falsely using language tags
>         in the wrong media type as Bernard understood my wording. It was
>         rather a limitation to what modalities are used in each media
>         type and how to know the modality with cases that are not
>         evident, e.g. "application" and "message" media types.
> 
> 
>     What do you mean by "know"? Is it for the *UA* software to know, or
>     for the human user of the UA to know? Presumably a human user that
>     cares will understand this if presented with the information in some
>     way. But typically this isn't presented to the user.
> 
>     For the software to know must mean that it will behave differently
>     for a tag that represents a sign language than for one that
>     represents a spoken or written language. What is it that it will do
>     differently?
> 
>              Thanks,
>              Paul
> 
> 
>         Right now we have returned to a very simple rule: we define only
>         use of spoken language in audio media, written language in text
>         media and sign language in video media.
>         We have discussed other use, such as a view of a speaking person
>         in video, text overlay on video, a sign language notation in
>         text media, written language in message media, written language
>         in WebRTC data channels, sign written and spoken in bucket media
>         maybe declared as application media. We do not define these
>         cases. They are just not defined, not forbidden. They may be
>         defined in the future.
> 
>         My proposed wording in section 5.4 got too many
>         misunderstandings so I gave up with it. I think we can live with
>         5.4 as it is in version -16.
> 
>         Thanks,
>         Gunnar
> 
> 
> 
>             (IIRC I suggested something along these lines a long time ago.)
> 
>                  Thanks,
>                  Paul
> 
>             _______________________________________________
>             SLIM mailing list
>             SLIM@ietf.org <mailto:SLIM@ietf.org>
>             https://www.ietf.org/mailman/listinfo/slim
>             <https://www.ietf.org/mailman/listinfo/slim>
> 
> 
> 
>     _______________________________________________
>     SLIM mailing list
>     SLIM@ietf.org <mailto:SLIM@ietf.org>
>     https://www.ietf.org/mailman/listinfo/slim
>     <https://www.ietf.org/mailman/listinfo/slim>
> 
>