Re: [Slim] Issue 43: How to know the modality of a language indication?

Bernard Aboba <bernard.aboba@gmail.com> Sun, 15 October 2017 23:22 UTC

Return-Path: <bernard.aboba@gmail.com>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3DA02133226 for <slim@ietfa.amsl.com>; Sun, 15 Oct 2017 16:22:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CiQqGVEhu5TN for <slim@ietfa.amsl.com>; Sun, 15 Oct 2017 16:21:56 -0700 (PDT)
Received: from mail-ua0-x232.google.com (mail-ua0-x232.google.com [IPv6:2607:f8b0:400c:c08::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5DA9D133229 for <slim@ietf.org>; Sun, 15 Oct 2017 16:21:56 -0700 (PDT)
Received: by mail-ua0-x232.google.com with SMTP id w45so8545006uac.3 for <slim@ietf.org>; Sun, 15 Oct 2017 16:21:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=VDGYRpaW9UyneteoxgoL1QzeTukj4M11Kt/ixlnZURs=; b=qVb1cgHy7lS+RgjxuhNwaI6D9xiXsvSa53MVR6Ti5yzSTEgvQsKjgCMlaFmoD8bBV6 Jb5DQ+6YUqoZLsUpV0NCbzM+kx+9wLXwqPF6obbrRq3bXxFTx4qEx4cY+7AWyjnxW5Bx PnGzqQ3YI8/LiuyvopmwoeuKaxgaFz2QdlgRehjjH5QhzZWPBi24ukHgje/cGjG5xIYi wM7ctk3kU2WMRoVVFop3jRf6r9hwx21ES1+ZvlCXGmtPcYQeozVY0ZObaMkx/TJHfy8f mYJdR/H7JLNrVQwvyRbRmYEMRnSwXKsLBYdWpfLG5omnNoKVxAJFu/cMY1eWNKe7/J24 1rmA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=VDGYRpaW9UyneteoxgoL1QzeTukj4M11Kt/ixlnZURs=; b=Yno1gOsH+7WhSXlkJ0C6WfpbmIdGdO7EboztOjUpeFQx9SG5fpYf9vuOYKetURCBeK jP6s9Q/yE6szd94hTRRk2vn8tROckPACSbs75vshyKLd3lTQcx1IgKmA+64jqqH7AAD3 F55JnuxH0kM67CWucoj/jAzg1Pse1F4gCBphpTimDGtRAhGNiC85DtW/wEr/UOF11C5D FyWWjm/QhA/leVnizpUVmL8NYyQzGifweD27zidaRJn128oiirAPeJx2IHHLK2cj8fHu CkP7KJViQG/Ncua+f+z6Qqi/3rnP6mpARDMzGlPF6qMcZymWPdPVLNAxty/ILtQRyG9b 4W0g==
X-Gm-Message-State: AMCzsaUYqaTtngytpvI34QYbXg+RBiCJONUCkuSm8az2/tMUeMuulPZC pwpZQe7kjDKJ+40+yzC9CMGpcHKuEIzK4IgzYNM=
X-Google-Smtp-Source: ABhQp+S4BJIiB1NspnZdg3KFeNJ9t8IwtdGNKzimMeQsZBjovHqTgwI07jWxX+uD5HQ2CTRv54KphKNjN6wsC3HhsqA=
X-Received: by 10.176.64.131 with SMTP id i3mr1637094uad.195.1508109715148; Sun, 15 Oct 2017 16:21:55 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.159.32.76 with HTTP; Sun, 15 Oct 2017 16:21:34 -0700 (PDT)
In-Reply-To: <84fb37bd-5c7a-90ea-81fd-d315faabfd96@omnitor.se>
References: <CAOW+2dtSOgp3JeiSVAttP+t0ZZ-k3oJK++TS71Xn7sCOzMZNVQ@mail.gmail.com> <p06240606d607257c9584@172.20.60.54> <fb9e6b79-7bdd-9933-e72e-a47bc8c93b58@omnitor.se> <CAOW+2dtteOadptCT=yvfmk01z-+USfE4a7JO1+u_fkTp72ygNA@mail.gmail.com> <da5cfaea-75f8-3fe1-7483-d77042bd9708@alum.mit.edu> <b2611e82-2133-0e77-b72b-ef709b1bba3c@omnitor.se> <1b0380ef-b57d-3cc7-c649-5351dc61f878@alum.mit.edu> <CAOW+2dtVE5BDmD2qy_g-asXvxntif4fVC8LYO4j7QLQ5Kq2E+g@mail.gmail.com> <3fc6d055-08a0-2bdb-f6e9-99b94efc49df@alum.mit.edu> <84fb37bd-5c7a-90ea-81fd-d315faabfd96@omnitor.se>
From: Bernard Aboba <bernard.aboba@gmail.com>
Date: Sun, 15 Oct 2017 16:21:34 -0700
Message-ID: <CAOW+2dvPSUGA_7tye+KqR1TGs1kYL43TdxBCDOHVEmWOFHud0Q@mail.gmail.com>
To: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
Cc: Paul Kyzivat <pkyzivat@alum.mit.edu>, slim@ietf.org
Content-Type: multipart/alternative; boundary="94eb2c12370eda99f7055b9e274e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/nud9Rtu7qO210bFMl0I18jrrntg>
Subject: Re: [Slim] Issue 43: How to know the modality of a language indication?
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Oct 2017 23:22:00 -0000

Paul said:

""- can the UA use this information to change how to render the media?"

[BA]  If the video is used for signing, an application might infer an
encoder preference for frame rate over resolution (e.g. in WebRTC,
RTCRtpParameters.degradationPreference = "maintain-framerate" )

See:
https://rawgit.com/w3c/webrtc-pc/master/webrtc.html#dom-rtcrtpparameters-degradationpreference

On Sun, Oct 15, 2017 at 2:22 PM, Gunnar Hellström <
gunnar.hellstrom@omnitor.se> wrote:

> Den 2017-10-15 kl. 21:27, skrev Paul Kyzivat:
>
>> On 10/15/17 1:49 PM, Bernard Aboba wrote:
>>
>>> Paul said:
>>>
>>> "For the software to know must mean that it will behave differently for
>>> a tag that represents a sign language than for one that represents a spoken
>>> or written language. What is it that it will do differently?"
>>>
>>> [BA] In terms of behavior based on the signed/non-signed distinction, in
>>> -17 the only reference appears to be in Section 5.4, stating that certain
>>> combinations are not defined in the document (but that definition of those
>>> combinations was out of scope):
>>>
>>
>> I'm asking whether this is a distinction without a difference. I'm not
>> asking whether this makes a difference in the *protocol*, but whether in
>> the end it benefits the participants in the call in any way.
>>
> <GH>Good point, I was on my way to make a similar comment earlier today.
> The difference it makes for applications to "know" what modality a language
> tag represents in its used position seems to be only for imagined functions
> that are out of scope for the protocol specification.
>
>> For instance:
>>
>> - does it help the UA to decide how to alert the callee, so that the
>>   callee can better decide whether to accept the call or instruct the
>>   UA about how to handle the call?
>>
> <GH>Yes, for a regular human user -to-user call, the result of the
> negotiation must be presented to the participants, so that they can start
> the call with a language and modality that is agreed.
> That presentation could be exactly the description from the language tag
> registry, and then no "knowledge" is needed from the application. But it is
> more likely that the application has its own string for presentation of the
> negotiated language and modality. So that will be presented. But it is
> still found by a table lookup between language tag and string for a
> language name, so no real knowledge is needed.
> We have said many times that the way the application tells the user the
> result of the negotiation is out of scope for the draft, but it is good to
> discuss and know that it can be done.
> A similar mechanism is also needed for configuration of the user's
> language preference profile further discussed below.
>
>>
>> - does it allow the UA to make a decision whether to accept the media?
>>
> <GH>No, the media should be accepted regardless of the result of the
> language negotiation.
>
>>
>> - can the UA use this information to change how to render the media?
>>
> <GH>Yes, for the specialized text notation of sign language we have
> discussed but currently placed out of scope, a very special rendering
> application is needed. The modality would be recognized by a script subtag
> to a sign language tag used in text media. However, I think that would be
> best to also use it with a specific text subtype, so that the rendering can
> be controlled by invocation of a "codec" for that rendering.
>
>>
>> And if there is something like this, will the UA be able to do this
>> generically based on whether the media is sign language or not, or will the
>> UA need to already understand *specific* sign language tags?
>>
> <GH>Applications will need to have localized versions of the names for the
> different sign languages and also for spoken languages and written
> languages, to be used in setting of preferences and announcing the results
> of the negotiation. It might be overkill to have such localized names for
> all languages in the IANA language registry, so it will need to be able to
> handle localized names of a subset och the registry. With good design
> however, this is just an automatic translation between a language tag and a
> corresponding name, so it does in fact not require any "knowledge" of what
> modality is used with each language tag.
> The application can ask for the configuration:
> "Which languages do you want to offer to send in video"
> "Which languages do you want to offer to send in text"
> "Which languages do you want to offer to send in audio"
> "Which languages do you want to be prepared to receive in video"
> "Which languages do you want to be prepared to receive in text"
> "Which languages do you want to be prepared to receive in audio"
>
> And for each question provide a list of language names to select from.
> When the selection is made, the corresponding language tag is placed in the
> profile for negotiation.
>
> If the application provides the whole IANA language registry to the user
> for each question, then there is a possibility that the user by mistake
> selects a language that requires another modality than the question was
> about. If the application shall limit the lists provided for each question,
> then it will need a kind of knowledge about which language tags suit each
> modality (and media)
>
>
>
>> E.g., A UA serving a deaf person might automatically introduce a sign
>> language interpreter into an incoming audio-only call. If the incoming call
>> has both audio and video then the video *might* be for conveying sign
>> language, or not. If not then the UA will still want to bring in a sign
>> language interpreter. But is knowing the call generically contains sign
>> language sufficient to decide against bringing in an interpreter? Or must
>> that depend on it being a sign language that the user can use? If the UA is
>> configured for all the specific sign languages that the user can deal with
>> then there is no need to recognize other sign languages generically.
>>
> <GH>We are talking about specific language tags here and knowing what
> modality they are used for. The user needs to specify which sign languages
> they prefer to use. The callee application can be made to look for gaps
> between what the caller offers and what the callee can accept, and from
> that deduct which type and languages for a conversion that is needed, and
> invoke that as a relay service. That invocation can be made completely
> table driven and have corresponding translation profiles for available
> relay services. But it is more likely that it is done by having some
> knowledge about which languages are sign languages and which are spoken
> languages and sending the call to the relay service to try to sort out if
> they can handle the translation.
>
>>
>>
>> So, the answer is - no, the application does not really have any
> knowledge about which modality a language tag represents in its used
> position. If the user selects to indicate very rare language tag
> indications for a media, then a match will just become very unlikely.
>
> Where does this discussion take us? Should we modify section 5.4 again?
>
> Thanks
> Gunnar
>
>     Thanks,
>>     Paul
>>
>>       5.4
>>> <https://tools.ietf.org/html/draft-ietf-slim-negotiating-hum
>>> an-language-17#section-5.4>.
>>>       Undefined Combinations
>>>
>>>
>>>
>>>     The behavior when specifying a non-signed language tag for a video
>>>     media stream, or a signed language tag for an audio or text media
>>>     stream, is not defined in this document.
>>>
>>>     The problem of knowing which language tags are signed and which are
>>>     not is out of scope of this document.
>>>
>>>
>>>
>>> On Sun, Oct 15, 2017 at 10:13 AM, Paul Kyzivat <pkyzivat@alum.mit.edu
>>> <mailto:pkyzivat@alum.mit.edu>> wrote:
>>>
>>>     On 10/15/17 2:24 AM, Gunnar Hellström wrote:
>>>
>>>         Paul,
>>>         Den 2017-10-15 kl. 01:19, skrev Paul Kyzivat:
>>>
>>>             On 10/14/17 2:03 PM, Bernard Aboba wrote:
>>>
>>>                 Gunnar said:
>>>
>>>                 "Applications not implementing such specific notations
>>>                 may use the following simple deductions.
>>>
>>>                 - A language tag in audio media is supposed to indicate
>>>                 spoken modality.
>>>
>>>                 [BA] Even a tag with "Sign Language" in the description??
>>>
>>>                 - A language tag in text media is supposed to indicate
>>>                 written modality.
>>>
>>>                 [BA] If the tag has "Sign Language" in the description,
>>>                 can this document really say that?
>>>
>>>                 - A language tag in video media is supposed to indicate
>>>                 visual sign language modality except for the case when
>>>                 it is supposed to indicate a view of a speaking person
>>>                 mentioned in section 5.2 characterized by the exact same
>>>                 language tag also appearing in an audio media
>>> specification.
>>>
>>>                 [BA] It seems like an over-reach to say that a spoken
>>>                 language tag in video media should instead be
>>>                 interpreted as a request for Sign Language.  If this
>>>                 were done, would it always be clear which Sign Language
>>>                 was intended?  And could we really assume that both
>>>                 sides, if negotiating a spoken language tag in video
>>>                 media, were really indicating the desire to sign?  It
>>>                 seems like this could easily result interoperability
>>>                 failure.
>>>
>>>
>>>             IMO the right way to indicate that two (or more) media
>>>             streams are conveying alternative representations of the
>>>             same language content is by grouping them with a new
>>>             grouping attribute. That can tie together an audio with a
>>>             video and/or text. A language tag for sign language on the
>>>             video stream then clarifies to the recipient that it is sign
>>>             language. The grouping attribute by itself can indicate that
>>>             these streams are conveying language.
>>>
>>>         <GH>Yes, and that is proposed in
>>>         draft-hellstrom-slim-modality-grouping    with two kinds of
>>>         grouping: One kind of grouping to tell that two or more
>>>         languages in different streams are alternatives with the same
>>>         content and a priority order is assigned to them to guide the
>>>         selection of which one to use during the call. The other kind of
>>>         grouping telling that two or more languages in different streams
>>>         are desired together with the same language content but
>>>         different modalities ( such as the use for captioned telephony
>>>         with the same content provided in both speech and text, or sign
>>>         language interpretation where you see the interpreter, or
>>>         possibly spoken language interpretation with the languages
>>>         provided in different audio streams ). I hope that that draft
>>>         can be progressed. I see it as a needed complement to the pure
>>>         language indications per media.
>>>
>>>
>>>     Oh, sorry. I did read that draft but forgot about it.
>>>
>>>         The discussion in this thread is more about how an application
>>>         would easily know that e.g. "ase" is a sign language and "en" is
>>>         a spoken (or written) language, and also a discussion about what
>>>         kinds of languages are allowed and indicated by default in each
>>>         media type. It was not at all about falsely using language tags
>>>         in the wrong media type as Bernard understood my wording. It was
>>>         rather a limitation to what modalities are used in each media
>>>         type and how to know the modality with cases that are not
>>>         evident, e.g. "application" and "message" media types.
>>>
>>>
>>>     What do you mean by "know"? Is it for the *UA* software to know, or
>>>     for the human user of the UA to know? Presumably a human user that
>>>     cares will understand this if presented with the information in some
>>>     way. But typically this isn't presented to the user.
>>>
>>>     For the software to know must mean that it will behave differently
>>>     for a tag that represents a sign language than for one that
>>>     represents a spoken or written language. What is it that it will do
>>>     differently?
>>>
>>>              Thanks,
>>>              Paul
>>>
>>>
>>>         Right now we have returned to a very simple rule: we define only
>>>         use of spoken language in audio media, written language in text
>>>         media and sign language in video media.
>>>         We have discussed other use, such as a view of a speaking person
>>>         in video, text overlay on video, a sign language notation in
>>>         text media, written language in message media, written language
>>>         in WebRTC data channels, sign written and spoken in bucket media
>>>         maybe declared as application media. We do not define these
>>>         cases. They are just not defined, not forbidden. They may be
>>>         defined in the future.
>>>
>>>         My proposed wording in section 5.4 got too many
>>>         misunderstandings so I gave up with it. I think we can live with
>>>         5.4 as it is in version -16.
>>>
>>>         Thanks,
>>>         Gunnar
>>>
>>>
>>>
>>>             (IIRC I suggested something along these lines a long time
>>> ago.)
>>>
>>>                  Thanks,
>>>                  Paul
>>>
>>>             _______________________________________________
>>>             SLIM mailing list
>>>             SLIM@ietf.org <mailto:SLIM@ietf.org>
>>>             https://www.ietf.org/mailman/listinfo/slim
>>>             <https://www.ietf.org/mailman/listinfo/slim>
>>>
>>>
>>>
>>>     _______________________________________________
>>>     SLIM mailing list
>>>     SLIM@ietf.org <mailto:SLIM@ietf.org>
>>>     https://www.ietf.org/mailman/listinfo/slim
>>>     <https://www.ietf.org/mailman/listinfo/slim>
>>>
>>>
>>>
>>
> --
> -----------------------------------------
> Gunnar Hellström
> Omnitor
> gunnar.hellstrom@omnitor.se
> +46 708 204 288
>
>