Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language

Den 2017-11-21 kl. 18:06, skrev Bernard Aboba:
> Paul said:
>
> "When using lip sync, is there any necessity to put the language tag 
> on the video?"
>
> [BA] Good point.
>
> "   By including a language tag for spoken language in an audio
>    description and using the "lip sync" grouping mechanism defined
>    in [RFC5888] to group it with a video media stream it is possible
>    to indicate synchronized audio and video so as to support lip
>  reading."
>
> [BA] This seems like an improvement.
<GH>I do not think that an indication of lip reading synch grouping can 
be assumed to mean that the user promises to be seen in video. I guess 
that most products implementing the lip synch grouping do it generally 
for all calls regardless of if the user want to provide or see lips in 
synch. But it is a good feature to use if you desire to see a speaker.
The 'hlang' attribute in a video description is on the other hand clear 
indications that you want to provide or receive language in the video 
media stream.
Therefore I think we should return either to say that a spoken/written 
language tag in video media description means a view of the speaker if 
there is also a lip synch grouping, or even skip the dependency on lip 
synch grouping.  (there is a risk that we introduce tricky corner cases 
by the bundling of lip synch and language use. How about if we by 
further work agree on a way to indicate written captions in MPEG4 video, 
and want to indicate that in a product that always provides lip synch 
grouping. That will cause conflicts.)

Randall recently commented that use of text captions in the video stream 
is a far fetched use case. MPEG4 has caption elements defined and it can 
be provided in media declared as video, but it may be right that it is 
rarely or never used in conversational calls. If we can agree on that we 
could simply return to saying that a spoken/written language tag in 
video description means a view of a speaker, and skip the requirement to 
link it to the language in the audio stream.

Gunnar
>
> On Tue, Nov 21, 2017 at 8:44 AM, Paul Kyzivat <pkyzivat@alum.mit.edu 
> <mailto:pkyzivat@alum.mit.edu>> wrote:
>
>     On 11/21/17 10:59 AM, Bernard Aboba wrote:
>
>         [BA] LGTM.  Do you recall what the objection was to the term
>         "spoken/written language"?
>
>         Gunnar had said:
>
>         By including a language tag for spoken language in a video
>         description and using the "lip sync" grouping mechanism
>         defined in [RFC5888] it is possible to indicate synchronized
>         audio and video so as to support lip reading.
>
>
>     When using lip sync, is there any necessity to put the language
>     tag on the video? ISTM that is irrelevant, as long as it is on the
>     synced audio media. ISTM it would be better to say:
>
>        By including a language tag for spoken language in an audio
>        description and using the "lip sync" grouping mechanism defined
>        in [RFC5888] to group it with a video media stream it is possible
>        to indicate synchronized audio and video so as to support lip
>        reading.
>
>             Thanks,
>             Paul
>
>
>     _______________________________________________
>     SLIM mailing list
>     SLIM@ietf.org <mailto:SLIM@ietf.org>
>     https://www.ietf.org/mailman/listinfo/slim
>     <https://www.ietf.org/mailman/listinfo/slim>
>
>
>
>
> _______________________________________________
> SLIM mailing list
> SLIM@ietf.org
> https://www.ietf.org/mailman/listinfo/slim

-- 
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288