Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language

Den 2017-11-21 kl. 21:53, skrev Paul Kyzivat:
> On 11/21/17 2:45 PM, Gunnar Hellström wrote:
>> Den 2017-11-21 kl. 18:06, skrev Bernard Aboba:
>>> Paul said:
>>>
>>> "When using lip sync, is there any necessity to put the language tag 
>>> on the video?"
>>>
>>> [BA] Good point.
>>>
>>> "   By including a language tag for spoken language in an audio
>>>    description and using the "lip sync" grouping mechanism defined
>>>    in [RFC5888] to group it with a video media stream it is possible
>>>    to indicate synchronized audio and video so as to support lip
>>>  reading."
>>>
>>> [BA] This seems like an improvement.
>> <GH>I do not think that an indication of lip reading synch grouping 
>> can be assumed to mean that the user promises to be seen in video. I 
>> guess that most products implementing the lip synch grouping do it 
>> generally for all calls regardless of if the user want to provide or 
>> see lips in synch.
>
> I wonder if they do it even when it isn't true.
<GH> The purpose of our protocol is to be used for conversational 
services. i.e. real-time calls. And for them, I just assume that if the 
manufacturer of the device has taken the effort to implement RFC 5888 
Lip Sync, it is a technical feature that is active in all calls of the 
device regardless of how the users intend to use the media. All users 
appreciate good sync, so if the device is capable of providing good sync 
without sacrificing other factors too much, it will be used.
It is not really realistic to turn Lip Sync on or off depending on the 
language preference settings.

>
>
> For instance, consider a movie that was created in English. So the 
> actors are speaking English, the audio is labeled as English, and 
> there is lip sync.
>
> Now take the same movie and dub it in Spanish. Now the audio ought to 
> be labeled as Spanish. But lip sync should no longer be indicated. Is 
> that what happens in practice?
<GH>The example is from streamed application. I think lip sync will be 
activated anyway if the streaming service supports it. RFC 5888 is not 
meant to just handle the real lip sync usage, it is to indicate when 
good sync can be supported technically. RFC 5888 says: "Note that LS 
semantics apply not only to a video stream that has to be synchronized 
with an audio stream; the playout of two streams of the same type can be 
synchronized as well." That clearly indicates that it is not intended 
for just the lip sync use.
>
> Of course, a totally deaf English speaking lip reader will understand 
> both equally well. But there may not be anything in the signaling to 
> indicate that he will have English lip motion.
<GH>Right, that is a good reason for keeping the spoken language tag in 
the video description, and not requiring the same language to appear in 
the audio media description.

So, as a summary, I am afraid that the good idea of using RFC 5888 Lip 
Sync as an indication that we mean a view of a speaking person in video 
has similar bad side effects as my cancelled proposal to use the 
"speaker" in an SDP Content attribute according to RFC 4796 [RFC4796] 
for the same purpose. What is left to do for now is to not define any 
way to indicate language of captions in the video stream but use 
spoken/written language tags in video as an indication to view a 
speaker. This is also supported by Randalls observation that captions in 
video is hardly used in conversational calls.

I will create a new 5.4 proposal in another mail.

Gunnar

>
> The above isn't assuming that the content is being created explicitly 
> to accommodate deaf people. If the goal is specifically to focus on 
> that then some different policies might help.

>
>     Thanks,
>     Paul
>
>> But it is a good feature to use if you desire to see a speaker.
>> The 'hlang' attribute in a video description is on the other hand 
>> clear indications that you want to provide or receive language in the 
>> video media stream.
>> Therefore I think we should return either to say that a 
>> spoken/written language tag in video media description means a view 
>> of the speaker if there is also a lip synch grouping, or even skip 
>> the dependency on lip synch grouping.  (there is a risk that we 
>> introduce tricky corner cases by the bundling of lip synch and 
>> language use. How about if we by further work agree on a way to 
>> indicate written captions in MPEG4 video, and want to indicate that 
>> in a product that always provides lip synch grouping. That will cause 
>> conflicts.)
>>
>> Randall recently commented that use of text captions in the video 
>> stream is a far fetched use case. MPEG4 has caption elements defined 
>> and it can be provided in media declared as video, but it may be 
>> right that it is rarely or never used in conversational calls. If we 
>> can agree on that we could simply return to saying that a 
>> spoken/written language tag in video description means a view of a 
>> speaker, and skip the requirement to link it to the language in the 
>> audio stream.
>>
>> Gunnar
>>>
>>> On Tue, Nov 21, 2017 at 8:44 AM, Paul Kyzivat <pkyzivat@alum.mit.edu 
>>> <mailto:pkyzivat@alum.mit.edu>> wrote:
>>>
>>>     On 11/21/17 10:59 AM, Bernard Aboba wrote:
>>>
>>>         [BA] LGTM.  Do you recall what the objection was to the term
>>>         "spoken/written language"?
>>>
>>>         Gunnar had said:
>>>
>>>         By including a language tag for spoken language in a video
>>>         description and using the "lip sync" grouping mechanism
>>>         defined in [RFC5888] it is possible to indicate synchronized
>>>         audio and video so as to support lip reading.
>>>
>>>
>>>     When using lip sync, is there any necessity to put the language
>>>     tag on the video? ISTM that is irrelevant, as long as it is on the
>>>     synced audio media. ISTM it would be better to say:
>>>
>>>        By including a language tag for spoken language in an audio
>>>        description and using the "lip sync" grouping mechanism defined
>>>        in [RFC5888] to group it with a video media stream it is 
>>> possible
>>>        to indicate synchronized audio and video so as to support lip
>>>        reading.
>>>
>>>             Thanks,
>>>             Paul
>>>
>>>
>>>     _______________________________________________
>>>     SLIM mailing list
>>>     SLIM@ietf.org <mailto:SLIM@ietf.org>
>>>     https://www.ietf.org/mailman/listinfo/slim
>>>     <https://www.ietf.org/mailman/listinfo/slim>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> SLIM mailing list
>>> SLIM@ietf.org
>>> https://www.ietf.org/mailman/listinfo/slim
>>
>> -- 
>> -----------------------------------------
>> Gunnar Hellström
>> Omnitor
>> gunnar.hellstrom@omnitor.se
>> +46 708 204 288
>>
>>
>>
>> _______________________________________________
>> SLIM mailing list
>> SLIM@ietf.org
>> https://www.ietf.org/mailman/listinfo/slim
>>
>
> _______________________________________________
> SLIM mailing list
> SLIM@ietf.org
> https://www.ietf.org/mailman/listinfo/slim

-- 
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288