Re: [Slim] Proposed 5.4 text

Den 2017-11-23 kl. 03:02, skrev Bernard Aboba:
> On Nov 22, 2017, at 3:47 PM, Gunnar Hellström 
> <gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>> wrote:
>>
>> <GH>Yes, I agree completely that speech recognition is a reality for 
>> real-time captioning today. But do you know implementations that 
>> transmit it as part of a video media stream? What coding?
>
> Here is an example that used machine learning to provide realtime 
> captions in multiple languages:
> https://azure.microsoft.com/en-us/blog/live-real-time-captions-with-azure-media-services-and-player/?cdn=disable

<GH>Yes, a nice example.  Even if it is not from our application area - 
conversational calls, and not initiated by SDP, and not using a video 
media stream, but a multiplexed RTMP TCP based message stream with 
interleaved video chunks and text chunks, it is an example that reminds 
us that something similar could be set up in a conversational call with 
SDP and using video media with MPEG4 according to RFC 3640, 4337 or 6381.

That indicates to us that we have cases that are even less supported by 
our current draft. The video/mp4 media can contain video and audio and 
text. If that is used for a conversational call, we would need to 
collect 'hlang' attributes for all three modalities in the video media 
description, and possibly get them all agreed, and that is against one 
of our basic statements in section 3:

    (Negotiating multiple simultaneous languages within a media stream is
    out of scope of this document.)

So, ok, let us for now assume that we limit the application to 
traditional conversational calls with one media and language per stream 
and no multiplexing.

Gunnar

>
>
>
> _______________________________________________
> SLIM mailing list
> SLIM@ietf.org
> https://www.ietf.org/mailman/listinfo/slim

-- 
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288