Re: [clue] 2 draft covering real-time text relevant for conferencing/ multiple participants and telepresence

Arnoud van Wijk <arnoud.vanwijk@realtimetext.org> Mon, 06 June 2011 09:57 UTC

Return-Path: <arnoud.vanwijk@realtimetext.org>
X-Original-To: clue@ietfa.amsl.com
Delivered-To: clue@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F323011E80F8 for <clue@ietfa.amsl.com>; Mon, 6 Jun 2011 02:57:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.001
X-Spam-Level:
X-Spam-Status: No, score=0.001 tagged_above=-999 required=5 tests=[BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7GPEB6lF9iXo for <clue@ietfa.amsl.com>; Mon, 6 Jun 2011 02:57:31 -0700 (PDT)
Received: from mx-in01.nouzelle.com (mx-in01.nouzelle.com [87.119.194.132]) by ietfa.amsl.com (Postfix) with ESMTP id 7BE1D11E80EC for <clue@ietf.org>; Mon, 6 Jun 2011 02:57:31 -0700 (PDT)
Received: from internal02.nouzelle.com (internal02.nouzelle.local [172.29.32.13]) by mx-in01.nouzelle.com (Postfix) with ESMTP id 8D2D0125BA1; Mon, 6 Jun 2011 11:56:37 +0200 (CEST)
Received: from mailscan.nouzelle.com (unknown [172.29.32.10]) by internal02.nouzelle.com (Postfix) with ESMTP id E2B6E10219AB7; Mon, 6 Jun 2011 11:57:28 +0200 (CEST)
X-Virus-Scanned: by amavisd-new at nouzelle.com
Received: from internal02.nouzelle.com ([172.29.32.13]) by mailscan.nouzelle.com (mailscan.nouzelle.com [172.29.32.10]) (amavisd-new, port 10026) with ESMTP id ceOIlEyxzckc; Mon, 6 Jun 2011 11:57:27 +0200 (CEST)
Received: from arnoud-van-wijks-macbook-pro.local (541BD3CF.cm-5-4d.dynamic.ziggo.nl [84.27.211.207]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by internal02.nouzelle.com (Postfix) with ESMTPSA id 649C410219AC1; Mon, 6 Jun 2011 11:57:27 +0200 (CEST)
Message-ID: <4DECA486.9000702@realtimetext.org>
Date: Mon, 06 Jun 2011 11:57:26 +0200
From: Arnoud van Wijk <arnoud.vanwijk@realtimetext.org>
Organization: R3TF
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10
MIME-Version: 1.0
To: "Mike Hammer (hmmr)" <hmmr@cisco.com>
References: <CA05517C.2C3BB%stewe@stewe.org><4DE0083A.3040102@cisco.com> <7F2072F1E0DE894DA4B517B93C6A0585194E2A65D7@ESESSCMS0356.eemea.ericsson.se><4DE4E7BF.8070301@cisco.com> <4DE60A5E.105@realtimetext.org> <C4064AF1C9EC1F40868C033DB94958C704A3479F@XMB-RCD-111.cisco.com>
In-Reply-To: <C4064AF1C9EC1F40868C033DB94958C704A3479F@XMB-RCD-111.cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: clue@ietf.org
Subject: Re: [clue] 2 draft covering real-time text relevant for conferencing/ multiple participants and telepresence
X-BeenThere: clue@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: arnoud.vanwijk@realtimetext.org
List-Id: CLUE - ControLling mUltiple streams for TElepresence <clue.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/clue>, <mailto:clue-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/clue>
List-Post: <mailto:clue@ietf.org>
List-Help: <mailto:clue-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/clue>, <mailto:clue-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jun 2011 09:57:33 -0000

Hi Mike,
Good questions actually.
We have to visualize how to have Real-Time Text (RTT) included with 
telepresence.
comments inline..

> Is that keyboard left and keyboard right?  :)

Actually, not a bad comment. If you have a conference with say 3 people. 
The camera will zoom/focus o the speaker based on microphone activity. 
The same thing is also possible when the speaker uses a keyboard. When 
typing with RTT, the camera will zoom/focus on the typing person. Use 3 
wireless keyboards here.
> I took a quick look through the referenced ID and did not see reference
> to audio and video feeds in a multi-party situation.  With TP, the
> system may be designed to select one of multiple cameras to display when
> there are more participants than displays.  So, is your intent that when
> a 'texter' is 'speaking' that the keyboard input should direct the
> camera on him/her to be the one selected for display?

Yes. :-) For example.
> How do you identify the text-speaker from someone just doing email?
Assign a hot key for RTT/TP activity for example. That key can also be 
used as a raising the hand signal to the conference floor manager as well.

> Does the rate of texting translate to volume for selection of video
> feed?
We have slow typist and fast typists, I think the duration of activity 
can be used parallel as to volume yes. But I think we have to test this 
kind of scenarios on what is most convenient/realistic. Or even when 3 
people type..have 3 mosaics screens and the longest or most text typing 
person will grow full screen. I am just brainstorming a bit here.
> Is the text intended to be overwritten on the screen, or is there a
> separate screen?

Both. Depending on what is preferred by the users. I think with 
discussions where the camera moves between several users, the RTT can be 
used as an overlay on the video or at the bottom. But at the same time 
you can use the text on a separate screen and when the camera/audio is 
locked on a certain user, the user id will be added to the RTT on the 
common screen for example.
> If there is a separate display screen, do all 'text-speakers' get equal
> time, or does an algorithm select which one is displayed there?

All get equal time if a separate display is used for text. The text 
preview draft does give an example of multiple users with RTT. As in 
figure 3 for example 
https://datatracker.ietf.org/doc/draft-hellstrom-textpreview/?include_text=1

> If there is a conflict between an audio feed and a text feed for
> attention, how do you decide which video is displayed?

That depends on the users participating. I encounter the issue myself 
that I have a hard time to insert myself in a discussion because my text 
get overlooked or takes more time to be noticed.
If you would allow user tags per stream identifying the speaker, you can 
perhaps add a flag that this person talks by text.
In such case the RTT may have the highest priority in the media stream 
selection. The RTT user must be perfectly aware of this then.

> Are there signers or text to audio speakers possible?

Sure, we have to incorporate them in the scenarios and work out the 
details. It is all about behavior and conference handling /protocol 
between the people and the system. But a signer can wave to the camera 
and with the rapid development of motion sensors like the kinect and 
similar, that is not really an obstacle technically. We just need to add 
it to the possible system behavior.
> Lots of questions.

Much appreciated. Questions will help. I still have to think more on 
Total Conversation with TP but the addition of RTT is and will be SO 
important!
I also like that the telepresence can be used to enable remote sign and 
speech to text interpretation services.
The biggest problem is that a remote interpreter cannot see who is 
talking and who the person is. And with video added the interpreter can 
see the body language to see if the speaker is joking or angry or 
frustrated. Such things are always very well expressible via sign. :-)

> I think you should contribute to CLUE.

Yes I agree. The more I am learning about what CLUE wants to do the more 
I see how important this work is. And to inCLUdE persons who use 
alternative modes of communication besides voice.
> However, I would question bolting on a solution after the fact with
> separate drafts.
>
> Perhaps, all input and output devices should be integrated into the base
> drafts.

I agree. By standard all CLUE work should use Total Conversation. Then 
we have audio and video as it has been described already but also 
include Real-Time Text!
And use cases can include signing users (wave to camera or sensor to get 
focus on the signing user), type text to get the focus/"raising hand" 
and where you have a remote speech to text/ sign language interpreter 
participating in the conference session. etc etc
(you do not need to be deaf or speech impaired, since you can also have 
a remote Spanish to English interpreter participating, and the output is 
via text and audio).

cheers

Arnoud
> Mike
>
>
>
> -----Original Message-----
> From: clue-bounces@ietf.org [mailto:clue-bounces@ietf.org] On Behalf Of
> Arnoud van Wijk
> Sent: Wednesday, June 01, 2011 5:46 AM
> To: clue@ietf.org
> Subject: [clue] 2 draft covering real-time text relevant for
> conferencing/ multiple participants and telepresence
>
> Hi all,
> I am getting a clue about CLUE. :-)
> This is an excellent WG covering what we need for a good telePRESENCE
> without hurdles.
> If I communicate remote with other persons, I'd like to "forget" that I
> participate remote. There should be no limitations in the ability to
> communicate.
>
> The focus is at this moment on audio and video, lets make real-time text
>
> also a standard media to be used in all the work and scenarios as well
> with CLUE.
> The use of Total Conversation, where audio, video and real-time text are
>
> presented and available simultaneously will actually optimize the
> communication with others.
> As a deaf person, I need lipreading and real-time text together, others
> sign language, others having real-time text as support if the language
> used in the conference is not your native language and have the text in
> your own language.
> Or even to talk and discuss with others during the conference with
> real-time text while listening to the main conference.
>
> More about real-time text can be found at: http://www.realtimetext.org
> but most of you are already familiar with it.
>
> It is not only for persons with a hearing disability, it is for all
> breathing humans who want to communicate (and for a few robots out here
> :-) )
> But using Total Conversation will remove the biggest Internet
> (telephony/conferencing) communication hurdle for persons who are deaf,
> hard of hearing or have a speech impairment. And that is not a joke.
>
> We have submitted now 2 drafts that are what I feel quite valuable for
> CLUE.
>
> Text media handling in RTP based real-time conferences
> https://datatracker.ietf.org/doc/draft-hellstrom-text-conference/
>
> and
> Presentation of Text Conversation in real-time and en-bloc form
> https://datatracker.ietf.org/doc/draft-hellstrom-textpreview/
>
> How would you feel about continuing these drafts under the CLUE banner?
> I am looking forward to your feedback and comments and well...anything
> that you throw at me and my fellow authors :-)
>
> Thank you.
>
> Sincerely
>
> Arnoud van Wijk
>
> PS we can also help with more insight on Total Conversation regarding
> quality of the video, camera position and the angle of the view (for
> example head for lipreading, upper body and sufficient area around the
> person for sign language etc.).
>
> _______________________________________________
> clue mailing list
> clue@ietf.org
> https://www.ietf.org/mailman/listinfo/clue
>