Re: [MMUSIC] draft-gellens-negotiating-human-language-02

Gunnar Hellstrom <gunnar.hellstrom@omnitor.se> Mon, 11 March 2013 21:45 UTC

Return-Path: <gunnar.hellstrom@omnitor.se>
X-Original-To: mmusic@ietfa.amsl.com
Delivered-To: mmusic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BCA1D21F9075 for <mmusic@ietfa.amsl.com>; Mon, 11 Mar 2013 14:45:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.299
X-Spam-Level:
X-Spam-Status: No, score=-2.299 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nIcHn3eTrxAS for <mmusic@ietfa.amsl.com>; Mon, 11 Mar 2013 14:45:40 -0700 (PDT)
Received: from vsp-authed-01-02.binero.net (vsp-authed02.binero.net [195.74.38.226]) by ietfa.amsl.com (Postfix) with SMTP id A9C5221F8EDA for <mmusic@ietf.org>; Mon, 11 Mar 2013 14:45:32 -0700 (PDT)
Received: from smtp01.binero.se (unknown [195.74.38.28]) by vsp-authed-01-02.binero.net (Halon Mail Gateway) with ESMTP for <mmusic@ietf.org>; Mon, 11 Mar 2013 22:44:49 +0100 (CET)
Received: from [192.168.50.38] (h79n2fls31o933.telia.com [212.181.137.79]) (Authenticated sender: gunnar.hellstrom@omnitor.se) by smtp-04-01.atm.binero.net (Postfix) with ESMTPA id 6EF3D3A0F7 for <mmusic@ietf.org>; Mon, 11 Mar 2013 22:44:48 +0100 (CET)
Message-ID: <513E504F.1010209@omnitor.se>
Date: Mon, 11 Mar 2013 22:44:47 +0100
From: Gunnar Hellstrom <gunnar.hellstrom@omnitor.se>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3
MIME-Version: 1.0
To: mmusic@ietf.org
References: <p0624060ecd63af26fe28@dhcp-42ec.meeting.ietf.org>
In-Reply-To: <p0624060ecd63af26fe28@dhcp-42ec.meeting.ietf.org>
Content-Type: multipart/alternative; boundary="------------040800080507030709090604"
Subject: Re: [MMUSIC] draft-gellens-negotiating-human-language-02
X-BeenThere: mmusic@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <mmusic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mmusic>, <mailto:mmusic-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmusic>
List-Post: <mailto:mmusic@ietf.org>
List-Help: <mailto:mmusic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmusic>, <mailto:mmusic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Mar 2013 21:45:45 -0000

Before this discussion got its home in mmusic, we discussed quite 
similar topics as you, Dale brought up now.

It was about what was needed to be expressed by the parameters and if 
SDP or SIP was the right place. And in the case of SIP, if RFC 3840 / 
3841 could be a suitable mechanism for routing and decisions on the 
parameters.

Here is part of that discussion that we need to capture.


I see some complication that might be needed in order to reflect 
reality. At least they should be discussed.

And I am also seeing some different ways to specify it.

The complications to discuss are:

*1. Level of preference.*

There may be a need for specifying levels of preference for languages.  
I might strongly prefer to talk English, but have some useful capability 
in French. I want to display that preference and that capability with 
that difference, so that I get English whenever possible, but get the 
call connected even if English is not available at all but French is.

I would assume that two levels are sufficient, but that can be 
discussed:  Preferred and capable.
>
> The draft already proposes that languages be listed in order of 
> preference, which should handle the example you mention: you list 
> English first and French second.  The called party selects English if 
> it is capable and falls back to French if English is not and French 
> is.  This seems much simpler and is a common way of handling 
> situations where there is a preference.  It would be good to keep the 
> mechanism as simple as possible.
> Yes, I am afraid of complicating this beyond the point when users do 
> not manage to get their settings right.
> Still, I do not think that the order is sufficient as level of 
> preference indicator. You may want to indicate capability for one 
> modality but preference for another. ( as my example, capability for 
> ASL, but preference for talking and reading )

If you have a capability for ASL but preference for talking and reading, 
you could initially offer two media streams: a voice with English and a 
text with English.  If accepted, you have your preferred 
communications.  If those are rejected you could then offer video with 
ASL.  Would that handle the case?
No, video is still very valuable for judging the emergency case. Or 
seeing a friend. So, if you support it you want to offer it. But the 
decision on languages and modalities may end up in video being not 
important for language communication.


>
>
>>>> *2. Directionality*
>>>> There is a need for a direction of the language preference. 
>>>> "Transmit, receive or both". or   "Produce, perceive or both". That 
>>>> is easy to understand for the relay service examples.
>>>> A hard-of-hearing user may declare:
>>>>
>>>> Text, capable, produce, English
>>>> Text, prefer, perceive, English
>>>> Audio, prefer, produce, English
>>>> Audio, capable, perceive, English    ( tricky, a typical 
>>>> hard-of-hearing user may have benefit of receiving audio, while it 
>>>> is not usable enough for reliable perception. I do not want to see 
>>>> this eternally complex, but I see a need for refined expressions here)
>>>> video, capable, both, ASL
>>>>
>>>> This should be understood as that the user prefers to speak and get 
>>>> text back, and has benefit of getting voice in parallel with text.  
>>>> ASL signing can be an alternative if the other party has 
>>>> corresponding capability or preference.
>>>>
>>>
>>> The draft does support this (and even mentions some of these 
>>> specific uses) because it proposes an SDP media attribute, and media 
>>> can be specified to be send, receive, or both.
>> No, that is not the same. You want the media to flow, but by the 
>> parameter you want to indicate your preference for how to use it.  
>> You do not want to turn off incoming audio just because you prefer to 
>> talk but read text.
>
> Yes, I see, thanks for the clarification.  Does this need to be part 
> of the session setup?  If you establish all media streams that you 
> wish to use, can you then just use them as you prefer?  I will consult 
> with the NENA accessibility committee on this.
No, there are specific services providing service with one direction but 
not the other. The information is needed for decision on what assisting 
service to invoke. One such service is the captioned telephony, that 
adds rapidly created speech-to-text in parallel with the voice. They 
provide just that. A user will have a very strong preference for getting 
just that service, but could accept with much lower preference to get a 
direct conversation with the far end in combined text and voice.
>
>
>>>> I think it would be useful to move most of the introduction to a 
>>>> structured use case chapter and express the different cases 
>>>> according to a template. Thast can then be used to test if proposed 
>>>> approaches will work.
>>>
>>> I'm not sure I fully understand what you mean by "structured" in 
>>> "structured use case" or "template."  Can you be more specific?
>> I mean just a simple template for how the use case descriptions are 
>> written.
>>
>> E.g.
>> A title indicating what case we have.
>> Description of the calling user and its capabilities and preferences.
>> Description of the answering user and its capabilities and preferences
>> Description of a possible assisting service and its capabilities and 
>> preferences
>> Description of the calling user's indications.
>> Description of the answering user's indications.
>> The resulting decision and outcome
>>>
>>>
>>>> *3.  Specify language and modality at SIP Media tag level instead.
>>>> *There could be some benfits to declare these parameters at the SIP 
>>>> media tag level instead of SDP level.
>>>> A call center can then register with their capabilities already at 
>>>> the SIP REGISTER time, and the caller preferences / callee 
>>>> capabilities mechanism from RFC 3840/ 3841 can be used to select 
>>>> modalities and languages and route the call to the best capable 
>>>> person or combination of person and assisting interpreting.
>>>
>>> Maybe, but one advantage of using SDP is that the ESInet can take 
>>> language and media needs into account during policy-based routing.  
>>> For example, in some European countries emergency calls placed by a 
>>> speaker of language x in country y may be routed to a PSAP in a 
>>> country where x is the native language.  Or, there might be regional 
>>> or national text relay or sign language interpreter services as 
>>> opposed to PSAP-level capabilities.
>> Is there a complete specification for how policy based routing is 
>> thought to work? Where?
>> Does it not use RFC 3840/3841?
>> That procedure is already supported by SIP servers. Using SDP 
>> requires new SIP server programming.
>
> NENA has a document under development.  I thought it was able to take 
> SDP into account but I'll look into it, and I'm sure Brian will have 
> something to say.
Yes, I think I have seen that. But it needs to come into IETF to be 
possible to refer to.
>
>
>>>> But, on the other hand, then we need a separate specification of 
>>>> what modality the parameters indicate, because the language tags 
>>>> only distinguish between signed and other languages, and "other" 
>>>> seems to mean either spoken or written without any difference.
>>>>
>>>
>>> The SDP media already indicates the type (audio, video, text).
>> Yes, convenient. But there is no knowledge about the parameters until 
>> at call time. It could be better to know the callee capabilities in 
>> advance if available. Then middle boxes can do the routing instead of 
>> the far end. There may be many terminals competing for the call and 
>> the comparison about who to get it should be done by a sip server 
>> instead of an endpoint.
>
> I think call time is the right time.  For emergency calls, it isolates 
> the decision making about how to process calls requiring text, sign 
> language, foreign language, etc. to the ESInet and PSAPs, which is I 
> think the right place.  The processing rules in the ESInet can then be 
> changed without involving any carrier.  The capabilities of an entity 
> may vary based on dynamic factors (time of day, load, etc.) so the 
> decision as to how to support a need may be best made by the ESInet or 
> PSAP in the case of an emergency call, or called party for 
> non-emergency calls.  For example, at some times or under some loads, 
> emergency calls may be routed to a specific PSAP that is not the 
> geographically indicated one.  Likewise, a non-emergency call to a 
> call center may be routed to a center in a country that has support 
> for the language or media needed.
The decision is of course made at call time. With the RFC 3840/3841 
method, the different agents and services available register their 
availability and capabilities when they go on duty, and unregister when 
they stop, so that their information is available at call time.

>
> Further, it is often the case that the cost of relay, interpretation, 
> or translation services is affected by which entity invokes the service.
Yes, that is a complicating policy issue.
>
>
>>>> *4. Problem that 3GPP specifies that it is the UAs only who specify 
>>>> and act on these parameters.
>>>> *I think it is a problem that 3GPP inserted the restriction that 
>>>> the language and modality negotiation shall only bother the 
>>>> involved UAs.
>>>> It would be more natural that it is a service provider between them 
>>>> who detect the differences and make the decision to invoke a relay 
>>>> service for the relay case.
>>>> How do you propose to solve that? Let the service provider behave 
>>>> as a B2BUA, who then can behave as both a UA and a service provider?
>>>
>>> What do you mean by "service provider?"  In the case of a voice 
>>> service provider such as a cellular carrier or a VoIP provider, I 
>>> think this should be entirely transparent.  The voice service 
>>> provider knows it is an emergency call and routes to an ESInet.  It 
>>> is then up to the ESInet and the PSAPs to handle the call as they wish.
>> It can be a service provider for just the function to make advanced 
>> call invocation based on language preferences. The same type of 
>> decisions, call connections and assisting service invocation are 
>> needed in everyday calls as in emergency calls. But it can also be a 
>> service provider for emergency services and the user is registered by 
>> that service provider. They can make decisions on the call. E.g. 
>> detect that it is an emergency call requiring interpreter, and 
>> therefore connect to both the PSAP and interpreter at the same time 
>> to save time.
>
> I think it's best to make these decisions at the end, not the middle.  
> In the case of emergency calls, the ESInet can route to a particular 
> PSAP, the PSAP may bridge in translation or interpretation services, 
> etc.  In the case of non-emergency calls, the call center may support 
> some capabilities locally at some hours but route to a different call 
> center at other times.
The end is not decided until you have evaluated the alternative possible 
ends and decided who has the right capability and preference.



There is another issue with using sdp for decisions. SIP Message is 
included in the set of methods to handle in emergency calls in RFC 6443. 
It can be used within sessions to carry text messages if other media are 
used as well. It is no favored way to have text communication, but 
possible. SIP message has no sdp.  I know that the 3GPP sections  about 
emergency calling in TS 22.101 points towards using MSRP for text 
messaging, so it should not be an issue for 3GPP. Can we neglect SIP 
Message from the discussion and aim at solving it only for real-time 
conversational media?  I do not urge for solving it for SIP Message, I 
just wanted to point out that result by basing the mechanism on SDP.






Will there be a possibility for remote participation on Thursday. I am 
sorry I am not there, but would like to participate if possible.
/Gunnar

------------------------------------------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46708204288
On 2013-03-11 16:57, Randall Gellens wrote:
> [[[ resending without Cc list ]]]
>
> Hi Dale,
>
> At 11:00 AM -0500 2/25/13, Dale R. Worley wrote:
>
>>  (It's not clear to me what the proper mailing list is to discuss this
>>  draft.  From the headers of the messages, it appears that the primary
>>  list is ietf@ietf.org, but the first message in this thread about that
>>  draft already has a "Re:" in the subject line, so the discussion
>>  started somewhere else.)
>
> There has been some discussion among those listed in the CC header of 
> this message.  I think the mmusic list is probably the right place to 
> continue the discussion and was planning on doing so more formally 
> with the next revision of the draft.
>
> By the way, the draft was updated and is now at -02: 
> http://www.ietf.org/internet-drafts/draft-gellens-negotiating-human-language-02.txt
>
> There is a face-to-face discussion Thursday 11:30-1:00 at The 
> Tropicale (the cafe in the Caribe Royal).  Please let me know if you 
> can make it.
>
>>  (Also, it's not clear why Randall's messages are coming through in
>>  HTML.)
>
> My apologies; I have gotten in the habit when replying to messages 
> that have style to allow Eudora to send my reply styled as well.
>
>
>>  But onward to questions of substance:
>>
>>  - Why SDP and not SIP?
>>
>>  I'd like to see a more thorough exploration of why language
>>  negotiation is to be handled in SDP rather than SIP.  (SIP, like HTTP,
>>  uses the Content-Language header to specify languages.)  In principle,
>>  specifying data that may be used in call-routing should be done in the
>>  SIP layer, but it's well-accepted in the SIP world that call routing
>>  may be affected by the SDP content as well (e.g., media types).
>
> I think it fits more naturally in SDP since the language is related to 
> the media, e.g., English for audio and ASL for video.
>
>
>>  And some discussion and comparison should be done with the SIP/HTTP
>>  Content-Language header (used to specify the language of the
>>  communications) and the SIP Accept-Language header (used to specify
>>  the language of text components of SIP messages), particularly given
>>  that Accept-Language has different set of language specifiers and a
>>  richer syntax for specifying preferences.  In any case, preference
>>  should be given to reusing one of the existing syntaxes for specifying
>>  language preferences.
>
> I think the semantics of Content-Language and Accept-Language are 
> different from the semantics here, especially when setting up a 
> session with, as an example, an audio stream using English and a video 
> stream using ASL.  (But I can see clients using a default value to set 
> both the SDP language attribute and the HTTP Content-Language, unless 
> configured differently.)
>
> As for reusing existing mechanisms, the draft does contain two 
> alternative proposals, one to re-use the existing 'language' SDP 
> attribute, and one to define a new attribute.
>
>>  - Dependency between media descriptions?
>>
>>     Another example would be a user who is able to speak but is deaf or
>>     hard-of-hearing and requires a voice stream plus a text stream
>>     (known as voice carry over).  Making language a media attribute
>>     allows the standard session negotiation mechanism to handle this by
>>     providing the information and mechanism for the endpoints to make
>>     appropriate decisions.
>>
>>  This scenario suggests that there might be dependency or interaction
>>  between language specifications for different media descriptions.
>>  Whether this is needed should be determined and documented.
>>
>>  - Specifying preference levels?
>>
>>     For example, some users may be able to speak several languages, but
>>     have a preference.
>>
>>  This might argue for describing degrees of preference using "q"
>>  parameters (as in the SIP Accept-Language header).
>>
>>  - Expressing multiple languages in answers
>>
>>     (While it is true that a conversation among multilingual people
>>     often involves multiple languages, it does not seem useful enough
>>     as a general facility to warrant complicating the desired semantics
>>     of the SDP attribute to allow negotiation of multiple simultaneous
>>     languages within an interactive media stream.)
>>
>>  Why shouldn't an answer be able to indicate multiple languages?  At
>>  the least, this might provide the offerer with useful information.
>
> You raise good questions that I think need more discussion.  I am 
> hoping to keep the work as simple as possible and not add additional 
> complexity, which argues for not solving every aspect of the problem, 
> but only those that must be solved immediately.
>
>>
>>  - Reusing a=lang
>>
>>  Searching, I can only find these descriptions of the use of
>>  "a=lang:...":
>>
>>      RFC 4566
>>      draft-saintandre-sip-xmpp-chat
>>      draft-gellens-negotiating-human-language
>>
>>  So it looks like "a=lang:..." is entirely unused at the present and is
>>  safe to be redefined.
>
>
>
>
>