Re: [MMUSIC] draft-gellens-negotiating-human-language-02

On 3/11/13 5:44 PM, Gunnar Hellstrom wrote:
> Before this discussion got its home in mmusic, we discussed quite 
> similar topics as you, Dale brought up now.
>
> It was about what was needed to be expressed by the parameters and if 
> SDP or SIP was the right place. And in the case of SIP, if RFC 3840 / 
> 3841 could be a suitable mechanism for routing and decisions on the 
> parameters.
>
I agree more discussion is needed on this. There seems to be two 
problems considered in the draft:
1) Routing of a request to an answerer that has the language 
capabilities the caller desires.
2) Negotiation of the language properties to use on a per-stream basis 
once the call has been routed to a particular answerer.

Problem 1 seems to fall in the RFC 3840/3841 space, whereas problem 2 is 
more of an SDP issue.

-- Flemming

> Here is part of that discussion that we need to capture.
>
>
> I see some complication that might be needed in order to reflect 
> reality. At least they should be discussed.
>
> And I am also seeing some different ways to specify it.
>
> The complications to discuss are:
>
> *1. Level of preference.*
>
> There may be a need for specifying levels of preference for 
> languages.  I might strongly prefer to talk English, but have some 
> useful capability in French. I want to display that preference and 
> that capability with that difference, so that I get English whenever 
> possible, but get the call connected even if English is not available 
> at all but French is.
>
> I would assume that two levels are sufficient, but that can be 
> discussed:  Preferred and capable.
>>
>> The draft already proposes that languages be listed in order of 
>> preference, which should handle the example you mention: you list 
>> English first and French second.  The called party selects English if 
>> it is capable and falls back to French if English is not and French 
>> is.  This seems much simpler and is a common way of handling 
>> situations where there is a preference.  It would be good to keep the 
>> mechanism as simple as possible.
>> Yes, I am afraid of complicating this beyond the point when users do 
>> not manage to get their settings right.
>> Still, I do not think that the order is sufficient as level of 
>> preference indicator. You may want to indicate capability for one 
>> modality but preference for another. ( as my example, capability for 
>> ASL, but preference for talking and reading )
>
> If you have a capability for ASL but preference for talking and 
> reading, you could initially offer two media streams: a voice with 
> English and a text with English.  If accepted, you have your preferred 
> communications.  If those are rejected you could then offer video with 
> ASL.  Would that handle the case?
> No, video is still very valuable for judging the emergency case. Or 
> seeing a friend. So, if you support it you want to offer it. But the 
> decision on languages and modalities may end up in video being not 
> important for language communication.
>
>
>>
>>
>>>>> *2. Directionality*
>>>>> There is a need for a direction of the language preference. 
>>>>> "Transmit, receive or both". or   "Produce, perceive or both". 
>>>>> That is easy to understand for the relay service examples.
>>>>> A hard-of-hearing user may declare:
>>>>>
>>>>> Text, capable, produce, English
>>>>> Text, prefer, perceive, English
>>>>> Audio, prefer, produce, English
>>>>> Audio, capable, perceive, English    ( tricky, a typical 
>>>>> hard-of-hearing user may have benefit of receiving audio, while it 
>>>>> is not usable enough for reliable perception. I do not want to see 
>>>>> this eternally complex, but I see a need for refined expressions here)
>>>>> video, capable, both, ASL
>>>>>
>>>>> This should be understood as that the user prefers to speak and 
>>>>> get text back, and has benefit of getting voice in parallel with 
>>>>> text.  ASL signing can be an alternative if the other party has 
>>>>> corresponding capability or preference.
>>>>>
>>>>
>>>> The draft does support this (and even mentions some of these 
>>>> specific uses) because it proposes an SDP media attribute, and 
>>>> media can be specified to be send, receive, or both.
>>> No, that is not the same. You want the media to flow, but by the 
>>> parameter you want to indicate your preference for how to use it.  
>>> You do not want to turn off incoming audio just because you prefer 
>>> to talk but read text.
>>
>> Yes, I see, thanks for the clarification.  Does this need to be part 
>> of the session setup?  If you establish all media streams that you 
>> wish to use, can you then just use them as you prefer?  I will 
>> consult with the NENA accessibility committee on this.
> No, there are specific services providing service with one direction 
> but not the other. The information is needed for decision on what 
> assisting service to invoke. One such service is the captioned 
> telephony, that adds rapidly created speech-to-text in parallel with 
> the voice. They provide just that. A user will have a very strong 
> preference for getting just that service, but could accept with much 
> lower preference to get a direct conversation with the far end in 
> combined text and voice.
>>
>>
>>>>> I think it would be useful  to move most of the introduction to a 
>>>>> structured use case chapter and express the different cases 
>>>>> according to a template. Thast can then be used to test if 
>>>>> proposed approaches will work.
>>>>
>>>> I'm not sure I fully understand what you mean by "structured" in 
>>>> "structured use case" or "template."  Can you be more specific?
>>> I mean just a simple template for how the use case descriptions are 
>>> written.
>>>
>>> E.g.
>>> A title indicating what case we have.
>>> Description of the calling user and its capabilities and preferences.
>>> Description of the answering user and its capabilities and preferences
>>> Description of a possible assisting service and its capabilities and 
>>> preferences
>>> Description of the calling user's indications.
>>> Description of the answering user's indications.
>>> The resulting decision and outcome
>>>>
>>>>
>>>>> *3.  Specify language and modality at SIP Media tag level instead.
>>>>> *There could be some benfits to declare these parameters at the 
>>>>> SIP media tag level instead of SDP level.
>>>>> A call center can then register with their capabilities already at 
>>>>> the SIP REGISTER time, and the caller preferences / callee 
>>>>> capabilities mechanism from RFC 3840/ 3841 can be used to select 
>>>>> modalities and languages and route the call to the best capable 
>>>>> person or combination of person and assisting interpreting.
>>>>
>>>> Maybe, but one advantage of using SDP is that the ESInet can take 
>>>> language and media needs into account during policy-based routing.  
>>>> For example, in some European countries emergency calls placed by a 
>>>> speaker of language x in country y may be routed to a PSAP in a 
>>>> country where x is the native language.  Or, there might be 
>>>> regional or national text relay or sign language interpreter 
>>>> services as opposed to PSAP-level capabilities.
>>> Is there a complete specification for how policy based routing is 
>>> thought to work? Where?
>>> Does it not use RFC 3840/3841?
>>> That procedure is already supported by SIP servers. Using SDP 
>>> requires new SIP server programming.
>>
>> NENA has a document under development.  I thought it was able to take 
>> SDP into account but I'll look into it, and I'm sure Brian will have 
>> something to say.
> Yes, I think I have seen that. But it needs to come into IETF to be 
> possible to refer to.
>>
>>
>>>>> But, on the other hand, then we need a separate specification of 
>>>>> what modality the parameters indicate, because the language tags 
>>>>> only distinguish between signed and other languages, and "other" 
>>>>> seems to mean either spoken or written without any difference.
>>>>>
>>>>
>>>> The SDP media already indicates the type (audio, video, text).
>>> Yes, convenient. But there is no knowledge about the parameters 
>>> until at call time. It could be better to know the callee 
>>> capabilities in advance if available. Then middle boxes can do the 
>>> routing instead of the far end. There may be many terminals 
>>> competing for the call and the comparison about who to get it should 
>>> be done by a sip server instead of an endpoint.
>>
>> I think call time is the right time.  For emergency calls, it 
>> isolates the decision making about how to process calls requiring 
>> text, sign language, foreign language, etc. to the ESInet and PSAPs, 
>> which is I think the right place. The processing rules in the ESInet 
>> can then be changed without involving any carrier.  The capabilities 
>> of an entity may vary based on dynamic factors (time of day, load, 
>> etc.) so the decision as to how to support a need may be best made by 
>> the ESInet or PSAP in the case of an emergency call, or called party 
>> for non-emergency calls.  For example, at some times or under some 
>> loads, emergency calls may be routed to a specific PSAP that is not 
>> the geographically indicated one.  Likewise, a non-emergency call to 
>> a call center may be routed to a center in a country that has support 
>> for the language or media needed.
> The decision is of course made at call time. With the RFC 3840/3841 
> method, the different agents and services available register their 
> availability and capabilities when they go on duty, and unregister 
> when they stop, so that their information is available at call time.
>
>>
>> Further, it is often the case that the cost of relay, interpretation, 
>> or translation services is affected by which entity invokes the service.
> Yes, that is a complicating policy issue.
>>
>>
>>>>> *4. Problem that 3GPP specifies that it is the UAs only who 
>>>>> specify and act on these parameters.
>>>>> *I think it is a problem that 3GPP inserted the restriction that 
>>>>> the language and modality negotiation shall only bother the 
>>>>> involved UAs.
>>>>> It would be more natural that it is a service provider between 
>>>>> them who detect the differences and make the decision to invoke a 
>>>>> relay service for the relay case.
>>>>> How do you propose to solve that? Let the service provider behave 
>>>>> as a B2BUA, who then can behave as both a UA and a service provider?
>>>>
>>>> What do you mean by "service provider?"  In the case of a voice 
>>>> service provider such as a cellular carrier or a VoIP provider, I 
>>>> think this should be entirely transparent.  The voice service 
>>>> provider knows it is an emergency call and routes to an ESInet.  It 
>>>> is then up to the ESInet and the PSAPs to handle the call as they wish.
>>> It can be a service provider for just the function to make advanced 
>>> call invocation based on language preferences. The same type of 
>>> decisions, call connections and assisting service invocation are 
>>> needed in everyday calls as in emergency calls. But it can also be a 
>>> service provider for emergency services and the user is registered 
>>> by that service provider. They can make decisions on the call. E.g. 
>>> detect that it is an emergency call requiring interpreter, and 
>>> therefore connect to both the PSAP and interpreter at the same time 
>>> to save time.
>>
>> I think it's best to make these decisions at the end, not the 
>> middle.  In the case of emergency calls, the ESInet can route to a 
>> particular PSAP, the PSAP may bridge in translation or interpretation 
>> services, etc.  In the case of non-emergency calls, the call center 
>> may support some capabilities locally at some hours but route to a 
>> different call center at other times.
> The end is not decided until you have evaluated the alternative 
> possible ends and decided who has the right capability and preference.
>
>
>
> There is another issue with using sdp for decisions. SIP Message is 
> included in the set of methods to handle in emergency calls in RFC 
> 6443. It can be used within sessions to carry text messages if other 
> media are used as well. It is no favored way to have text 
> communication, but possible. SIP message has no sdp.  I know that the 
> 3GPP sections  about emergency calling in TS 22.101 points towards 
> using MSRP for text messaging, so it should not be an issue for 3GPP. 
> Can we neglect SIP Message from the discussion and aim at solving it 
> only for real-time conversational media?  I do not urge for solving it 
> for SIP Message, I just wanted to point out that result by basing the 
> mechanism on SDP.
>
>
>
>
>
>
> Will there be a possibility for remote participation on Thursday. I am 
> sorry I am not there, but would like to participate if possible.
> /Gunnar
>
> ------------------------------------------------------------------------
> Gunnar Hellström
> Omnitor
> gunnar.hellstrom@omnitor.se
> +46708204288
> On 2013-03-11 16:57, Randall Gellens wrote:
>> [[[ resending without Cc list ]]]
>>
>> Hi Dale,
>>
>> At 11:00 AM -0500 2/25/13, Dale R. Worley wrote:
>>
>>>  (It's not clear to me what the proper mailing list is to discuss this
>>>  draft.  From the headers of the messages, it appears that the primary
>>>  list is ietf@ietf.org, but the first message in this thread about that
>>>  draft already has a "Re:" in the subject line, so the discussion
>>>  started somewhere else.)
>>
>> There has been some discussion among those listed in the CC header of 
>> this message.  I think the mmusic list is probably the right place to 
>> continue the discussion and was planning on doing so more formally 
>> with the next revision of the draft.
>>
>> By the way, the draft was updated and is now at -02: 
>> http://www.ietf.org/internet-drafts/draft-gellens-negotiating-human-language-02.txt
>>
>> There is a face-to-face discussion Thursday 11:30-1:00 at The 
>> Tropicale (the cafe in the Caribe Royal).  Please let me know if you 
>> can make it.
>>
>>>  (Also, it's not clear why Randall's messages are coming through in
>>>  HTML.)
>>
>> My apologies; I have gotten in the habit when replying to messages 
>> that have style to allow Eudora to send my reply styled as well.
>>
>>
>>>  But onward to questions of substance:
>>>
>>>  - Why SDP and not SIP?
>>>
>>>  I'd like to see a more thorough exploration of why language
>>>  negotiation is to be handled in SDP rather than SIP.  (SIP, like HTTP,
>>>  uses the Content-Language header to specify languages.)  In principle,
>>>  specifying data that may be used in call-routing should be done in the
>>>  SIP layer, but it's well-accepted in the SIP world that call routing
>>>  may be affected by the SDP content as well (e.g., media types).
>>
>> I think it fits more naturally in SDP since the language is related 
>> to the media, e.g., English for audio and ASL for video.
>>
>>
>>>  And some discussion and comparison should be done with the SIP/HTTP
>>>  Content-Language header (used to specify the language of the
>>>  communications) and the SIP Accept-Language header (used to specify
>>>  the language of text components of SIP messages), particularly given
>>>  that Accept-Language has different set of language specifiers and a
>>>  richer syntax for specifying preferences.  In any case, preference
>>>  should be given to reusing one of the existing syntaxes for specifying
>>>  language preferences.
>>
>> I think the semantics of Content-Language and Accept-Language are 
>> different from the semantics here, especially when setting up a 
>> session with, as an example, an audio stream using English and a 
>> video stream using ASL.  (But I can see clients using a default value 
>> to set both the SDP language attribute and the HTTP Content-Language, 
>> unless configured differently.)
>>
>> As for reusing existing mechanisms, the draft does contain two 
>> alternative proposals, one to re-use the existing 'language' SDP 
>> attribute, and one to define a new attribute.
>>
>>>  - Dependency between media descriptions?
>>>
>>>     Another example would be a user who is able to speak but is deaf or
>>>     hard-of-hearing and requires a voice stream plus a text stream
>>>     (known as voice carry over).  Making language a media attribute
>>>     allows the standard session negotiation mechanism to handle this by
>>>     providing the information and mechanism for the endpoints to make
>>>     appropriate decisions.
>>>
>>>  This scenario suggests that there might be dependency or interaction
>>>  between language specifications for different media descriptions.
>>>  Whether this is needed should be determined and documented.
>>>
>>>  - Specifying preference levels?
>>>
>>>     For example, some users may be able to speak several languages, but
>>>     have a preference.
>>>
>>>  This might argue for describing degrees of preference using "q"
>>>  parameters (as in the SIP Accept-Language header).
>>>
>>>  - Expressing multiple languages in answers
>>>
>>>     (While it is true that a conversation among multilingual people
>>>     often involves multiple languages, it does not seem useful enough
>>>     as a general facility to warrant complicating the desired semantics
>>>     of the SDP attribute to allow negotiation of multiple simultaneous
>>>     languages within an interactive media stream.)
>>>
>>>  Why shouldn't an answer be able to indicate multiple languages?  At
>>>  the least, this might provide the offerer with useful information.
>>
>> You raise good questions that I think need more discussion.  I am 
>> hoping to keep the work as simple as possible and not add additional 
>> complexity, which argues for not solving every aspect of the problem, 
>> but only those that must be solved immediately.
>>
>>>
>>>  - Reusing a=lang
>>>
>>>  Searching, I can only find these descriptions of the use of
>>>  "a=lang:...":
>>>
>>>      RFC 4566
>>>      draft-saintandre-sip-xmpp-chat
>>>      draft-gellens-negotiating-human-language
>>>
>>>  So it looks like "a=lang:..." is entirely unused at the present and is
>>>  safe to be redefined.
>>
>>
>>
>>
>>
>
>
>
> _______________________________________________
> mmusic mailing list
> mmusic@ietf.org
> https://www.ietf.org/mailman/listinfo/mmusic