Re: [MMUSIC] draft-gellens-negotiating-human-language-02

Francois Audet <francois.audet@skype.net> Mon, 11 March 2013 22:24 UTC

Return-Path: <francois.audet@skype.net>
X-Original-To: mmusic@ietfa.amsl.com
Delivered-To: mmusic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E2AFE21F8E4C for <mmusic@ietfa.amsl.com>; Mon, 11 Mar 2013 15:24:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.398
X-Spam-Level:
X-Spam-Status: No, score=-1.398 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6, J_CHICKENPOX_34=0.6]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QsD9OhsUzN-P for <mmusic@ietfa.amsl.com>; Mon, 11 Mar 2013 15:24:50 -0700 (PDT)
Received: from na01-sn2-obe.outbound.o365filtering.com (na01-sn2-obe.ptr.o365filtering.com [157.55.158.25]) by ietfa.amsl.com (Postfix) with ESMTP id E488321F8B4C for <mmusic@ietf.org>; Mon, 11 Mar 2013 15:24:49 -0700 (PDT)
Received: from BY2SR01CA104.namsdf01.sdf.exchangelabs.com (10.255.93.149) by BY2SR01MB609.namsdf01.sdf.exchangelabs.com (10.255.93.168) with Microsoft SMTP Server (TLS) id 15.0.651.5; Mon, 11 Mar 2013 22:24:44 +0000
Received: from SN2FFOFD004.ffo.gbl (157.55.158.26) by BY2SR01CA104.outlook.office365.com (10.255.93.149) with Microsoft SMTP Server (TLS) id 15.0.651.5 via Frontend Transport; Mon, 11 Mar 2013 22:24:44 +0000
Received: from hybrid.exchange.microsoft.com (131.107.1.17) by SN2FFOFD004.mail.o365filtering.com (10.111.201.41) with Microsoft SMTP Server (TLS) id 15.0.641.0 via Frontend Transport; Mon, 11 Mar 2013 22:24:44 +0000
Received: from DFM-TK5MBX15-03.exchange.corp.microsoft.com (157.54.110.22) by DF-G14-01.exchange.corp.microsoft.com (157.54.87.87) with Microsoft SMTP Server (TLS) id 14.3.123.1; Mon, 11 Mar 2013 15:24:15 -0700
Received: from DFM-DB3MBX15-07.exchange.corp.microsoft.com (10.166.18.225) by DFM-TK5MBX15-03.exchange.corp.microsoft.com (157.54.110.22) with Microsoft SMTP Server (TLS) id 15.0.620.14; Mon, 11 Mar 2013 15:23:53 -0700
Received: from DFM-DB3MBX15-06.exchange.corp.microsoft.com (10.166.18.224) by DFM-DB3MBX15-07.exchange.corp.microsoft.com (10.166.18.225) with Microsoft SMTP Server (TLS) id 15.0.620.14; Mon, 11 Mar 2013 15:23:50 -0700
Received: from DFM-DB3MBX15-06.exchange.corp.microsoft.com ([169.254.10.250]) by DFM-DB3MBX15-06.exchange.corp.microsoft.com ([169.254.10.155]) with mapi id 15.00.0620.012; Mon, 11 Mar 2013 15:23:50 -0700
From: Francois Audet <francois.audet@skype.net>
To: Gunnar Hellstrom <gunnar.hellstrom@omnitor.se>, "mmusic@ietf.org" <mmusic@ietf.org>
Thread-Topic: [MMUSIC] draft-gellens-negotiating-human-language-02
Thread-Index: AQHOHqHkW+vZghnZxUGfZiQe0zOl0ZihDnRw
Date: Mon, 11 Mar 2013 22:23:50 +0000
Message-ID: <ba5960bc7f784aeab46571825f8ef969@DFM-DB3MBX15-06.exchange.corp.microsoft.com>
References: <p0624060ecd63af26fe28@dhcp-42ec.meeting.ietf.org> <513E504F.1010209@omnitor.se>
In-Reply-To: <513E504F.1010209@omnitor.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [157.54.51.13]
Content-Type: multipart/alternative; boundary="_000_ba5960bc7f784aeab46571825f8ef969DFMDB3MBX1506exchangeco_"
MIME-Version: 1.0
X-Forefront-Antispam-Report: CIP:131.107.1.17; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(377454001)(51694001)(377424002)(199002)(189002)(24454001)(479174001)(52034002)(51914002)(51444002)(47736001)(16406001)(5343655001)(33646001)(74502001)(20776003)(4396001)(63696002)(47446002)(80022001)(56776001)(47976001)(56816002)(512934001)(66066001)(54316002)(46102001)(5343635001)(59766001)(77982001)(16236675001)(31966008)(15202345002)(54356001)(65816001)(50986001)(876001)(561944001)(44976002)(51856001)(53806001)(16297215001)(74662001)(79102001)(76482001)(49866001)(69226001)(24736002); DIR:OUT; SFP:; SCL:1; SRVR:BY2SR01MB609; H:hybrid.exchange.microsoft.com; RD:mail1.exchange.microsoft.com; A:1; MX:1; LANG:en;
X-Forefront-PRVS: 0782EC617F
X-OriginatorOrg: msft.ccsctp.net
Subject: Re: [MMUSIC] draft-gellens-negotiating-human-language-02
X-BeenThere: mmusic@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <mmusic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mmusic>, <mailto:mmusic-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmusic>
List-Post: <mailto:mmusic@ietf.org>
List-Help: <mailto:mmusic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmusic>, <mailto:mmusic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Mar 2013 22:25:00 -0000

I tought about this a while ago. I guess this is a contact center scenario.

While it would certainly be possible to express your caller pref using q-values, it seems the wrong way to approach the problem. Really, what you want is the caller resource to be of a specific language. Therefore, I think that rather, the URL itself should have the information to map it to the appropriate agent. Kind of similar to 1-800-help-product, 1800-aide-produit, but with a SIP URI. You could use a parameter sip:help@example.com?lang=en or it could be in the URI itself, SIP:help?example.com/en or SIP:en@example.com.

The calling party would consult the directory and select the desired language. This approach also eliminates the confusing "feature" of having an answer in a different language than you expect...

My 2 cents.

From: mmusic-bounces@ietf.org [mailto:mmusic-bounces@ietf.org] On Behalf Of Gunnar Hellstrom
Sent: Monday, March 11, 2013 2:45 PM
To: mmusic@ietf.org
Subject: Re: [MMUSIC] draft-gellens-negotiating-human-language-02

Before this discussion got its home in mmusic, we discussed quite similar topics as you, Dale brought up now.

It was about what was needed to be expressed by the parameters and if SDP or SIP was the right place. And in the case of SIP, if RFC 3840 / 3841 could be a suitable mechanism for routing and decisions on the parameters.

Here is part of that discussion that we need to capture.


I see some complication that might be needed in order to reflect reality. At least they should be discussed.

And I am also seeing some different ways to specify it.

The complications to discuss are:

1. Level of preference.

There may be a need for specifying levels of preference for languages.  I might strongly prefer to talk English, but have some useful capability in French. I want to display that preference and that capability with that difference, so that I get English whenever possible, but get the call connected even if English is not available at all but French is.

I would assume that two levels are sufficient, but that can be discussed:  Preferred and capable.


The draft already proposes that languages be listed in order of preference, which should handle the example you mention: you list English first and French second.  The called party selects English if it is capable and falls back to French if English is not and French is.  This seems much simpler and is a common way of handling situations where there is a preference.  It would be good to keep the mechanism as simple as possible.
Yes, I am afraid of complicating this beyond the point when users do not manage to get their settings right.
Still, I do not think that the order is sufficient as level of preference indicator. You may want to indicate capability for one modality but preference for another. ( as my example, capability for ASL, but preference for talking and reading )

If you have a capability for ASL but preference for talking and reading, you could initially offer two media streams: a voice with English and a text with English.  If accepted, you have your preferred communications.  If those are rejected you could then offer video with ASL.  Would that handle the case?
No, video is still very valuable for judging the emergency case. Or seeing a friend. So, if you support it you want to offer it. But the decision on languages and modalities may end up in video being not important for language communication.





2. Directionality
There is a need for a direction of the language preference. "Transmit, receive or both". or   "Produce, perceive or both". That is easy to understand for the relay service examples.
A hard-of-hearing user may declare:

Text, capable, produce, English
Text, prefer, perceive, English
Audio, prefer, produce, English
Audio, capable, perceive, English    ( tricky, a typical hard-of-hearing user may have benefit of receiving audio, while it is not usable enough for reliable perception. I do not want to see this eternally complex, but I see a need for refined expressions here)
video, capable, both, ASL

This should be understood as that the user prefers to speak and get text back, and has benefit of getting voice in parallel with text.  ASL signing can be an alternative if the other party has corresponding capability or preference.


The draft does support this (and even mentions some of these specific uses) because it proposes an SDP media attribute, and media can be specified to be send, receive, or both.
No, that is not the same. You want the media to flow, but by the parameter you want to indicate your preference for how to use it.  You do not want to turn off incoming audio just because you prefer to talk but read text.

Yes, I see, thanks for the clarification.  Does this need to be part of the session setup?  If you establish all media streams that you wish to use, can you then just use them as you prefer?  I will consult with the NENA accessibility committee on this.
No, there are specific services providing service with one direction but not the other. The information is needed for decision on what assisting service to invoke. One such service is the captioned telephony, that adds rapidly created speech-to-text in parallel with the voice. They provide just that. A user will have a very strong preference for getting just that service, but could accept with much lower preference to get a direct conversation with the far end in combined text and voice.



I think it would be useful  to move most of the introduction to a structured use case chapter and express the different cases according to a template. Thast can then be used to test if proposed approaches will work.

I'm not sure I fully understand what you mean by "structured" in "structured use case" or "template."  Can you be more specific?
I mean just a simple template for how the use case descriptions are written.

E.g.
A title indicating what case we have.
Description of the calling user and its capabilities and preferences.
Description of the answering user and its capabilities and preferences
Description of a possible assisting service and its capabilities and preferences
Description of the calling user's indications.
Description of the answering user's indications.
The resulting decision and outcome




3.  Specify language and modality at SIP Media tag level instead.
There could be some benfits to declare these parameters at the SIP media tag level instead of SDP level.
A call center can then register with their capabilities already at the SIP REGISTER time, and the caller preferences / callee capabilities mechanism from RFC 3840/ 3841 can be used to select modalities and languages and route the call to the best capable person or combination of person and assisting interpreting.

Maybe, but one advantage of using SDP is that the ESInet can take language and media needs into account during policy-based routing.  For example, in some European countries emergency calls placed by a speaker of language x in country y may be routed to a PSAP in a country where x is the native language.  Or, there might be regional or national text relay or sign language interpreter services as opposed to PSAP-level capabilities.
Is there a complete specification for how policy based routing is thought to work? Where?
Does it not use RFC 3840/3841?
That procedure is already supported by SIP servers. Using SDP requires new SIP server programming.

NENA has a document under development.  I thought it was able to take SDP into account but I'll look into it, and I'm sure Brian will have something to say.
Yes, I think I have seen that. But it needs to come into IETF to be possible to refer to.



But, on the other hand, then we need a separate specification of what modality the parameters indicate, because the language tags only distinguish between signed and other languages, and "other" seems to mean either spoken or written without any difference.


The SDP media already indicates the type (audio, video, text).
Yes, convenient. But there is no knowledge about the parameters until at call time. It could be better to know the callee capabilities in advance if available. Then middle boxes can do the routing instead of the far end. There may be many terminals competing for the call and the comparison about who to get it should be done by a sip server instead of an endpoint.

I think call time is the right time.  For emergency calls, it isolates the decision making about how to process calls requiring text, sign language, foreign language, etc. to the ESInet and PSAPs, which is I think the right place.  The processing rules in the ESInet can then be changed without involving any carrier.  The capabilities of an entity may vary based on dynamic factors (time of day, load, etc.) so the decision as to how to support a need may be best made by the ESInet or PSAP in the case of an emergency call, or called party for non-emergency calls.  For example, at some times or under some loads, emergency calls may be routed to a specific PSAP that is not the geographically indicated one.  Likewise, a non-emergency call to a call center may be routed to a center in a country that has support for the language or media needed.
The decision is of course made at call time. With the RFC 3840/3841 method, the different agents and services available register their availability and capabilities when they go on duty, and unregister when they stop, so that their information is available at call time.



Further, it is often the case that the cost of relay, interpretation, or translation services is affected by which entity invokes the service.
Yes, that is a complicating policy issue.



4. Problem that 3GPP specifies that it is the UAs only who specify and act on these parameters.
I think it is a problem that 3GPP inserted the restriction that the language and modality negotiation shall only bother the involved UAs.
It would be more natural that it is a service provider between them who detect the differences and make the decision to invoke a relay service for the relay case.
How do you propose to solve that? Let the service provider behave as a B2BUA, who then can behave as both a UA and a service provider?

What do you mean by "service provider?"  In the case of a voice service provider such as a cellular carrier or a VoIP provider, I think this should be entirely transparent.  The voice service provider knows it is an emergency call and routes to an ESInet.  It is then up to the ESInet and the PSAPs to handle the call as they wish.
It can be a service provider for just the function to make advanced call invocation based on language preferences. The same type of decisions, call connections and assisting service invocation are needed in everyday calls as in emergency calls. But it can also be a service provider for emergency services and the user is registered by that service provider. They can make decisions on the call. E.g. detect that it is an emergency call requiring interpreter, and therefore connect to both the PSAP and interpreter at the same time to save time.

I think it's best to make these decisions at the end, not the middle.  In the case of emergency calls, the ESInet can route to a particular PSAP, the PSAP may bridge in translation or interpretation services, etc.  In the case of non-emergency calls, the call center may support some capabilities locally at some hours but route to a different call center at other times.
The end is not decided until you have evaluated the alternative possible ends and decided who has the right capability and preference.



There is another issue with using sdp for decisions. SIP Message is included in the set of methods to handle in emergency calls in RFC 6443. It can be used within sessions to carry text messages if other media are used as well. It is no favored way to have text communication, but possible. SIP message has no sdp.  I know that the 3GPP sections  about emergency calling in TS 22.101 points towards using MSRP for text messaging, so it should not be an issue for 3GPP. Can we neglect SIP Message from the discussion and aim at solving it only for real-time conversational media?  I do not urge for solving it for SIP Message, I just wanted to point out that result by basing the mechanism on SDP.






Will there be a possibility for remote participation on Thursday. I am sorry I am not there, but would like to participate if possible.
/Gunnar
________________________________
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se<mailto:gunnar.hellstrom@omnitor.se>
+46708204288
On 2013-03-11 16:57, Randall Gellens wrote:
[[[ resending without Cc list ]]]

Hi Dale,

At 11:00 AM -0500 2/25/13, Dale R. Worley wrote:


 (It's not clear to me what the proper mailing list is to discuss this
 draft.  From the headers of the messages, it appears that the primary
 list is ietf@ietf.org<mailto:ietf@ietf.org>, but the first message in this thread about that
 draft already has a "Re:" in the subject line, so the discussion
 started somewhere else.)

There has been some discussion among those listed in the CC header of this message.  I think the mmusic list is probably the right place to continue the discussion and was planning on doing so more formally with the next revision of the draft.

By the way, the draft was updated and is now at -02: http://www.ietf.org/internet-drafts/draft-gellens-negotiating-human-language-02.txt

There is a face-to-face discussion Thursday 11:30-1:00 at The Tropicale (the cafe in the Caribe Royal).  Please let me know if you can make it.


 (Also, it's not clear why Randall's messages are coming through in
 HTML.)

My apologies; I have gotten in the habit when replying to messages that have style to allow Eudora to send my reply styled as well.



 But onward to questions of substance:

 - Why SDP and not SIP?

 I'd like to see a more thorough exploration of why language
 negotiation is to be handled in SDP rather than SIP.  (SIP, like HTTP,
 uses the Content-Language header to specify languages.)  In principle,
 specifying data that may be used in call-routing should be done in the
 SIP layer, but it's well-accepted in the SIP world that call routing
 may be affected by the SDP content as well (e.g., media types).

I think it fits more naturally in SDP since the language is related to the media, e.g., English for audio and ASL for video.



 And some discussion and comparison should be done with the SIP/HTTP
 Content-Language header (used to specify the language of the
 communications) and the SIP Accept-Language header (used to specify
 the language of text components of SIP messages), particularly given
 that Accept-Language has different set of language specifiers and a
 richer syntax for specifying preferences.  In any case, preference
 should be given to reusing one of the existing syntaxes for specifying
 language preferences.

I think the semantics of Content-Language and Accept-Language are different from the semantics here, especially when setting up a session with, as an example, an audio stream using English and a video stream using ASL.  (But I can see clients using a default value to set both the SDP language attribute and the HTTP Content-Language, unless configured differently.)

As for reusing existing mechanisms, the draft does contain two alternative proposals, one to re-use the existing 'language' SDP attribute, and one to define a new attribute.


 - Dependency between media descriptions?

    Another example would be a user who is able to speak but is deaf or
    hard-of-hearing and requires a voice stream plus a text stream
    (known as voice carry over).  Making language a media attribute
    allows the standard session negotiation mechanism to handle this by
    providing the information and mechanism for the endpoints to make
    appropriate decisions.

 This scenario suggests that there might be dependency or interaction
 between language specifications for different media descriptions.
 Whether this is needed should be determined and documented.

 - Specifying preference levels?

    For example, some users may be able to speak several languages, but
    have a preference.

 This might argue for describing degrees of preference using "q"
 parameters (as in the SIP Accept-Language header).

 - Expressing multiple languages in answers

    (While it is true that a conversation among multilingual people
    often involves multiple languages, it does not seem useful enough
    as a general facility to warrant complicating the desired semantics
    of the SDP attribute to allow negotiation of multiple simultaneous
    languages within an interactive media stream.)

 Why shouldn't an answer be able to indicate multiple languages?  At
 the least, this might provide the offerer with useful information.

You raise good questions that I think need more discussion.  I am hoping to keep the work as simple as possible and not add additional complexity, which argues for not solving every aspect of the problem, but only those that must be solved immediately.



 - Reusing a=lang

 Searching, I can only find these descriptions of the use of
 "a=lang:...":

     RFC 4566
     draft-saintandre-sip-xmpp-chat
     draft-gellens-negotiating-human-language

 So it looks like "a=lang:..." is entirely unused at the present and is
 safe to be redefined.