Re: [MMUSIC] draft-gellens-negotiating-human-language-02

Randall Gellens <> Thu, 14 March 2013 14:41 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 5C96D11E8199 for <>; Thu, 14 Mar 2013 07:41:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -104.603
X-Spam-Status: No, score=-104.603 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_14=0.6, MIME_QP_LONG_LINE=1.396, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id zi6h0ovesS1S for <>; Thu, 14 Mar 2013 07:41:43 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 9DC3911E8175 for <>; Thu, 14 Mar 2013 07:41:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;;; q=dns/txt; s=qcdkim; t=1363272103; x=1394808103; h=message-id:in-reply-to:references:date:to:from:subject: cc:mime-version:content-transfer-encoding; bh=Ll+2Phker5gUZGoq2FfaiYiOuP181uus2eahhpzCtw4=; b=QaGB2x7mZwOaPSXY5SaRW1tL9aRvxHYUBgavb9sHVRQHN3zlkrkaqt7t cUptJyx6uDgHX6gCLdAq57ZwfXRcOHo4obwN1NglZuu4kb825z8NLP6jg aGFVk3+oQuv7czMcQ1i+BKVIPKYMzqEvKPQ+kDI/eQTENHaupfXnDlTb4 c=;
X-IronPort-AV: E=Sophos;i="4.84,845,1355126400"; d="scan'208";a="29436553"
Received: from ([]) by with ESMTP; 14 Mar 2013 07:41:43 -0700
X-IronPort-AV: E=Sophos;i="4.84,845,1355126400"; d="scan'208";a="505098478"
Received: from ([]) by with ESMTP/TLS/RC4-SHA; 14 Mar 2013 07:41:43 -0700
Received: from ( by ( with Microsoft SMTP Server (TLS) id 14.2.318.4; Thu, 14 Mar 2013 07:41:41 -0700
Message-ID: <>
In-Reply-To: <>
References: <> <> <>
X-Mailer: Eudora for Mac OS X
Date: Thu, 14 Mar 2013 07:41:37 -0700
To: Flemming Andreasen <>, Gunnar Hellstrom <>
From: Randall Gellens <>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-Random-Sig-Tag: 1.0b28
X-Originating-IP: []
Subject: Re: [MMUSIC] draft-gellens-negotiating-human-language-02
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 14 Mar 2013 14:41:45 -0000

At 9:56 AM -0400 3/14/13, Flemming Andreasen wrote:

>  On 3/11/13 5:44 PM, Gunnar Hellstrom wrote:
>>  Before this discussion got its home in mmusic, 
>> we discussed quite similar topics as you, Dale 
>> brought up now.
>>  It was about what was needed to be expressed 
>> by the parameters and if SDP or SIP was the 
>> right place. And in the case of SIP, if RFC 
>> 3840 / 3841 could be a suitable mechanism for 
>> routing and decisions on the parameters.
>  I agree more discussion is needed on this. 
> There seems to be two problems considered in 
> the draft:
>  1) Routing of a request to an answerer that has 
> the language capabilities the caller desires.
>  2) Negotiation of the language properties to 
> use on a per-stream basis once the call has 
> been routed to a particular answerer.
>  Problem 1 seems to fall in the RFC 3840/3841 
> space, whereas problem 2 is more of an SDP 
> issue.
>  -- Flemming

Language and media both need to be considered 
together when choosing how to process the call. 
For example, a call with a request for video with 
ASL, or text in English, may be handled 
differently than a call that requests only voice 
with English.

>>  Here is part of that discussion that we need to capture.
>>  I see some complication that might be needed 
>> in order to reflect reality. At least they 
>> should be discussed.
>>  And I am also seeing some different ways to specify it.
>>  The complications to discuss are:
>>  1. Level of preference.
>>  There may be a need for specifying levels of 
>> preference for languages.  I might strongly 
>> prefer to talk English, but have some useful 
>> capability in French. I want to display that 
>> preference and that capability with that 
>> difference, so that I get English whenever 
>> possible, but get the call connected even if 
>> English is not available at all but French is.
>>  I would assume that two levels are sufficient, 
>> but that can be discussed:  Preferred and 
>> capable.
>>>  The draft already proposes that languages be 
>>> listed in order of preference, which should 
>>> handle the example you mention: you list 
>>> English first and French second.  The called 
>>> party selects English if it is capable and 
>>> falls back to French if English is not and 
>>> French is.  This seems much simpler and is a 
>>> common way of handling situations where there 
>>> is a preference.  It would be good to keep 
>>> the mechanism as simple as possible.
>>>  Yes, I am afraid of complicating this beyond 
>>> the point when users do not manage to get 
>>> their settings right.
>>>  Still, I do not think that the order is 
>>> sufficient as level of preference indicator. 
>>> You may want to indicate capability for one 
>>> modality but preference for another. ( as my 
>>> example, capability for ASL, but preference 
>>> for talking and reading )
>>  If you have a capability for ASL but 
>> preference for talking and reading, you could 
>> initially offer two media streams: a voice 
>> with English and a text with English.  If 
>> accepted, you have your preferred 
>> communications.  If those are rejected you 
>> could then offer video with ASL.  Would that 
>> handle the case?
>>  No, video is still very valuable for judging 
>> the emergency case. Or seeing a friend. So, if 
>> you support it you want to offer it. But the 
>> decision on languages and modalities may end 
>> up in video being not important for language 
>> communication.
>>>>>>  2. Directionality
>>>>>>  There is a need for a direction of the 
>>>>>> language preference. "Transmit, receive or 
>>>>>> both". or   "Produce, perceive or both". 
>>>>>> That is easy to understand for the relay 
>>>>>> service examples.
>>>>>>  A hard-of-hearing user may declare:
>>>>>>  Text, capable, produce, English
>>>>>>  Text, prefer, perceive, English
>>>>>>  Audio, prefer, produce, English
>>>>>>  Audio, capable, perceive, English    ( 
>>>>>> tricky, a typical hard-of-hearing user may 
>>>>>> have benefit of receiving audio, while it 
>>>>>> is not usable enough for reliable 
>>>>>> perception. I do not want to see this 
>>>>>> eternally complex, but I see a need for 
>>>>>> refined expressions here)
>>>>>>  video, capable, both, ASL 
>>>>>>  This should be understood as that the user 
>>>>>> prefers to speak and get text back, and 
>>>>>> has benefit of getting voice in parallel 
>>>>>> with text.  ASL signing can be an 
>>>>>> alternative if the other party has 
>>>>>> corresponding capability or preference.
>>>>>  The draft does support this (and even 
>>>>> mentions some of these specific uses) 
>>>>> because it proposes an SDP media attribute, 
>>>>> and media can be specified to be send, 
>>>>> receive, or both.
>>>>  No, that is not the same. You want the media 
>>>> to flow, but by the parameter you want to 
>>>> indicate your preference for how to use it. 
>>>> You do not want to turn off incoming audio 
>>>> just because you prefer to talk but read 
>>>> text.
>>>  Yes, I see, thanks for the clarification. 
>>> Does this need to be part of the session 
>>> setup?  If you establish all media streams 
>>> that you wish to use, can you then just use 
>>> them as you prefer?  I will consult with the 
>>> NENA accessibility committee on this.
>>  No, there are specific services providing 
>> service with one direction but not the other. 
>> The information is needed for decision on what 
>> assisting service to invoke. One such service 
>> is the captioned telephony, that adds rapidly 
>> created speech-to-text in parallel with the 
>> voice. They provide just that. A user will 
>> have a very strong preference for getting just 
>> that service, but could accept with much lower 
>> preference to get a direct conversation with 
>> the far end in combined text and voice.
>>>>>>  I think it would be useful  to move most 
>>>>>> of the introduction to a structured use 
>>>>>> case chapter and express the different 
>>>>>> cases according to a template. Thast can 
>>>>>> then be used to test if proposed 
>>>>>> approaches will work.
>>>>>  I'm not sure I fully understand what you 
>>>>> mean by "structured" in "structured use 
>>>>> case" or "template."  Can you be more 
>>>>> specific?
>>>>  I mean just a simple template for how the 
>>>> use case descriptions are written.
>>>>  E.g.
>>>>  A title indicating what case we have.
>>>>  Description of the calling user and its capabilities and preferences.
>>>>  Description of the answering user and its capabilities and preferences
>>>>  Description of a possible assisting service 
>>>> and its capabilities and preferences
>>>>  Description of the calling user's indications.
>>>>  Description of the answering user's indications.
>>>>  The resulting decision and outcome
>>>>>>  3.  Specify language and modality at SIP Media tag level instead.
>>>>>>  There could be some benfits to declare 
>>>>>> these parameters at the SIP media tag 
>>>>>> level instead of SDP level.
>>>>>>  A call center can then register with their 
>>>>>> capabilities already at the SIP REGISTER 
>>>>>> time, and the caller preferences / callee 
>>>>>> capabilities mechanism from RFC 3840/ 3841 
>>>>>> can be used to select modalities and 
>>>>>> languages and route the call to the best 
>>>>>> capable person or combination of person 
>>>>>> and assisting interpreting.
>>>>>  Maybe, but one advantage of using SDP is 
>>>>> that the ESInet can take language and media 
>>>>> needs into account during policy-based 
>>>>> routing.  For example, in some European 
>>>>> countries emergency calls placed by a 
>>>>> speaker of language x in country y may be 
>>>>> routed to a PSAP in a country where x is 
>>>>> the native language.  Or, there might be 
>>>>> regional or national text relay or sign 
>>>>> language interpreter services as opposed to 
>>>>> PSAP-level capabilities.
>>>>  Is there a complete specification for how 
>>>> policy based routing is thought to work? 
>>>> Where?
>>>>  Does it not use RFC 3840/3841?
>>>>  That procedure is already supported by SIP 
>>>> servers. Using SDP requires new SIP server 
>>>> programming.
>>>  NENA has a document under development.  I 
>>> thought it was able to take SDP into account 
>>> but I'll look into it, and I'm sure Brian 
>>> will have something to say.
>>  Yes, I think I have seen that. But it needs to 
>> come into IETF to be possible to refer to.
>>>>>>  But, on the other hand, then we need a 
>>>>>> separate specification of what modality 
>>>>>> the parameters indicate, because the 
>>>>>> language tags only distinguish between 
>>>>>> signed and other languages, and "other" 
>>>>>> seems to mean either spoken or written 
>>>>>> without any difference.
>>>>>  The SDP media already indicates the type (audio, video, text).
>>>>  Yes, convenient. But there is no knowledge 
>>>> about the parameters until at call time. It 
>>>> could be better to know the callee 
>>>> capabilities in advance if available. Then 
>>>> middle boxes can do the routing instead of 
>>>> the far end. There may be many terminals 
>>>> competing for the call and the comparison 
>>>> about who to get it should be done by a sip 
>>>> server instead of an endpoint.
>>>  I think call time is the right time.  For 
>>> emergency calls, it isolates the decision 
>>> making about how to process calls requiring 
>>> text, sign language, foreign language, etc. 
>>> to the ESInet and PSAPs, which is I think the 
>>> right place.  The processing rules in the 
>>> ESInet can then be changed without involving 
>>> any carrier.  The capabilities of an entity 
>>> may vary based on dynamic factors (time of 
>>> day, load, etc.) so the decision as to how to 
>>> support a need may be best made by the ESInet 
>>> or PSAP in the case of an emergency call, or 
>>> called party for non-emergency calls.  For 
>>> example, at some times or under some loads, 
>>> emergency calls may be routed to a specific 
>>> PSAP that is not the geographically indicated 
>>> one.  Likewise, a non-emergency call to a 
>>> call center may be routed to a center in a 
>>> country that has support for the language or 
>>> media needed.
>>  The decision is of course made at call time. 
>> With the RFC 3840/3841 method, the different 
>> agents and services available register their 
>> availability and capabilities when they go on 
>> duty, and unregister when they stop, so that 
>> their information is available at call time.
>>>  Further, it is often the case that the cost 
>>> of relay, interpretation, or translation 
>>> services is affected by which entity invokes 
>>> the service.
>>  Yes, that is a complicating policy issue.
>>>>>>  4. Problem that 3GPP specifies that it is 
>>>>>> the UAs only who specify and act on these 
>>>>>> parameters.
>>>>>>  I think it is a problem that 3GPP inserted 
>>>>>> the restriction that the language and 
>>>>>> modality negotiation shall only bother the 
>>>>>> involved UAs.
>>>>>>  It would be more natural that it is a 
>>>>>> service provider between them who detect 
>>>>>> the differences and make the decision to 
>>>>>> invoke a relay service for the relay case.
>>>>>>  How do you propose to solve that? Let the 
>>>>>> service provider behave as a B2BUA, who 
>>>>>> then can behave as both a UA and a service 
>>>>>> provider?
>>>>>  What do you mean by "service provider?"  In 
>>>>> the case of a voice service provider such 
>>>>> as a cellular carrier or a VoIP provider, I 
>>>>> think this should be entirely transparent. 
>>>>> The voice service provider knows it is an 
>>>>> emergency call and routes to an ESInet.  It 
>>>>> is then up to the ESInet and the PSAPs to 
>>>>> handle the call as they wish.
>>>>  It can be a service provider for just the 
>>>> function to make advanced call invocation 
>>>> based on language preferences. The same type 
>>>> of decisions, call connections and assisting 
>>>> service invocation are needed in everyday 
>>>> calls as in emergency calls. But it can also 
>>>> be a service provider for emergency services 
>>>> and the user is registered by that service 
>>>> provider. They can make decisions on the 
>>>> call. E.g. detect that it is an emergency 
>>>> call requiring interpreter, and therefore 
>>>> connect to both the PSAP and interpreter at 
>>>> the same time to save time.
>>>  I think it's best to make these decisions at 
>>> the end, not the middle.  In the case of 
>>> emergency calls, the ESInet can route to a 
>>> particular PSAP, the PSAP may bridge in 
>>> translation or interpretation services, etc. 
>>> In the case of non-emergency calls, the call 
>>> center may support some capabilities locally 
>>> at some hours but route to a different call 
>>> center at other times.
>>  The end is not decided until you have 
>> evaluated the alternative possible ends and 
>> decided who has the right capability and 
>> preference.
>>  There is another issue with using sdp for 
>> decisions. SIP Message is included in the set 
>> of methods to handle in emergency calls in RFC 
>> 6443. It can be used within sessions to carry 
>> text messages if other media are used as well. 
>> It is no favored way to have text 
>> communication, but possible. SIP message has 
>> no sdp.  I know that the 3GPP sections  about 
>> emergency calling in TS 22.101 points towards 
>> using MSRP for text messaging, so it should 
>> not be an issue for 3GPP. Can we neglect SIP 
>> Message from the discussion and aim at solving 
>> it only for real-time conversational media?  I 
>> do not urge for solving it for SIP Message, I 
>> just wanted to point out that result by basing 
>> the mechanism on SDP.
>>  Will there be a possibility for remote 
>> participation on Thursday. I am sorry I am not 
>> there, but would like to participate if 
>> possible.
>>  /Gunnar
>>  Gunnar Hellström
>>  Omnitor
>>  <>
>>  +46708204288
>>  On 2013-03-11 16:57, Randall Gellens wrote:
>>>  [[[ resending without Cc list ]]]
>>>  Hi Dale,
>>>  At 11:00 AM -0500 2/25/13, Dale R. Worley wrote:
>>>>   (It's not clear to me what the proper mailing list is to discuss this
>>>>   draft.  From the headers of the messages, it appears that the primary
>>>>   list is 
>>>> <>, but the 
>>>> first message in this thread about that
>>>>   draft already has a "Re:" in the subject line, so the discussion
>>>>   started somewhere else.)
>>>  There has been some discussion among those 
>>> listed in the CC header of this message.  I 
>>> think the mmusic list is probably the right 
>>> place to continue the discussion and was 
>>> planning on doing so more formally with the 
>>> next revision of the draft.
>>>  By the way, the draft was updated and is now 
>>> at -02: 
>>> <>
>>>  There is a face-to-face discussion Thursday 
>>> 11:30-1:00 at The Tropicale (the cafe in the 
>>> Caribe Royal).  Please let me know if you can 
>>> make it.
>>>>   (Also, it's not clear why Randall's messages are coming through in
>>>>   HTML.)
>>>  My apologies; I have gotten in the habit when 
>>> replying to messages that have style to allow 
>>> Eudora to send my reply styled as well.
>>>>   But onward to questions of substance:
>>>>   - Why SDP and not SIP?
>>>>   I'd like to see a more thorough exploration of why language
>>>>   negotiation is to be handled in SDP rather than SIP.  (SIP, like HTTP,
>>>>   uses the Content-Language header to specify languages.)  In principle,
>>>>   specifying data that may be used in call-routing should be done in the
>>>>   SIP layer, but it's well-accepted in the SIP world that call routing
>>>>   may be affected by the SDP content as well (e.g., media types).
>>>  I think it fits more naturally in SDP since 
>>> the language is related to the media, e.g., 
>>> English for audio and ASL for video.
>>>>   And some discussion and comparison should be done with the SIP/HTTP
>>>>   Content-Language header (used to specify the language of the
>>>>   communications) and the SIP Accept-Language header (used to specify
>>>>   the language of text components of SIP messages), particularly given
>>>>   that Accept-Language has different set of language specifiers and a
>>>>   richer syntax for specifying preferences.  In any case, preference
>>>>   should be given to reusing one of the existing syntaxes for specifying
>>>>   language preferences.
>>>  I think the semantics of Content-Language and 
>>> Accept-Language are different from the 
>>> semantics here, especially when setting up a 
>>> session with, as an example, an audio stream 
>>> using English and a video stream using ASL. 
>>> (But I can see clients using a default value 
>>> to set both the SDP language attribute and 
>>> the HTTP Content-Language, unless configured 
>>> differently.)
>>>  As for reusing existing mechanisms, the draft 
>>> does contain two alternative proposals, one 
>>> to re-use the existing 'language' SDP 
>>> attribute, and one to define a new attribute.
>>>>   - Dependency between media descriptions?
>>>>      Another example would be a user who is able to speak but is deaf or
>>>>      hard-of-hearing and requires a voice stream plus a text stream
>>>>      (known as voice carry over).  Making language a media attribute
>>>>      allows the standard session negotiation mechanism to handle this by
>>>>      providing the information and mechanism for the endpoints to make
>>>>      appropriate decisions.
>>>>   This scenario suggests that there might be dependency or interaction
>>>>   between language specifications for different media descriptions.
>>>>   Whether this is needed should be determined and documented.
>>>>   - Specifying preference levels?
>>>>      For example, some users may be able to speak several languages, but
>>>>      have a preference.
>>>>   This might argue for describing degrees of preference using "q"
>>>>   parameters (as in the SIP Accept-Language header).
>>>>   - Expressing multiple languages in answers
>>>>      (While it is true that a conversation among multilingual people
>>>>      often involves multiple languages, it does not seem useful enough
>>>>      as a general facility to warrant complicating the desired semantics
>>>>      of the SDP attribute to allow negotiation of multiple simultaneous
>>>>      languages within an interactive media stream.)
>>>>   Why shouldn't an answer be able to indicate multiple languages?  At
>>>>   the least, this might provide the offerer with useful information.
>>>  You raise good questions that I think need 
>>> more discussion.  I am hoping to keep the 
>>> work as simple as possible and not add 
>>> additional complexity, which argues for not 
>>> solving every aspect of the problem, but only 
>>> those that must be solved immediately.
>>>>   - Reusing a=lang
>>>>   Searching, I can only find these descriptions of the use of
>>>>   "a=lang:...":
>>>>       RFC 4566
>>>>       draft-saintandre-sip-xmpp-chat
>>>>       draft-gellens-negotiating-human-language
>>>>   So it looks like "a=lang:..." is entirely unused at the present and is
>>>>   safe to be redefined.
>>  _______________________________________________
>>  mmusic mailing list
>>  <>
>> <>
>  _______________________________________________
>  mmusic mailing list

Randall Gellens
Opinions are personal;    facts are suspect;    I speak for myself only
-------------- Randomly selected tag: ---------------
A diva who specializes in risque arias is an off-coloratura soprano...