Re: [MMUSIC] draft-gellens-mmusic-negotiating-human-language-01

Gunnar Hellstrom <gunnar.hellstrom@omnitor.se> Tue, 15 October 2013 02:45 UTC

Return-Path: <gunnar.hellstrom@omnitor.se>
X-Original-To: mmusic@ietfa.amsl.com
Delivered-To: mmusic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 332AD21E808F for <mmusic@ietfa.amsl.com>; Mon, 14 Oct 2013 19:45:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[AWL=0.498, BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Merq3pR9LDlV for <mmusic@ietfa.amsl.com>; Mon, 14 Oct 2013 19:45:19 -0700 (PDT)
Received: from vsp-authed-03-02.binero.net (vsp-authed02.binero.net [195.74.38.226]) by ietfa.amsl.com (Postfix) with ESMTP id B39E521F84C2 for <mmusic@ietf.org>; Mon, 14 Oct 2013 19:45:12 -0700 (PDT)
Received: from smtp01.binero.se (unknown [195.74.38.28]) by vsp-authed-03-02.binero.net (Halon Mail Gateway) with ESMTPS for <mmusic@ietf.org>; Tue, 15 Oct 2013 04:39:51 +0200 (CEST)
Received: from [192.168.0.7] (cpe-74-69-52-124.rochester.res.rr.com [74.69.52.124]) (Authenticated sender: gunnar.hellstrom@omnitor.se) by smtp-05-01.atm.binero.net (Postfix) with ESMTPA id 5B36A3A168 for <mmusic@ietf.org>; Tue, 15 Oct 2013 04:39:50 +0200 (CEST)
Message-ID: <525CAAF8.8060303@omnitor.se>
Date: Mon, 14 Oct 2013 22:39:52 -0400
From: Gunnar Hellstrom <gunnar.hellstrom@omnitor.se>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: mmusic@ietf.org
References: <51EEC4A8.9050108@alum.mit.edu> <51EF0546.6000604@omnitor.se> <51EF1A54.2070302@alum.mit.edu> <51EF78CA.6040202@omnitor.se>
In-Reply-To: <51EF78CA.6040202@omnitor.se>
Content-Type: multipart/mixed; boundary="------------040608040603020104040905"
Subject: Re: [MMUSIC] draft-gellens-mmusic-negotiating-human-language-01
X-BeenThere: mmusic@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <mmusic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mmusic>, <mailto:mmusic-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmusic>
List-Post: <mailto:mmusic@ietf.org>
List-Help: <mailto:mmusic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmusic>, <mailto:mmusic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Oct 2013 02:45:25 -0000

Randall and all,

Attached is a version of the draft on negotiating human language with a 
discussion added about use of Caller preferences/Callee capabilities or 
SDP, including a proposed coding based on RFC 3840/3841.

I am afraid that this is getting a bit complex, both the specification 
in SDP and the one in SIP, but I see this structure needed to express 
all kinds of user needs and relay service types.

I also realize that the declarations in sdp about media capabilities 
represent a good source of information that could be used for routing 
decisions, so a merge of the SDP and SIP based methods might be of 
interest.

I mean that it for example does not make sense to send a call with video 
to a callee that cannot handle video.

Sorry for not providing a nice formatted draft.
There are use cases added, and the sections on coding according to RFC 
3840 and 3841.

In RFC 3840 / 3841, there is a concept of implicit capabilities. If that 
could be extended to include contents of sdp, we could possibly combine 
the SIP and SDP based versions to something doable.

You have mentioned a couple of times that we could take a simple 
approach and not complicate it with asymmetric capabilities or 
preferences in different directions. I tend to agree that that must 
still be an option. A coarse selection of service could be done anyway. 
Selection of exact performance of a relay service including the 
asymmetric aspects would then need to be handled in some other way.

Regards
Gunnar
------------------------------------------------------------------------

On 2013-07-24 02:48, Gunnar Hellstrom wrote:
> On 2013-07-24 02:05, Paul Kyzivat wrote:
>> On 7/23/13 6:35 PM, Gunnar Hellstrom wrote:
>>>
>>> On 2013-07-23 20:00, Paul Kyzivat wrote:
>>>> I just reviewed this new version. I have a number of comments:
>>> Good comments Paul,
>>>>
>>>> * Section 5:
>>>>
>>>> You don't go into the details of the O/A rules. But I infer you assume
>>>> that the answer will be a subset of the offer, with the priorities
>>>> possibly changed. This is a common strategy with SDP O/A, but it is
>>>> limiting because it doesn't allow the answerer to indicate
>>>> capabilities it has that weren't mentioned by the offerer. Knowing
>>>> that could be useful, especially if a mismatch is going to result in
>>>> recruiting a relay service.
>>>>
>>>> I suggest that answer allow the answerer to indicate all of its
>>>> capabilities and preferences, and at the same time indicate choices
>>>> made. (If a choice was made.)
>>> <GH>Yes, very useful to indicate all.
>>>>
>>>> * Section 7.1:
>>>>
>>>> Instead of humanintlang-send and humanintlang-recv, how about just
>>>> giving the lang tag and a send/recv/sendrecv parameter with it? This
>>>> would be more concise when sendrecv can be used.
>>>>
>>>> Suggest using a parameter (q) to indicate relative preference for
>>>> languages, rather than ordering. This can show degree of preference,
>>>> and can show when multiple languages have the same preference.
>>>>
>>> <GH>Have you seen my request for an indication of level of preference
>>> between the different modalities? A possiblility to indicate for 
>>> example
>>> that British Sign Language is highly preferred, but English text 
>>> possible.
>>> The q values could be used for coding of such relative level of
>>> preference even between modalities. a q=1 for BSL in video, and a q=0.1
>>> for English in text.   Right?
>>
>> That would work within a single medium.
>>
>> But I don't think we can use that mechanism to trade off between 
>> different media. For that I think some other, more complex, mechanism 
>> would be needed.
>>
>> Or else we would need a more complex definition of which things are 
>> being prioritized against one another in order to use the q-values.
> I proposed a simpler mechanism in a recent mail. It was to insert at 
> the end of the humintlang tag a character a "!" for highly preferred, 
> nothing for manageable and "#" for
> a non-preferred but possible fall-back language and modality.
>
> I think it will be very important for most users to have a chance to 
> indicate this three-level preference order between different media as 
> well as between different languages within one medium.
>
> I am hesitating a bit for the type of coding I proposed, and saw your 
> proposal of q-value as something cleaner. When coding, you could make 
> sure that the last resort medium gets q-values under 0.4, the strongly 
> preferred between 0.7 and 1 and the managed ones between 0.4 and 0.6.
>
> You can see the earlier motivation for the preference between 
> modalities in the archive at:
> http://www.ietf.org/mail-archive/web/mmusic/current/msg11871.html
>
>
>>
>>>> Example:
>>>>
>>>> a=humanintlang:EN;sendrecv;q=1,ES;recv;q=.5
>>>>
>>>> To accommodate my comment about section 5, perhaps the answer can
>>>> "mark" the alternative(s) that have been chosen for use, if any. (I'm
>>>> not sure I like this, but for now, lets assume a "*" suffix on the
>>>> direction in the answer.) E.g., assuming the offer above, then an
>>>> answer like:
>>>>
>>>> a=humanintlang:EN;sendrecv*;q=1,FR:recv;q=.5
>>>>
>>>> Also in this section, the following:
>>>>
>>>>    Note that while signed language tags are used with a video 
>>>> stream to
>>>>    indicate sign language, a spoken language tag for a video stream in
>>>>    parallel with an audio stream with the same spoken language tag
>>>>    indicates a request for a supplemental video stream to see the
>>>>    speaker.
>>>>
>>>> might be ambiguous in the presence of many streams. It might be good
>>>> to define a grouping framework (RFC 5888) usage to group an audio
>>>> stream with a video stream that contains the speaker(s) of the audio.
>>> <GH>This is both a threat and a solution. There are already other
>>> reasons to group media in SDP. We have a risk of ending up in a very
>>> complex coding in SDP to get this correctly specified in an environment
>>> where there are also other reasons to group m-lines. This is why I am
>>> still hesitating if the whole thing would not fit better as tags on the
>>> SIP level.
>>
>> I notice Henning just asked about this today, in a different forum.
>>
>> Perhaps some things are better done at the sip level, and others at 
>> the SDP level.
>>
>> Its also possible that correlations between different media streams 
>> should be left to other mechanisms. E.g., CLUE could signal better 
>> correlations between audio and video.
> Yes, but here we have three media. video, audio and text. CLUE need to 
> take text on board.
>
> But you are right, if this mechanism is moved to SIP level, then we 
> get the complex task to point out to what m-lines with the same medium 
> do the humintlang-tags for that medium relate to. Simplifications need 
> to be taken, so that we do not end up in specifying another CLUE.
>
> I think that part of the background for this draft is that the 
> mechanism shall be usable in 3GPP Multimedia Telephony . That might be 
> a reason why it ended up specified in SDP.
>
> Gunnar
>>
>>     Thanks,
>>     Paul
>>
>>>>
>>>> Also in this section, the following:
>>>>
>>>>    Clients acting on behalf of end users are expected to set one or 
>>>> both
>>>>    'humintlang-send' and 'humintlang-recv' attributes on each media
>>>>    stream in an offer when placing an outgoing session but ignore the
>>>>    attributes when receiving incoming calls.  Systems acting on behalf
>>>>    of call centers and PSAPs are expected to take into account the
>>>>    values when processing inbound calls.
>>>>
>>>> Are you intending to exclude clients taking a more active role? (E.g.,
>>>> Isn't it possible that a client receiving an incoming call in a
>>>> language it doesn't support might bridge in a relay service?)
>>> <GH>I have asked for deletion of this limiting paragraph.
>>>>
>>>> Should a client that can't do anything useful with the language info
>>>> still indicate its language preferences in the answer? I would think
>>>> this would still be helpful. It might allow the caller, or a
>>>> middlebox, to compensate in some way. What I suggest above allows 
>>>> that.
>>> <GH>Yes, I prefer that view. The answering part shall answer if it can.
>>> middleboxes or the offering part may do the invocation of services to
>>> cover the gaps.
>>>>
>>>> * Section 7.2 (Advisory vs Required):
>>>>
>>>> Rather than "*" suffix, would be cleaner to have a separate indicator.
>>>> And it seems like this should not be on a particular language tag. If
>>>> you speak both Spanish and English, you may want the call to succeed
>>>> if the callee can do one or the other, and otherwise fail. A tag per
>>>> language doesn't do that.
>>>
>>> <GH>Yes, tricky.  I am getting a feeling that we should start a draft
>>> describing usage in parallel to this one, so that use cases can be
>>> verified.
>>>
>>>>
>>>> Need more discussion for how to denote this.
>>>>
>>>>     Thanks,
>>>>     Paul
>>>>
>>> Gunnar
>>>> _______________________________________________
>>>> mmusic mailing list
>>>> mmusic@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/mmusic
>>>
>>> _______________________________________________
>>> mmusic mailing list
>>> mmusic@ietf.org
>>> https://www.ietf.org/mailman/listinfo/mmusic
>>>
>>
>> _______________________________________________
>> mmusic mailing list
>> mmusic@ietf.org
>> https://www.ietf.org/mailman/listinfo/mmusic
>