RE: [Speechsc] SPEECHSC vs 3GPP

"BRANDT,MARC (HP-France,ex2)" <marc.brandt@hp.com> Tue, 17 December 2002 20:24 UTC

Received: from www1.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA22594 for <speechsc-archive@odin.ietf.org>; Tue, 17 Dec 2002 15:24:18 -0500 (EST)
Received: (from mailnull@localhost) by www1.ietf.org (8.11.6/8.11.6) id gBHKQrk01695 for speechsc-archive@odin.ietf.org; Tue, 17 Dec 2002 15:26:53 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gBHKQqv01692 for <speechsc-web-archive@optimus.ietf.org>; Tue, 17 Dec 2002 15:26:52 -0500
Received: from www1.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA22585 for <speechsc-web-archive@ietf.org>; Tue, 17 Dec 2002 15:23:46 -0500 (EST)
Received: from www1.ietf.org (localhost.localdomain [127.0.0.1]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gBHKKLv01493; Tue, 17 Dec 2002 15:20:21 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gBHKJav01445 for <speechsc@optimus.ietf.org>; Tue, 17 Dec 2002 15:19:36 -0500
Received: from gremg1.net.external.hp.com (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA22422 for <speechsc@ietf.org>; Tue, 17 Dec 2002 15:16:28 -0500 (EST)
Received: from loire.grenoble.hp.com (loire.grenoble.hp.com [15.128.14.199]) by gremg1.net.external.hp.com (Postfix) with ESMTP id EE9AB16B for <speechsc@ietf.org>; Tue, 17 Dec 2002 21:19:27 +0100 (MET)
Received: by loire.grenoble.hp.com with Internet Mail Service (5.5.2655.55) id <Y0AKC0J9>; Tue, 17 Dec 2002 21:19:27 +0100
Message-ID: <468579AFDE99E74DB926952FCDE3D657017CA7AC@dumas.grenoble.hp.com>
From: "BRANDT,MARC (HP-France,ex2)" <marc.brandt@hp.com>
To: speechsc@ietf.org
Subject: RE: [Speechsc] SPEECHSC vs 3GPP
Date: Tue, 17 Dec 2002 21:19:26 +0100
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2655.55)
Content-Type: text/plain; charset="iso-8859-1"
Sender: speechsc-admin@ietf.org
Errors-To: speechsc-admin@ietf.org
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>

<-- sorry if dup, I had email pb posting to the list, and was
    not sure of the result, + this one is in ascii -->

some comments on this thread below, 

<layers>

I agree with the decomposition or unbundling model. 
Although there is no needs to support a  multitude of protocol stacks
profiles 
(aka speechsc over everything),  I believe speechsc needs to enable some
separation in 
terms of protocol  layers and service interfaces (this is also a personal
good learning 
of previous model  like OSI, TCP/IP... which are somewhat widely adopted in
the SIP, HTTP,
SOAP and other work as well). 
Separation of Control pipe and media pipe is also at the heart of modern
telecoms.

<command semantic>

I suspect in addition to the command/response protocol model for speechsc,
there needs to 
be an event based model as well (like a pub/sub).
Typically to support the detection of resources, or for better efficiency in
terms of 
media resource processing (modern programming has always shown interest for
poll as well 
as asynchronous models).

<extensions>

I also noticed one point in the discussion that is important, regarding the
extensions and 
openness to support evolutions of the resource control semantic. Speechsc
would benefit by 
not relying on new specifications to be created each time a new resource
control feature is 
available, this is imo well covered in the speechsc requirements. 

But going further and leaving this knowledge at the application level would
really enable 
speechsc to  become a framework for supporting disparate media resource
control semantic 
with not much protocol changes. Additionally 'pay load' or 'specific
resource profiles' could 
be described anytime a new one is added as standards extensions (e.g. like
the model for 
RTP pay loads...). 

This also leaves the room to application specific extensions and
differentiations while not 
violating standards. Enabling a 'programmable' approach when new media
resources are 
created / described and used by app servers.
An application may need to invoke a brand new control function without
having to rewrite 
the speechsc protocol layer.

Regarding that I tend to consider that adding verbs to the protocol might be
more a 
specification burden than using a descriptive means which can be evolved. At
the price of 
efficiency maybe. So I can agree that some verbs could be put at the core of
the protocol 
(like we have for HTTP, SIP and so on), and the rest more as a semantic pay
load 
(which can be standardized as well, see the XML payloads by OASIS).

For instance if I want to build a speech resource that translates streams
from voice to voice, 
will I need other verbs ? a new protocol ? Or if I build a resource that
combines functions 
and so on.

<efficient>

This is where we might think of proposing a framework with a way to create
optimized 
interactions. This is already done in some protocols where for instance you
can use 
abbreviated fields, or encoded fields instead of full XML stuff. Or by
analogy when using 
a framework for interpreting languages or compiling languages. But the
protocol shall 
certainly not be a hack just  to support efficiency (one can also bet on
Moore's law). 

<media resources scope>

I also agree with the point on multitude of possible multimedia resources
that will 
have to be controlled by an application. This was  initially discussed at
the req phase 
if I remember well.  With an output that initial focus of the 1st delivery
of speechsc 
shall be limited in order to avoid the full-picture syndrome and no protocol
at the end 
(and there is SPEECH in speechsc).  I believe that other IETF groups are
also holding 
worthwhile  discussions on this domain.

For instance, I would like to understand the opinions of the group on the
mmusic status 
from last IETF:
what about: XML Schema for Media Control in the mmusic minutes
http://www1.ietf.org/mail-archive/working-groups/mmusic/current/msg01105.htm
l 

draft-levin-mmusic-xml-media-control-00.txt
http://www.ietf.org/internet-drafts/draft-levin-mmusic-xml-media-control-00.
txt 

Of course each time we broaden the scope we ease programmable approaches and
thus wide 
developers adoption, but often at the price of efficiency provided by
limited scope 
approaches, really targeted at and tuned for specific resources (and
manufacturers ;-).

<underlying techno candidates>

Now in terms of technology I guess there are advantages in the likes of SIP,
SOAP, XML 
(anyway already widely used at the speech grammar or synthesis level in the
MRCP packets 
for instance), with all the extensibility  and programmability that they
provide. 
For instance, SIP can be extended, see the SIPPING an SIMPLE work to provide
open framework 
for other semantic to be built on top of it. SOAP clearly provides a good
invocation model 
for a 'programmable' framework.

<finally>

One value add of speechsc could then be to keep this programmability and
openness while 
delivering efficiency in the targeted application profiles (optimizing
connections set up, 
traffic, reuse of media paths and so on ...), e.g. providing new verbs for
these kind of 
core functions while using descriptive services for upper application/media
resources functions.
Refer to speechsc reqs: Re-use of transport connections across sessions,
Piggybacking of 
responses on requests in the reverse direction, Caching of  state across
requests ... 
these are functions that deserve standard treatment across a whole bunch of 
resources (core protocol).

Speechsc would then be completely independent of media resource semantic,
and only aware 
of the semantic of 'controlling' such resources for the best application
experience. 
A TTS resource would be speechsc compliant of the class TTS with such and
such features 
defined in the programmable layer pay load (+ room for vendor extensions).

One SIze protocol does not fit all layers.

Marc Brandt      - mailto:Marc.Brandt@hp.com
Hewlett-Packard  - OpenCall Business Unit 
5, av. r. chanas - eybens - 38053 grenoble cedex 9 - france
tel  : +33  4 7614 1088 (hp 779-1088)
fax : +33  4 7614 4323 (hp 779-4323)
https://ecardfile.com/id/Marc+Brandt
http://www.hp.com/communications/opencall/

-----Original Message-----
From: Jean Philippe Longeray [mailto:jean-philippe.longeray@netcentrex.net]
Sent: Tuesday, December 17, 2002 11:00 AM
To: brian.wyld@eloquant.com; 'Skip Cave'; speechsc@ietf.org
Cc: eburger@snowshore.com
Subject: RE: [Speechsc] SPEECHSC vs 3GPP


Brian,

Living in France, I'm very attached to the OSI model defined by ITU-T. 

I think it is really important to make some distinctions between
transportation and application protocol.
SIP is little bit poor for data transportation, but it exists and clearly
chosen by 3GPP and 3GPP2.

For my opinion ,  SPEECHSC could be something very closed from MRCP and
transported by SIP  INFO method.
Speechsc, like MRCP must only define the media control part and the way how
it can be transported by SIP (and optionally by others protocols like
H.225.0, RTSP, H.248, ...)

All media resources could be controlled by the same protocol (SPEECHSC). At
the streaming side, RTP/RTCP is ouf course engaged. I'm sure that a
MutliMedia VOIP core with optional peripheral gateways is the Next Gen
architecture for Telephony.

An extension of MRCP could be the answer. It is a great protocol, isn't it?
And It already works over RTSP (Nuance, Speechworks, Telisma...)

Find bellow some extension of MRCP,  SPEECHSC could cover:
- speaker verification,
- speaker identification,
- announcement, voice recording, 
- tones detection, tones generation,
- fax,
- audio conferencing,
- video conferencing,
- chat


SPEECHSC could be a Multimedia protocol, not only for Speech but also for
Video, Data, FAX ....3G!


Best regards.




Jean-Philippe LONGERAY 
R&D Director - Service NODE

NetCentrex 

Jean-philippe.longeray@netcentrex.net
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net

-----Original Message-----
From: Brian Wyld [mailto:brian.wyld@eloquant.com]
Sent: mardi 17 décembre 2002 09:47
To: 'Jean Philippe Longeray'; 'Skip Cave'; speechsc@ietf.org
Cc: eburger@snowshore.com
Subject: RE: [Speechsc] SPEECHSC vs 3GPP


Messieurs

Some interesting discussion here - to ease the job of the protocol eval doc
editor :-) perhaps someone would like to do a protocol analysis section for
3GPP H.248 (maybe just to rule it out?) - Jean-Philippe perhaps?

My 2c on the SPEECHSC/whatever - I think there is a first question to
resolve in my mind 
 Q1: what is the best model for SPEECHSC:
 - a layer OVER a media signalling protocol (SIP, RTSP, etc, depending on
this lower layer for media and session control, just like MRCP/RTSP
currently does)
    -> in which case what is the encapsulation mechanism - RTSP has ANNOUNCE
messages, what does SIP provide for this sort of bundling?
    -> and what is the "best" protocol to layer over
 - an extension to an existing media signalling protocol (eg, add MRCP
"verbs" as new ones in RTSP, or add as new SIP commands...)
 - a new protocol incorporating both media signalling, session control and
resource control (eg Web services extensions)

As for the identification and resolution of resource servers, this is for me
a separate functionality to SPEECHSC itself, and there are already multiple
mechanisms existing (SLP, UDDI, etc) for service location and discovery.

Brian
-----Message d'origine-----
De : speechsc-admin@ietf.org [mailto:speechsc-admin@ietf.org]De la part de
Jean Philippe Longeray
Envoyé : Tuesday, December 17, 2002 08:33
À : Skip Cave; speechsc@ietf.org
Cc : eburger@snowshore.com
Objet : RE: [Speechsc] SPEECHSC vs 3GPP


Hi skip,

You're right, I didn't say something different. 

Like MRCP, SPEECHSC is a media command protocol. SIP is not only a streaming
protocol, It can be used as a transport protocol, like HTTP, X.224, ... If
SIP transports SDP, it becomes a streaming protocol, but what don't you
thing it's not possible to transport SPEECHSC messages in SIP Content.

In your document, something is missing: You need something to find an
Resource Server (ASR, SVI, TTS), and I propose to use SIP softswitching,
This softswitch could be inserted between your Application Execution Server
and all others voice resource (It could be ASR, TTS, SVI, but also
Audio/Video streaming, conferencing, ....)


I think that draft-robinson-mrcp-sip-00 is a great example that I want to
say. Do you agree Eric?

Best regards.

Jean-Philippe LONGERAY 
R&D Director - Service NODE

NetCentrex 

Jean-philippe.longeray@netcentrex.net
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net

-----Original Message-----
From: Skip Cave [mailto:skip.cave@intervoice.com]
Sent: lundi 16 décembre 2002 20:47
To: speechsc@ietf.org
Cc: jean-philippe.longeray@netcentrex.net; eburger@snowshore.com
Subject: RE: [Speechsc] SPEECHSC vs 3GPP


Eric, Jean,

It's good that we agree. I believe that there has been some confusion in the
past that SpeechSC is a media streaming protocol. We need to list the basic
issues to make sure that we clear up that misconception: 

1) SpeechSC is NOT a media streaming protocol.
2) The SpeechSC protocol is strictly a command/response protocol, carrying
commands and returning responses from application servers to speech servers.
The SpeechSC protocol will never be a media transport protocol, and will
never carry any type of media. 
3) Even though the SpeechSC protocol is not a media transport protocol, the
SpeechSC protocol can be used to COMMAND speech servers to set up streaming
with another server using some type of streaming protocol (like SIP). Which
streaming protocol is used will be determined as part of the SpeechSC's
work.

For example, in my attached architecture diagram, the SpeechSC protocol
allows an Application Server commanding an ASR server to set up a SIP
session between the ASR server and a telephony platform (see attached
figure). Note that there is a SpeechSC command/control session between the
Application Server and the ASR Server, but there is no streaming media going
betweeen the Application & ASR Servers. There IS a standard SIP session
between the Speech server and the Telephony platform, that was set up by
commands given in the SpeechSC protocol. This SIP session does NOT carry any
commands other than standard SIP setup/teardown commands. 

An example of the command/response sequence in a SpeechSC Command stream
would be:

Request from Application server to Directory Services for an ASR server
Reply from Directory Services To Application Server giving info on specific
ASR Server
Command from Application Server to ASR Server to set up a specific command/
response session for a call (one command session per call context)
Response from ASR Server to Application Server acknowledging the completion
of the session set-up.
Command from Application Server to selected ASR server to set up SIP session
with a specific Telephony Server. The App Server gives the ASR server the
addresses of the Telephony Server so the App Server can set up the SIP
session. 
(ASR Server sets up SIP Session to Telephony Server))
Response from ASR Server indicating successful SIP session setup.
Command from Application Server to ASR Server to set up grammars and start
recognition on ASR Server.
Response from ASR Server to Application Server reporting a grammar match or
timeout from ASR Server.
etc. 

Again, this is shown in mu attached diagram.

Skip Cave 
Sr. Principal Engineer
Intervoice Inc.


>>> "Eric Burger" <eburger@snowshore.com> 12/16/02 08:29AM >>>
From a personal perspective, the MRCP over SIP proposal was what pushed me
over the edge to fix MRCP.  I would be hard pressed to try to convince the
IESG that there is a need for MRCP/RTSP, MRCP/SIP, MRCP/foo, ...  We need to
pick the one that makes the most sense.

-----Original Message-----
From: Jean Philippe Longeray [mailto:jean-philippe.longeray@netcentrex.net]
Sent: Monday, December 16, 2002 3:12 AM
To: Skip Cave; Eric Burger
Cc: speechsc@ietf.org
Subject: RE: [Speechsc] SPEECHSC vs 3GPP


Hi Skip,

Nice to have some news from Intervoice...

I agree. SIP  will never provide  multimedia control functionnalities, but
I'm sure it's a great protocol from transport (and mandatory for 3G).
WG has to define a couple of protocols, like MRCP/RTSP. What do you think of
SPEECHSC/SIP, which can be very closed from MRCP/SIP.

Regards.


Jean-Philippe LONGERAY 
R&D Director - Service NODE

NetCentrex 

Jean-philippe.longeray@netcentrex.net
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net

-----Original Message-----
From: Skip Cave [mailto:skip.cave@intervoice.com]
Sent: vendredi 13 décembre 2002 19:38
To: jean-philippe.longeray@netcentrex.net; eburger@snowshore.com
Cc: speechsc@ietf.org
Subject: RE: [Speechsc] SPEECHSC vs 3GPP


Jean, Eric,

I don't think SIP will support the separated media/control requirements I
posted earlier. You will need a control protocol, and a separate media
protocol. I expect that it will take a new protocol to meet these
requirements.

Skip Cave
Sr. Principal Engineer
Intervoice Inc. 


>>> "Jean Philippe Longeray" <jean-philippe.longeray@netcentrex.net>
12/13/02 01:25AM >>>
Thanks Eric for this analyze,

I agree. H.248 doesn't seems to be the right answer for MRFC/MRFP, even if
special packages can be provided.

For my opinion, SIP is the correct answer for transport layer (since it's
used everywhere in 3GPP and 3GPP2), and SPEECHSC could (should?) be used for
resource control.

Concerning SPEECHSC, 3.3 "Avoid Duplicating Existing Protocols", I would
like to add some remarks:

In case of you would like to insert a routing mechanism (a SIP soft-switch)
between Media Processing Entity / Application server and Resource server
(ASR, SI/SV, TTS, Announcement server), I could be interesting to have a
single transport protocol, like SIP, instead of several incompatible
protocols (RTSP for example) for the closes functionalities. I think it is
easier to add some redundancy, rather than conserving "old" protocols like
RTSP.

It seems to be very important to make distinctions between each layers of
model.
Something like UDP/SIP/SPEECHSC or TCP/SIP/SPEECHSC or SCTP/SIP/SPEECHSC,
could be an answer.



Regards.


Jean-Philippe LONGERAY
R&D Director - Service NODE

NetCentrex

Jean-philippe.longeray@netcentrex.net
<mailto:Jean-philippe.longeray@netcentrex.net>
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net



-----Original Message-----
From: Eric Burger [mailto:eburger@snowshore.com]
Sent: vendredi 13 décembre 2002 03:13
To: Jean Philippe Longeray
Cc: IETF SPEECHSC (E-mail)
Subject: RE: [Speechsc] SPEECHSC vs 3GPP


The Mp interface is (1) not the right concept and (2) is itself (IMHO) not
the correct choice for 3GPP's needs, either.

With respect to (1), Mp is trying to be an analog to the MGC/MG
decomposition for a media server, where the MRFC is a "media server
controller" and the MRFP is a "media server [processor]".  The types of
resources are bearer packet processors (e.g., tone detection, prompt
playing, and recording).  The protocol is a low-level device control
protocol (e.g., allocate a resource, allocate a RTP port, connect the port
to the resource, wait for a signal, etc.).  speechsc is a higher-level
protocol, concerned with things like 'establish session' and 'recognize
speech'.

In fact, early in the days of MRCP/speechsc, people wanted to extend the
speechsc scope to do device control.  The answer has consistently been to
use H.248 for device control.

With respect to (2), AFAIK, no one has ever built a MRFC.  I believe this is
because unlike a media gateway, where there are definite decomposition
benefits, there are really few if any benefits to decomposing the MRF.  In
fact, there are clear benefits to using the native application interface
(SIP), rather than the native gateway interface (H.248) for interfacing the
AS and CSCF to the MRF.

-----Original Message-----
From: Jean Philippe Longeray [mailto:jean-philippe.longeray@netcentrex.net]
Sent: Tuesday, December 10, 2002 5:53 AM
To: IETF SPEECHSC (E-mail)
Subject: [Speechsc] SPEECHSC vs 3GPP


Hi,

Do you ever compare SPEECHSC and MRFC/MRFP interface in 3GPP (TS 24.229
Rel5) architecture.

It looks like  SPEECHSC  is very closed from Mp interface (H.248).

Could SPEECHSC works with 3GPP (www.3gpp.org) , 3GPP2 (www.3gpp2.org) ,
3G.IP (www.3gip.org),  MWIF (www.mwif.org)?

Regards.

Jean-Philippe LONGERAY
R&D Director - Service NODE

NetCentrex

Jean-philippe.longeray@netcentrex.net
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc