RE: [Speechsc] SPEECHSC vs 3GPP

some comments on this thread, 
(sorry if I inject ideas already discussed in previous meetings, was not
obvious on the mailings).

<layers>

I agree with the decomposition or unbundling model. 
Although there is no needs to support a  multitude of protocol stacks
profiles (aka speechsc over everything),  I believe speechsc needs to enable
some separation in terms of protocol  layers and service interfaces (this is
also a personal good learning of previous model  like OSI, TCP/IP... which
are somewhat widely adopted in the SIP, HTTP,  SOAP and other work as well).

Separation of Control pipe and media pipe is also at the heart of modern
telecoms.

<command semantic>

I suspect in addition to the command/response protocol model for speechsc,
there needs to be an event based model as well (like a pub/sub).
Typically to support the detection of resources, or for better efficiency in
terms of media resource processing (modern programming has always shown
interest for poll as well as asynchronous models).

<extensions>

I also noticed one point in the discussion that is important, regarding the
extensions and openness to support evolutions of the resource control
semantic. Speechsc would benefit by not relying on new specifications to be
created each time a new resource control feature is available, this is imo
well covered in the speechsc requirements. 

But going further and leaving this knowledge at the application level would
really enable speechsc to  become a framework for supporting disparate media
resource control semantic with not much protocol changes. Additionally 'pay
load' or 'specific resource profiles' could be described anytime a new one
is added as standards extensions (e.g. like the model for RTP pay loads...).

This also leaves the room to application specific extensions and
differentiations while not violating standards. Enabling a 'programmable'
approach when new media resources are created / described and used by app
servers.
An application may need to invoke a brand new control function without
having to rewrite the speechsc protocol layer.

Regarding that I tend to consider that adding verbs to the protocol might be
more a specification burden than using a descriptive means which can be
evolved. At the price of efficiency maybe. So I can agree that some verbs
could be put at the core of the protocol (like we have for HTTP, SIP and so
on), and the rest more as a semantic pay load (which can be standardized as
well, see the XML payloads by OASIS).

For instance if I want to build a speech resource that translates streams
from voice to voice, will I need other verbs ? a new protocol ? Or if I
build a resource that combines functions and so on.

<efficient>

This is where we might think of proposing a framework with a way to create
optimized interactions. This is already done in some protocols where for
instance you can use abbreviated fields, or encoded fields instead of full
XML stuff. Or by analogy when using a framework for interpreting languages
or compiling languages. But the protocol shall certainly not be a hack just
to support efficiency (one can also bet on Moore's law). 

<media resources scope>

I also agree with the point on multitude of possible multimedia resources
that will have to be controlled by an application. This was  initially
discussed at the req phase if I remember well.  With an output that initial
focus of the 1st delivery of speechsc shall be limited in order to avoid the
full-picture syndrome and no protocol at the end (and there is SPEECH in
speechsc).  I believe that other IETF groups are also holding worthwhile
discussions on this domain.

For instance, I would like to understand the opinions of the group on the
mmusic status from last IETF:
what about: XML Schema for Media Control in the mmusic minutes
http://www1.ietf.org/mail-archive/working-groups/mmusic/current/msg01105.htm
l
<http://www1.ietf.org/mail-archive/working-groups/mmusic/current/msg01105.ht
ml>  

draft-levin-mmusic-xml-media-control-00.txt
http://www.ietf.org/internet-drafts/draft-levin-mmusic-xml-media-control-00.
txt
<http://www.ietf.org/internet-drafts/draft-levin-mmusic-xml-media-control-00
.txt>  

Of course each time we broaden the scope we ease programmable approaches and
thus wide developers adoption, but often at the price of efficiency provided
by limited scope approaches, really targeted at and tuned for specific
resources (and manufacturers ;-).

<underlying techno candidates>

Now in terms of technology I guess there are advantages in the likes of SIP,
SOAP, XML (anyway already widely used at the speech grammar or synthesis
level in the MRCP packets for instance), with all the extensibility  and
programmability that they provide. 
For instance, SIP can be extended, see the SIPPING an SIMPLE work to provide
open framework for other semantic to be built on top of it. SOAP clearly
provides a good invocation model for a 'programmable' framework.

<finally>

One value add of speechsc could then be to keep this programmability and
openness while delivering efficiency in the targeted application profiles
(optimizing connections set up, traffic, reuse of media paths and so on
...), e.g. providing new verbs for these kind of core functions while using
descriptive services for upper application/media resources functions.
Refer to speechsc reqs: Re-use of transport connections across sessions,
Piggybacking of responses on requests in the reverse direction, Caching of
state across requests ... these are functions that deserve standard
treatment across a whole bunch of resources (core protocol).

Speechsc would then be completely independent of media resource semantic,
and only aware of the semantic of 'controlling' such resources for the best
application experience. A TTS resource would be speechsc compliant of the
class TTS with such and such features defined in the programmable layer pay
load (+ room for vendor extensions).

One SIze protocol does not fit all layers.

Marc Brandt      -  <mailto:Marc.Brandt@hp.com> mailto:Marc.Brandt@hp.com
Hewlett-Packard  - OpenCall Business Unit  <http://www.hp.com/go/opencall/> 
5, av. r. chanas - eybens - 38053 grenoble cedex 9 - france
tel  : +33  4 7614 1088 (hp 779-1088)
fax : +33  4 7614 4323 (hp 779-4323)
 <https://ecardfile.com/id/Marc+Brandt> https://ecardfile.com/id/Marc+Brandt
http://www.hp.com/communications/opencall/
<http://www.hp.com/communications/opencall/> 

-----Original Message-----
From: Jean Philippe Longeray [mailto:jean-philippe.longeray@netcentrex.net]
Sent: Tuesday, December 17, 2002 11:00 AM
To: brian.wyld@eloquant.com; 'Skip Cave'; speechsc@ietf.org
Cc: eburger@snowshore.com
Subject: RE: [Speechsc] SPEECHSC vs 3GPP

Brian,

Living in France, I'm very attached to the OSI model defined by ITU-T. 

I think it is really important to make some distinctions between
transportation and application protocol.
SIP is little bit poor for data transportation, but it exists and clearly
chosen by 3GPP and 3GPP2.

For my opinion ,  SPEECHSC could be something very closed from MRCP and
transported by SIP  INFO method.
Speechsc, like MRCP must only define the media control part and the way how
it can be transported by SIP (and optionally by others protocols like
H.225.0, RTSP, H.248, ...)

All media resources could be controlled by the same protocol (SPEECHSC). At
the streaming side, RTP/RTCP is ouf course engaged. I'm sure that a
MutliMedia VOIP core with optional peripheral gateways is the Next Gen
architecture for Telephony.

An extension of MRCP could be the answer. It is a great protocol, isn't it?
And It already works over RTSP (Nuance, Speechworks, Telisma...)

Find bellow some extension of MRCP,  SPEECHSC could cover:
- speaker verification,
- speaker identification,
- announcement, voice recording, 
- tones detection, tones generation,
- fax,
- audio conferencing,
- video conferencing,
- chat

SPEECHSC could be a Multimedia protocol, not only for Speech but also for
Video, Data, FAX ....3G!

Best regards.

Jean-Philippe LONGERAY 
R&D Director - Service NODE

NetCentrex 

Jean-philippe.longeray@netcentrex.net
<mailto:Jean-philippe.longeray@netcentrex.net> 
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net <http://www.netcentrex.net/> 

-----Original Message-----
From: Brian Wyld [mailto:brian.wyld@eloquant.com]
Sent: mardi 17 décembre 2002 09:47
To: 'Jean Philippe Longeray'; 'Skip Cave'; speechsc@ietf.org
Cc: eburger@snowshore.com
Subject: RE: [Speechsc] SPEECHSC vs 3GPP

Messieurs

Some interesting discussion here - to ease the job of the protocol eval doc
editor :-) perhaps someone would like to do a protocol analysis section for
3GPP H.248 (maybe just to rule it out?) - Jean-Philippe perhaps?

My 2c on the SPEECHSC/whatever - I think there is a first question to
resolve in my mind 
 Q1: what is the best model for SPEECHSC:
 - a layer OVER a media signalling protocol (SIP, RTSP, etc, depending on
this lower layer for media and session control, just like MRCP/RTSP
currently does)
    -> in which case what is the encapsulation mechanism - RTSP has ANNOUNCE
messages, what does SIP provide for this sort of bundling?
    -> and what is the "best" protocol to layer over
 - an extension to an existing media signalling protocol (eg, add MRCP
"verbs" as new ones in RTSP, or add as new SIP commands...)
 - a new protocol incorporating both media signalling, session control and
resource control (eg Web services extensions)

As for the identification and resolution of resource servers, this is for me
a separate functionality to SPEECHSC itself, and there are already multiple
mechanisms existing (SLP, UDDI, etc) for service location and discovery.

Brian

-----Message d'origine-----
De : speechsc-admin@ietf.org [mailto:speechsc-admin@ietf.org]De la part de
Jean Philippe Longeray
Envoyé : Tuesday, December 17, 2002 08:33
À : Skip Cave; speechsc@ietf.org
Cc : eburger@snowshore.com
Objet : RE: [Speechsc] SPEECHSC vs 3GPP

Hi skip,

You're right, I didn't say something different. 

Like MRCP, SPEECHSC is a media command protocol. SIP is not only a streaming
protocol, It can be used as a transport protocol, like HTTP, X.224, ... If
SIP transports SDP, it becomes a streaming protocol, but what don't you
thing it's not possible to transport SPEECHSC messages in SIP Content.

In your document, something is missing: You need something to find an
Resource Server (ASR, SVI, TTS), and I propose to use SIP softswitching,
This softswitch could be inserted between your Application Execution Server
and all others voice resource (It could be ASR, TTS, SVI, but also
Audio/Video streaming, conferencing, ....)

I think that draft-robinson-mrcp-sip-00 is a great example that I want to
say. Do you agree Eric?

Best regards.

Jean-Philippe LONGERAY 
R&D Director - Service NODE

NetCentrex 

Jean-philippe.longeray@netcentrex.net
<mailto:Jean-philippe.longeray@netcentrex.net> 
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net <http://www.netcentrex.net/> 

-----Original Message-----
From: Skip Cave [mailto:skip.cave@intervoice.com]
Sent: lundi 16 décembre 2002 20:47
To: speechsc@ietf.org
Cc: jean-philippe.longeray@netcentrex.net; eburger@snowshore.com
Subject: RE: [Speechsc] SPEECHSC vs 3GPP

Eric, Jean,

It's good that we agree. I believe that there has been some confusion in the
past that SpeechSC is a media streaming protocol. We need to list the basic
issues to make sure that we clear up that misconception: 

1) SpeechSC is NOT a media streaming protocol.
2) The SpeechSC protocol is strictly a command/response protocol, carrying
commands and returning responses from application servers to speech servers.
The SpeechSC protocol will never be a media transport protocol, and will
never carry any type of media. 
3) Even though the SpeechSC protocol is not a media transport protocol, the
SpeechSC protocol can be used to COMMAND speech servers to set up streaming
with another server using some type of streaming protocol (like SIP). Which
streaming protocol is used will be determined as part of the SpeechSC's
work.

For example, in my attached architecture diagram, the SpeechSC protocol
allows an Application Server commanding an ASR server to set up a SIP
session between the ASR server and a telephony platform (see attached
figure). Note that there is a SpeechSC command/control session between the
Application Server and the ASR Server, but there is no streaming media going
betweeen the Application & ASR Servers. There IS a standard SIP session
between the Speech server and the Telephony platform, that was set up by
commands given in the SpeechSC protocol. This SIP session does NOT carry any
commands other than standard SIP setup/teardown commands. 

An example of the command/response sequence in a SpeechSC Command stream
would be:

Request from Application server to Directory Services for an ASR server
Reply from Directory Services To Application Server giving info on specific
ASR Server
Command from Application Server to ASR Server to set up a specific command/
response session for a call (one command session per call context)
Response from ASR Server to Application Server acknowledging the completion
of the session set-up.
Command from Application Server to selected ASR server to set up SIP session
with a specific Telephony Server. The App Server gives the ASR server the
addresses of the Telephony Server so the App Server can set up the SIP
session. 
(ASR Server sets up SIP Session to Telephony Server))
Response from ASR Server indicating successful SIP session setup.
Command from Application Server to ASR Server to set up grammars and start
recognition on ASR Server.
Response from ASR Server to Application Server reporting a grammar match or
timeout from ASR Server.
etc. 

Again, this is shown in mu attached diagram.

Skip Cave 
Sr. Principal Engineer
Intervoice Inc.

>>> "Eric Burger" <eburger@snowshore.com> 12/16/02 08:29AM >>>
From a personal perspective, the MRCP over SIP proposal was what pushed me
over the edge to fix MRCP.  I would be hard pressed to try to convince the
IESG that there is a need for MRCP/RTSP, MRCP/SIP, MRCP/foo, ...  We need to
pick the one that makes the most sense.

-----Original Message-----
From: Jean Philippe Longeray [ mailto:jean-philippe.longeray@netcentrex.net]
<mailto:jean-philippe.longeray@netcentrex.net]> 
Sent: Monday, December 16, 2002 3:12 AM
To: Skip Cave; Eric Burger
Cc: speechsc@ietf.org
Subject: RE: [Speechsc] SPEECHSC vs 3GPP

Hi Skip,

Nice to have some news from Intervoice...

I agree. SIP  will never provide  multimedia control functionnalities, but
I'm sure it's a great protocol from transport (and mandatory for 3G).
WG has to define a couple of protocols, like MRCP/RTSP. What do you think of
SPEECHSC/SIP, which can be very closed from MRCP/SIP.

Regards.

Jean-Philippe LONGERAY 
R&D Director - Service NODE

NetCentrex 

Jean-philippe.longeray@netcentrex.net
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net <http://www.netcentrex.net> 

-----Original Message-----
From: Skip Cave [ mailto:skip.cave@intervoice.com]
<mailto:skip.cave@intervoice.com]> 
Sent: vendredi 13 décembre 2002 19:38
To: jean-philippe.longeray@netcentrex.net; eburger@snowshore.com
Cc: speechsc@ietf.org
Subject: RE: [Speechsc] SPEECHSC vs 3GPP

Jean, Eric,

I don't think SIP will support the separated media/control requirements I
posted earlier. You will need a control protocol, and a separate media
protocol. I expect that it will take a new protocol to meet these
requirements.

Skip Cave
Sr. Principal Engineer
Intervoice Inc. 

>>> "Jean Philippe Longeray" <jean-philippe.longeray@netcentrex.net>
12/13/02 01:25AM >>>
Thanks Eric for this analyze,

I agree. H.248 doesn't seems to be the right answer for MRFC/MRFP, even if
special packages can be provided.

For my opinion, SIP is the correct answer for transport layer (since it's
used everywhere in 3GPP and 3GPP2), and SPEECHSC could (should?) be used for
resource control.

Concerning SPEECHSC, 3.3 "Avoid Duplicating Existing Protocols", I would
like to add some remarks:

In case of you would like to insert a routing mechanism (a SIP soft-switch)
between Media Processing Entity / Application server and Resource server
(ASR, SI/SV, TTS, Announcement server), I could be interesting to have a
single transport protocol, like SIP, instead of several incompatible
protocols (RTSP for example) for the closes functionalities. I think it is
easier to add some redundancy, rather than conserving "old" protocols like
RTSP.

It seems to be very important to make distinctions between each layers of
model.
Something like UDP/SIP/SPEECHSC or TCP/SIP/SPEECHSC or SCTP/SIP/SPEECHSC,
could be an answer.

Regards.

Jean-Philippe LONGERAY
R&D Director - Service NODE

NetCentrex

Jean-philippe.longeray@netcentrex.net
< mailto:Jean-philippe.longeray@netcentrex.net
<mailto:Jean-philippe.longeray@netcentrex.net> >
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net <http://www.netcentrex.net> 

-----Original Message-----
From: Eric Burger [ mailto:eburger@snowshore.com]
<mailto:eburger@snowshore.com]> 
Sent: vendredi 13 décembre 2002 03:13
To: Jean Philippe Longeray
Cc: IETF SPEECHSC (E-mail)
Subject: RE: [Speechsc] SPEECHSC vs 3GPP

The Mp interface is (1) not the right concept and (2) is itself (IMHO) not
the correct choice for 3GPP's needs, either.

With respect to (1), Mp is trying to be an analog to the MGC/MG
decomposition for a media server, where the MRFC is a "media server
controller" and the MRFP is a "media server [processor]".  The types of
resources are bearer packet processors (e.g., tone detection, prompt
playing, and recording).  The protocol is a low-level device control
protocol (e.g., allocate a resource, allocate a RTP port, connect the port
to the resource, wait for a signal, etc.).  speechsc is a higher-level
protocol, concerned with things like 'establish session' and 'recognize
speech'.

In fact, early in the days of MRCP/speechsc, people wanted to extend the
speechsc scope to do device control.  The answer has consistently been to
use H.248 for device control.

With respect to (2), AFAIK, no one has ever built a MRFC.  I believe this is
because unlike a media gateway, where there are definite decomposition
benefits, there are really few if any benefits to decomposing the MRF.  In
fact, there are clear benefits to using the native application interface
(SIP), rather than the native gateway interface (H.248) for interfacing the
AS and CSCF to the MRF.

-----Original Message-----
From: Jean Philippe Longeray [ mailto:jean-philippe.longeray@netcentrex.net]
<mailto:jean-philippe.longeray@netcentrex.net]> 
Sent: Tuesday, December 10, 2002 5:53 AM
To: IETF SPEECHSC (E-mail)
Subject: [Speechsc] SPEECHSC vs 3GPP

Hi,

Do you ever compare SPEECHSC and MRFC/MRFP interface in 3GPP (TS 24.229
Rel5) architecture.

It looks like  SPEECHSC  is very closed from Mp interface (H.248).

Could SPEECHSC works with 3GPP (www.3gpp.org) , 3GPP2 (www.3gpp2.org) ,
3G.IP (www.3gip.org),  MWIF (www.mwif.org)?

Regards.

Jean-Philippe LONGERAY
R&D Director - Service NODE

NetCentrex

Jean-philippe.longeray@netcentrex.net
+ 33 4 72 53 61 33 - + 33 4 72 53 61 30
Mobile: + 33 6 76 48 34 95
http://www.netcentrex.net <http://www.netcentrex.net> 

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
<https://www1.ietf.org/mailman/listinfo/speechsc> 
_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc
<https://www1.ietf.org/mailman/listinfo/speechsc>