RE: [Speechsc] SPEECHSC vs 3GPP
"BRANDT,MARC (HP-France,ex2)" <marc.brandt@hp.com> Tue, 17 December 2002 20:24 UTC
Received: from www1.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA22594 for <speechsc-archive@odin.ietf.org>; Tue, 17 Dec 2002 15:24:18 -0500 (EST)
Received: (from mailnull@localhost) by www1.ietf.org (8.11.6/8.11.6) id gBHKQrk01695 for speechsc-archive@odin.ietf.org; Tue, 17 Dec 2002 15:26:53 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gBHKQqv01692 for <speechsc-web-archive@optimus.ietf.org>; Tue, 17 Dec 2002 15:26:52 -0500
Received: from www1.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA22585 for <speechsc-web-archive@ietf.org>; Tue, 17 Dec 2002 15:23:46 -0500 (EST)
Received: from www1.ietf.org (localhost.localdomain [127.0.0.1]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gBHKKLv01493; Tue, 17 Dec 2002 15:20:21 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gBHKJav01445 for <speechsc@optimus.ietf.org>; Tue, 17 Dec 2002 15:19:36 -0500
Received: from gremg1.net.external.hp.com (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA22422 for <speechsc@ietf.org>; Tue, 17 Dec 2002 15:16:28 -0500 (EST)
Received: from loire.grenoble.hp.com (loire.grenoble.hp.com [15.128.14.199]) by gremg1.net.external.hp.com (Postfix) with ESMTP id EE9AB16B for <speechsc@ietf.org>; Tue, 17 Dec 2002 21:19:27 +0100 (MET)
Received: by loire.grenoble.hp.com with Internet Mail Service (5.5.2655.55) id <Y0AKC0J9>; Tue, 17 Dec 2002 21:19:27 +0100
Message-ID: <468579AFDE99E74DB926952FCDE3D657017CA7AC@dumas.grenoble.hp.com>
From: "BRANDT,MARC (HP-France,ex2)" <marc.brandt@hp.com>
To: speechsc@ietf.org
Subject: RE: [Speechsc] SPEECHSC vs 3GPP
Date: Tue, 17 Dec 2002 21:19:26 +0100
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2655.55)
Content-Type: text/plain; charset="iso-8859-1"
Sender: speechsc-admin@ietf.org
Errors-To: speechsc-admin@ietf.org
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
<-- sorry if dup, I had email pb posting to the list, and was not sure of the result, + this one is in ascii --> some comments on this thread below, <layers> I agree with the decomposition or unbundling model. Although there is no needs to support a multitude of protocol stacks profiles (aka speechsc over everything), I believe speechsc needs to enable some separation in terms of protocol layers and service interfaces (this is also a personal good learning of previous model like OSI, TCP/IP... which are somewhat widely adopted in the SIP, HTTP, SOAP and other work as well). Separation of Control pipe and media pipe is also at the heart of modern telecoms. <command semantic> I suspect in addition to the command/response protocol model for speechsc, there needs to be an event based model as well (like a pub/sub). Typically to support the detection of resources, or for better efficiency in terms of media resource processing (modern programming has always shown interest for poll as well as asynchronous models). <extensions> I also noticed one point in the discussion that is important, regarding the extensions and openness to support evolutions of the resource control semantic. Speechsc would benefit by not relying on new specifications to be created each time a new resource control feature is available, this is imo well covered in the speechsc requirements. But going further and leaving this knowledge at the application level would really enable speechsc to become a framework for supporting disparate media resource control semantic with not much protocol changes. Additionally 'pay load' or 'specific resource profiles' could be described anytime a new one is added as standards extensions (e.g. like the model for RTP pay loads...). This also leaves the room to application specific extensions and differentiations while not violating standards. Enabling a 'programmable' approach when new media resources are created / described and used by app servers. An application may need to invoke a brand new control function without having to rewrite the speechsc protocol layer. Regarding that I tend to consider that adding verbs to the protocol might be more a specification burden than using a descriptive means which can be evolved. At the price of efficiency maybe. So I can agree that some verbs could be put at the core of the protocol (like we have for HTTP, SIP and so on), and the rest more as a semantic pay load (which can be standardized as well, see the XML payloads by OASIS). For instance if I want to build a speech resource that translates streams from voice to voice, will I need other verbs ? a new protocol ? Or if I build a resource that combines functions and so on. <efficient> This is where we might think of proposing a framework with a way to create optimized interactions. This is already done in some protocols where for instance you can use abbreviated fields, or encoded fields instead of full XML stuff. Or by analogy when using a framework for interpreting languages or compiling languages. But the protocol shall certainly not be a hack just to support efficiency (one can also bet on Moore's law). <media resources scope> I also agree with the point on multitude of possible multimedia resources that will have to be controlled by an application. This was initially discussed at the req phase if I remember well. With an output that initial focus of the 1st delivery of speechsc shall be limited in order to avoid the full-picture syndrome and no protocol at the end (and there is SPEECH in speechsc). I believe that other IETF groups are also holding worthwhile discussions on this domain. For instance, I would like to understand the opinions of the group on the mmusic status from last IETF: what about: XML Schema for Media Control in the mmusic minutes http://www1.ietf.org/mail-archive/working-groups/mmusic/current/msg01105.htm l draft-levin-mmusic-xml-media-control-00.txt http://www.ietf.org/internet-drafts/draft-levin-mmusic-xml-media-control-00. txt Of course each time we broaden the scope we ease programmable approaches and thus wide developers adoption, but often at the price of efficiency provided by limited scope approaches, really targeted at and tuned for specific resources (and manufacturers ;-). <underlying techno candidates> Now in terms of technology I guess there are advantages in the likes of SIP, SOAP, XML (anyway already widely used at the speech grammar or synthesis level in the MRCP packets for instance), with all the extensibility and programmability that they provide. For instance, SIP can be extended, see the SIPPING an SIMPLE work to provide open framework for other semantic to be built on top of it. SOAP clearly provides a good invocation model for a 'programmable' framework. <finally> One value add of speechsc could then be to keep this programmability and openness while delivering efficiency in the targeted application profiles (optimizing connections set up, traffic, reuse of media paths and so on ...), e.g. providing new verbs for these kind of core functions while using descriptive services for upper application/media resources functions. Refer to speechsc reqs: Re-use of transport connections across sessions, Piggybacking of responses on requests in the reverse direction, Caching of state across requests ... these are functions that deserve standard treatment across a whole bunch of resources (core protocol). Speechsc would then be completely independent of media resource semantic, and only aware of the semantic of 'controlling' such resources for the best application experience. A TTS resource would be speechsc compliant of the class TTS with such and such features defined in the programmable layer pay load (+ room for vendor extensions). One SIze protocol does not fit all layers. Marc Brandt - mailto:Marc.Brandt@hp.com Hewlett-Packard - OpenCall Business Unit 5, av. r. chanas - eybens - 38053 grenoble cedex 9 - france tel : +33 4 7614 1088 (hp 779-1088) fax : +33 4 7614 4323 (hp 779-4323) https://ecardfile.com/id/Marc+Brandt http://www.hp.com/communications/opencall/ -----Original Message----- From: Jean Philippe Longeray [mailto:jean-philippe.longeray@netcentrex.net] Sent: Tuesday, December 17, 2002 11:00 AM To: brian.wyld@eloquant.com; 'Skip Cave'; speechsc@ietf.org Cc: eburger@snowshore.com Subject: RE: [Speechsc] SPEECHSC vs 3GPP Brian, Living in France, I'm very attached to the OSI model defined by ITU-T. I think it is really important to make some distinctions between transportation and application protocol. SIP is little bit poor for data transportation, but it exists and clearly chosen by 3GPP and 3GPP2. For my opinion , SPEECHSC could be something very closed from MRCP and transported by SIP INFO method. Speechsc, like MRCP must only define the media control part and the way how it can be transported by SIP (and optionally by others protocols like H.225.0, RTSP, H.248, ...) All media resources could be controlled by the same protocol (SPEECHSC). At the streaming side, RTP/RTCP is ouf course engaged. I'm sure that a MutliMedia VOIP core with optional peripheral gateways is the Next Gen architecture for Telephony. An extension of MRCP could be the answer. It is a great protocol, isn't it? And It already works over RTSP (Nuance, Speechworks, Telisma...) Find bellow some extension of MRCP, SPEECHSC could cover: - speaker verification, - speaker identification, - announcement, voice recording, - tones detection, tones generation, - fax, - audio conferencing, - video conferencing, - chat SPEECHSC could be a Multimedia protocol, not only for Speech but also for Video, Data, FAX ....3G! Best regards. Jean-Philippe LONGERAY R&D Director - Service NODE NetCentrex Jean-philippe.longeray@netcentrex.net + 33 4 72 53 61 33 - + 33 4 72 53 61 30 Mobile: + 33 6 76 48 34 95 http://www.netcentrex.net -----Original Message----- From: Brian Wyld [mailto:brian.wyld@eloquant.com] Sent: mardi 17 décembre 2002 09:47 To: 'Jean Philippe Longeray'; 'Skip Cave'; speechsc@ietf.org Cc: eburger@snowshore.com Subject: RE: [Speechsc] SPEECHSC vs 3GPP Messieurs Some interesting discussion here - to ease the job of the protocol eval doc editor :-) perhaps someone would like to do a protocol analysis section for 3GPP H.248 (maybe just to rule it out?) - Jean-Philippe perhaps? My 2c on the SPEECHSC/whatever - I think there is a first question to resolve in my mind Q1: what is the best model for SPEECHSC: - a layer OVER a media signalling protocol (SIP, RTSP, etc, depending on this lower layer for media and session control, just like MRCP/RTSP currently does) -> in which case what is the encapsulation mechanism - RTSP has ANNOUNCE messages, what does SIP provide for this sort of bundling? -> and what is the "best" protocol to layer over - an extension to an existing media signalling protocol (eg, add MRCP "verbs" as new ones in RTSP, or add as new SIP commands...) - a new protocol incorporating both media signalling, session control and resource control (eg Web services extensions) As for the identification and resolution of resource servers, this is for me a separate functionality to SPEECHSC itself, and there are already multiple mechanisms existing (SLP, UDDI, etc) for service location and discovery. Brian -----Message d'origine----- De : speechsc-admin@ietf.org [mailto:speechsc-admin@ietf.org]De la part de Jean Philippe Longeray Envoyé : Tuesday, December 17, 2002 08:33 À : Skip Cave; speechsc@ietf.org Cc : eburger@snowshore.com Objet : RE: [Speechsc] SPEECHSC vs 3GPP Hi skip, You're right, I didn't say something different. Like MRCP, SPEECHSC is a media command protocol. SIP is not only a streaming protocol, It can be used as a transport protocol, like HTTP, X.224, ... If SIP transports SDP, it becomes a streaming protocol, but what don't you thing it's not possible to transport SPEECHSC messages in SIP Content. In your document, something is missing: You need something to find an Resource Server (ASR, SVI, TTS), and I propose to use SIP softswitching, This softswitch could be inserted between your Application Execution Server and all others voice resource (It could be ASR, TTS, SVI, but also Audio/Video streaming, conferencing, ....) I think that draft-robinson-mrcp-sip-00 is a great example that I want to say. Do you agree Eric? Best regards. Jean-Philippe LONGERAY R&D Director - Service NODE NetCentrex Jean-philippe.longeray@netcentrex.net + 33 4 72 53 61 33 - + 33 4 72 53 61 30 Mobile: + 33 6 76 48 34 95 http://www.netcentrex.net -----Original Message----- From: Skip Cave [mailto:skip.cave@intervoice.com] Sent: lundi 16 décembre 2002 20:47 To: speechsc@ietf.org Cc: jean-philippe.longeray@netcentrex.net; eburger@snowshore.com Subject: RE: [Speechsc] SPEECHSC vs 3GPP Eric, Jean, It's good that we agree. I believe that there has been some confusion in the past that SpeechSC is a media streaming protocol. We need to list the basic issues to make sure that we clear up that misconception: 1) SpeechSC is NOT a media streaming protocol. 2) The SpeechSC protocol is strictly a command/response protocol, carrying commands and returning responses from application servers to speech servers. The SpeechSC protocol will never be a media transport protocol, and will never carry any type of media. 3) Even though the SpeechSC protocol is not a media transport protocol, the SpeechSC protocol can be used to COMMAND speech servers to set up streaming with another server using some type of streaming protocol (like SIP). Which streaming protocol is used will be determined as part of the SpeechSC's work. For example, in my attached architecture diagram, the SpeechSC protocol allows an Application Server commanding an ASR server to set up a SIP session between the ASR server and a telephony platform (see attached figure). Note that there is a SpeechSC command/control session between the Application Server and the ASR Server, but there is no streaming media going betweeen the Application & ASR Servers. There IS a standard SIP session between the Speech server and the Telephony platform, that was set up by commands given in the SpeechSC protocol. This SIP session does NOT carry any commands other than standard SIP setup/teardown commands. An example of the command/response sequence in a SpeechSC Command stream would be: Request from Application server to Directory Services for an ASR server Reply from Directory Services To Application Server giving info on specific ASR Server Command from Application Server to ASR Server to set up a specific command/ response session for a call (one command session per call context) Response from ASR Server to Application Server acknowledging the completion of the session set-up. Command from Application Server to selected ASR server to set up SIP session with a specific Telephony Server. The App Server gives the ASR server the addresses of the Telephony Server so the App Server can set up the SIP session. (ASR Server sets up SIP Session to Telephony Server)) Response from ASR Server indicating successful SIP session setup. Command from Application Server to ASR Server to set up grammars and start recognition on ASR Server. Response from ASR Server to Application Server reporting a grammar match or timeout from ASR Server. etc. Again, this is shown in mu attached diagram. Skip Cave Sr. Principal Engineer Intervoice Inc. >>> "Eric Burger" <eburger@snowshore.com> 12/16/02 08:29AM >>> From a personal perspective, the MRCP over SIP proposal was what pushed me over the edge to fix MRCP. I would be hard pressed to try to convince the IESG that there is a need for MRCP/RTSP, MRCP/SIP, MRCP/foo, ... We need to pick the one that makes the most sense. -----Original Message----- From: Jean Philippe Longeray [mailto:jean-philippe.longeray@netcentrex.net] Sent: Monday, December 16, 2002 3:12 AM To: Skip Cave; Eric Burger Cc: speechsc@ietf.org Subject: RE: [Speechsc] SPEECHSC vs 3GPP Hi Skip, Nice to have some news from Intervoice... I agree. SIP will never provide multimedia control functionnalities, but I'm sure it's a great protocol from transport (and mandatory for 3G). WG has to define a couple of protocols, like MRCP/RTSP. What do you think of SPEECHSC/SIP, which can be very closed from MRCP/SIP. Regards. Jean-Philippe LONGERAY R&D Director - Service NODE NetCentrex Jean-philippe.longeray@netcentrex.net + 33 4 72 53 61 33 - + 33 4 72 53 61 30 Mobile: + 33 6 76 48 34 95 http://www.netcentrex.net -----Original Message----- From: Skip Cave [mailto:skip.cave@intervoice.com] Sent: vendredi 13 décembre 2002 19:38 To: jean-philippe.longeray@netcentrex.net; eburger@snowshore.com Cc: speechsc@ietf.org Subject: RE: [Speechsc] SPEECHSC vs 3GPP Jean, Eric, I don't think SIP will support the separated media/control requirements I posted earlier. You will need a control protocol, and a separate media protocol. I expect that it will take a new protocol to meet these requirements. Skip Cave Sr. Principal Engineer Intervoice Inc. >>> "Jean Philippe Longeray" <jean-philippe.longeray@netcentrex.net> 12/13/02 01:25AM >>> Thanks Eric for this analyze, I agree. H.248 doesn't seems to be the right answer for MRFC/MRFP, even if special packages can be provided. For my opinion, SIP is the correct answer for transport layer (since it's used everywhere in 3GPP and 3GPP2), and SPEECHSC could (should?) be used for resource control. Concerning SPEECHSC, 3.3 "Avoid Duplicating Existing Protocols", I would like to add some remarks: In case of you would like to insert a routing mechanism (a SIP soft-switch) between Media Processing Entity / Application server and Resource server (ASR, SI/SV, TTS, Announcement server), I could be interesting to have a single transport protocol, like SIP, instead of several incompatible protocols (RTSP for example) for the closes functionalities. I think it is easier to add some redundancy, rather than conserving "old" protocols like RTSP. It seems to be very important to make distinctions between each layers of model. Something like UDP/SIP/SPEECHSC or TCP/SIP/SPEECHSC or SCTP/SIP/SPEECHSC, could be an answer. Regards. Jean-Philippe LONGERAY R&D Director - Service NODE NetCentrex Jean-philippe.longeray@netcentrex.net <mailto:Jean-philippe.longeray@netcentrex.net> + 33 4 72 53 61 33 - + 33 4 72 53 61 30 Mobile: + 33 6 76 48 34 95 http://www.netcentrex.net -----Original Message----- From: Eric Burger [mailto:eburger@snowshore.com] Sent: vendredi 13 décembre 2002 03:13 To: Jean Philippe Longeray Cc: IETF SPEECHSC (E-mail) Subject: RE: [Speechsc] SPEECHSC vs 3GPP The Mp interface is (1) not the right concept and (2) is itself (IMHO) not the correct choice for 3GPP's needs, either. With respect to (1), Mp is trying to be an analog to the MGC/MG decomposition for a media server, where the MRFC is a "media server controller" and the MRFP is a "media server [processor]". The types of resources are bearer packet processors (e.g., tone detection, prompt playing, and recording). The protocol is a low-level device control protocol (e.g., allocate a resource, allocate a RTP port, connect the port to the resource, wait for a signal, etc.). speechsc is a higher-level protocol, concerned with things like 'establish session' and 'recognize speech'. In fact, early in the days of MRCP/speechsc, people wanted to extend the speechsc scope to do device control. The answer has consistently been to use H.248 for device control. With respect to (2), AFAIK, no one has ever built a MRFC. I believe this is because unlike a media gateway, where there are definite decomposition benefits, there are really few if any benefits to decomposing the MRF. In fact, there are clear benefits to using the native application interface (SIP), rather than the native gateway interface (H.248) for interfacing the AS and CSCF to the MRF. -----Original Message----- From: Jean Philippe Longeray [mailto:jean-philippe.longeray@netcentrex.net] Sent: Tuesday, December 10, 2002 5:53 AM To: IETF SPEECHSC (E-mail) Subject: [Speechsc] SPEECHSC vs 3GPP Hi, Do you ever compare SPEECHSC and MRFC/MRFP interface in 3GPP (TS 24.229 Rel5) architecture. It looks like SPEECHSC is very closed from Mp interface (H.248). Could SPEECHSC works with 3GPP (www.3gpp.org) , 3GPP2 (www.3gpp2.org) , 3G.IP (www.3gip.org), MWIF (www.mwif.org)? Regards. Jean-Philippe LONGERAY R&D Director - Service NODE NetCentrex Jean-philippe.longeray@netcentrex.net + 33 4 72 53 61 33 - + 33 4 72 53 61 30 Mobile: + 33 6 76 48 34 95 http://www.netcentrex.net _______________________________________________ Speechsc mailing list Speechsc@ietf.org https://www1.ietf.org/mailman/listinfo/speechsc _______________________________________________ Speechsc mailing list Speechsc@ietf.org https://www1.ietf.org/mailman/listinfo/speechsc _______________________________________________ Speechsc mailing list Speechsc@ietf.org https://www1.ietf.org/mailman/listinfo/speechsc
- [Speechsc] SPEECHSC vs 3GPP Jean Philippe Longeray
- RE: [Speechsc] SPEECHSC vs 3GPP Jean Philippe Longeray
- RE: [Speechsc] SPEECHSC vs 3GPP Eric Burger
- RE: [Speechsc] SPEECHSC vs 3GPP Jean Philippe Longeray
- RE: [Speechsc] SPEECHSC vs 3GPP Skip Cave
- RE: [Speechsc] SPEECHSC vs 3GPP Jean Philippe Longeray
- RE: [Speechsc] SPEECHSC vs 3GPP Eric Burger
- RE: [Speechsc] SPEECHSC vs 3GPP Skip Cave
- RE: [Speechsc] SPEECHSC vs 3GPP Jean Philippe Longeray
- RE: [Speechsc] SPEECHSC vs 3GPP Brian Wyld
- RE: [Speechsc] SPEECHSC vs 3GPP Jean Philippe Longeray
- RE: [Speechsc] SPEECHSC vs 3GPP BRANDT,MARC (HP-France,ex2)
- RE: [Speechsc] SPEECHSC vs 3GPP Eric Burger
- RE: [Speechsc] SPEECHSC vs 3GPP Brian Marquette
- RE: [Speechsc] SPEECHSC vs 3GPP Brian Eberman
- RE: [Speechsc] SPEECHSC vs 3GPP Eric Burger
- RE: [Speechsc] SPEECHSC vs 3GPP BRANDT,MARC (HP-France,ex2)
- RE: [Speechsc] SPEECHSC vs 3GPP Brian Wyld