[Speechsc] Continuous speech recognition in MRCP

Tomáš Valenta <tomas.valenta@speechtech.cz> Wed, 09 March 2011 15:44 UTC

Return-Path: <tomas.valenta@speechtech.cz>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 68E233A689F for <speechsc@core3.amsl.com>; Wed, 9 Mar 2011 07:44:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.395
X-Spam-Level:
X-Spam-Status: No, score=-1.395 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HOST_EQ_CZ=0.904, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Phzi3F3xcN2x for <speechsc@core3.amsl.com>; Wed, 9 Mar 2011 07:44:27 -0800 (PST)
Received: from fred.zcu.cz (fred.zcu.cz [IPv6:2001:718:1801:1057::1:19]) by core3.amsl.com (Postfix) with ESMTP id 36D963A6A2A for <speechsc@ietf.org>; Wed, 9 Mar 2011 07:44:22 -0800 (PST)
Received: from [192.168.2.201] (uk511r01-kky.fav.zcu.cz [147.228.47.142]) by fred.zcu.cz (Postfix) with ESMTPS id DAA3FA075CA6 for <speechsc@ietf.org>; Wed, 9 Mar 2011 16:45:31 +0100 (CET)
Message-ID: <4D77A084.7050506@speechtech.cz>
Date: Wed, 09 Mar 2011 16:45:08 +0100
From: Tomáš Valenta <tomas.valenta@speechtech.cz>
Organization: SpeechTech, s.r.o.
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; cs; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: speechsc@ietf.org
Content-Type: text/plain; charset="ISO-8859-2"; format="flowed"
Content-Transfer-Encoding: 7bit
X-ZCU-MailScanner-ID: DAA3FA075CA6.A0A96
X-ZCU-MailScanner: Found to be clean
X-ZCU-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-2.9, required 5, autolearn=not spam, ALL_TRUSTED -1.00, BAYES_00 -1.90)
X-ZCU-MailScanner-From: tomas.valenta@speechtech.cz
Subject: [Speechsc] Continuous speech recognition in MRCP
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Mar 2011 15:50:01 -0000

Dear SpeechSC list members,

in our company we are implementing TTS and ASR solutions using MRCP. For 
ASR we would like to use the protocol not only for recognition of short 
utterances based on simple grammar; scheme

C->S: RECOGNIZE
S->C: IN-PROGRESS
S->C: START-OF-INPUT
S->C: RECOGNITION-COMPLETE (result)

but also for continuous speech recognition (e.g. minutes or tens of 
minutes, dictations, ...) with immediate results. Unfortunately there is 
no such approach in MRCPv2 specification draft. We thought about using 
following scheme:

C->S: RECOGNIZE (continuous)
S->C: IN-PROGRESS
S->C: START-OF-INPUT
S->C: IN-PROGRESS (partial_result_1)
...
S->C: IN-PROGRESS (partial_result_n)
C->S: STOP

Imagine an application for writing dictation so that user can see 
immediately what he said. The recognizer could be located on a remote 
machine.

The (continuous) parameter to the RECOGNIZE request could be a type of 
grammar (built-in language model specification, in fact) or a header 
value. Do you find this approach a clean solution? Or do not you think 
continuous speech recognition should be part of MRCPv2 specification?

Kindest regards and thanks for comments,
Tomas Valenta

PS. Originally we started discussing this topic with Arsen Chaloyan 
(author of UniMRCP) here:
https://groups.google.com/d/topic/unimrcp/pSaDbhHPh3M/discussion