RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt

"Burger, Eric" <> Tue, 18 April 2006 21:47 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1FVy2e-0000q7-Bh; Tue, 18 Apr 2006 17:47:24 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1FVy2c-0000q2-Sa for; Tue, 18 Apr 2006 17:47:22 -0400
Received: from ([]) by with esmtp (Exim 4.43) id 1FVy2c-00007j-H6 for; Tue, 18 Apr 2006 17:47:22 -0400
X-IronPort-AV: i="4.04,131,1144036800"; d="scan'208"; a="31442557:sNHT33818444"
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt
Date: Tue, 18 Apr 2006 17:47:20 -0400
Message-ID: <>
Thread-Topic: [Dmsp] Comments on draft-engelsma-dmsp-01.txt
Thread-Index: AcZNJc71JX1OMyZVTkyiN39gU8LCkwWCut/g
From: "Burger, Eric" <>
To: "Chris Cross" <>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: c83ccb5cc10e751496398f1233ca9c3a
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Distributed Multimodal Synchronization Protocol <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>


> ________________________________________
> From: Chris Cross [] 
> Sent: Tuesday, March 21, 2006 3:25 PM
> To:
> Subject: Re: [Dmsp] Comments on draft-engelsma-dmsp-01.txt
> Eric,
> Thanks for your comments. It takes a bit of work to wade through a
spec this size and I appreciate the effort. 
> "Burger, Eric" <> wrote on 03/20/2006 08:48:35 PM:
> > User-Agent field in SIG_INIT: says for advertising capabilities, but
> > is just a string identifying the GUA. A better mechanism is to
> > capabilities.
> Open to suggestion here. The intent is to provide an efficient
one-turn init event.

I would offer choosing to either have a bunch of protocol data elements
conveying what capabilities need to be conveyed, a parameterized string,
or a structured string.  I would offer that creating PDE's makes
extensions problematic and structured strings are doomed to
interoperability failure.  That leaves a string with (extendable)

> > Translating the real result into a DMSP result will be error-prone
> > is guaranteed to not supply what the application desires. What is
> > use case? It is not a VoiceXML browser in the handset; that is what
> > MRCPv2 is for. It is inconceivable that it is a network-based
> > browser using a handset ASR engine; if the handset has the power to
> > ASR, it most likely has the power to run a VoiceXML browser.
> > 
> > For that matter, what does the GUA do with recognition results? Is
it to
> > populate fields or to help in low-confidence situations? If the
> > then it isn't worth having confidence scores - there should not be
> > than one value. If the latter, what does the interaction look like?
I am
> > asking, because presumably the VoiceXML interpreter will go into its
> > did not get that" portion of the form. I am assuming that the goal
is to
> > allow the user to visually pick from a list of results. I was
> > that it might be more compact to have the GUA send the VUA the
> > pick by reference, but that is too much state to carry around (which
> > pick of which result are we referring to ). Thus the current model
> > the GUA pushes down the result string is a good way to go.
> Don't assume that the application author will only want to
> handle n-best results in the voice modality. He may prompt
> the user with "what did you say?" and pop up a list to choose
> from. The same argument goes for the interpretation and/or
> recognition results. There's all kinds of creative things
> that the GUA can do with that information.
> MRCP by definition does not support dialog level application
> programming. So your assertion that there won't be VoiceXML
> in a handset is incorrect. DMSP is designed to support a
> couple of broad use cases: Interaction Manager and
> peer-to-peer configurations. The latter includes an X+V
> multimodal browser where the VoiceXML is rendered by a
> remote VoiceXML server. Turn your assertion around: are
> there devices that could support a VoiceXML interpreter but
> not ASR/TTS?

That was my point exactly.  We violently agree here.

> > Strings: most of the strings are or will need to be Unicode. For
> > example, arbitrary form text data can easily be non-Western.
> > expect International URI's to end up as Unicode or UTF-16. If every
> > counts, then I would offer selecting the charset in SIG_INIT or
> > SIG_VXML_START, with a default to UTF-8.
> Every byte counts so utf-8 is probably the default. Maybe
> string encoding is part of the initial session negotiation?

Sounds like a plan.

> > DOM keydown, keyup, keypress events: I don't have the DOM reference
> > handy. Do these refer to actual keyboard presses or ink strokes? If
> > who would use a key-by-key protocol for a distributed, web-oriented
> > stimulus protocol?
> Others in the multimodal community, such as some OMA members,
> have pressed for this level of granularity (no pun intended.)
> I don't think key-by-key protocol is practical on a real
> network and it is generally not necessary in dialog level
> interaction.

Ouch!  Hehehe.  Agreed :)


Dmsp mailing list