RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt

"Burger, Eric" <> Tue, 21 March 2006 20:38 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1FLncj-0007qE-GP; Tue, 21 Mar 2006 15:38:37 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1FLnci-0007q8-MD for; Tue, 21 Mar 2006 15:38:36 -0500
Received: from ([]) by with esmtp (Exim 4.43) id 1FLnci-0007sn-8Q for; Tue, 21 Mar 2006 15:38:36 -0500
X-IronPort-AV: i="4.03,116,1141621200"; d="scan'208"; a="30366711:sNHT58262328"
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt
Date: Tue, 21 Mar 2006 15:38:35 -0500
Message-ID: <>
Thread-Topic: [Dmsp] Comments on draft-engelsma-dmsp-01.txt
Thread-Index: AcZLWegFSIl8r7ahSyiEz4UGx6mj5QBL3LLwACYB7mAAAWyrgA==
From: "Burger, Eric" <>
To: "Brian Marquette" <>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 7fa173a723009a6ca8ce575a65a5d813
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Distributed Multimodal Synchronization Protocol <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

I would offer that XML with SIGCOMP, TLS/gzip, or another
transport-layer compression scheme will be on the same order of a binary
protocol; when you start passing n-best lists I will bet megabucks that
the binary protocol will be LARGER than XML.

On the n-best responses, I would offer that if you are interested in
n-best, you are interested in EMMA.  "Translating" it to DMSP is
guaranteed to miss some critical parameter for somebody's application,
is error prone, uses resources unnecessarily, and does not have any

-----Original Message-----
From: Brian Marquette [] 
Sent: Tuesday, March 21, 2006 3:22 PM
To: Burger, Eric;
Subject: RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt


As for some of your points, the main use case here is a thin client
handset that is attempting to run a Voice and Visual application.
VoiceXML browser is in the network and is most likely using MRCP for
interactions with ASR and TTS.  The connection however from the thin
client to the Voice server is over packet data and typically 2 to 2.5G.
So latency will become a huge issue if we try to communicate with XML. 

The result processing summary you wrote is basically correct. There are
a couple of use cases there, one for simple navigation and form filling,
and another for selecting from a n-best result. For example, you might
be using a map application and be looking for "Maple". The result should
allow the user to see the n-best list and visually select which address
he intended. 


-----Original Message-----
From: Burger, Eric [] 
Sent: Monday, March 20, 2006 6:49 PM
Subject: [Dmsp] Comments on draft-engelsma-dmsp-01.txt

Section 3.

Binary encoding - blech. Will anyone use XML if all of the normative
text describes the binary encoding? Conversely, given how much easier it
is to generate, parse, and debug XML, would it not be better to have the
normative text use XML, and have the mapping of tags to binary values in
the appendix?

Seems very VoiceXML-centric, to the point it may only work with
VoiceXML. Is that OK?

User-Agent field in SIG_INIT: says for advertising capabilities, but it
is just a string identifying the GUA. A better mechanism is to advertise

RESULT: is there any reason not to simply tunnel EMMA or NLSML?
Translating the real result into a DMSP result will be error-prone and
is guaranteed to not supply what the application desires. What is the
use case? It is not a VoiceXML browser in the handset; that is what
MRCPv2 is for. It is inconceivable that it is a network-based VoiceXML
browser using a handset ASR engine; if the handset has the power to run
ASR, it most likely has the power to run a VoiceXML browser.

For that matter, what does the GUA do with recognition results? Is it to
populate fields or to help in low-confidence situations? If the former,
then it isn't worth having confidence scores - there should not be more
than one value. If the latter, what does the interaction look like? I am
asking, because presumably the VoiceXML interpreter will go into its "I
did not get that" portion of the form. I am assuming that the goal is to
allow the user to visually pick from a list of results. I was thinking
that it might be more compact to have the GUA send the VUA the correct
pick by reference, but that is too much state to carry around (which
pick of which result are we referring to ). Thus the current model where
the GUA pushes down the result string is a good way to go.

SIG_VXML_START: which is not really going to be used, SIG_INIT or

Can Dispatch: Which is more likely, a series of "can you do this?" or
"what can you do?" If the latter, then it would be better to have a
single OPTIONS message. If the former, then the mechanism as described
is OK.

Get/Set Cookies security and privacy considerations

Strings: most of the strings are or will need to be Unicode. For
example, arbitrary form text data can easily be non-Western. Likewise,
expect International URI's to end up as Unicode or UTF-16. If every byte
counts, then I would offer selecting the charset in SIG_INIT or
SIG_VXML_START, with a default to UTF-8.

DOM keydown, keyup, keypress events: I don't have the DOM reference
handy. Do these refer to actual keyboard presses or ink strokes? If so,
who would use a key-by-key protocol for a distributed, web-oriented
stimulus protocol?

General: Much easier to build parsers that have all of the fixed-length
data items up front. Take Table 36, for example. Having the Error Code
follow Correlation means I can immediately figure out the status without
having to parse the Node and Location fields. I might not care,
depending on the error. If I do care, there is no harm in having the
Error Code up front.

Need to explain how a loop could occur (Section 4.4)

Dmsp mailing list

Dmsp mailing list