RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt

"Burger, Eric" <> Tue, 28 March 2006 02:17 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1FO3lv-0001WX-Rw; Mon, 27 Mar 2006 21:17:27 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1FO3lu-0001WS-NI for; Mon, 27 Mar 2006 21:17:26 -0500
Received: from ([]) by with esmtp (Exim 4.43) id 1FO3lt-0000sA-Eg for; Mon, 27 Mar 2006 21:17:26 -0500
X-IronPort-AV: i="4.03,136,1141621200"; d="scan'208"; a="30698938:sNHT111379656"
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt
Date: Mon, 27 Mar 2006 21:17:16 -0500
Message-ID: <>
Thread-Topic: [Dmsp] Comments on draft-engelsma-dmsp-01.txt
Thread-Index: AcZLWegFSIl8r7ahSyiEz4UGx6mj5QBL3LLwACYB7mAAAWyrgAAGjoxAARxNJ5AAFrns0A==
From: "Burger, Eric" <>
To: "Engelsma Jonathan-QA2678" <>, "Brian Marquette" <>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 31b28e25e9d13a22020d8b7aedc9832c
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Distributed Multimodal Synchronization Protocol <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

We went through this ad nauseam in lemonade, but we woke up and realized
that NONE of this will work on current handsets, so that objection
should be taken off the table.  You are going to be doing firmware
updates / new handsets to do even the basic binary capability anyway, so
might as well do the right thing rather than hack something for handsets
that will only be used in the lab.

Remember, the average time for a draft to go from -00 to RFC is 3 years.
That is two whole generations of phones...

-----Original Message-----
From: Engelsma Jonathan-QA2678 [] 
Sent: Monday, March 27, 2006 3:46 PM
To: Brian Marquette; Burger, Eric
Subject: RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt


Sorry for not responding sooner, I was out of the office last week
without email access.  Thanks for all the excellent feedback.  

Yes, SIGCOMP or something equivalent would be desirable. For voice
content that returns a fair amount string data (e.g., n-best results)
this would likely yield more compact messages than the current binary
format, even when encoded in an XML format.

However, while this is fairly straightforward to support from the server
side, things aren't quite so easy on the terminal side.  For low to
mid-tier handsets, SIGCOMP would have to be supported as a firmware
service.  While that may very well happen in the future, there are
millions of handsets out there today that do not support it, and many
more that will ship in the future that won't either.  In terms of
alternatives, some of Motorola's handsets provide a proprietary zip
capability to J2ME developers, but in general, as far as I know this
sort of capability is not widely available to middleware and/or
application developers across handsets.  A third option would be to
implement a compression library and include it with the multimodal
application, however this increases the application's footprint which is
not a good thing in terms of over-the-air downloads, not to mention the
impact on performance (at least in the case of Java).

We are open to SIGCOMP, gzip, etc., where possible, but one of our
requirements is to be able to implement the protocol on the terminal
without requiring compression.  We realize there are trade-offs here, as
Eric has pointed out, but we see enormous utility in keeping the
protocol simple and compact enough to be implemented on handsets and
wireless networks that are widely available today.


-----Original Message-----
From: Brian Marquette [] 
Sent: Tuesday, March 21, 2006 6:45 PM
To: Burger, Eric
Subject: RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt

I am VERY interested in SIGCOMP, and I know Jonathan Engelsma at
Motorola is as well. We should definitely strongly consider that as an
option.  I will also do some reading on EMMA. Currently we are using
NLSML, which I think EMMA replaces.


-----Original Message-----
From: Burger, Eric []
Sent: Tuesday, March 21, 2006 1:39 PM
To: Brian Marquette
Subject: RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt

I would offer that XML with SIGCOMP, TLS/gzip, or another
transport-layer compression scheme will be on the same order of a binary
protocol; when you start passing n-best lists I will bet megabucks that
the binary protocol will be LARGER than XML.

On the n-best responses, I would offer that if you are interested in
n-best, you are interested in EMMA.  "Translating" it to DMSP is
guaranteed to miss some critical parameter for somebody's application,
is error prone, uses resources unnecessarily, and does not have any

-----Original Message-----
From: Brian Marquette []
Sent: Tuesday, March 21, 2006 3:22 PM
To: Burger, Eric;
Subject: RE: [Dmsp] Comments on draft-engelsma-dmsp-01.txt


As for some of your points, the main use case here is a thin client
handset that is attempting to run a Voice and Visual application.
VoiceXML browser is in the network and is most likely using MRCP for
interactions with ASR and TTS.  The connection however from the thin
client to the Voice server is over packet data and typically 2 to 2.5G.
So latency will become a huge issue if we try to communicate with XML. 

The result processing summary you wrote is basically correct. There are
a couple of use cases there, one for simple navigation and form filling,
and another for selecting from a n-best result. For example, you might
be using a map application and be looking for "Maple". The result should
allow the user to see the n-best list and visually select which address
he intended. 


-----Original Message-----
From: Burger, Eric []
Sent: Monday, March 20, 2006 6:49 PM
Subject: [Dmsp] Comments on draft-engelsma-dmsp-01.txt

Section 3.

Binary encoding - blech. Will anyone use XML if all of the normative
text describes the binary encoding? Conversely, given how much easier it
is to generate, parse, and debug XML, would it not be better to have the
normative text use XML, and have the mapping of tags to binary values in
the appendix?

Seems very VoiceXML-centric, to the point it may only work with
VoiceXML. Is that OK?

User-Agent field in SIG_INIT: says for advertising capabilities, but it
is just a string identifying the GUA. A better mechanism is to advertise

RESULT: is there any reason not to simply tunnel EMMA or NLSML?
Translating the real result into a DMSP result will be error-prone and
is guaranteed to not supply what the application desires. What is the
use case? It is not a VoiceXML browser in the handset; that is what
MRCPv2 is for. It is inconceivable that it is a network-based VoiceXML
browser using a handset ASR engine; if the handset has the power to run
ASR, it most likely has the power to run a VoiceXML browser.

For that matter, what does the GUA do with recognition results? Is it to
populate fields or to help in low-confidence situations? If the former,
then it isn't worth having confidence scores - there should not be more
than one value. If the latter, what does the interaction look like? I am
asking, because presumably the VoiceXML interpreter will go into its "I
did not get that" portion of the form. I am assuming that the goal is to
allow the user to visually pick from a list of results. I was thinking
that it might be more compact to have the GUA send the VUA the correct
pick by reference, but that is too much state to carry around (which
pick of which result are we referring to ). Thus the current model where
the GUA pushes down the result string is a good way to go.

SIG_VXML_START: which is not really going to be used, SIG_INIT or

Can Dispatch: Which is more likely, a series of "can you do this?" or
"what can you do?" If the latter, then it would be better to have a
single OPTIONS message. If the former, then the mechanism as described
is OK.

Get/Set Cookies security and privacy considerations

Strings: most of the strings are or will need to be Unicode. For
example, arbitrary form text data can easily be non-Western. Likewise,
expect International URI's to end up as Unicode or UTF-16. If every byte
counts, then I would offer selecting the charset in SIG_INIT or
SIG_VXML_START, with a default to UTF-8.

DOM keydown, keyup, keypress events: I don't have the DOM reference
handy. Do these refer to actual keyboard presses or ink strokes? If so,
who would use a key-by-key protocol for a distributed, web-oriented
stimulus protocol?

General: Much easier to build parsers that have all of the fixed-length
data items up front. Take Table 36, for example. Having the Error Code
follow Correlation means I can immediately figure out the status without
having to parse the Node and Location fields. I might not care,
depending on the error. If I do care, there is no harm in having the
Error Code up front.

Need to explain how a loop could occur (Section 4.4)

Dmsp mailing list

Dmsp mailing list

Dmsp mailing list

Dmsp mailing list