[mmox] Compact Binary Serialization

Catherine Pfeffer <cathypfeffer@gmail.com> Tue, 24 February 2009 17:01 UTC

Return-Path: <cathypfeffer@gmail.com>
X-Original-To: mmox@core3.amsl.com
Delivered-To: mmox@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5391C3A6977 for <mmox@core3.amsl.com>; Tue, 24 Feb 2009 09:01:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.267
X-Spam-Level:
X-Spam-Status: No, score=-1.267 tagged_above=-999 required=5 tests=[AWL=-0.835, BAYES_00=-2.599, FF_IHOPE_YOU_SINK=2.166, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Wxf0YrOFJzeW for <mmox@core3.amsl.com>; Tue, 24 Feb 2009 09:01:16 -0800 (PST)
Received: from mail-ew0-f164.google.com (mail-ew0-f164.google.com [209.85.219.164]) by core3.amsl.com (Postfix) with ESMTP id 8DC2E3A6B3E for <mmox@ietf.org>; Tue, 24 Feb 2009 09:01:11 -0800 (PST)
Received: by ewy8 with SMTP id 8so340710ewy.13 for <mmox@ietf.org>; Tue, 24 Feb 2009 09:01:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=YPyXB5A16naEU5luBDyhaKsc7wJz/EVkU5kYmfEkUxY=; b=pZOYg7V2Xn6+h3g+TDLbR8wuegOfBhKDbPUR0/N2pDOsu/d3nxjIvyvNBNNo5T7wus kaBlTZgVnTrOyoGdXJM+rys0lNczTU2g3lHc1uEuo8KSsiNtzE2l4F+lVH0GUvHGYD+2 ddTlXeJzzeQ3T1AcgyS1g1sB53ECAEvM3vojY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=T8MtTUuCRpPfVfbX84NWNjjjKUjc8zNko1XNKjFAR2uEsRUcXwUhTmoNbJpT2ySa6P MpKq+UivBDPQCttKIXSlRB8Zeu5ClckhAhJQh7Dp0YCHAhyt96VD5SLc36PsxjP070Np xSEzKRrcpFwhyEPVGXtZvGFr5NpH4Z5NLdlE8=
MIME-Version: 1.0
Received: by 10.210.125.13 with SMTP id x13mr4965ebc.18.1235494889810; Tue, 24 Feb 2009 09:01:29 -0800 (PST)
Date: Tue, 24 Feb 2009 18:01:29 +0100
Message-ID: <ebe4d1860902240901oaff2260o65c4337598a14fc5@mail.gmail.com>
From: Catherine Pfeffer <cathypfeffer@gmail.com>
To: mmox@ietf.org
Content-Type: multipart/alternative; boundary="0015174c3f5c076ac90463ad12ed"
Subject: [mmox] Compact Binary Serialization
X-BeenThere: mmox@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Massively Multi-participant Online Games and Applications <mmox.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/mmox>, <mailto:mmox-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmox>
List-Post: <mailto:mmox@ietf.org>
List-Help: <mailto:mmox-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmox>, <mailto:mmox-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Feb 2009 17:01:17 -0000

> There needs to be support for a binary serialization that does not
> contain embedded type or key references, but instead use explicit
> external schema. Thus, to describe a "quaternion" you would simply
> specify four fp16 (or fp32) values in sequence, with no specific type
> information. This can extend to describing general entity property
> values: the type information can be carried by the schema for the
> entity, and does not need to be encoded in the actual data. Allowing
> external schema leads to significant bandwidth savings during
transmission.

Wow, this one is violent!

OK, let's examine it in detail:

1) Example of the quaternion.

As I understand it, a quaternion is currently implemented as:
   <array>
        <float>0.5</float>  <!-- x -->
        <float>-0.5</float>  <!-- y -->
        <float>0.5</float>  <!-- z -->
        <float>0.5</float>  <!-- s -->
   </array>

(in the order x, y, z, s and not math's order s, x, y, z, that's one of
Second Life's strangenesses)

bandwidth could be spared with something like:

   <array>
        <float repeat="4">0.5,-0.5,0.5,0.5</float>  <!-- x, y, z, s -->
   </array>

(If you ask me why this is enclosed within <array> tags while we could
remove these tags without losing any information, I think the answer is:
"because it leads to a cleaner C++ code to access the data in that way".
Correct me if I'm wrong here, Infinity)

Such a functioning could spare bandwith (57 bytes versus 87 bytes for this
quaternion), while still remaining compatible with the other serializations
(but not in a 1:1 straightforward compatible way as they are now).

There could even be more bandwidth spared with constructs like:
   <array>
        <floats>0.5,-0.5,0.5,0.5</float>  <!-- x, y, z, s -->
   </array>

because you don't really need to specify in the data that there are four
floats, it's obvious from the number of commas. That would be 47 bytes
versus 87, including the useless <array> tags. Nearly the half.


2) Moving the type information to the schema.

Yes, that's the "ordinary way" of encoding data in XML.

Ordinary way:
     XML schema or DTD says: we have the avatar's nose orientation
quaternion here. That's four fp32.
     XML data over the wire:
<nose-orientation>0.5,-0.5,0.5,0.5</nose-orientation>

LLSD (simplifying):
     LLIDL says: nose orientation is an array of four fp32s
     LLSD data over the wire: <floats>0.5,-0.5,0.5,0.5</floats>

(Example of nose orientation might be idiotic, I had no time to choose it
better.)
(I have used <floats> and not 4 <float>s because that was boring to type)

**** Jon, you are touching something at the very core of LLSD here. ****

Where ordinary XML markup would say "this is nose orientation", XML
serialization of LLSD says "this is four floats".

While such choices seem weird in an XML file, they make a lot of sense for
binary serialization where the data type of the following data can be
specified in a single byte: "the next sequence of four bytes is going to be
a floating point number". It would be impossible to declare "now there is
going to be the nose's orientation. For the number of bytes involved, look
at the schema" in a binary file using a simple byte.

That does not mean it is impossible to put semantics in the binary file sent
over the wire, but you have to do it with maps, i.e. with the binary
serialization equivalent of:
  <map>
       <key>nose-orientation</key>
       <array>
          <floats>0.5,-0.5,0.5,0.5</floats>
       </array>
   </map>

Have I summed up correctly the whole reasoning behind that, Infinity?

Compare the bandwidth difference with:
      <nose-orientation>0.5,-0.5,0.5,0.5</nose-orientation>
No need to comment on that.

I see this as another example of "Yeah, that's strange XML, but it maps 1:1
to a nice and natural binary serialization" :-(.

Now back to your question: "should we move the type information to the
schema, as all people do in XML?". The answer is probably: "no, because we
want to keep a simple 1:1 mapping with the binary serialization".

Now, could we do partial exceptions for common types like vectors or
quaternions? Perharps. I already suggested:
         <integers>127,255,127</integers>
         <floats>100.0,75.3,28.6</floats>
         <floats>0.5,-0.5,0.5,0.5</floats>

but it could also take other forms with more semantics like
         <colour>127,255,127</colour>
         <vector>100.0,75.3,28.6</vector>
         <quat>0.5,-0.5,0.5,0.5</quat>

although I see the second solution as harder to fit in the existing LLIDL,
which works at a lower level of semantics.

(I intentionally did not store the colour in a vector, because that's
another SL-ism).


I hope that helps.

-- 
Cathy