Re: [mmox] The Structure and Interpretation of LLSD Messages [Was: Re: unefficient binary serialization ?]

"Meadhbh Hamrick (Infinity)" <infinity@lindenlab.com> Wed, 25 February 2009 07:06 UTC

Return-Path: <infinity@lindenlab.com>
X-Original-To: mmox@core3.amsl.com
Delivered-To: mmox@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 22F323A6B55 for <mmox@core3.amsl.com>; Tue, 24 Feb 2009 23:06:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.49
X-Spam-Level:
X-Spam-Status: No, score=-3.49 tagged_above=-999 required=5 tests=[AWL=0.108, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JnR-tGEFGy+q for <mmox@core3.amsl.com>; Tue, 24 Feb 2009 23:06:20 -0800 (PST)
Received: from tammy.lindenlab.com (tammy.lindenlab.com [64.154.223.128]) by core3.amsl.com (Postfix) with ESMTP id 9AE2D3A6B51 for <mmox@ietf.org>; Tue, 24 Feb 2009 23:06:20 -0800 (PST)
Received: from infinity.vpn.lindenlab.com (infinity.vpn.lindenlab.com [10.0.254.125]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by tammy.lindenlab.com (Postfix) with ESMTP id A89AF3DBC44B; Tue, 24 Feb 2009 23:06:39 -0800 (PST)
Message-Id: <497C7FD9-BF36-458F-98C0-BEECAE611E1A@lindenlab.com>
From: "Meadhbh Hamrick (Infinity)" <infinity@lindenlab.com>
To: Morgaine <morgaine.dinova@googlemail.com>
In-Reply-To: <e0b04bba0902242245y7ce6cc81i5d67d02b73f65440@mail.gmail.com>
Content-Type: multipart/alternative; boundary="Apple-Mail-18--459639796"
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Tue, 24 Feb 2009 23:06:37 -0800
References: <ebe4d1860902241754s7942179ajd4a29dde4e1d1bdb@mail.gmail.com> <49A4A8FA.607@gmail.com> <024355C7-0753-451F-9542-4478D27A2802@lindenlab.com> <e0b04bba0902242245y7ce6cc81i5d67d02b73f65440@mail.gmail.com>
X-Mailer: Apple Mail (2.930.3)
Cc: mmox@ietf.org
Subject: Re: [mmox] The Structure and Interpretation of LLSD Messages [Was: Re: unefficient binary serialization ?]
X-BeenThere: mmox@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Massively Multi-participant Online Games and Applications <mmox.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/mmox>, <mailto:mmox-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmox>
List-Post: <mailto:mmox@ietf.org>
List-Help: <mailto:mmox-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmox>, <mailto:mmox-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Feb 2009 07:06:23 -0000

i looked at Google PB when i was writing mobile phone software and  
went back to using DBUS.

but honestly... that was a radically different problem space.

GooglePB's kinda sorta fall down in the way i believe that  
representing PDUs with message specific DTDs / Schemas fall down. you  
have to define a message specific .proto file and run it through the  
protocol buffer compiler that makes the stub. i'm a bit of a CORBA  
fan, so this doesn't scare me. the part that scares me is that you  
have to use integers for unique names. this is important thusly...

imagine you have a .proto file like:
message Person {
   required string name = 1;
   required int32 id = 2;
   optional string email = 3;

   enum PhoneType {
     MOBILE = 0;
     HOME = 1;
     WORK = 2;
   }

   message PhoneNumber {
     required string number = 1;
     optional PhoneType type = 2 [default = HOME];
   }

   repeated PhoneNumber phone = 4;
}
and the next rev of the protocol is:
message Person {
   required string name = 1;
   required int32 id = 2;
   optional string email = 3;

   enum PhoneType {
     MOBILE = 0;
     HOME = 1;
     WORK = 2;
   }

   enum PhoneUsage {
     VOICE = 0;
     FAX = 1;
     MODEM = 2;
   }
   message PhoneNumber {
     required string number = 1;
     optional PhoneType type = 2 [default = HOME];
     optional PhoneUsage type = 3 [default = VOICE];
   }
   repeated PhoneNumber phone = 4;
}

the problem is that if a endpoint expecting the former consumes the  
latter, the serialization engine barfs. and you (in the application  
layer) don't receive the message in order to try to pick it apart  
manually (which kind of defeats the purpose of the serialization.)

however, it _is_ an open source project, so we could probably create a  
version of the library that adds a "fail silently" mode.

there are also semantics for extending messages, so maybe.. i'd have  
to look at it a little more before passing judgement.


On Feb 24, 2009, at 10:45 PM, Morgaine wrote:

> Excellent background on LLSD, Infinity!  It raises a question:
>
> On Wed, Feb 25, 2009 at 5:41 AM, Meadhbh Hamrick (Infinity) <infinity@lindenlab.com 
> > wrote:
>
> while we have a metric tonne of code that uses the existing LLSD  
> DTD, ultimately, any additional serialization format that preserves  
> these processing expectations could be wedged into the system.  
> serialization formats that did not support these processing  
> expectations would cause considerable heart burn.
>
>
> One serialization format that many have mentioned here is Google's  
> Protocol Buffers.  While I have barely started to examine PB myself,  
> it is very clear that this serialization is extraordinarily  
> efficient on-the-wire and also at both ends of the wire, as well as  
> catering for the continual change that is inherent in large,  
> evolving systems.  A interest in extending the 3-way serialization  
> of XXSD to Protocol Buffers follows naturally.
>
> The question that is raised by the quoted paragraph is whether  
> Protocol Buffers "preserves these processing expectations".  If it  
> does then extending the set of serializations to PB is work that  
> could be done by those interested in it, as an open project, and  
> picked up by other users of XXSD if they wish.  Potentially, the  
> acceleration could be quite significant, and hence the benefit to  
> Limited Capability Clients very important.
>
> On the other hand, if it does not "preserve these processing  
> expectations" then there could be a problem.
>
> Do Lindens (or anyone else here) have any insight on this?
>
> Morgaine.
>
>
>
>
>
>
>
> On Wed, Feb 25, 2009 at 5:41 AM, Meadhbh Hamrick (Infinity) <infinity@lindenlab.com 
> > wrote:
> but here's one reason we like LLSD...
>
> imagine you have two systems that are interoperating... one is owned  
> my me, the other is owned by catherine.
>
> cathy's customers are all banks and insurance companies and are very  
> conservative, so cathy want's to err on the side of caution in terms  
> of when she deploys new features.
>
> i on the other hand throw caution to the wind because my customers  
> are all first person shooter fanatics between the ages of 13 and 21  
> who demand the latest features as soon as possible (otherwise they  
> use david's server)
>
> it's not hard to see we're eventually going to introduce version  
> slew which will make it difficult for cathy and i to participate in  
> the unregulated business of marketing virtual life insurance  
> policies to my customers. in this example, i'm trying to point out  
> that the both the option to retain version parity with cathy and the  
> option to upgrade to the latest version are not without drawbacks.
>
> now.. imagine we have a PDU with three members while the second  
> version adds a member. for simplicity's sake, let's say they're all  
> integers.
>
> if used the DTD to define the validation for the message, the first  
> version of the message might look like:
>
> <?xml version="1.0"?>
> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML  
> PDU 1.0//EN" "http://example.com/pdu/1.0/pdutastic.dtd">
> <pdutastic>
>  <foo type="integer">13</foo>
>  <bar type="integer">42</bar>
>  <baz type="integer">93</baz>
> </pdutastic>
>
> while the new version would look like this:
>
> <?xml version="1.0"?>
> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML  
> PDU 1.1//EN" "http://example.com/pdu/1.1/pdutastic.dtd">
> <pdutastic>
>  <foo type="integer">13</foo>
>  <bar type="integer">42</bar>
>  <baz type="integer">93</baz>
>  <qux type="integer">161</qux>
> </pdutastic>
>
> so when i (expecting the new version) consume the old version, i do  
> not see qux. but my parser doesn't complain because instead of the  
> DTD of the new PDU, i see a reference to the old PDU and say to  
> myself... "it's okay... it's just an older version," and i do  
> whatever i have to do to continue processing in the absence of the  
> qux member i was expecting. even if i'm doing validation, it's okay,  
> i'm pulling the DTD from the old definition, so we're cool.
>
> now... when cathy consumes the new PDU, something similar happens.  
> she see's a DTD she's not familiar with, but it's all good.. she  
> grabs teh DTD from URL specified in teh DOCTYPE and moves on with  
> her life, hoping that new version PDU didn't change the semantics of  
> the three values (foo, bar and baz) she's familiar with.
>
> and then one day she receives this from david:
>
> <?xml version="1.0"?>
> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML  
> PDU 1.2//EN" "http://example.com/pdu/1.2/pdutastic.dtd">
> <pdutastic>
>  <foo type="integer">13</foo>
>  <bar type="date">2008-12-15T13:17:23.235Z</bar>
>  <baz type="integer">93</baz>
>  <qux type="integer">161</qux>
> </pdutastic>
>
> aha! david, trying to steal my core demographic away from me has  
> upgraded to the most recent beta version of the software (now with  
> 13% more explosions)
>
> but this is now a serious problem for cathy. the "shape" of the  
> structured data now does not match. what does she do? how does she  
> convert that date into an integer?
>
> and this is the question we keep running into as we deploy "loosely  
> coupled" systems. how do we fail gracefully and ensure robustness in  
> the presence of PDU schema mismatches?
>
> the way linden answered this question was to develop LLSD, an  
> abstract type system with conversions and defaults.
>
> the way we would probably have defined the message above would have  
> been something like:
>
> <?xml version="1.0"?>
> <llsd>
>  <map>
>    <key>foo</key>
>    <integer>13</integer>
>    <key>bar</key>
>    <date>2008-12-15T13:17:23.235Z</date>
>    <key>baz</key>
>    <integer>93</integer>
>    <key>qux</key>
>    <integer>161</integer>
>  </map>
> </llsd>
>
> we don't include the doctype id 'cause our DTD never changes. in  
> this example the LLSD message is shorter 'cause we don't include the  
> DOCTYPE id, but even after removing the DOCTYPE id and changing the  
> outer element to llsd, the compressed version of the LLSD message is  
> only seven bytes larger. (an increase from 154 bytes to 161 bytes,  
> an efficiency loss of 4.5%) however, i can guarantee you that it  
> took a lot less time to code the parser because.. well.. we already  
> had  a parser that knew how to unwrap LLSD messages.
>
> but we haven't got to the interesting part yet.
>
> there is a processing expectation with LLSD. we assume we're de- 
> serializing a message into a dynamic language like ECMAScript or  
> Python. For languages like C++ and Java we create a class that  
> emulates this feature. so.. all LLSD PDUs are de-serialized into  
> these LLSD objects, and inside our application(s), it's these  
> objects that get passed around. so we might have code that looks like:
>
> public foo ( string message ) {
>        LLSD llsd;
>
>        llsd = LLSD.From( message, LLSD.XML_SERIALIZATION );
>
>        System.err.println( "hey! foo was %d.\n", (int)  
> llsd.at( "foo" ).asInteger() );
>
>        if( ( (int) llsd.at("bar").asInteger() > 100 ) || ( (int)  
> llsd.at("bar").asInteger() < 0 ) ) {
>                System.err.println( "bar is not a percentage\n" );
>                System.haltAndCatchFire();
>        }
>
>
>        if( (int) llsd.at("bar").asInteger() < 40 ) {
>                System.err.println( "sorry, this percent margin is  
> way too small for me to be interested.\n" );
>                System.haltAndCatchFire();
>        }
>                .
>                .
>                .
> }
>
> forgive the javaisms, the alternative was smalltalk and well...  
> let's not go there.
>
> for the sake of argument let's say we had already determined it was  
> in XML format by looking at the Content-Type: header from the request.
>
> so what we're doing here is not out of the ordinary... we create an  
> instance of the LLSD class from the serialization.
>
> and we print some logging messages.
>
> and we then try to see if bar is between 0 an 100.
>
> but the object representing "foo" is a date, not an integer. what  
> happens?
>
> in our current implementation of LLSD, we attempt to convert, but  
> notice that the conversion from date to integer has no semantic  
> meaning.  because the conversion is "illegal," the default value is  
> returned.
>
> the result of running this fragment would be something like
>
>        hey! foo was 0.
>        sorry, this percent margin is way too small for me to be  
> interested.
>
> the important feature here is that we did not allow the  
> "presentation layer" to create an error. we did the best we could to  
> interpret the sense of the message, even to the point of making a  
> nonsensical conversion. we allow this to happen because we assume  
> that at the application layer, we're going to do parameter checking.  
> so the reasoning was... if we're going to be doing parameter  
> checking anyway, why would we want to allow the presentation layer  
> to fail a message that would have generated a result that could be  
> checked at the application layer?
>
> i hope you've enjoyed this installment of "the structure and  
> interpretation of LLSD messages." and i hope it helps explain our  
> motivations for designing LLSD and it's serializations the way we  
> did... and while we have a metric tonne of code that uses the  
> existing LLSD DTD, ultimately, any additional serialization format  
> that preserves these processing expectations could be wedged into  
> the system. serialization formats that did not support these  
> processing expectations would cause considerable heart burn.
>
> -cheers
> -meadhbh
>
> On Feb 24, 2009, at 6:12 PM, Jon Watte wrote:
>
> Catherine Pfeffer wrote:
> That's precisely all the HUGE strength of LLSD. Having information  
> about what the data is intermingled with the data itself makes it  
> very flexible and future-proof.
> But that is simply not necessary. If you transmit a reference to the  
> schema you use when you initiate connection, then all future  
> transmissions during that connection session can use the knowledge  
> of that schema to highly optimize traffic. Network traffic is  
> actually one of the scarce resources when you start scaling up  
> really large virtual worlds. There are cases where an entire office  
> (20 people or more) share a single T1 connection to the Internet,  
> and expect to have a good virtual world experience for multiple  
> users at the same time.
>
>
> Honestly, here, the bandwidth is not lost, compared to the advantages.
>
>
> I think it is, because the size difference is quite significant.
>
> In summary, their binary serialization is as flexible as XML data.  
> Perhaps I'm wrong, but to me it's an unique case in all computer  
> science of a binary format that would be so flexible.
>
> Don't overextend yourself :-) AIFF/RIFF has pretty much the same  
> flexibility, as does TLV (also used in OSCAR), a number of "binary  
> XML" proposals, and a number of protocols that have all done more or  
> less the same thing ever since computers were interconnected on a  
> larger scale in the '80s.
>
> Sincerely,
>
> jw
>
> _______________________________________________
> mmox mailing list
> mmox@ietf.org
> https://www.ietf.org/mailman/listinfo/mmox
>
> _______________________________________________
> mmox mailing list
> mmox@ietf.org
> https://www.ietf.org/mailman/listinfo/mmox
>