Re: [mmox] The Structure and Interpretation of LLSD Messages [Was: Re: unefficient binary serialization ?]

"Meadhbh Hamrick (Infinity)" <infinity@lindenlab.com> Wed, 25 February 2009 17:09 UTC

Return-Path: <infinity@lindenlab.com>
X-Original-To: mmox@core3.amsl.com
Delivered-To: mmox@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 54D6C28C1D3 for <mmox@core3.amsl.com>; Wed, 25 Feb 2009 09:09:54 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.494
X-Spam-Level:
X-Spam-Status: No, score=-3.494 tagged_above=-999 required=5 tests=[AWL=0.105, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NkKOYwhNxiH3 for <mmox@core3.amsl.com>; Wed, 25 Feb 2009 09:09:52 -0800 (PST)
Received: from tammy.lindenlab.com (tammy.lindenlab.com [64.154.223.128]) by core3.amsl.com (Postfix) with ESMTP id A578E28C1CB for <mmox@ietf.org>; Wed, 25 Feb 2009 09:09:52 -0800 (PST)
Received: from regression.lindenlab.com (regression.lindenlab.com [10.1.16.8]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by tammy.lindenlab.com (Postfix) with ESMTP id 5CC6F3DBC063; Wed, 25 Feb 2009 09:10:12 -0800 (PST)
Message-Id: <06CF9E4E-6C20-46CB-B092-68F702C2E185@lindenlab.com>
From: "Meadhbh Hamrick (Infinity)" <infinity@lindenlab.com>
To: Ewen Cheslack-Postava <ewencp@gmail.com>
In-Reply-To: <e17956060902242327x712880c8h184d41dc57dec206@mail.gmail.com>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Wed, 25 Feb 2009 09:10:12 -0800
References: <ebe4d1860902241754s7942179ajd4a29dde4e1d1bdb@mail.gmail.com> <49A4A8FA.607@gmail.com> <024355C7-0753-451F-9542-4478D27A2802@lindenlab.com> <e0b04bba0902242245y7ce6cc81i5d67d02b73f65440@mail.gmail.com> <497C7FD9-BF36-458F-98C0-BEECAE611E1A@lindenlab.com> <e17956060902242327x712880c8h184d41dc57dec206@mail.gmail.com>
X-Mailer: Apple Mail (2.930.3)
Cc: mmox@ietf.org
Subject: Re: [mmox] The Structure and Interpretation of LLSD Messages [Was: Re: unefficient binary serialization ?]
X-BeenThere: mmox@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Massively Multi-participant Online Games and Applications <mmox.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/mmox>, <mailto:mmox-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmox>
List-Post: <mailto:mmox@ietf.org>
List-Help: <mailto:mmox-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmox>, <mailto:mmox-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Feb 2009 17:09:54 -0000

i'm relying on rusty neurons to make this assertion, but i think  
protocol buffers will fail if you parse a message with an additional  
required member.

but this isn't that big of a deal since we like optional members in  
LLSD.

but i'm also at a bit of a loss for figuring out how to represent  
variant records with GPB.

which is to say, i'm sure it can be done and i'm pretty sure it "won't  
be pretty" in the sense that to support variants, you may have to do  
things that avoid using the innate structure representation of GPB.

and at that point, unless google (or someone else) is opening up an  
LLSD service, maybe the thing to do would be to have a protocol  
gateway that takes LLSD style PDUs in one side and emits GPB messages  
out the other.

On Feb 24, 2009, at 11:27 PM, Ewen Cheslack-Postava wrote:

> Maybe I'm just overlooking something, but how is this a problem?  It
> looks like you added an enum and an optional field using that enum
> type to the PhoneNumber message type.  Why would the serialization
> engine would fail?  It ignores key value pairs it doesn't understand.
> The encoding provides sufficient type/size information to skip the
> field without understanding its contents. In fact, it will even
> collect those fields up and deliver them to you so you can try to use
> them. Protocol Buffers was designed (or probably more accurately,
> evolved) to support backwards and forwards compatibility.  This is
> especially critical for google because they do slow rollouts of
> updated services, so they, at a minimum, have to be compatible between
> consecutive versions of their protocols.
>
> That being said, you certainly do have to be careful about
> compatibility when updating a message type or creating a new message
> that might be upgraded in the future.  For instance, there are quite a
> few people who use optional by default so that its possible to remove
> fields if they are no longer necessary.  But this is true for all
> serializations, so I wouldn't count that as a negative mark against
> protocol buffers.  In fact, I'd say its an argument for it that it can
> handle it (as can LLSD, Thrift, XML, etc).
>
> -Ewen
>
> On Tue, Feb 24, 2009 at 11:06 PM, Meadhbh Hamrick (Infinity)
> <infinity@lindenlab.com> wrote:
>> imagine you have a .proto file like:
>>
>> message Person {
>>  required string name = 1;
>>  required int32 id = 2;
>>  optional string email = 3;
>>
>>  enum PhoneType {
>>    MOBILE = 0;
>>    HOME = 1;
>>    WORK = 2;
>>  }
>>
>>  message PhoneNumber {
>>    required string number = 1;
>>    optional PhoneType type = 2 [default = HOME];
>>  }
>>
>>  repeated PhoneNumber phone = 4;
>> }
>>
>> and the next rev of the protocol is:
>>
>> message Person {
>>  required string name = 1;
>>  required int32 id = 2;
>>  optional string email = 3;
>>
>>  enum PhoneType {
>>    MOBILE = 0;
>>    HOME = 1;
>>    WORK = 2;
>>  }
>>
>>  enum PhoneUsage {
>>
>>    VOICE = 0;
>>
>>    FAX = 1;
>>
>>    MODEM = 2;
>>
>>  }
>>  message PhoneNumber {
>>    required string number = 1;
>>    optional PhoneType type = 2 [default = HOME];
>>
>>    optional PhoneUsage type = 3 [default = VOICE];
>>  }
>>
>>  repeated PhoneNumber phone = 4;
>> }
>>
>> the problem is that if a endpoint expecting the former consumes the  
>> latter,
>> the serialization engine barfs. and you (in the application layer)  
>> don't
>> receive the message in order to try to pick it apart manually  
>> (which kind of
>> defeats the purpose of the serialization.)
>> however, it _is_ an open source project, so we could probably  
>> create a
>> version of the library that adds a "fail silently" mode.
>> there are also semantics for extending messages, so maybe.. i'd  
>> have to look
>> at it a little more before passing judgement.
>>
>> On Feb 24, 2009, at 10:45 PM, Morgaine wrote:
>>
>> Excellent background on LLSD, Infinity!  It raises a question:
>>
>> On Wed, Feb 25, 2009 at 5:41 AM, Meadhbh Hamrick (Infinity)
>> <infinity@lindenlab.com> wrote:
>>
>> while we have a metric tonne of code that uses the existing LLSD DTD,
>> ultimately, any additional serialization format that preserves these
>> processing expectations could be wedged into the system.  
>> serialization
>> formats that did not support these processing expectations would  
>> cause
>> considerable heart burn.
>>
>>
>> One serialization format that many have mentioned here is Google's  
>> Protocol
>> Buffers.  While I have barely started to examine PB myself, it is  
>> very clear
>> that this serialization is extraordinarily efficient on-the-wire  
>> and also at
>> both ends of the wire, as well as catering for the continual change  
>> that is
>> inherent in large, evolving systems.  A interest in extending the 3- 
>> way
>> serialization of XXSD to Protocol Buffers follows naturally.
>>
>> The question that is raised by the quoted paragraph is whether  
>> Protocol
>> Buffers "preserves these processing expectations".  If it does then
>> extending the set of serializations to PB is work that could be  
>> done by
>> those interested in it, as an open project, and picked up by other  
>> users of
>> XXSD if they wish.  Potentially, the acceleration could be quite
>> significant, and hence the benefit to Limited Capability Clients very
>> important.
>>
>> On the other hand, if it does not "preserve these processing  
>> expectations"
>> then there could be a problem.
>>
>> Do Lindens (or anyone else here) have any insight on this?
>>
>> Morgaine.
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Feb 25, 2009 at 5:41 AM, Meadhbh Hamrick (Infinity)
>> <infinity@lindenlab.com> wrote:
>> but here's one reason we like LLSD...
>>
>> imagine you have two systems that are interoperating... one is  
>> owned my me,
>> the other is owned by catherine.
>>
>> cathy's customers are all banks and insurance companies and are very
>> conservative, so cathy want's to err on the side of caution in  
>> terms of when
>> she deploys new features.
>>
>> i on the other hand throw caution to the wind because my customers  
>> are all
>> first person shooter fanatics between the ages of 13 and 21 who  
>> demand the
>> latest features as soon as possible (otherwise they use david's  
>> server)
>>
>> it's not hard to see we're eventually going to introduce version  
>> slew which
>> will make it difficult for cathy and i to participate in the  
>> unregulated
>> business of marketing virtual life insurance policies to my  
>> customers. in
>> this example, i'm trying to point out that the both the option to  
>> retain
>> version parity with cathy and the option to upgrade to the latest  
>> version
>> are not without drawbacks.
>>
>> now.. imagine we have a PDU with three members while the second  
>> version adds
>> a member. for simplicity's sake, let's say they're all integers.
>>
>> if used the DTD to define the validation for the message, the first  
>> version
>> of the message might look like:
>>
>> <?xml version="1.0"?>
>> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML  
>> PDU
>> 1.0//EN" "http://example.com/pdu/1.0/pdutastic.dtd">
>> <pdutastic>
>>  <foo type="integer">13</foo>
>>  <bar type="integer">42</bar>
>>  <baz type="integer">93</baz>
>> </pdutastic>
>>
>> while the new version would look like this:
>>
>> <?xml version="1.0"?>
>> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML  
>> PDU
>> 1.1//EN" "http://example.com/pdu/1.1/pdutastic.dtd">
>> <pdutastic>
>>  <foo type="integer">13</foo>
>>  <bar type="integer">42</bar>
>>  <baz type="integer">93</baz>
>>  <qux type="integer">161</qux>
>> </pdutastic>
>>
>> so when i (expecting the new version) consume the old version, i do  
>> not see
>> qux. but my parser doesn't complain because instead of the DTD of  
>> the new
>> PDU, i see a reference to the old PDU and say to myself... "it's  
>> okay...
>> it's just an older version," and i do whatever i have to do to  
>> continue
>> processing in the absence of the qux member i was expecting. even  
>> if i'm
>> doing validation, it's okay, i'm pulling the DTD from the old  
>> definition, so
>> we're cool.
>>
>> now... when cathy consumes the new PDU, something similar happens.  
>> she see's
>> a DTD she's not familiar with, but it's all good.. she grabs teh  
>> DTD from
>> URL specified in teh DOCTYPE and moves on with her life, hoping  
>> that new
>> version PDU didn't change the semantics of the three values (foo,  
>> bar and
>> baz) she's familiar with.
>>
>> and then one day she receives this from david:
>>
>> <?xml version="1.0"?>
>> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML  
>> PDU
>> 1.2//EN" "http://example.com/pdu/1.2/pdutastic.dtd">
>> <pdutastic>
>>  <foo type="integer">13</foo>
>>  <bar type="date">2008-12-15T13:17:23.235Z</bar>
>>  <baz type="integer">93</baz>
>>  <qux type="integer">161</qux>
>> </pdutastic>
>>
>> aha! david, trying to steal my core demographic away from me has  
>> upgraded to
>> the most recent beta version of the software (now with 13% more  
>> explosions)
>>
>> but this is now a serious problem for cathy. the "shape" of the  
>> structured
>> data now does not match. what does she do? how does she convert  
>> that date
>> into an integer?
>>
>> and this is the question we keep running into as we deploy "loosely  
>> coupled"
>> systems. how do we fail gracefully and ensure robustness in the  
>> presence of
>> PDU schema mismatches?
>>
>> the way linden answered this question was to develop LLSD, an  
>> abstract type
>> system with conversions and defaults.
>>
>> the way we would probably have defined the message above would have  
>> been
>> something like:
>>
>> <?xml version="1.0"?>
>> <llsd>
>>  <map>
>>    <key>foo</key>
>>    <integer>13</integer>
>>    <key>bar</key>
>>    <date>2008-12-15T13:17:23.235Z</date>
>>    <key>baz</key>
>>    <integer>93</integer>
>>    <key>qux</key>
>>    <integer>161</integer>
>>  </map>
>> </llsd>
>>
>> we don't include the doctype id 'cause our DTD never changes. in this
>> example the LLSD message is shorter 'cause we don't include the  
>> DOCTYPE id,
>> but even after removing the DOCTYPE id and changing the outer  
>> element to
>> llsd, the compressed version of the LLSD message is only seven  
>> bytes larger.
>> (an increase from 154 bytes to 161 bytes, an efficiency loss of 4.5%)
>> however, i can guarantee you that it took a lot less time to code  
>> the parser
>> because.. well.. we already had  a parser that knew how to unwrap  
>> LLSD
>> messages.
>>
>> but we haven't got to the interesting part yet.
>>
>> there is a processing expectation with LLSD. we assume we're de- 
>> serializing
>> a message into a dynamic language like ECMAScript or Python. For  
>> languages
>> like C++ and Java we create a class that emulates this feature.  
>> so.. all
>> LLSD PDUs are de-serialized into these LLSD objects, and inside our
>> application(s), it's these objects that get passed around. so we  
>> might have
>> code that looks like:
>>
>> public foo ( string message ) {
>>        LLSD llsd;
>>
>>        llsd = LLSD.From( message, LLSD.XML_SERIALIZATION );
>>
>>        System.err.println( "hey! foo was %d.\n", (int) llsd.at( "foo"
>> ).asInteger() );
>>
>>        if( ( (int) llsd.at("bar").asInteger() > 100 ) || ( (int)
>> llsd.at("bar").asInteger() < 0 ) ) {
>>                System.err.println( "bar is not a percentage\n" );
>>                System.haltAndCatchFire();
>>        }
>>
>>
>>        if( (int) llsd.at("bar").asInteger() < 40 ) {
>>                System.err.println( "sorry, this percent margin is  
>> way too
>> small for me to be interested.\n" );
>>                System.haltAndCatchFire();
>>        }
>>                .
>>                .
>>                .
>> }
>>
>> forgive the javaisms, the alternative was smalltalk and well...  
>> let's not go
>> there.
>>
>> for the sake of argument let's say we had already determined it was  
>> in XML
>> format by looking at the Content-Type: header from the request.
>>
>> so what we're doing here is not out of the ordinary... we create an  
>> instance
>> of the LLSD class from the serialization.
>>
>> and we print some logging messages.
>>
>> and we then try to see if bar is between 0 an 100.
>>
>> but the object representing "foo" is a date, not an integer. what  
>> happens?
>>
>> in our current implementation of LLSD, we attempt to convert, but  
>> notice
>> that the conversion from date to integer has no semantic meaning.   
>> because
>> the conversion is "illegal," the default value is returned.
>>
>> the result of running this fragment would be something like
>>
>>        hey! foo was 0.
>>        sorry, this percent margin is way too small for me to be  
>> interested.
>>
>> the important feature here is that we did not allow the  
>> "presentation layer"
>> to create an error. we did the best we could to interpret the sense  
>> of the
>> message, even to the point of making a nonsensical conversion. we  
>> allow this
>> to happen because we assume that at the application layer, we're  
>> going to do
>> parameter checking. so the reasoning was... if we're going to be  
>> doing
>> parameter checking anyway, why would we want to allow the  
>> presentation layer
>> to fail a message that would have generated a result that could be  
>> checked
>> at the application layer?
>>
>> i hope you've enjoyed this installment of "the structure and  
>> interpretation
>> of LLSD messages." and i hope it helps explain our motivations for  
>> designing
>> LLSD and it's serializations the way we did... and while we have a  
>> metric
>> tonne of code that uses the existing LLSD DTD, ultimately, any  
>> additional
>> serialization format that preserves these processing expectations  
>> could be
>> wedged into the system. serialization formats that did not support  
>> these
>> processing expectations would cause considerable heart burn.
>>
>> -cheers
>> -meadhbh
>>
>> On Feb 24, 2009, at 6:12 PM, Jon Watte wrote:
>>
>> Catherine Pfeffer wrote:
>> That's precisely all the HUGE strength of LLSD. Having information  
>> about
>> what the data is intermingled with the data itself makes it very  
>> flexible
>> and future-proof.
>> But that is simply not necessary. If you transmit a reference to  
>> the schema
>> you use when you initiate connection, then all future transmissions  
>> during
>> that connection session can use the knowledge of that schema to  
>> highly
>> optimize traffic. Network traffic is actually one of the scarce  
>> resources
>> when you start scaling up really large virtual worlds. There are  
>> cases where
>> an entire office (20 people or more) share a single T1 connection  
>> to the
>> Internet, and expect to have a good virtual world experience for  
>> multiple
>> users at the same time.
>>
>>
>> Honestly, here, the bandwidth is not lost, compared to the  
>> advantages.
>>
>>
>> I think it is, because the size difference is quite significant.
>>
>> In summary, their binary serialization is as flexible as XML data.  
>> Perhaps
>> I'm wrong, but to me it's an unique case in all computer science of  
>> a binary
>> format that would be so flexible.
>>
>> Don't overextend yourself :-) AIFF/RIFF has pretty much the same
>> flexibility, as does TLV (also used in OSCAR), a number of "binary  
>> XML"
>> proposals, and a number of protocols that have all done more or  
>> less the
>> same thing ever since computers were interconnected on a larger  
>> scale in the
>> '80s.
>>
>> Sincerely,
>>
>> jw
>>
>> _______________________________________________
>> mmox mailing list
>> mmox@ietf.org
>> https://www.ietf.org/mailman/listinfo/mmox
>>
>> _______________________________________________
>> mmox mailing list
>> mmox@ietf.org
>> https://www.ietf.org/mailman/listinfo/mmox
>>
>>
>>
>> _______________________________________________
>> mmox mailing list
>> mmox@ietf.org
>> https://www.ietf.org/mailman/listinfo/mmox
>>
>>