Re: [mmox] The Structure and Interpretation of LLSD Messages [Was: Re: unefficient binary serialization ?]
Ewen Cheslack-Postava <ewencp@gmail.com> Wed, 25 February 2009 07:26 UTC
Return-Path: <echeslack@gmail.com>
X-Original-To: mmox@core3.amsl.com
Delivered-To: mmox@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id AB6A528C14F for <mmox@core3.amsl.com>; Tue, 24 Feb 2009 23:26:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.766
X-Spam-Level:
X-Spam-Status: No, score=-1.766 tagged_above=-999 required=5 tests=[AWL=0.833, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OJFa8XO3BlPg for <mmox@core3.amsl.com>; Tue, 24 Feb 2009 23:26:52 -0800 (PST)
Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.25]) by core3.amsl.com (Postfix) with ESMTP id AAB003A686C for <mmox@ietf.org>; Tue, 24 Feb 2009 23:26:51 -0800 (PST)
Received: by qw-out-2122.google.com with SMTP id 3so1951742qwe.31 for <mmox@ietf.org>; Tue, 24 Feb 2009 23:27:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=uU81x1sbpM4huExe/9gbkFFk0ERYo3hMZztRvaU+hwg=; b=kd+7BR+A2pA9ZKY6xhdAVfm6lxVTibWxXjIegNk0b2YxE+cCrXRPE6CFlyvqzx4nAQ 4eFS7e9y+S+qaUr6qctsxvzsKqglVnGPQuRufCAD3L4q/2dfoS9px4FL/0QkYYJFGTg7 4OiLgfQgRi2YiTpyTUFduectWcMaaA8fRsXWM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=jkGdfYxHSqJLoZKj6KD/fk0TkCnPtY07Oe7Ro9Srm18LWDooxn1uwzcelaOe1RVLdm BLwNCffr6Z5C3aniq2MszBMCTvAZbRxt0JLS6xxlENcVdRtIzZnPCBT1sQz60EVj9FDw LOocX0m6T6FhA2KDq9Y+Ujl9CbsJ1b3tXRYO0=
MIME-Version: 1.0
Sender: echeslack@gmail.com
Received: by 10.224.11.137 with SMTP id t9mr57294qat.135.1235546830907; Tue, 24 Feb 2009 23:27:10 -0800 (PST)
In-Reply-To: <497C7FD9-BF36-458F-98C0-BEECAE611E1A@lindenlab.com>
References: <ebe4d1860902241754s7942179ajd4a29dde4e1d1bdb@mail.gmail.com> <49A4A8FA.607@gmail.com> <024355C7-0753-451F-9542-4478D27A2802@lindenlab.com> <e0b04bba0902242245y7ce6cc81i5d67d02b73f65440@mail.gmail.com> <497C7FD9-BF36-458F-98C0-BEECAE611E1A@lindenlab.com>
Date: Tue, 24 Feb 2009 23:27:10 -0800
X-Google-Sender-Auth: 37ffb56b3fe6ffb4
Message-ID: <e17956060902242327x712880c8h184d41dc57dec206@mail.gmail.com>
From: Ewen Cheslack-Postava <ewencp@gmail.com>
To: "Meadhbh Hamrick (Infinity)" <infinity@lindenlab.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: mmox@ietf.org
Subject: Re: [mmox] The Structure and Interpretation of LLSD Messages [Was: Re: unefficient binary serialization ?]
X-BeenThere: mmox@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Massively Multi-participant Online Games and Applications <mmox.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/mmox>, <mailto:mmox-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmox>
List-Post: <mailto:mmox@ietf.org>
List-Help: <mailto:mmox-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmox>, <mailto:mmox-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Feb 2009 07:26:53 -0000
Maybe I'm just overlooking something, but how is this a problem? It
looks like you added an enum and an optional field using that enum
type to the PhoneNumber message type. Why would the serialization
engine would fail? It ignores key value pairs it doesn't understand.
The encoding provides sufficient type/size information to skip the
field without understanding its contents. In fact, it will even
collect those fields up and deliver them to you so you can try to use
them. Protocol Buffers was designed (or probably more accurately,
evolved) to support backwards and forwards compatibility. This is
especially critical for google because they do slow rollouts of
updated services, so they, at a minimum, have to be compatible between
consecutive versions of their protocols.
That being said, you certainly do have to be careful about
compatibility when updating a message type or creating a new message
that might be upgraded in the future. For instance, there are quite a
few people who use optional by default so that its possible to remove
fields if they are no longer necessary. But this is true for all
serializations, so I wouldn't count that as a negative mark against
protocol buffers. In fact, I'd say its an argument for it that it can
handle it (as can LLSD, Thrift, XML, etc).
-Ewen
On Tue, Feb 24, 2009 at 11:06 PM, Meadhbh Hamrick (Infinity)
<infinity@lindenlab.com> wrote:
> imagine you have a .proto file like:
>
> message Person {
> required string name = 1;
> required int32 id = 2;
> optional string email = 3;
>
> enum PhoneType {
> MOBILE = 0;
> HOME = 1;
> WORK = 2;
> }
>
> message PhoneNumber {
> required string number = 1;
> optional PhoneType type = 2 [default = HOME];
> }
>
> repeated PhoneNumber phone = 4;
> }
>
> and the next rev of the protocol is:
>
> message Person {
> required string name = 1;
> required int32 id = 2;
> optional string email = 3;
>
> enum PhoneType {
> MOBILE = 0;
> HOME = 1;
> WORK = 2;
> }
>
> enum PhoneUsage {
>
> VOICE = 0;
>
> FAX = 1;
>
> MODEM = 2;
>
> }
> message PhoneNumber {
> required string number = 1;
> optional PhoneType type = 2 [default = HOME];
>
> optional PhoneUsage type = 3 [default = VOICE];
> }
>
> repeated PhoneNumber phone = 4;
> }
>
> the problem is that if a endpoint expecting the former consumes the latter,
> the serialization engine barfs. and you (in the application layer) don't
> receive the message in order to try to pick it apart manually (which kind of
> defeats the purpose of the serialization.)
> however, it _is_ an open source project, so we could probably create a
> version of the library that adds a "fail silently" mode.
> there are also semantics for extending messages, so maybe.. i'd have to look
> at it a little more before passing judgement.
>
> On Feb 24, 2009, at 10:45 PM, Morgaine wrote:
>
> Excellent background on LLSD, Infinity! It raises a question:
>
> On Wed, Feb 25, 2009 at 5:41 AM, Meadhbh Hamrick (Infinity)
> <infinity@lindenlab.com> wrote:
>
> while we have a metric tonne of code that uses the existing LLSD DTD,
> ultimately, any additional serialization format that preserves these
> processing expectations could be wedged into the system. serialization
> formats that did not support these processing expectations would cause
> considerable heart burn.
>
>
> One serialization format that many have mentioned here is Google's Protocol
> Buffers. While I have barely started to examine PB myself, it is very clear
> that this serialization is extraordinarily efficient on-the-wire and also at
> both ends of the wire, as well as catering for the continual change that is
> inherent in large, evolving systems. A interest in extending the 3-way
> serialization of XXSD to Protocol Buffers follows naturally.
>
> The question that is raised by the quoted paragraph is whether Protocol
> Buffers "preserves these processing expectations". If it does then
> extending the set of serializations to PB is work that could be done by
> those interested in it, as an open project, and picked up by other users of
> XXSD if they wish. Potentially, the acceleration could be quite
> significant, and hence the benefit to Limited Capability Clients very
> important.
>
> On the other hand, if it does not "preserve these processing expectations"
> then there could be a problem.
>
> Do Lindens (or anyone else here) have any insight on this?
>
> Morgaine.
>
>
>
>
>
>
>
> On Wed, Feb 25, 2009 at 5:41 AM, Meadhbh Hamrick (Infinity)
> <infinity@lindenlab.com> wrote:
> but here's one reason we like LLSD...
>
> imagine you have two systems that are interoperating... one is owned my me,
> the other is owned by catherine.
>
> cathy's customers are all banks and insurance companies and are very
> conservative, so cathy want's to err on the side of caution in terms of when
> she deploys new features.
>
> i on the other hand throw caution to the wind because my customers are all
> first person shooter fanatics between the ages of 13 and 21 who demand the
> latest features as soon as possible (otherwise they use david's server)
>
> it's not hard to see we're eventually going to introduce version slew which
> will make it difficult for cathy and i to participate in the unregulated
> business of marketing virtual life insurance policies to my customers. in
> this example, i'm trying to point out that the both the option to retain
> version parity with cathy and the option to upgrade to the latest version
> are not without drawbacks.
>
> now.. imagine we have a PDU with three members while the second version adds
> a member. for simplicity's sake, let's say they're all integers.
>
> if used the DTD to define the validation for the message, the first version
> of the message might look like:
>
> <?xml version="1.0"?>
> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML PDU
> 1.0//EN" "http://example.com/pdu/1.0/pdutastic.dtd">
> <pdutastic>
> <foo type="integer">13</foo>
> <bar type="integer">42</bar>
> <baz type="integer">93</baz>
> </pdutastic>
>
> while the new version would look like this:
>
> <?xml version="1.0"?>
> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML PDU
> 1.1//EN" "http://example.com/pdu/1.1/pdutastic.dtd">
> <pdutastic>
> <foo type="integer">13</foo>
> <bar type="integer">42</bar>
> <baz type="integer">93</baz>
> <qux type="integer">161</qux>
> </pdutastic>
>
> so when i (expecting the new version) consume the old version, i do not see
> qux. but my parser doesn't complain because instead of the DTD of the new
> PDU, i see a reference to the old PDU and say to myself... "it's okay...
> it's just an older version," and i do whatever i have to do to continue
> processing in the absence of the qux member i was expecting. even if i'm
> doing validation, it's okay, i'm pulling the DTD from the old definition, so
> we're cool.
>
> now... when cathy consumes the new PDU, something similar happens. she see's
> a DTD she's not familiar with, but it's all good.. she grabs teh DTD from
> URL specified in teh DOCTYPE and moves on with her life, hoping that new
> version PDU didn't change the semantics of the three values (foo, bar and
> baz) she's familiar with.
>
> and then one day she receives this from david:
>
> <?xml version="1.0"?>
> <!DOCTYPE pdutastic PUBLIC "-//MMOX//Poorly Defined Derivatives ML PDU
> 1.2//EN" "http://example.com/pdu/1.2/pdutastic.dtd">
> <pdutastic>
> <foo type="integer">13</foo>
> <bar type="date">2008-12-15T13:17:23.235Z</bar>
> <baz type="integer">93</baz>
> <qux type="integer">161</qux>
> </pdutastic>
>
> aha! david, trying to steal my core demographic away from me has upgraded to
> the most recent beta version of the software (now with 13% more explosions)
>
> but this is now a serious problem for cathy. the "shape" of the structured
> data now does not match. what does she do? how does she convert that date
> into an integer?
>
> and this is the question we keep running into as we deploy "loosely coupled"
> systems. how do we fail gracefully and ensure robustness in the presence of
> PDU schema mismatches?
>
> the way linden answered this question was to develop LLSD, an abstract type
> system with conversions and defaults.
>
> the way we would probably have defined the message above would have been
> something like:
>
> <?xml version="1.0"?>
> <llsd>
> <map>
> <key>foo</key>
> <integer>13</integer>
> <key>bar</key>
> <date>2008-12-15T13:17:23.235Z</date>
> <key>baz</key>
> <integer>93</integer>
> <key>qux</key>
> <integer>161</integer>
> </map>
> </llsd>
>
> we don't include the doctype id 'cause our DTD never changes. in this
> example the LLSD message is shorter 'cause we don't include the DOCTYPE id,
> but even after removing the DOCTYPE id and changing the outer element to
> llsd, the compressed version of the LLSD message is only seven bytes larger.
> (an increase from 154 bytes to 161 bytes, an efficiency loss of 4.5%)
> however, i can guarantee you that it took a lot less time to code the parser
> because.. well.. we already had a parser that knew how to unwrap LLSD
> messages.
>
> but we haven't got to the interesting part yet.
>
> there is a processing expectation with LLSD. we assume we're de-serializing
> a message into a dynamic language like ECMAScript or Python. For languages
> like C++ and Java we create a class that emulates this feature. so.. all
> LLSD PDUs are de-serialized into these LLSD objects, and inside our
> application(s), it's these objects that get passed around. so we might have
> code that looks like:
>
> public foo ( string message ) {
> LLSD llsd;
>
> llsd = LLSD.From( message, LLSD.XML_SERIALIZATION );
>
> System.err.println( "hey! foo was %d.\n", (int) llsd.at( "foo"
> ).asInteger() );
>
> if( ( (int) llsd.at("bar").asInteger() > 100 ) || ( (int)
> llsd.at("bar").asInteger() < 0 ) ) {
> System.err.println( "bar is not a percentage\n" );
> System.haltAndCatchFire();
> }
>
>
> if( (int) llsd.at("bar").asInteger() < 40 ) {
> System.err.println( "sorry, this percent margin is way too
> small for me to be interested.\n" );
> System.haltAndCatchFire();
> }
> .
> .
> .
> }
>
> forgive the javaisms, the alternative was smalltalk and well... let's not go
> there.
>
> for the sake of argument let's say we had already determined it was in XML
> format by looking at the Content-Type: header from the request.
>
> so what we're doing here is not out of the ordinary... we create an instance
> of the LLSD class from the serialization.
>
> and we print some logging messages.
>
> and we then try to see if bar is between 0 an 100.
>
> but the object representing "foo" is a date, not an integer. what happens?
>
> in our current implementation of LLSD, we attempt to convert, but notice
> that the conversion from date to integer has no semantic meaning. because
> the conversion is "illegal," the default value is returned.
>
> the result of running this fragment would be something like
>
> hey! foo was 0.
> sorry, this percent margin is way too small for me to be interested.
>
> the important feature here is that we did not allow the "presentation layer"
> to create an error. we did the best we could to interpret the sense of the
> message, even to the point of making a nonsensical conversion. we allow this
> to happen because we assume that at the application layer, we're going to do
> parameter checking. so the reasoning was... if we're going to be doing
> parameter checking anyway, why would we want to allow the presentation layer
> to fail a message that would have generated a result that could be checked
> at the application layer?
>
> i hope you've enjoyed this installment of "the structure and interpretation
> of LLSD messages." and i hope it helps explain our motivations for designing
> LLSD and it's serializations the way we did... and while we have a metric
> tonne of code that uses the existing LLSD DTD, ultimately, any additional
> serialization format that preserves these processing expectations could be
> wedged into the system. serialization formats that did not support these
> processing expectations would cause considerable heart burn.
>
> -cheers
> -meadhbh
>
> On Feb 24, 2009, at 6:12 PM, Jon Watte wrote:
>
> Catherine Pfeffer wrote:
> That's precisely all the HUGE strength of LLSD. Having information about
> what the data is intermingled with the data itself makes it very flexible
> and future-proof.
> But that is simply not necessary. If you transmit a reference to the schema
> you use when you initiate connection, then all future transmissions during
> that connection session can use the knowledge of that schema to highly
> optimize traffic. Network traffic is actually one of the scarce resources
> when you start scaling up really large virtual worlds. There are cases where
> an entire office (20 people or more) share a single T1 connection to the
> Internet, and expect to have a good virtual world experience for multiple
> users at the same time.
>
>
> Honestly, here, the bandwidth is not lost, compared to the advantages.
>
>
> I think it is, because the size difference is quite significant.
>
> In summary, their binary serialization is as flexible as XML data. Perhaps
> I'm wrong, but to me it's an unique case in all computer science of a binary
> format that would be so flexible.
>
> Don't overextend yourself :-) AIFF/RIFF has pretty much the same
> flexibility, as does TLV (also used in OSCAR), a number of "binary XML"
> proposals, and a number of protocols that have all done more or less the
> same thing ever since computers were interconnected on a larger scale in the
> '80s.
>
> Sincerely,
>
> jw
>
> _______________________________________________
> mmox mailing list
> mmox@ietf.org
> https://www.ietf.org/mailman/listinfo/mmox
>
> _______________________________________________
> mmox mailing list
> mmox@ietf.org
> https://www.ietf.org/mailman/listinfo/mmox
>
>
>
> _______________________________________________
> mmox mailing list
> mmox@ietf.org
> https://www.ietf.org/mailman/listinfo/mmox
>
>
- [mmox] unefficient binary serialization ? Catherine Pfeffer
- Re: [mmox] unefficient binary serialization ? Jon Watte
- [mmox] The Structure and Interpretation of LLSD M… Meadhbh Hamrick (Infinity)
- Re: [mmox] The Structure and Interpretation of LL… Morgaine
- Re: [mmox] The Structure and Interpretation of LL… Meadhbh Hamrick (Infinity)
- Re: [mmox] The Structure and Interpretation of LL… Ewen Cheslack-Postava
- Re: [mmox] The Structure and Interpretation of LL… Meadhbh Hamrick (Infinity)
- Re: [mmox] The Structure and Interpretation of LL… Ewen Cheslack-Postava
- Re: [mmox] The Structure and Interpretation of LL… Lisa Dusseault
- Re: [mmox] The Structure and Interpretation of LL… Morgaine
- Re: [mmox] The Structure and Interpretation of LL… Jon Watte