Re: My slides from the Sunday XML session

"Timur Shemsedinov" <timur.shemsedinov@gmail.com> Wed, 19 December 2007 17:39 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1J52tX-0000S7-Ob; Wed, 19 Dec 2007 12:39:47 -0500
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1IzNsM-0007RA-Bd for discuss-confirm+ok@megatron.ietf.org; Mon, 03 Dec 2007 21:51:10 -0500
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1IzNsM-0007Qz-2H for discuss@apps.ietf.org; Mon, 03 Dec 2007 21:51:10 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IzNqr-0004qL-KE for discuss@apps.ietf.org; Mon, 03 Dec 2007 21:49:37 -0500
Received: from py-out-1112.google.com ([64.233.166.180]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IzNqq-0001o8-Le for discuss@apps.ietf.org; Mon, 03 Dec 2007 21:49:37 -0500
Received: by py-out-1112.google.com with SMTP id j37so7370645pyc for <discuss@apps.ietf.org>; Mon, 03 Dec 2007 18:49:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=1LKxWa6IZodzv/Z2EvJbwjLPzSo7W7hYIFXJALAtvL8=; b=Aci11nidhEoH52kupZ6AOYRuXwqRnOOZjxIiXxEHT3j5eVYo3tkFGP1URYhbffBkKR9ZbR8S6b3HjvMOYlAwJsTpMf/hTA+OFwaTLWwyEKD3/+gPrAnLoNPN63buA5F87SB4L/GS/8cVFmHju+2TayWaqKRZEfQXcU5S36MbM+Y=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=CVDCfBdJDDcNH+wXEXG5QYQ8m9OfGWUWH8r1JvJU4SbrTCfNQh13TfxH5dYBbNBfua3OOO2wQC0Q8TB23YU6BH6KfIysi3fEvvUmBRIoJQIX+fmU7wmwAMICBbXVGJQ+m4CK7HhmPDPhNzh2+wY97bsebCO2ZgWG/miIBqoGFNY=
Received: by 10.65.59.11 with SMTP id m11mr17273983qbk.1196736574694; Mon, 03 Dec 2007 18:49:34 -0800 (PST)
Received: by 10.65.188.16 with HTTP; Mon, 3 Dec 2007 18:49:34 -0800 (PST)
Message-ID: <248bcd790712031849m75f878b0nd61bb1962d2280b7@mail.gmail.com>
Date: Tue, 4 Dec 2007 04:49:34 +0200
From: "Timur Shemsedinov" <timur.shemsedinov@gmail.com>
To: tbray@textuality.com, Tim.Bray@sun.com, discuss@apps.ietf.org
Subject: Re: My slides from the Sunday XML session
In-Reply-To: <517bf110712021756v5f5d6774ya2489dcaa092e6@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_12710_1882312.1196736574685"
References: <517bf110712021756v5f5d6774ya2489dcaa092e6@mail.gmail.com>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: c3a18ef96977fc9bcc21a621cbf1174b
X-TMDA-Confirmed: Mon, 03 Dec 2007 21:51:10 -0500
X-Mailman-Approved-At: Wed, 19 Dec 2007 12:39:46 -0500
Cc:
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

Hello,

2007-12-03, Tim Bray <tbray@textuality.com >:
> Online at  http://www.tbray.org/tmp/IETF70.pdf

Thanks for generalizing sight. In addition I wont to notice that it is
possible to combine plain text, binary format and xml-like hierarchical
language in one approach. It is possible to replace lacks of each approach
with of advantages of another.

My team in Research Institute of System Technologies has been using various
approaches and representation forms for a long time: like XML, Jason, Binary
and even own language for structured data called USP (since 1998). Having
the same problems with parsing speed and some syntactic excesses as an XML
have, we have got insignificant advantage in speed (20-25%) and the size
(15-20%). However, loving optimization, we were asked by a question of
improvement of this language. Now we have so simple and obvious structure
that I don't know why it has taken a lot of time.

Example:
1: Node1:Class1 Field1[value1] Field2[Value2] Link1<url1>
Set1{item1,item2,item3}
2: SubNode1 Field3[value3] Set2{} Set3{item4,item5} Link2<url2>
Field4[value4]
2: SubNode2 Field5[value5] Field6[value6]
3: SubSubNode1 Set4{item6,item7} Field7[value7] Field8[value8] Link3<url3>
2: SubNode3 Field9[value9]
3: SubSubNode2:Class2,Class3 Field10[value10] Link4<url4>

Comments:
- Each node of a tree in a new line
- Numbers in the beginning of a line is a hierarchy level of node, i.e. in
this example we have tree:
   Node1
     SubNode1
     SubNode2
       SubSubNode1
     SubNode3
       SubSubNode2
- Numbering hierarchy gives us two advantages at once: it is possible escape
analyzing quantity of open/close blocks, as in:
<tag1><tag2><tag1></tag1></tag2><tag2></tag2></tag1>
or
(...(...(...)...)...(...)...)
and it gives us possibility to skip entire branch of tree only with one
substring find operation
- This format does not concede Binary on brevity. For example, here is
binary analogue:
[Level 4
bytes][Name][0x00][Class][0x00][FieldName][0x00][Value][0x00][0x13][0x10]...
So, full "Binarization" gives no advantage in size or speed, neither in
brevity nor in simplicity of syntactic analysis.

This approach is tested on applications where the given structure is used
not only as an input data format but also as data representation in memory.
Besides we use this format for ours network protocols and even have made a
databases management system which stores data in this format, placing
everyone node in one or more memory page. DBMS organizes indexing, search
and fast binding together data of required nodes. So data storage format
through network packets (without extra repacking) is direct hits the client
application memory.

-Timur