Re: [Json] JData: A general-purpose data storage and interchange format - Draft 1

Qianqian Fang <fangqq@gmail.com> Tue, 04 June 2019 17:21 UTC

Return-Path: <fangqq@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF7761200DB for <json@ietfa.amsl.com>; Tue, 4 Jun 2019 10:21:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7klTPixFZTbY for <json@ietfa.amsl.com>; Tue, 4 Jun 2019 10:21:06 -0700 (PDT)
Received: from mail-vk1-xa44.google.com (mail-vk1-xa44.google.com [IPv6:2607:f8b0:4864:20::a44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1D43B1200C4 for <json@ietf.org>; Tue, 4 Jun 2019 10:21:00 -0700 (PDT)
Received: by mail-vk1-xa44.google.com with SMTP id d7so3715197vkf.1 for <json@ietf.org>; Tue, 04 Jun 2019 10:21:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:references:message-id:date:user-agent:mime-version :in-reply-to:content-language; bh=dX/9g0vsKwkdeNzUr2HIUGI5bfUPmd+5noA6YaSPuNw=; b=UU9Zl2PLzjmvMynZMfxBCV+UM0sXnwexkqBY5jA34Y4mD8VrODxiHM3MVmJU3RFiaW HD5lYBxV+XZ7f0LDRCO1nRuMHMtmyDUvCjQtMQes0ZUmYIW9LudRW//ghoM0Wf3iXuKF y0jskUeprVkhvTG4LIldzXRqDOq1wrkE93hhgTXu0LdPBf1X61BWN3YaF681W8BuCZck f6EYAQwLCzD3MHfuY+uPULV1AaldrwdElpTgnnr/y+GU2KscLfld3lSoTkHBJYQNx9Aa RxhqIy3rdWzzTh/vHKvpJMXjC36kyHyzDdWQWULjgkSTpjNL8oGdqRDi4FqBk0t8fHNx BGlw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=dX/9g0vsKwkdeNzUr2HIUGI5bfUPmd+5noA6YaSPuNw=; b=FD4afTVSKPJk62ALoIA4SHVYMVRzGzmgr3yrg1bF1cl1X8ek+weulD7/d/rYWLgYuJ iKyflcZ4npYHw+fkrhqX6qB2N4BT2p6c1+VYSUWn5J6NpveaV86EK7Ix9dQ91WpEyFGF 7EfKaLnDG1ihKnYUFfusrm3TybpIZzNvYwAa4XEeHvI1Xmweb13GjIU4vQCDZo3MzYJo 2/X+c+IavhPIj38bwu51XBxzSgJSW/AxBno7CxtT1PUK2vbrRXILqEWZ0MUwdMENDVaQ SfIHI7b7C8b5ypkirCgHU2ZkxTyyR3G5NcgI6fUPfl19EkvUlAbTGRnjuPS3olmfLNgl xeKw==
X-Gm-Message-State: APjAAAUTOBphwSIaxSsbhNQumMw27UP9mQ/wPFm9WhINjuLxRQRlgJyB YWubJmjbf7tTsV4SNMCb2SWr1Udzh5g=
X-Google-Smtp-Source: APXvYqxYFcmkChSYWzT/w3u4KTVxU2yX4Y1yJPpxvVeTsKAb0kfaVCFBbHcBw2GxqgkQMGGoUBeREg==
X-Received: by 2002:a1f:12d5:: with SMTP id 204mr225925vks.4.1559668858643; Tue, 04 Jun 2019 10:20:58 -0700 (PDT)
Received: from [129.10.224.37] ([129.10.224.37]) by smtp.gmail.com with ESMTPSA id j13sm2816353vke.52.2019.06.04.10.20.58 for <json@ietf.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Jun 2019 10:20:58 -0700 (PDT)
From: Qianqian Fang <fangqq@gmail.com>
To: json@ietf.org
References: <72cccaa7-d2d6-e7ce-57ee-a86a98626d36@gmail.com>
Message-ID: <d04c6d64-6c3a-65d9-d0be-fcb9cf451baa@gmail.com>
Date: Tue, 04 Jun 2019 13:20:57 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0
MIME-Version: 1.0
In-Reply-To: <72cccaa7-d2d6-e7ce-57ee-a86a98626d36@gmail.com>
Content-Type: multipart/alternative; boundary="------------5A4BA8492B17B58D59DFD79C"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/HHpZt-GiO8VQJy9H6Ga58EJcOfU>
Subject: Re: [Json] JData: A general-purpose data storage and interchange format - Draft 1
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Jun 2019 17:21:09 -0000

FYI, after a list of deliberated updates, I felt comfortable to tag the 
"Draft-1" of the JData specification.

https://github.com/fangq/jdata/blob/Draft_1/JData_specification.md


The major changes include

1. switch from column-major to row-major order to serialize N-D arrays 
(in both annotated and direct form)
2. define _ArrayData_ as 2-D arrays for better grouping of similar data 
types for space-efficiency
3. add support for generic byte-streams, non-string keyed maps, and 
weighted graphs
4. fix the incorrect UBJSON extended syntax for N-D array storage


I also noticed recently that the MATLAB Production Server (mps) included 
in MATLAB R2018a or later
had adopted a very similar data annotation schemes like in 
JSONLab/JData: it uses "mwsize",
"mwtype", "mwdata" instead of "_ArraySize_", "_ArrayType_" and 
"_ArrayData_".  The serialization
for complex arrays is also very similarly defined.

Eventually, all software that intends to handle complex arrays/data 
structures will have to consider
the additional serialization schemes for complex data, which I believe 
JData will be able to make this
process easier.


Qianqian


On 5/13/19 7:52 PM, Qianqian Fang wrote:
>
> Dear list,
>
> (I am new to this mailing list, apologize if this is not the right 
> place to post proposals of new JSON-based specifications - in that 
> case, I am appreciated if you can point me to the right direction).
>
> I am a researcher/professor working in a university. A big part of my 
> work, aside from teaching, involves writing computing software and 
> processing medical image data. Over the past 10 years, I gradually 
> migrated the software I wrote, most of them are open-source, some 
> funded by the NIH, to use JSON as the input/output - I really love 
> this format because it is human readable, easy to manipulate, compact, 
> with parsers widely available.
>
> In 2011, I wrote a JSON encoder/decoder MATLAB toolbox 
> <https://www.mathworks.com/matlabcentral/fileexchange/33381-jsonlab-a-toolbox-to-encode-decode-json-files>, 
> called JSONLab <https://github.com/fangq/jsonlab>, and the toolbox has 
> grown a small user community since. In 2013, I added support for 
> UBJSON <http://ubjson.org> (http://ubjson.org), a simple binary JSON 
> format, into my toolbox. Around 2015, I felt strongly that a 
> combination of text and binary JSON is well capable in handling a wide 
> variety of scientific data that I, and many of my colleagues, handle 
> on a daily basis. Compared to the more "advanced" and "feature-rich" 
> data formats such as HDF5, CDF and NetCDF, JSON/UBJSON has clear 
> advantage of being so simple, excellently readable and requiring much 
> low programming overhead to implement. Many other less complicated but 
> still somewhat "opaque" imaging data formats such as DICOM, Analyze7.5 
> and NifTi, can also benefit from a more human-readable version if one 
> can find a data mapping to JSON/UBJSON.
>
> So I started a project <https://github.com/fangq/jdata/commits/master> 
> called "JData" to use JSON constructs to map common data structures, 
> such as N-D arrays, hashes, tables, trees, graphs etc, as the 
> foundation to store/interchange scientific data in a more readable and 
> easy-to-operate fashion (many of these are already supported in 
> JSONLab). After much procrastination, I finally finished the first 
> draft of this specification, and would like your thoughts.
>
> The current draft of the specification can be found here
>
> https://github.com/fangq/jdata/blob/master/JData_specification.md
>
> the repository dedicated to the development and maintenance this 
> specification is
>
> https://github.com/fangq/jdata
>
> The overall idea is to define complex data structures using a set of 
> dedicated "name" tags in JSON/UBJSON without changing the syntax of 
> the format. This makes the generated file JSON/UBJSON compatible and 
> can be readily parsed by most existing parsers.
>
> Currently, this specification supports the following major features:
>
>  1. N-D arrays with and without data compression
>  2. Trees, tables, hashes, graphs, linked lists
>  3. Inline metadata and metadata node append-able to all elements
>  4. Data grouping tags similar to HDF5
>  5. Indexing and query interface
>  6. Referencing and link support
>  7. dual interface text <-> binary
>
> The keyword names were choose to minimize conflict with other JSON 
> features that are under development (such as JSON-LD, JSON schema).
>
> I am sure there are typos and minor issues that I overlooked as an 
> early draft. What I would like to hear from this community are
>
>  1. well, what do you think? is this a project that you would consider
>     useful (in general and for the research community)?
>  2. any major loopholes in the design of the specification? I am new
>     to writing a specification from scratch, I don't want to miss
>     anything important from the start
>  3. if there is a value to continue developing this specification/file
>     format, what is the typical path way for such development? what
>     are the appropriate community/group to discuss ideas and get
>     suggestions?
>
> again, I have no experience writing an RFC or specification from 
> scratch, so, please be gentle, and I appreciate your guidance and 
> pointers.
>
> Qianqian
>