[Json] JData: A general-purpose data storage and interchange format

Qianqian Fang <fangqq@gmail.com> Mon, 13 May 2019 23:52 UTC

Return-Path: <fangqq@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AB52A1200C5 for <json@ietfa.amsl.com>; Mon, 13 May 2019 16:52:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dKferXKcisBQ for <json@ietfa.amsl.com>; Mon, 13 May 2019 16:52:19 -0700 (PDT)
Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CD271120026 for <json@ietf.org>; Mon, 13 May 2019 16:52:18 -0700 (PDT)
Received: by mail-qk1-x736.google.com with SMTP id k189so9234777qkc.0 for <json@ietf.org>; Mon, 13 May 2019 16:52:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:subject:message-id:date:user-agent:mime-version :content-language; bh=tGlHbrTXbWSUv9qeLmdIHBczicaByejVXIRJNBg45Q4=; b=scLnzDsYarWB/uTpskVJFvNxQi8LK17XYAdeUcz5+XsjzWmBbxut6AHbaeH5MA7rSk /B0jBc0ppA9Dl0DEH+rfOYMAmNQk8FtrpJ5L2myysyVVpnY72UGklUy/a5bdhwzt7hyV CVIwmQj/dmaIykDT44IRn22IDNtj6JuTxjjPpYEOVQINw8O0c9w2p0s0hOsDunNfQ9rl hM9nnCQKFjpUTS3Sarz18IjdkeKmgdW/qW1hOMv+GEfWKwbX2IMiWBXmNLxP7k6qYhAV oWFQyX+bNPJYzpuUBN7HToytygwvR+DVwrB8HJL4ru9h/hrxGPeek5MbWXQJ4XQ52PHU IEtQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-language; bh=tGlHbrTXbWSUv9qeLmdIHBczicaByejVXIRJNBg45Q4=; b=Id9jffC6on5xLu3ZyP9KsG7LHzMD6DLTcRmY57xDhnTD6VKOU+hRmpTUkVZn0+LwZ4 lHwI1Tc/xmLc7H0/6ZrqgzFfs47wHA3x8/PsD2JJj7ojt0wV7CtX0ntfQF82jnvtJOT3 wE6BvB7wk9ccRYNnB836ZEgzlBxARjHb9GaOxIJSr/Z1UP8rO5ET5pwnub6eg4JxEQgS jlqD89bCGtZn1Wnlvez0EfE0a1dZbOld6I8N/sXyJ5duq5s2HPxG6cdNrTOXIKo1srXb i+g8p4i6pkUvyh509BMwVYUjmoirQrxF3IWYK4Z+pPLa/nbqaA3L0zI8prHH2x6fr17Y KVXQ==
X-Gm-Message-State: APjAAAUOsHOzxamGe7U+redDEklHws3faO4xmSo3yOabtVBek2cPNFsk DO6Aje1gQiAAS5gZqnqBpCocUnA9v2I=
X-Google-Smtp-Source: APXvYqwmCi+gx0nh3N84dpLwJzYOZdKlMaPvE03pP/ZygxOKZZ9vYNgGGiSiziwCvq4e/0tszzr6Wg==
X-Received: by 2002:a37:5945:: with SMTP id n66mr23936631qkb.295.1557791537372; Mon, 13 May 2019 16:52:17 -0700 (PDT)
Received: from [129.10.224.37] ([129.10.224.37]) by smtp.gmail.com with ESMTPSA id f129sm7659142qkj.47.2019.05.13.16.52.16 for <json@ietf.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 May 2019 16:52:16 -0700 (PDT)
To: json@ietf.org
From: Qianqian Fang <fangqq@gmail.com>
Message-ID: <72cccaa7-d2d6-e7ce-57ee-a86a98626d36@gmail.com>
Date: Mon, 13 May 2019 19:52:15 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="------------F45B8BDF3CDC299842F1ADB0"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/z7cUcdABZSS_sgvhVExWUW5y1Yc>
Subject: [Json] JData: A general-purpose data storage and interchange format
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 May 2019 15:15:04 -0000

Dear list,

(I am new to this mailing list, apologize if this is not the right place 
to post proposals of new JSON-based specifications - in that case, I am 
appreciated if you can point me to the right direction).

I am a researcher/professor working in a university. A big part of my 
work, aside from teaching, involves writing computing software and 
processing medical image data. Over the past 10 years, I gradually 
migrated the software I wrote, most of them are open-source, some funded 
by the NIH, to use JSON as the input/output - I really love this format 
because it is human readable, easy to manipulate, compact, with parsers 
widely available.

In 2011, I wrote a JSON encoder/decoder MATLAB toolbox 
<https://www.mathworks.com/matlabcentral/fileexchange/33381-jsonlab-a-toolbox-to-encode-decode-json-files>, 
called JSONLab <https://github.com/fangq/jsonlab>, and the toolbox has 
grown a small user community since. In 2013, I added support for UBJSON 
<http://ubjson.org> (http://ubjson.org), a simple binary JSON format, 
into my toolbox. Around 2015, I felt strongly that a combination of text 
and binary JSON is well capable in handling a wide variety of scientific 
data that I, and many of my colleagues, handle on a daily basis. 
Compared to the more "advanced" and "feature-rich" data formats such as 
HDF5, CDF and NetCDF, JSON/UBJSON has clear advantage of being so 
simple, excellently readable and requiring much low programming overhead 
to implement. Many other less complicated but still somewhat "opaque" 
imaging data formats such as DICOM, Analyze7.5 and NifTi, can also 
benefit from a more human-readable version if one can find a data 
mapping to JSON/UBJSON.

So I started a project <https://github.com/fangq/jdata/commits/master> 
called "JData" to use JSON constructs to map common data structures, 
such as N-D arrays, hashes, tables, trees, graphs etc, as the foundation 
to store/interchange scientific data in a more readable and 
easy-to-operate fashion (many of these are already supported in 
JSONLab). After much procrastination, I finally finished the first draft 
of this specification, and would like your thoughts.

The current draft of the specification can be found here

https://github.com/fangq/jdata/blob/master/JData_specification.md

the repository dedicated to the development and maintenance this 
specification is

https://github.com/fangq/jdata

The overall idea is to define complex data structures using a set of 
dedicated "name" tags in JSON/UBJSON without changing the syntax of the 
format. This makes the generated file JSON/UBJSON compatible and can be 
readily parsed by most existing parsers.

Currently, this specification supports the following major features:

 1. N-D arrays with and without data compression
 2. Trees, tables, hashes, graphs, linked lists
 3. Inline metadata and metadata node append-able to all elements
 4. Data grouping tags similar to HDF5
 5. Indexing and query interface
 6. Referencing and link support
 7. dual interface text <-> binary

The keyword names were choose to minimize conflict with other JSON 
features that are under development (such as JSON-LD, JSON schema).

I am sure there are typos and minor issues that I overlooked as an early 
draft. What I would like to hear from this community are

 1. well, what do you think? is this a project that you would consider
    useful (in general and for the research community)?
 2. any major loopholes in the design of the specification? I am new to
    writing a specification from scratch, I don't want to miss anything
    important from the start
 3. if there is a value to continue developing this specification/file
    format, what is the typical path way for such development? what are
    the appropriate community/group to discuss ideas and get suggestions?

again, I have no experience writing an RFC or specification from 
scratch, so, please be gentle, and I appreciate your guidance and pointers.

Qianqian