Re: [Json] Advice on registering JSON Lines (not JSON) as IANA Media Type

Stefan Hagen <stefan@dilettant.eu> Wed, 30 December 2020 19:35 UTC

Return-Path: <stefan@dilettant.eu>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 807F23A0C43 for <json@ietfa.amsl.com>; Wed, 30 Dec 2020 11:35:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.096
X-Spam-Level:
X-Spam-Status: No, score=-2.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=dilettant.eu
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o1dqhB07y_LQ for <json@ietfa.amsl.com>; Wed, 30 Dec 2020 11:35:52 -0800 (PST)
Received: from mailrelay3-3.pub.mailoutpod1-cph3.one.com (mailrelay3-3.pub.mailoutpod1-cph3.one.com [46.30.212.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9B9473A0C40 for <json@ietf.org>; Wed, 30 Dec 2020 11:35:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilettant.eu; s=20191106; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version:from: content-transfer-encoding:content-type:from; bh=EQ1cR2SSW/iAfnrhoGyuRvCUElln6e1j9hxe8gbBS/4=; b=QEquZeKYU0QDYcWVU/lvusmpwYftZgkMVkxdxRT+7Rp3D5WjqrEe71Ufrese2nnMtL5FTbk0/vD48 GkspXi8YbK079/KcsV4K4vGNHS9RN7kcdRlhdSePARIQxpqRTQvR+V6kQXZxQaPHCp+KARrta/bYDz GGWOWR+44WseDpywxQxDh4DyNFsqrtTpnPXgv7AQD0mpi4+Uul2PLNlWMxliTq3r6eXiMpGg6IsxV8 CvQY7H9q4Ugs9LTSsKOm9NJxcYTKZ0rV7YcHoaiOlUZEfx7iMB9eVZ6zhuXt+doPSbNmAu0pPkoHGg KltuNgqkML8G2u/WsCcV8lHAvb0bxoQ==
X-HalOne-Cookie: 69bf06cb59103bc4952075ae2400aa0bb538cf4c
X-HalOne-ID: 363a9987-4ad6-11eb-8cb9-d0431ea8bb03
Received: from [192.168.1.112] (50.249.197.178.dynamic.dsl-lte-bonding.lssmb00p-msn.res.cust.swisscom.ch [178.197.249.50]) by mailrelay3.pub.mailoutpod1-cph3.one.com (Halon) with ESMTPSA id 363a9987-4ad6-11eb-8cb9-d0431ea8bb03; Wed, 30 Dec 2020 19:35:46 +0000 (UTC)
Content-Type: multipart/alternative; boundary=Apple-Mail-94D7FDAA-56AD-4E1F-8DDD-2388B51ED4F0
Content-Transfer-Encoding: 7bit
From: Stefan Hagen <stefan@dilettant.eu>
Mime-Version: 1.0 (1.0)
Date: Wed, 30 Dec 2020 20:35:45 +0100
Message-Id: <5DE9D26C-7F3E-4448-9B2E-675FC840D507@dilettant.eu>
References: <92962f86-1e03-aaae-4b7d-bbb76c88ac6c@crockford.com>
Cc: "Hlavina, Wratko (NIH/NLM/NCBI) [E]" <whlavina@ncbi.nlm.nih.gov>, nico@cryptonector.com, json@ietf.org
In-Reply-To: <92962f86-1e03-aaae-4b7d-bbb76c88ac6c@crockford.com>
To: Douglas Crockford <douglas@crockford.com>
X-Mailer: iPad Mail (18C66)
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/AkBM6LNv5hgEJX-oBGcuxoy71KQ>
Subject: Re: [Json] Advice on registering JSON Lines (not JSON) as IANA Media Type
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Dec 2020 19:35:56 -0000

> Am 30.12.2020 um 20:03 schrieb Douglas Crockford <douglas@crockford.com>om>:
> 
> 
> Anything that is not JSON should not be called JSON. It should have a less confusing name.
> 
> 
> 
> On 2020-12-30 8:58 AM, Hlavina, Wratko (NIH/NLM/NCBI) [E] wrote:
>> Hello, Mr. Crockford and Mr. Williams,
>>  
>> I understand you are listed as the authors for the "application/json" and "application/json-seq" IANA Media Types, respectively.
>> I would like to ask for your advice/help with a related file format, JSON Lines:
>>  
>> https://jsonlines.org
>>  
>> I think there is value in having this format registered as a Standards Tree IANA Media Type.
>> Per the RFC6838 process, this requires Expert Review and IETF/IESG approval.
>> Not being a member of those organizations, how can I encourage such registration?
>>  
>> Motivation:
>>  
>> Unfortunately, JSON Lines is not valid JSON (technically) and is different from JSON Text Sequences.
>> However, JSON Lines is a frequently used file format; for example, it is used by many database products, including Cloud services like AWS Athena, Snowflake, and others.
>>  
>> Since it is not valid JSON, using "application/json" as media type leads to processing failures and mishandling.
>> Since it uses the newline as separator, without RS Unicode Information Separator Two record separators, "application/json-seq" is not a substitute Media Type, and the ecosystem of tools do not, in general, support JSON Text Sequences format.
>>  
>> In principle, good JSON programming libraries should allow streamed processing of JSON content, both in emitting it and in reading it, but in practice, libraries for JSON tend to require an entire JSON object to be held in memory.
>> Since HTTP emits one response per request, this implies only a single JSON object per response, if using "application/json" as Media Type; this is problematic for large data.
>>  
>> In my experience, JSON Lines has become a very useful and conventional file format, since it interoperates well with Unix text utilities while remaining highly interoperable with many JSON tools.
>>  
>> Cf.:
>> RFC6838
>> https://www.iana.org/assignments/media-types/application/json
>> https://www.iana.org/assignments/media-types/application/json-seq
>> https://www.iana.org/assignments/media-types/application/ld+json
>> https://stackoverflow.com/questions/51690624/json-lines-mime-type
>> https://github.com/wardi/jsonlines/issues/9
>>  
>> --
>> Wratko HLAVINA
>> Sequence Curation, Organization, Enhancements (Technical Program Manager)
>> NCBI Building 45 Floor 4 Room AS13D-121
>> Slack: whlavina / Phone: 301-402-9730 / FAX: 301-480-2484 / Calendar: https://bit.ly/2QU2EGB
>>  

Well, this is JSON texts separated by newline characters.
I think the original JSON sequences proposal started exactly like this (with newlines)
this is how I remember our e-mail discussions - and then the not too surprising practical ivory tower like discussion waves injected the long forgotten RS into the picture. 

Reading the RFC again I suggest to not reuse the json-seq media type in this case, as that specification  assumes skipping to RS tokens between JSON texts which these newline separated JSON streams will not offer.

I suggest to rather request a new media type from IANA and would not object having it start with text/json-

Please enjoy all a healthy and wonderfully non-semantic Year version 2021,
Stefan
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json