Re: [Rfc-markdown] How to cite this DOI?

Carsten Bormann <cabo@tzi.org> Sat, 16 October 2021 00:46 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: rfc-markdown@ietfa.amsl.com
Delivered-To: rfc-markdown@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 80B873A1347 for <rfc-markdown@ietfa.amsl.com>; Fri, 15 Oct 2021 17:46:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XqY2a-ARX8UQ for <rfc-markdown@ietfa.amsl.com>; Fri, 15 Oct 2021 17:46:30 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [134.102.50.15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9158D3A1345 for <rfc-markdown@ietf.org>; Fri, 15 Oct 2021 17:46:29 -0700 (PDT)
Received: from smtpclient.apple (p5089a8ac.dip0.t-ipconnect.de [80.137.168.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4HWPb64gghz2xLJ; Sat, 16 Oct 2021 02:46:26 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <20211015210234.A1BA52A66EE4@ary.qy>
Date: Sat, 16 Oct 2021 02:46:26 +0200
Cc: rfc-markdown@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <733C184A-8BFD-4EDE-92E6-5B5E4843B811@tzi.org>
References: <20211015210234.A1BA52A66EE4@ary.qy>
To: "John R. Levine" <johnl@taugh.com>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-markdown/V30a3gOuGB9QRAca0aNQ41BWBKU>
Subject: Re: [Rfc-markdown] How to cite this DOI?
X-BeenThere: rfc-markdown@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "rfc-markdown is a discussion list for people writing I-Ds and RFCs in Markdown and the authors of the tools used for that." <rfc-markdown.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-markdown/>
List-Post: <mailto:rfc-markdown@ietf.org>
List-Help: <mailto:rfc-markdown-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Oct 2021 00:46:37 -0000

On 15. Oct 2021, at 23:02, John Levine <johnl@taugh.com> wrote:
> 
> It appears that Carsten Bormann  <cabo@tzi.org> said:
>> On 15. Oct 2021, at 12:36, Lars Eggert <lars@eggert.org> wrote:
>>> 
>>> I have no idea. The whole DOI thing seems pretty sloppily put together; unfortunately it's what the publishing industry adopted though.
>> 
>> That was easy:
>> 
>> https://www.doi.org/doi_handbook/2_Numbering.html#2.2.1
>> 
>>>> The DOI name is case-insensitive and can incorporate any printable characters from the legal graphic characters of Unicode. Further constraints on
>> character use (e.g. use of language-specific alphanumeric characters) can be defined for an application by the ISO 26324 Registration Authority.
>> 
>> You can’t make this stuff up.
> 
> How soon they forget.  DOIs are an instance of the handle system, whose syntax is defined in RFC 3651.  It does indeed allow any UTF-8 encoded
> Unicode 2.0 character, but since the handle system also had its own retrieval protocol, quoting wasn't a problem.

Does “printable character” include ZWNJ?  If not, how do I get a correct representation of the Auf‌lage (*) of a digital object?
The term “case-insensitive” doesn’t even mean anything without a locale.
(Well, you know all this, but I want to remind everyone why this choice just boggles the mind.)

> Except that nobody ever used the handle protocol so the only popular application of handles, DOI, made an expedient
> choice to use http instead, turning the handles into URLs.  Unfortunately, the DOI people are librarians, not
> computer standards experts and may have missed the fine points of what goes into a URL.

The URL representation is not the problem here.

> Having said that, what's the problem with that URL?  RFC 1738 specifically says that unencoded parentheses in a URL are allowed.
> I tried pasting it into Brave (based on Chrome), Safari, and Firefox, and it worked fine.  If you have some tool that barfs
> on them, you can always percent encode them.

Parentheses are just fine in a URL (e.g., I use them all the time on cbor.me [1]).
Everything is wrong about using anything but [-A-Za-z0-9.]+ in the components of a DOI [2]; Elsevier would not have had to do that.

                    .oOo.

So the payload of this message is that I have added the parentheses for the YAML prelude in 1.5.10, but we have to be a bit careful about what is allowed directly in {{}} (right now, once you go outside sequences of XML namechars, weird things will happen).  Because the RFC editor surely wouldn't accept anything else as a reference anchor, I don’t think this is a crippling limitation.

If you ever need more bugs 𐲃𐲡 in your DOI, just holler.

Grüße, Carsten

(*) literally print run, but often used in place of edition, look closely at how this does not have an fl ligature even in a font that has them.
[1]: http://cbor.me/?bytes=82(01-82(02-03))
[2]: https://en.wikipedia.org/wiki/Principle_of_least_astonishment