[Tools-discuss] Bibxml7: anchor normalization, xml2rfc: caching with anchors

Carsten Bormann <cabo@tzi.org> Wed, 15 May 2019 05:28 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 70C7012027F for <tools-discuss@ietfa.amsl.com>; Tue, 14 May 2019 22:28:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 426w7V-Pk14g for <tools-discuss@ietfa.amsl.com>; Tue, 14 May 2019 22:28:32 -0700 (PDT)
Received: from smtp.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C0BDB12027E for <tools-discuss@ietf.org>; Tue, 14 May 2019 22:28:31 -0700 (PDT)
Received: from [192.168.217.106] (p54A6CC75.dip0.t-ipconnect.de [84.166.204.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.uni-bremen.de (Postfix) with ESMTPSA id 453jl13CjYzyjq; Wed, 15 May 2019 07:28:29 +0200 (CEST)
From: Carsten Bormann <cabo@tzi.org>
Content-Type: text/plain; charset="utf-8"
X-Mao-Original-Outgoing-Id: 579590906.566433-9af16d693e8cef482b64babd0c626227
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Date: Wed, 15 May 2019 07:28:28 +0200
Message-Id: <2FA9E921-3C89-4400-B47B-5889AA18A140@tzi.org>
To: tools-discuss <tools-discuss@ietf.org>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/l2MRxNISoYjw8Q-_GydhZTQOj10>
Subject: [Tools-discuss] Bibxml7: anchor normalization, xml2rfc: caching with anchors
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 May 2019 05:28:34 -0000

I just tried to work around a problem in kramdown-rfc’s “stand_alone: false” mode (this is where kramdown-rfc doesn’t do the reference handling by itself but instead generates entity declarations like it were 1986).

I noticed that

https://xml2rfc.tools.ietf.org/public/rfc/bibxml7/reference.DOI.10.1145_1282427.1282421.xml?anchor=foo

generates

<reference anchor="FOO">

Hmm, ID/IDREF are case-sensitive in XML, so this doesn’t quite work for an <xref target=“foo”/>.
➔ bug1

Independent of what anchor I specify, xml2rfc creates a cache file called

reference.DOI.10.1145_1282427.1282421.xml

So if I have another document that does 

https://xml2rfc.tools.ietf.org/public/rfc/bibxml7/reference.DOI.10.1145_1282427.1282421.xml?anchor=bar

Xml2rfc still uses the cache entry with 

<reference anchor=“FOO">

➔ bug2

Bug 1 actually has another interesting facet:

https://xml2rfc.tools.ietf.org/public/rfc/bibxml7/reference.DOI.10.1145_1282427.1282421.xml?anchor=DOI.10.1145_637201.637236

creates

<reference anchor=‘DOI101145_637201637236' >

Whoa, what happened to my dots?

➔ bug1b

So can we make anchor normalization on the bibxml server less aggressive?
No upcasing (bug1), no removal of valid ID/IDREF characters (bug1b).

Also, we’d need to fix the caching in xml2rfc (bug2).
(Kramdown-rfc’s [1.2.12, unreleased] cache file names for URLs with anchors currently look like this:
reference.DOI.10.1145_1282427.1282421--anchor=foo.xml
)

Grüße, Carsten