Re: [Tools-discuss] Inexplicable differences in datatracker bibxml files

Robert Sparks <rjsparks@nostrum.com> Sat, 20 February 2021 18:14 UTC

Return-Path: <rjsparks@nostrum.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3AEA23A1081 for <tools-discuss@ietfa.amsl.com>; Sat, 20 Feb 2021 10:14:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.08
X-Spam-Level:
X-Spam-Status: No, score=-2.08 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nostrum.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id F8fdjW-53GeS for <tools-discuss@ietfa.amsl.com>; Sat, 20 Feb 2021 10:14:00 -0800 (PST)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D519B3A1599 for <tools-discuss@ietf.org>; Sat, 20 Feb 2021 10:13:59 -0800 (PST)
Received: from unformal.localdomain ([47.186.1.92]) (authenticated bits=0) by nostrum.com (8.16.1/8.16.1) with ESMTPSA id 11KIDugW037370 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 20 Feb 2021 12:13:57 -0600 (CST) (envelope-from rjsparks@nostrum.com)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nostrum.com; s=default; t=1613844839; bh=drzEzLSiIMKRtaz//qSMbBO6t+Ji/OBaJrv0dvgLzCo=; h=To:References:From:Subject:Date:In-Reply-To; b=YAt6YYCB00+ERhty9VxwMD8UffFwEuLcpSbJVlUJXgrWlSsND41GLirA/1yaklN9h 07boDj31sgPKbhNlcB/5BEUW/4pmv8+s5Oxr5fZC7qxeGTH04C0s254ln3+yIsaErj P3GJxeSfLw+3DkregokIx2kX1Fso+uRyGjm+EigA=
X-Authentication-Warning: raven.nostrum.com: Host [47.186.1.92] claimed to be unformal.localdomain
To: Carsten Bormann <cabo@tzi.org>, tools-discuss <tools-discuss@ietf.org>
References: <916F2622-1AD1-4005-93E0-845A0C5F63DD@tzi.org>
From: Robert Sparks <rjsparks@nostrum.com>
Message-ID: <3b50aba7-7ad4-d5d9-62f3-fd3a5f93c9f3@nostrum.com>
Date: Sat, 20 Feb 2021 12:13:51 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <916F2622-1AD1-4005-93E0-845A0C5F63DD@tzi.org>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/rE1gIOptmhwMynR8qwd-6_TT3E4>
Subject: Re: [Tools-discuss] Inexplicable differences in datatracker bibxml files
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Feb 2021 18:14:04 -0000

Might be a good sprint project.

What's happening can be seen from line 77 here:

https://trac.tools.ietf.org/tools/ietfdb/browser/trunk/ietf/doc/urls.py#L77

(though you might also look through what happens with line 76)

lines 875-907 here:

https://trac.tools.ietf.org/tools/ietfdb/browser/trunk/ietf/doc/views_doc.py#L875

and this template:

https://trac.tools.ietf.org/tools/ietfdb/browser/trunk/ietf/templates/doc/bibxml.xml

The crux is that the view using the DocHistory record if a rev is 
present rather than the Document object.

It should either look at the most recent DocHistory (matching current 
rev) if a rev is not present, or stick with the Document object if rev 
is present. (I lean towards the latter).

The difference in rendering is a consequence of there being a link from 
the Document object to its Submission object, but not from the 
DocumentHistory object. If the template can get to a submission object, 
it renders the author records from that. It would probably be worth it 
to go try to find the Submission object matching the DocumentHistory 
object if it exists and make that available to the template.

This is related to the long-standing tension we have with older 
documents DocumentAuthor set being extracted from .txt by heuristics. 
The Submission objects have a bit of (or at least the potential for a 
bit of) manual grooming at submission time, and keeps some details like 
affiliation at time of submission that is not as easy to calculate from 
the DocumentAuthor set. When we have the Submission objects, we should 
use that data. Only when we don't should we fall back to the 
DocumentAuthor set.

Also - it would be worth the template to avoid the whitespace 
differences on the different branches that you noted when possible

RjS




On 2/20/21 10:30 AM, Carsten Bormann wrote:
> The files coming from
> "https://datatracker.ietf.org/doc/bibxml3/draft-#{namepart}/xml”
> (newest version) and
> "https://datatracker.ietf.org/doc/bibxml3/draft-#{namepart}-#{version}/xml"
> (specific version), when #{version} is the most recent one, are almost the same, except:
>
> --- reference.I-D.arkko-farrell-arch-model-t-7258-additions.xml	2021-02-20 17:10:23.000000000 +0100
> +++ reference.I-D.draft-arkko-farrell-arch-model-t-7258-additions-01.xml	2021-02-20 17:10:23.000000000 +0100
> @@ -2,10 +2,10 @@
>   <reference anchor="I-D.arkko-farrell-arch-model-t-7258-additions">
>      <front>
>         <title>RFC 7258 additions due to evolving Internet thread model</title>
> -      <author fullname="Jari Arkko">
> +      <author initials="J." surname="Arkko" fullname="Jari Arkko">
>   	 <organization>Ericsson</organization>
>         </author>
> -      <author fullname="Stephen Farrell">
> +      <author initials="S." surname="Farrell" fullname="Stephen Farrell">
>   	 <organization>Trinity College Dublin</organization>
>         </author>
>         <date month="August" day="20" year="2020" />
>
> This is a `diff -ub`, because the other inexplicable difference is that draft-…-nn has more of its pretty-XML indentation replaced by HTAB characters, while draft-… (without version number) is somewhat cleaner, but not entirely clean.
>
> (If the files were the same, I could use hardlinks and would need half as many fetches.)
>
> Grüße, Carsten
>
> ___________________________________________________________
> Tools-discuss mailing list - Tools-discuss@ietf.org
> This list is for discussion, not for action requests or bug reports.
> * Report datatracker and mailarchive bugs to: datatracker-project@ietf.org
> * Report tools.ietf.org bugs to: webmaster@tools.ietf.org
> * Report all other bugs or issues to: ietf-action@ietf.org
> List info (including how to Unsubscribe): https://www.ietf.org/mailman/listinfo/tools-discuss