Re: [Tools-discuss] iddiff vs rfcdiff

Bob Briscoe <ietf@bobbriscoe.net> Fri, 09 December 2022 23:01 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 017CFC14CF0E for <tools-discuss@ietfa.amsl.com>; Fri, 9 Dec 2022 15:01:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.096
X-Spam-Level:
X-Spam-Status: No, score=-2.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KS7i_fJy-Ac0 for <tools-discuss@ietfa.amsl.com>; Fri, 9 Dec 2022 15:01:22 -0800 (PST)
Received: from mail-ssdrsserver2.hostinginterface.eu (mail-ssdrsserver2.hostinginterface.eu [185.185.85.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C09FDC1522C2 for <tools-discuss@ietf.org>; Fri, 9 Dec 2022 14:59:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=In-Reply-To:From:Cc:References:To:Subject: MIME-Version:Date:Message-ID:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=pUIqa0sZjo6NiLNFIVf4n3MJ+eATCXtR/x+gDZgVS48=; b=zY5Bc/PptPrN530ZCPUIS5fPen smYSc/ScFJX3ZX4tgRk+3DaUZc2Ss9LjQPZC2Iiuz73hQKowCuZMyzrfk+DyXCaocw7tb3pZ8Ch85 vbwIyZOnZAfM0OYgXzS1VCca7/iH5bUUt8B/QxbujPpj1OWI5enoEiu6nomfLkCnAI6SA5PrpbsWQ LVTfF25qcBX/Dlrh6CdsTug/5jqUB1lpMWUDPumwFIhURSPZ/9VsRl674Y22wmOhmglYtKvlv+jbX g+Xl/Huf6jsFaQDpYvZ5YM8/eXZWQknp+TNruSbandW5DF9El654eBuEyU+ul37IXSTbZcdtZ9Q6l mefUxvMw==;
Received: from 67.153.238.178.in-addr.arpa ([178.238.153.67]:39678 helo=[192.168.1.11]) by ssdrsserver2.hostinginterface.eu with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.95) (envelope-from <ietf@bobbriscoe.net>) id 1p3mLU-00CBmy-7a; Fri, 09 Dec 2022 22:59:48 +0000
Content-Type: multipart/alternative; boundary="------------UyPUpqAn50CQk0HZhIkesBRN"
Message-ID: <0bd77d8f-e8ce-5682-da31-8d11609e0ca2@bobbriscoe.net>
Date: Fri, 09 Dec 2022 22:59:48 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2
Content-Language: en-GB
To: Robert Sparks <rjsparks@nostrum.com>
References: <04F0583F-98AC-4623-AB21-FF11B3365B74@juniper.net> <f17e74d4-8b9a-c9f2-870f-d1e32f96d44d@bobbriscoe.net> <fdf1997d-64bd-4063-e217-28e48c881b5b@nostrum.com>
Cc: John Scudder <jgs@juniper.net>, Tools Team Discussion <tools-discuss@ietf.org>
From: Bob Briscoe <ietf@bobbriscoe.net>
In-Reply-To: <fdf1997d-64bd-4063-e217-28e48c881b5b@nostrum.com>
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ssdrsserver2.hostinginterface.eu
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: ssdrsserver2.hostinginterface.eu: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: ssdrsserver2.hostinginterface.eu: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/Ti0TecUKdBC42YpUNz5LAsN31Vo>
Subject: Re: [Tools-discuss] iddiff vs rfcdiff
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Dec 2022 23:01:28 -0000

Robert,

Thanks for the explanations. Pls see [BB] inline

On 09/12/2022 15:52, Robert Sparks wrote:
>
>
> On 12/8/22 5:25 PM, Bob Briscoe wrote:
>> All,
>>
>> John's critique of iddiff as 'significantly noisier' than rfcdiff 
>> seems like understatement to me.
>>
>> rfcdiff was a shining example of the best diff tool I've ever 
>> encountered.
>> In contrast, iddiff is a considerable regression (see end for 
>> details). So I shall not be using iddiff in its current form.
>>
>> I've installed rfcdiff, which I shall now use locally.
>> On Ubuntu it can be installed with
>>     sudo apt install rfcdiff
>> Or the sources are here: https://sources.debian.org/src/rfcdiff/1.45-1.1/
>>
>> Questions for whoever was involved in this change:
>> Q1. What was the logic behind the apparent complete removal of 
>> rfcdiff from the ietf's servers?
> The webservice that we have been running was based on a 
> now-unsupported python2.7 framework (pyht) that has been becoming 
> increasingly difficult to manage (and it has security issues). We 
> began work many months ago, discussing it on this list (but see below).
>> Given the iddiff page supports a number of fairly archaic diff formats
> The discussion on this list, and the logs showing how the existing 
> tool was used indicated we needed to continue to support all but one 
> of the formats (abdiff).
>> , why not also support the side-by-side rfcdiff that I believe nearly 
>> everyone used (until it was recently removed)?
>
> That was not removed. Can you detail the trouble you're having using 
> it please?
>
> See, for 
> example:https://author-tools.ietf.org/iddiff?url1=draft-ietf-elegy-rfc8989bis-00&url2=draft-ietf-elegy-rfc8989bis-01&difftype=--html
>
> The datatracker links to the side-by-side format by default.
>

[BB] Sry, I made that sound like a /format/ was missing. I meant 
side-by-side format from rfcdiff is missing (as opposed to from iddiff).

What I'm really saying is that, while iddiff beds in, rfcdiff needs to 
be accessible as a choice for those of us that are trying to do work 
with these tools.
I understand that there's a catch-22:
1) no-one will shift to a new tool unless it replaces the old one;
2) but you don't necessarily find bugs until there's a large user base.
However, surely the new tool can be made the default, while still 
providing access to the old tool for a transition period.

In this case though, I can't see how anyone could have been using iddiff 
without having noticed this problem. Indeed, I've now discovered that,  
on 26 Jul 2022, Carsten told this list he had noticed that the output of 
iddiff was significantly larger than rfcdiff:
https://mailarchive.ietf.org/arch/msg/tools-discuss/7VWgfUbfJe-UEGO38az6JpjEVQw/

>> Q2. Was there discussion on this choice somewhere, please? Other than 
>> John's posting below, I could not find discussion on the archive of 
>> this list, nor in other fora after considerable web searching effort.
>
> It is here, on this list. I admit that finding a klaxon level 
> announcement of the change or a single thread is not easy, but there 
> are threads discussing the functionality, and we had quite a bit of 
> input. It was also noted in the discussion of the transition from 
> tools.ietf.org. I've been in discussions with this with chairs and the 
> IESG and the tools team (in the open meetings, like the one next 
> Tuesday) for so long it felt like visibility was higher than it 
> obviously was, and I'll continue to push to make sure fewer people are 
> surprised by such things going forward.
>

[BB] Don't beat yourself up about not announcing it strongly enough - 
that wasn't the problem - I don't actually follow the tools list anyway, 
other than when I post something. I just use the search facility over 
the archive. I couldn't find the reasoning for the changeover that you 
have just given.

>>
>> The main problem with iddiff:
>> Once there has been a change in the length of a line of text, at 
>> best, iddiff (incorrectly) highlights all the alterations in 
>> word-wrap in the para from there onward. At worst, it just highlights 
>> everything in the rest of the para even if there are no further 
>> changes. In contrast, rfcdiff (correctly) ignores differences in wrap 
>> (except where a hyphen has been added or removed).
> Thanks - please continue to point to such things. If you're willing, 
> add them as issues at https://github.com/ietf-tools/iddiff, but 
> sending them to this list will work just as well.

Just done so.
See: https://github.com/ietf-tools/iddiff/issues/44
(I wanted to check first on this list whether I was misunderstanding 
something)

Q3. Unless iddiff is going to be fixed fast, I suggest the author tools 
are reverted to rfcdiff until iddiff is fixed. iddiff is not really 
usable until its output is closer to rfcdifff.

Regards


Bob

PS. There ought to be some functional testing here, as well as build 
testing:
https://github.com/ietf-tools/iddiff#tests
   e.g. to check for regressions by testing the output against that of 
rfcdiff for various pairs of I-Ds.

>>
>>
>> Bob
>>
>> PS. I did find one case where iddiff improved on rfcdiff. Where a URL 
>> had been extended, iddiff just highlighted the extended part, while 
>> rfcdiff highlighted each whole URL.
>>
>>
>> On 28/09/2022 18:00, John Scudder wrote:
>>> Hi All,
>>>
>>> I don’t know that this rises to the level of a reportable bug, but I wanted to mention that iddiff is significantly noisier than rfcdiff for some comparisons. The one I’m looking at now is between RFC 8829 and draft-uberti-rtcweb-rfc8829bis-03. Rfcdiff produces a 10 page diff (I’m using the rough metric of producing the diff, then telling my browser to print it), whereas iddiff produces a 37 page diff.
>>>
>>> Some specifics about this particular case —
>>> - I think iddiff is reporting each page break? (Shows context lines from above/below the page break, but no changes.)
>>> - rfcdiff is also cleaner about rewrapped paragraphs, for instance (see the paragraph that begins “balanced:” for example, or Section 7.2).
>>> - iddiff produces a giant changed-text block for the references section, I can’t discern what it thinks is different.
>>>
>>> Needless (?) to say it seems worth striving for the quieter output rfcdiff produces.
>>>
>>> $0.02,
>>>
>>> —John
>>
>> -- 
>> ________________________________________________________________
>> Bob Briscoehttp://bobbriscoe.net/
>>
>> ___________________________________________________________
>> Tools-discuss mailing list -Tools-discuss@ietf.org  -https://www.ietf.org/mailman/listinfo/tools-discuss

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/