Re: [Xml-sg-cmt] WeasyPrint Update

Robert Sparks <rjsparks@nostrum.com> Wed, 29 June 2022 13:56 UTC

Return-Path: <rjsparks@nostrum.com>
X-Original-To: xml-sg-cmt@ietfa.amsl.com
Delivered-To: xml-sg-cmt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4DF2DC157B4D for <xml-sg-cmt@ietfa.amsl.com>; Wed, 29 Jun 2022 06:56:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.96
X-Spam-Level:
X-Spam-Status: No, score=-3.96 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_IMAGE_ONLY_32=0.001, HTML_MESSAGE=0.001, NICE_REPLY_A=-1.876, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nostrum.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VKxCBoepIlPo for <xml-sg-cmt@ietfa.amsl.com>; Wed, 29 Jun 2022 06:55:58 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7E5ACC14F73E for <xml-sg-cmt@ietf.org>; Wed, 29 Jun 2022 06:55:57 -0700 (PDT)
Received: from [192.168.1.114] ([47.186.48.51]) (authenticated bits=0) by nostrum.com (8.17.1/8.17.1) with ESMTPSA id 25TDtsXh000453 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for <xml-sg-cmt@ietf.org>; Wed, 29 Jun 2022 08:55:55 -0500 (CDT) (envelope-from rjsparks@nostrum.com)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nostrum.com; s=default; t=1656510955; bh=+WA+XpCsfPlvMuCjoVNeQGLVz/TjNF4UolXBRsInByo=; h=Date:Subject:From:To:References:In-Reply-To; b=lk16WH7XVzKT+B/AKDMzwNjw/eiHst+LqvUxay3b11XTqGamMQNtIbx4dZWUm9tBP zpCABDXeAfIrG8HY7UmsE9iMmug69ka+hG/y6sMJtJHzmAY65dB3tDiXvvrbpLwGX2 srSb6ZQQpAgsVD3qEBWepiEtAoLEZOn36ZIbPzD0=
X-Authentication-Warning: raven.nostrum.com: Host [47.186.48.51] claimed to be [192.168.1.114]
Content-Type: multipart/alternative; boundary="------------UlDXQZir6j27fDSe9TEgN1fq"
Message-ID: <f854ce40-e370-cec2-667a-ea32b4f4f63c@nostrum.com>
Date: Wed, 29 Jun 2022 08:55:49 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.10.0
Content-Language: en-US
From: Robert Sparks <rjsparks@nostrum.com>
To: xml-sg-cmt@ietf.org
References: <299a8995-589b-8b9d-8526-21f919afb122@staff.ietf.org> <546a3330-f75e-6733-ab64-e8853ca3dd49@nostrum.com>
In-Reply-To: <546a3330-f75e-6733-ab64-e8853ca3dd49@nostrum.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml-sg-cmt/eWcYEOJ72VCdMuWpKQH2u_RVtmk>
Subject: Re: [Xml-sg-cmt] WeasyPrint Update
X-BeenThere: xml-sg-cmt@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Working list for the xml and style guide change management team <xml-sg-cmt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml-sg-cmt>, <mailto:xml-sg-cmt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml-sg-cmt/>
List-Post: <mailto:xml-sg-cmt@ietf.org>
List-Help: <mailto:xml-sg-cmt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml-sg-cmt>, <mailto:xml-sg-cmt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2022 13:56:02 -0000

(fwiw, I started with the visual differencer at diff-checker.com, which 
provides an overlay and a slider between original/changed. Here's an 
example where it gets really different (because of table/figure 
rendering I think) at the bottom of page 16 of RFC8779, at least with 
what we have so far:

slider view:
On 6/29/22 8:48 AM, Robert Sparks wrote:
> I'm not sure we can do a real comparison without running these through 
> the pdfaPilot step, which essentially rewrites the pdf.
>
> Alice - is it easy to script running all the things at [3] through 
> pdfaPilot? If not, could you run 8779 through (as that's the 
> semi-random one I chose to look at first).
>
> I'm already seeing the differences in figure/table layout that can 
> affect where pagebreaks lie.
>
> Most of the other differences I see are in indentation, spacing 
> between paragraphs, etc - makes me wonder if the css is being honored 
> as intended. These add up over pages to change the overall length of 
> the document in pages (though the pagebreak algorithm change makes 
> that unavoidable). Again, I'm curious to see if these go away when run 
> through pdfaPilot.
>
>
> On 6/28/22 8:28 PM, Kesara Rathnayake wrote:
>> Hi all,
>>
>> I have draft PR [1] for the WeasyPrint update.
>> This updates WeasyPrint from 52.5 to 55.0.
>> Since WeasyPrint 53.0, they have moved the PDF generation from cairo 
>> to pypdf [2].
>> I have generated PDFs from RFC 8650 to RFC 9260 [3].
>>
>> There are some differences from my random checks.
>>
>> Let me know your thoughts.
>>
>> Note that these PDFs haven't gone through the pdfaPilot step to 
>> convert to PDF/A-3 with the XML source file embedded.
>>
>> [1] https://github.com/ietf-tools/xml2rfc/pull/802
>> [2] https://github.com/CourtBouillon/pydyf
>> [3] https://devbox.amsl.com/weasyprint55/
>>
>> Cheers,
>> Kesara