Re: [Xml-sg-cmt] WeasyPrint Update

Alice Russo <arusso@amsl.com> Thu, 14 July 2022 22:08 UTC

Return-Path: <arusso@amsl.com>
X-Original-To: xml-sg-cmt@ietfa.amsl.com
Delivered-To: xml-sg-cmt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2BE7CC16ECFC for <xml-sg-cmt@ietfa.amsl.com>; Thu, 14 Jul 2022 15:08:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.906
X-Spam-Level:
X-Spam-Status: No, score=-6.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8VI7v7ZQ9vIm for <xml-sg-cmt@ietfa.amsl.com>; Thu, 14 Jul 2022 15:08:31 -0700 (PDT)
Received: from c8a.amsl.com (c8a.amsl.com [4.31.198.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F3B90C16ECFB for <xml-sg-cmt@ietf.org>; Thu, 14 Jul 2022 15:08:30 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTP id B7147424B440 for <xml-sg-cmt@ietf.org>; Thu, 14 Jul 2022 15:08:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from c8a.amsl.com ([127.0.0.1]) by localhost (c8a.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VEL1lD_vCVhP for <xml-sg-cmt@ietf.org>; Thu, 14 Jul 2022 15:08:30 -0700 (PDT)
Received: from [192.168.4.33] (c-24-17-19-210.hsd1.wa.comcast.net [24.17.19.210]) by c8a.amsl.com (Postfix) with ESMTPSA id 9D735424B432 for <xml-sg-cmt@ietf.org>; Thu, 14 Jul 2022 15:08:30 -0700 (PDT)
From: Alice Russo <arusso@amsl.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Date: Thu, 14 Jul 2022 15:08:32 -0700
References: <299a8995-589b-8b9d-8526-21f919afb122@staff.ietf.org> <546a3330-f75e-6733-ab64-e8853ca3dd49@nostrum.com> <64B87EC4-12DF-466F-960F-1A91A6C615B7@amsl.com>
To: "xml-sg-cmt@ietf.org" <xml-sg-cmt@ietf.org>
In-Reply-To: <64B87EC4-12DF-466F-960F-1A91A6C615B7@amsl.com>
Message-Id: <42053699-2A20-4362-8F87-9669882E6EF5@amsl.com>
X-Mailer: Apple Mail (2.3273)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml-sg-cmt/erlnHrItxBZa9uXabTpednz5GGU>
Subject: Re: [Xml-sg-cmt] WeasyPrint Update
X-BeenThere: xml-sg-cmt@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Working list for the xml and style guide change management team <xml-sg-cmt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml-sg-cmt>, <mailto:xml-sg-cmt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml-sg-cmt/>
List-Post: <mailto:xml-sg-cmt@ietf.org>
List-Help: <mailto:xml-sg-cmt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml-sg-cmt>, <mailto:xml-sg-cmt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Jul 2022 22:08:35 -0000

Hi CMT,
I wrote:
> That said, to Kesara's point about looking at PDFs of more recent RFCs (produced by WeasyPrint 52.5), I'll do some comparing of a few recent ones vs. files in [3] and report back.

-- Summary
Nothing egregious; seems fine to proceed w/ WeasyPrint 55.

-- Details
In the WeasyPrint 55 output
- More lines fit per page.
- Less words fit per line. (e.g., in D, see p. 8.)
- Seems to ignore the align="center" for artwork (e.g., in G, see Figures 17, 18, 19, and more.) More on this below.*
- No significant increase in page breaks within tables (looking at F & G).

-- Files reviewed
Links to draftable are included below in case you want to see for yourself. Each file on the right side is from https://devbox.amsl.com/weasyprint55/ and has "wp55" added to the filename. (Yes, draftable shows changes that seem extraneous, e.g., commas.)

A) RFC 9243 - https://draftable.com/compare/VqtpDHhKWYUF
B) RFC 9245 - https://draftable.com/compare/xxVjWbERSmSl
C) RFC 9247 - https://draftable.com/compare/dNPCXOWEuefM
D) RFC 9251 - https://draftable.com/compare/QenFfvZPKdXk
E) RFC 9259 - https://draftable.com/compare/NuZWMeuFZosS

Also looked at 2 recent RFCs that contain many tables to see how page breaks within tables were affected.
F) RFC 9162 (19 tables) - https://draftable.com/compare/NkRRapcgoPrh
G) RFC 9174 (20 tables) - https://draftable.com/compare/cogvEOOBLLEN

Focusing on G. In rfc9174wp55.pdf:
- Table 1 (p. 22) is a small table that ideally wouldn't contain a page break. The oddity is that an extra line has been added within the table.)
- Table 17 contains a page break.
- On the flipside, Table 5 doesn't contain a break, which is an improvement.

* Seems to ignore the align="center" for artwork. 
Will add issue to github. Seem a current issue affecting some files. Apparently not introduced by WP55 bc it's in output from 52.5. For example, with artwork align="center":
In D (Section 9.1), artwork left-aligned in PDF and HTML. (i.e., align="center" only works in the text output. Guessing that it's supposed to work for all 3 outputs - https://authors.ietf.org/en/rfcxml-vocabulary#align)

--- Effect of pdfaPilot

Re: the topic that was raised earlier of whether pdfaPilot is changing the appearance of the PDF. What I see here confirms this statement:
>> It's not likely to change the appearance

FYI, the number of changes found by draftable is approx. the same whether comparing the (1) published PDF vs. wp55 PDF  or (2) the pre-pdfaPilot PDF vs. wp55 PDF.

Examples of 1 vs. 2:
RFC 9243: 242 changes vs. 231 changes
RFC 9259: 152 changes vs. 155 changes

Also, FWIW, comparing the pre-pdfaPilot (a.k.a. "before") PDF vs. the published PDF.
RFC 9243: 9 changes (https://draftable.com/compare/eISLHOHsEmOi)
RFC 9245: 0 changes (https://draftable.com/compare/ZcjIjfRfIzih)
RFC 9247: 0 changes (https://draftable.com/compare/ydAiGQDmWJRj)
RFC 9251: 5 changes (https://draftable.com/compare/OGivyxUtBSzn)
RFC 9259: 4 changes (https://draftable.com/compare/lDqDrTwGmGap)

Thanks,
Alice

On 6/28/22 8:28 PM, Kesara Rathnayake wrote:
> Hi all,
> 
> I have draft PR [1] for the WeasyPrint update.
> This updates WeasyPrint from 52.5 to 55.0.
> Since WeasyPrint 53.0, they have moved the PDF generation from cairo to pypdf [2].
> I have generated PDFs from RFC 8650 to RFC 9260 [3].
> 
> There are some differences from my random checks.
> 
> Let me know your thoughts.
> 
> Note that these PDFs haven't gone through the pdfaPilot step to convert to PDF/A-3 with the XML source file embedded.
> 
> [1] https://github.com/ietf-tools/xml2rfc/pull/802
> [2] https://github.com/CourtBouillon/pydyf
> [3] https://devbox.amsl.com/weasyprint55/
> 
> Cheers,
> Kesara
> 
> -- 
> Xml-sg-cmt mailing list
> Xml-sg-cmt@ietf.org
> https://www.ietf.org/mailman/listinfo/xml-sg-cmt