Re: [Tools-discuss] RFC PDF crashes viewer

Erik Auerswald <auerswal@unix-ag.uni-kl.de> Wed, 01 July 2020 08:20 UTC

Return-Path: <auerswal@unix-ag.uni-kl.de>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9D7E33A0AA6 for <tools-discuss@ietfa.amsl.com>; Wed, 1 Jul 2020 01:20:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q_-Mgw8ORhOK for <tools-discuss@ietfa.amsl.com>; Wed, 1 Jul 2020 01:20:51 -0700 (PDT)
Received: from mailgw1.uni-kl.de (mailgw1.uni-kl.de [IPv6:2001:638:208:120::220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1C35D3A0A9A for <tools-discuss@ietf.org>; Wed, 1 Jul 2020 01:20:50 -0700 (PDT)
Received: from sushi.unix-ag.uni-kl.de (sushi.unix-ag.uni-kl.de [IPv6:2001:638:208:ef34:0:ff:fe00:65]) by mailgw1.uni-kl.de (8.14.4/8.14.4/Debian-8+deb8u2) with ESMTP id 0618KmHp142198 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 1 Jul 2020 10:20:49 +0200
Received: from sushi.unix-ag.uni-kl.de (ip6-localhost [IPv6:::1]) by sushi.unix-ag.uni-kl.de (8.14.4/8.14.4/Debian-4+deb7u1) with ESMTP id 0618Kmi4006211 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 1 Jul 2020 10:20:48 +0200
Received: (from auerswal@localhost) by sushi.unix-ag.uni-kl.de (8.14.4/8.14.4/Submit) id 0618KmeC006210; Wed, 1 Jul 2020 10:20:48 +0200
Date: Wed, 01 Jul 2020 10:20:48 +0200
From: Erik Auerswald <auerswal@unix-ag.uni-kl.de>
To: John Levine <johnl@taugh.com>
Cc: tools-discuss@ietf.org
Message-ID: <20200701082048.GB32199@unix-ag.uni-kl.de>
References: <20200626203214.GA16307@unix-ag.uni-kl.de> <20200630171731.D0E901BDB9F8@ary.qy>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20200630171731.D0E901BDB9F8@ary.qy>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/X1N_pMcTYXURNox95zVot0wXIMU>
Subject: Re: [Tools-discuss] RFC PDF crashes viewer
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Jul 2020 08:20:53 -0000

Hi John,

On Tue, Jun 30, 2020 at 01:17:31PM -0400, John Levine wrote:
> In article <20200626203214.GA16307@unix-ag.uni-kl.de> you write:
> >I agree with Carsten that this is a bug in evince, but evince works fine
> >for most PDF files out there, including many that Firefox cannot display
> >correctly.  Since MuPDF produces error and warning messages, and evince
> >crashes, it seems to me as if it were likely that the PDF versions of
> >recent RFCs do have issues.
> 
> The PDF versions of our RFCs are in the PDF/A-3U profile of PDF. PDF/A
> is intended to be bitrot-resistant, with no external references and
> such, and the various subprofiles allow various options.  We take
> the output of xml2rfc and run it through a commercial package from
> Callas software to embed the XML and make it PDF/A compliant.

Thanks for the detailed information. :-)

> In my experience, open source software that handles complex formats
> like PDF is pretty flaky around the edges, particularly when dealing
> with featuers like embdded files that aren't very common.

In my experience, free and open source software is often of higher
quality than non-free software.  YMMV.

> All of our PDFs work fine in Acrobat and the Apple and Google viewers
> so I doubt that the files are bad.
> 
> As a workaround, if you run the XML through "xml2rfc --pdf" you'll get
> a version without the embedded file which is less likely to trip over
> bugs in the processing software.

Thanks, and thanks again for your additional email showing the additional
xml2rfc options used.  I do think this is quite useful, because the XML
file is supposed to be the canonical information source, and thus IMHO
everybody should have access to the needed tools and information (i.e.,
settings to use) to work from the XML.

Other workarounds are using Okular, Firefox, MuPDF, or gv instead of
evince (both MuPDF and gv produce error/warning messages, but do display
the PDF contents correctly (I think)).

Yet another workaround is to use the free software pdftk to transform
the PDF:

    pdftk unpack_files rfc8798.pdf
    pdftk rfc8798.pdf cat output rfc8798-no_xml.pdf
    pdftk rfc8798-no_xml.pdf attach_files output rfc8798-pdftk.pdf

The result "rfc8798-pdftk.pdf" works fine in evince and does not produce
error/warning messages in MuPDF (gv still produces error messages).

Thanks,
Erik
-- 
[T]he most dangerous enemy of a better solution is an existing codebase
that is just good enough.
                        -- Eric S. Raymond