[Tools-discuss] Re: PDF [Re: Tools team meeting tomorrow 9 July 2024 1800 UTC]

Robert Sparks <rjsparks@nostrum.com> Tue, 09 July 2024 13:31 UTC

Return-Path: <rjsparks@nostrum.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5FA08C1519A4 for <tools-discuss@ietfa.amsl.com>; Tue, 9 Jul 2024 06:31:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.088
X-Spam-Level:
X-Spam-Status: No, score=-2.088 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nostrum.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8hE1dgweAfgO for <tools-discuss@ietfa.amsl.com>; Tue, 9 Jul 2024 06:31:13 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2C9E2C15198C for <tools-discuss@ietf.org>; Tue, 9 Jul 2024 06:31:13 -0700 (PDT)
Received: from [192.168.1.102] ([47.186.48.51]) (authenticated bits=0) by nostrum.com (8.18.1/8.18.1) with ESMTPSA id 469DV9NN087262 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for <tools-discuss@ietf.org>; Tue, 9 Jul 2024 08:31:11 -0500 (CDT) (envelope-from rjsparks@nostrum.com)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nostrum.com; s=default; t=1720531872; bh=AwhVIZbtL0ZNuzh5oiObirCKUI0r5o6/V82zzrdRqxE=; h=Date:Subject:To:References:From:In-Reply-To; b=AuWDz7zNW1Mp4+6YS1VHlgM6lIfgmWVMxfpoO9g9Lg0g9pUcSaKbkUv6YBjaRmFli si0X5+k9rBPGsiaL5ebE/bkKis8z5CrBMndyaZPblP2L/n1yacGc4zx6YvHMJJv46z DNAfcfJVZjMIUKKMozEhorIC0aO0m+MularGfpJw=
X-Authentication-Warning: raven.nostrum.com: Host [47.186.48.51] claimed to be [192.168.1.102]
Message-ID: <6357e5da-b634-48d7-810e-5ad52ddf5750@nostrum.com>
Date: Tue, 09 Jul 2024 08:31:04 -0500
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: tools-discuss@ietf.org
References: <336856cc-9986-43ba-bc26-f5c96aaa9521@nostrum.com> <A4157066-C7BA-4560-812A-21DB8A063AC4@tzi.org> <eb2d896c-d246-42c7-97c6-a4b3c7151cd9@nostrum.com> <A0EA194A-556C-421E-9358-39B0885597D1@tzi.org>
Content-Language: en-US
From: Robert Sparks <rjsparks@nostrum.com>
Autocrypt: addr=rjsparks@nostrum.com; keydata= xsDNBFx4PQwBDADIIJqFKIeYNmVR3iH8YnNqwApV+ci83VqFaPg0UXZAZ1utH/2O2LOLJKmV Ol11+lOSfH4OJgpARt37PWbqfG2TzzGfEucRBPMAV8TEDmzKL+7/OUMLEoPeexgxz6ADxK2Q ACKKzHhF30y4fx2fn9rYZrCvYHV9HDKcfFotNLna0U6P6wu70L0mT2hcjQgZ7+8HSZCpK2XG PTya1mEiMklH6+UHfcTLoAxd3chQiseRi19/TQZZCD3LuuaGFWyTIeF9ZNWV9yL0HQeb/XMs tmZnObSSHSUbZwn5PR9Uf+3iW7jdG5JuXBvNbDpAHfLyPXRqxErM/nCLrbwGB6AgNSKFCwkL lb3uxsGFWcOt6sedrjixoVUO2k4zQWVnCUCwFHGrgIxUK24dI8oqydGPctXAKj5VqoCVJBv6 4JxSpiR+V8fl3A8gksBUnuIMLNlRjB5RAgZaSUpaOkXsWUBA8Z75wQWoIzkJIeMm29w2l1kB B9kGMdyiXGr2JV8VQZ4lAscAEQEAAc0kUm9iZXJ0IFNwYXJrcyA8cmpzcGFya3NAbm9zdHJ1 bS5jb20+wsEUBBMBCAA+FiEEGNywdGDCHUYBwWN3bipqV3X5ExgFAmXhDnYCGwMFCQ0rmNQF CwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQbipqV3X5ExiEpwwAknrYjNHDI2l50IC1lpQB SnmCLvuu4pEESpRaBx1Te7vZBHr740dMbKQv/ZYYekw/NfFfoq2Ptguz5BXHwtyx3hKKjBUA U/rP62bl+x77yTJ0+I5k2hJ1p1DWNqXHWK3SEM0IUvjWpMGlfXPu7iVYCBGPKBDglQ4GWpzU HmSAX/8Zww6+ZbrXM0VgA+hLSHivyHextX4mJwoLRcuY99ecvkdWwFnKoDlEsKozdv9NW+QT h1rFvAtXf2ZCGwIveAMJPbHbRY3uFVp+oMvBbP7m5teffB8Ki5kuO1Y1Wqd9UPhiVDZmUuyC PXymQErskbOB6DcNSSFH7ZHuLM4V+zyziYWTT6foBv0ynA+a3Ofxo+rKPVHLybZlO9bQcI0Q TIE8yT0oqd3kWMaMIyrKZURVUpzcDgRnx6ujckLLyAC1H8L0tuntPwZOo5PAq3P7SUiWlc0L 5HbA0L//BE6eeWn6U3xOgaJNF2+YRVICNtWpXcR3Mr4k1uXW4JkE7lyoufbnzsDNBFx4PQwB DAC03e1kk41e9Z9FuVW8UKWIkVUBeH3gfJMsb94d/c0cqBMRw5rulSY7+U76rw4AXo792LZn ydjDfoL0GQxGqkrZh397Sn9P/sLCb5I+wC14251nkmh5tmU2sQqCk+g9nykcE/NJft/zFkeb HHCKAosK6glO+W0YPHc/k7nXt/fLz7dMRpFpmqFXWjeN2VtwKr9znMg9+iX6XfgAJPMdDNH8 fn30Cp5TIsn5WCI70+JztgvfjFhD15Eb3rtDdOfOydjGCV2ZVxfM8ECmc8Z3DrThyiC2M3uo 2Y50rs6MH+TmVCtpHkISnH7B+80Vy2SC60K9l2xgCaezN1SlkQy3ZpprzcDrNTI8FcJa/UUM ayMGvSDGEGuHZRaNUyXP3jQ8oss+067axmNr5vgjpf01kmE1RJtiGEDWmCr8u1SbVQjdax6C pDqq3RKoX2ZVGLtkdDYZbsqSq4TgmFukoijWRbLxsFBdeEgruTViWRw4PKZav0piLxrhHUGI m6F6JFngapUAEQEAAcLA/AQYAQgAJhYhBBjcsHRgwh1GAcFjd24qald1+RMYBQJl4Q52AhsM BQkNK5jUAAoJEG4qald1+RMYaCML/jp+3W9OedMRVk5XQ3Urxu7g09qaeAfBAArLlE7F13Xt WuGUN7JwZ8hZt8Rsx1+Uz/Zq2TIPjl8PmgIqCSkuvZrxacr+drYARtO00Af71qHVoh4gZTae iOwEuOGhhtCVI3vvKLMDv1ex0scvD4rJTsIk/zqEDCJNDVOf09Szj0CW0vJOYxrIV0sG/UoM 7Ui5/eB4tlN5AFIXuTJzo6BzaUAJVut74Ss2i93qwtwjGw44iEqPVhqKMCDYuB9+bm13ft+H Vr7viRZobd+60NTWrfZhkpmzhb4Qiib9qXhrUoa2EXqVOIy+LMQoiwjF9/iK+5FSA18c52FP ODkDgkica826W9AnBasS6gXQr0bO1BCJu84Fp2RQcjB4IFP+sKVoN3EZTByyUKK4NnSLF3lJ /G+vQhisnuJS+e+emZ8UxZBOK8upAhrhHJj0Wju2W0uTQTxlBME0/uNsvA/KaudLNhlQiUYN 7Fl3rswvQk/iD+utnQdWJbRgIsqesNXbQCOimQ==
In-Reply-To: <A0EA194A-556C-421E-9358-39B0885597D1@tzi.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Message-ID-Hash: XHBLFS7YNXDTTAW2GQIOQ4L3O325NO2U
X-Message-ID-Hash: XHBLFS7YNXDTTAW2GQIOQ4L3O325NO2U
X-MailFrom: rjsparks@nostrum.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tools-discuss.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [Tools-discuss] Re: PDF [Re: Tools team meeting tomorrow 9 July 2024 1800 UTC]
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/-mcq_MQGAzgDWl9A7ywvniu7JIY>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Owner: <mailto:tools-discuss-owner@ietf.org>
List-Post: <mailto:tools-discuss@ietf.org>
List-Subscribe: <mailto:tools-discuss-join@ietf.org>
List-Unsubscribe: <mailto:tools-discuss-leave@ietf.org>

On 7/8/24 10:16 PM, Carsten Bormann wrote:
> Hi Robert,
>
> Thank you for responding to this.
>
> On 8. Jul 2024, at 23:29, Robert Sparks <rjsparks@nostrum.com> wrote:
>>
>> On 7/8/24 1:34 PM, Carsten Bormann wrote:
>>> On 8. Jul 2024, at 20:27, Robert Sparks <rjsparks@nostrum.com> wrote:
>>>> The agenda and the beginnings of notes for the meeting is available at https://notes.ietf.org/tools-team-20240709. Details will continue to be added up to and during the meeting.
>>> This discusses expensive resources (“endpoints”) for building PDF forms of documents.
>>>
>>> Could we maybe stop pdfizing things in weird ways and simply build the correct PDF form on I-D submission?
>>> The current state is just such a waste of time.
>> This glosses over a lot, so lets dig a little:
>>
>> - by "correct PDF form" I assume you mean what xml2rfc would produce given xml input.
> Yes.
> Something good enough to judge what the RFC editor will ultimately publish.
>
>> What about drafts that are still being submitted as text only
> I personally don’t care.  OK, you have to do *something*.
>
>> - do we stop providing any pdf form of those? What about drafts and RFCs from more than 5 years ago?
> So do something.  PDF-printing the .TXT seems to have worked so far.
>
>> - The expensive part is running the pdf creating software (currently weasyprint) and moving that expense to draft posting time vs the first time someone accesses the pdf document just moves the expense around.
> The expense is trivial (*).  The difference is whether I have to wait while the data is generated on demand or they can be provided immediately (or already is on my laptop via rsync).  I sure value my time more than that trivial expense.
>
>> It might even make the total expense higher if no-one every asks for the pdf of a particular version of a draft.
> That is true, but probably not relevant (*).
>
>> - One sharp edge is running the pdf creating software over provided svg (even with our badly constructed restrictions on the SVG). We know of extant submissions that cause the pdf generation call to fail to return.
> I can imagine that.  No soup for these drafts then (fall back to PDF-printing the TXT).
>
> Grüße, Carsten
>
> (*) conservative estimate: 2000 active I-Ds, each submitted 10 times per year, at 20 s of CPU per ingestion.
> That is about 111 CPU hours per year, or < $10/yr on AWS at retail prices?
> You writing your response already cost more...

And then the cost of maintaining the store - _updating_ the store when 
something fundamental changes that demands that we rerender the pdf (the 
basis becomes _all versions of all drafts in the archive_ at that 
point), and the time it takes to move the collective set of bits 
repeatedly when bots scrape them and individuals/bots do new blind 
full-rsyncs. All that can be optimized downwards and towards the edge, 
but dismissing this as a conversation about <$10/yr is incorrect.

Fwiw, the _current_ implementation (in place for many years) computes 
these on demand _in the HTTP transaction that asks for them_ if they 
haven't been computed already, and stores them in a disk-backed cache. 
These builds and the time used to shovel the results compete with the 
other things users are asking the datatracker to do and are a major part 
of slowness when the datatracker is slow (adding more to the collective 
expense of making/keeping them as we make all of our contributors wait 
to do other work. Yes, that can (and must) be redesigned to make better 
use of modern web technologies, and my initial question is about whether 
we invest in a redesign that moves this to places that help more people 
more of the time, or if we are far enough along that people can build 
these locally, closer to the origin.

Your argument to keep them is to promote reviewing all the formats as if 
they are going to become RFCs (at least for xml source). That would be 
better done _before_ submission wouldn't it?


>
> -----------------------------------------------
> Tools-discuss mailing list -- tools-discuss@ietf.org
> To unsubscribe send an email to tools-discuss-leave@ietf.org
> https://mailarchive.ietf.org/arch/browse/tools-discuss/