Re: [art] Request for review of draft-diaz-lzip-07

worley@ariadne.com Tue, 02 May 2023 19:18 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: art@ietfa.amsl.com
Delivered-To: art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DEB95C151532 for <art@ietfa.amsl.com>; Tue, 2 May 2023 12:18:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.982
X-Spam-Level:
X-Spam-Status: No, score=-5.982 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nZDO6-j7dgnz for <art@ietfa.amsl.com>; Tue, 2 May 2023 12:18:07 -0700 (PDT)
Received: from resdmta-a1p-077303.sys.comcast.net (resdmta-a1p-077303.sys.comcast.net [96.103.146.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 13B72C15152E for <art@ietf.org>; Tue, 2 May 2023 12:18:06 -0700 (PDT)
Received: from resomta-a1p-077050.sys.comcast.net ([96.103.145.228]) by resdmta-a1p-077303.sys.comcast.net with ESMTP id topzp221OKKHmtvTopGciY; Tue, 02 May 2023 19:16:04 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1683054964; bh=chpRH92XxT0l8luLEwkGMRBpuRw00/SXvNDbzFzOHhA=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID:Xfinity-Spam-Result; b=EiDhwJlUj8Pq0T+Jg7w8KGMfcIyC1QTr2ztjC4vMdlLOGKG6Vyd/vYCGj2RX8rgUK 86ZPV39FYE4vnIea1yH3av8mtyCajFym8CZNh/8MDZMRtgaJfzik2PpgKKTC4TfBlo WwK0sMYEsdjcj+fhF5p6KBiJVq/1L+MkB9A29fWrZ+pZVBWYHj7TiIbgPtAIjKe5dr ozaydL6HHdvMxFc4MIq0yoDhEPr9kTDCWv1Ilnz71PAHPXxD4TlE8sByqISzXBdXhc Ozcchl6Z73e9lDwSSHV3hBmptFCXrnd7/NzypvadvGgaEIqWbhnHxJl2bn7pCkE1Oo EeCzPlKz85Hrw==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4a00:430::f9c7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resomta-a1p-077050.sys.comcast.net with ESMTPA id tvTlpUucY5usstvTnpa26G; Tue, 02 May 2023 19:16:04 +0000
X-Xfinity-VMeta: sc=-100.00;st=legit
Received: from hobgoblin.ariadne.com (localhost [127.0.0.1]) by hobgoblin.ariadne.com (8.16.1/8.16.1) with ESMTPS id 342JG1HT1508137 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 2 May 2023 15:16:01 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.16.1/8.16.1/Submit) id 342JG1m11508134; Tue, 2 May 2023 15:16:01 -0400
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: Antonio Diaz Diaz <antonio@gnu.org>
Cc: art@ietf.org
In-Reply-To: <644A6E12.3030001@gnu.org> (antonio@gnu.org)
Sender: worley@ariadne.com
Date: Tue, 02 May 2023 15:16:01 -0400
Message-ID: <87y1m68rym.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/art/QdofBf5RgSA9O-ybsxFGszTQMRE>
Subject: Re: [art] Request for review of draft-diaz-lzip-07
X-BeenThere: art@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Applications and Real-Time Area Discussion <art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/art>, <mailto:art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/art/>
List-Post: <mailto:art@ietf.org>
List-Help: <mailto:art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/art>, <mailto:art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 May 2023 19:18:11 -0000

I'm OK with all of your points, except the following, which may or may
not be significant:

Antonio Diaz Diaz <antonio@gnu.org> writes:
> Maybe I could change the phrase to something like:
>
> "Lzip uses a simplified form of the LZMA stream format to maximize 
> interoperability and provides an accurate and robust 3 factor integrity 
> checking."

I'd say "increase" or "improve" rather than "maximize".  I mean, are you
asserting it is not possible to improve interoperability any further?

>> This depends on exactly what is meant by data communications.
>
> I think I have used the same meaning as the RFCs for gzip, brotli, and zstd, 
> as all of them use a similar wording:

Hmmm.  I still don't like the usage, but clearly this isn't the place to
deal with my problem.

> Thanks. I have changed the phrase above to:
>
>     A lzip file consists of one or more independent "members" (compressed
>     data sets).
>
> Note that there is nothing in a gzip/lzip file but members. A file with no 
> members is an empty file, not a lzip file.

Actually, that's a tricky bit of semantics.  A compression scheme could
define the compression of an empty file to be an empty file.  But since
you've tightened up the wording, it's clear that lzip doesn't do that.

>>     Member size (8 bytes)
>>        Total size of the member, including header and trailer.  This
>>        field acts as a distributed index, allows the verification of
>>        stream integrity, and facilitates the safe recovery of undamaged
>>        members from multimember files.
>>
>> You might want to explain how this "facilitates the safe recovery of
>> undamaged members" -- I would assume that the recovery process is to
>> advance through the Lzip file looking for an ID, then seeing if decoding
>> can proceed successfully from that point.
>
> It basically allows to locate all the members efficiently (index), unless 
> some header or member size is corrupt, in which case it still helps in 
> validating the ID in the headers of the undamaged members.

Ugh, OK, it took me a while to realize that you can index the members of
an lzip file efficiently, but you have to start *at the end* and work
backward.

> Overflowing the data size field should not have bad effects in the decoding 
> itself. It will simply show a truncated decompressed size, just as "gzip -l" 
> has done until recently (but with a 32-bit data size field). Limiting the 
> "member size" field to 2 PiB is the easiest way of preventing the overflow, 
> but it can be prevented by other means, or not prevented at all.

Hmmm, it seems you can go one of two ways.  One is to be strict, where
the data size and member size must match in order for the member to be
valid.  That is what the current text specifies.  The other is to allow
overflow/truncation of one or both of the fields.  But you'd have to
change the text to make that valid, and you'd also need to check that
the reference implementation handles that correctly upon decoding.
(Note this could become a subtle point of incompatibility in decoding.)

My reflex is to have the strict requirement, but really there's little
or no loss of validation checking if the decoder only checks the lower
32 bits of the two quantities.  So there's no practical reason not to
change the text to the overflow/truncation version.

Dale