Re: [Cellar] On the multiplicity of Info elements

Dave Rice <dave@dericed.com> Fri, 15 January 2016 07:28 UTC

Return-Path: <dave@dericed.com>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 721A51B2AE3 for <cellar@ietfa.amsl.com>; Thu, 14 Jan 2016 23:28:54 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.121
X-Spam-Level:
X-Spam-Status: No, score=-1.121 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_NEUTRAL=0.779] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7VsH08czRiqM for <cellar@ietfa.amsl.com>; Thu, 14 Jan 2016 23:28:52 -0800 (PST)
Received: from s172.web-hosting.com (s172.web-hosting.com [68.65.122.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EE88C1B2AE2 for <cellar@ietf.org>; Thu, 14 Jan 2016 23:28:51 -0800 (PST)
Received: from user-387g4ij.cable.mindspring.com ([208.120.18.83]:43088 helo=[10.0.1.64]) by server172.web-hosting.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.86) (envelope-from <dave@dericed.com>) id 1aJyoS-0019Zm-7g for cellar@ietf.org; Fri, 15 Jan 2016 02:28:51 -0500
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\))
From: Dave Rice <dave@dericed.com>
In-Reply-To: <5697E0D6.5000906@gmx.de>
Date: Fri, 15 Jan 2016 02:28:46 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <FA83BFB8-286C-4BD5-A757-F653A06702A2@dericed.com>
References: <CAOXsMFLCbe-W=h+tQpdRa8Nh0jz=xdbZTXEmoXsgQTbA=4OPCQ@mail.gmail.com> <C0E5EBA2-2A56-46F9-A049-629EFB11F280@dericed.com> <CAOXsMF+gc0d2LEisfHm0jnjDGQKcYquEMBt7FnZ_uuSNF=C0iw@mail.gmail.com> <568AC10F.9030303@gmx.de> <CAOXsMFKJJhzU-3CYqguDePY42T+Vvhx9ytAfvoM6xyqaZY+N4g@mail.gmail.com> <FCC4DC05-44CD-4C2B-8C59-8E3E5B494DC0@dericed.com> <568D6710.8000605@gmx.de> <692F039A-180D-4535-B4A1-529A777573F5@dericed.com> <5693D5D6.6030709@xiph.org> <F6B37DB3-EDCB-4BD2-9B0D-F8A4F353F36E@dericed.com> <20160112133656.GJ4063@bunkus.org> <5697E0D6.5000906@gmx.de>
To: cellar@ietf.org
X-Mailer: Apple Mail (2.3112)
X-OutGoing-Spam-Status: No, score=-1.0
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server172.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - dericed.com
X-Get-Message-Sender-Via: server172.web-hosting.com: authenticated_id: dave@dericed.com
X-Authenticated-Sender: server172.web-hosting.com: dave@dericed.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <http://mailarchive.ietf.org/arch/msg/cellar/-skl36QQ5JQr_Qcz4d7boGyzlro>
Subject: Re: [Cellar] On the multiplicity of Info elements
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jan 2016 07:28:54 -0000

> On Jan 14, 2016, at 12:54 PM, Sebastian G. <bastik> wrote:
> 
> 12.01.2016, 14:36 Moritz Bunkus:
>> Hey,
>> 
>>> From a sample set of 59,881 Matroska files uploaded to archive.org
>>> only 119 of them include CRC-32 Elements. Almost 0.2%! Additionally
>>> the webm Document Type lists the CRC-32 Element as unsupported. So I
>>> think it’s true that it’s rarely bothered with. The implementation of
>>> CRC-32 Elements does add many complications and perhaps demand has
>>> been too low to resolve them. Moritz may have comments here.
>> 
>> That's pretty much spot on. The CRC32 element was added comparatively
>> late; a lot of muxers had already been created by that time.
>> 
>> Most requests I've received where more concerned with error recovery
>> than with error detection. This is something that CRC32 wasn't meant
>> for; additionally CRC32 elements weren't meant to be written in the
>> cluster elements which make up the bulk of a Matroska file's data.
>> 
>> Yes, there is a feature request[1] for adding support in mkvmerge and
>> mkvinfo. It has been created in 2010, but except for those two people
>> mentioned in the bug I haven't really received any other requests for
>> such a functionality.
>> 
>> Kind regards,
>> mosu
>> 
>> [1] https://github.com/mbunkus/mkvtoolnix/issues/543
>> 
> 
> Hi,
> 
> to said ticket you added two entries [2][3] which mention a problem with
> verifying checksums.
> 
> Would that be much easier to handle if the CRC would be about the data
> that should be present in the file, but are omitted due to default
> values aren't written? Compared to creating a checksum of the data that
> actually get written and verifying said checksum.
> 
> I had a rather strong opinion towards including a CRC that covers the
> elements that are written. I'm just curious and if something can be
> solved elegantly rather than having to invest too much resources into
> it, it might be worth reconsidering. After all it would be defined
> within a proper specification.

The Matroska files with CRC Elements store a checksum of what data is stored in the Master-element rather than what is interpreted (via defaults).

For instance in the test file suite, test2.mkv has a CRC value in the EBML Header Element with a value of 0x67277B3A. The rest of the Master-element stores DocType, DocTypeVersion, and DocTypeReadVersion. The other EBML Header Elements of EBMLVersion, EBMLReadVersion, EBMLMaxIDLength, and EBMLMaxSizeLength are not stored but interpreted as the defined default values. The data as stored matched the checksum, not the data as interpreted.

I think switching the CRC to calculate interpreted data rather than stored data would present numerous challenges.

> Back then when I offered the bounty I expected matroska to be used a lot
> more often in the future by other people than me. My usecase for
> checksums was indeed just error recovery. However I am not against
> extending the potential usecases to have applications use it to detect
> errors.

A few use cases:

If I have a Matroska file in long-term storage and migrate it to new storage (hard drive to new hard drive, or LTO Tape, future format, etc), the CRC values can be verified to ensure the data is transferred authentically and that the data on the new form of storage.

Alternatively if the data is moved across a network the received file can be verified to ensure that it is still valid. This is possible with FLAC files as they have an MD5 in the header and CRCs per frame.

Also if a Matroska file in an archive needs to be changed. For instance to add contextual metadata or a supporting attachment, the file could be edited in place without affecting other Level 1 elements. After the edit the file could still be verified to ensure the Level 1 elements unaffected by the edit have maintained authenticity. A whole file checksum wouldn’t work in this case since the whole file is intended to be different before and after the change, but the parts that aren’t involved in the change can still be verified.

As discussed, CRC elements can allow authority in selecting one of an Identically-Recurring Element if there are many in a damaged file.

There’s also a use case with secure or legal recordings. The CameraV project, https://guardianproject.info/apps/camerav/, uses matroska and checksums in a way to allow verifiable recordings.

> [2] https://github.com/mbunkus/mkvtoolnix/issues/543#issuecomment-171036571
> [3] https://github.com/mbunkus/mkvtoolnix/issues/543#issuecomment-171422587
> 
> --
> Sebastian
> 
> _______________________________________________
> Cellar mailing list
> Cellar@ietf.org
> https://www.ietf.org/mailman/listinfo/cellar