Re: [Cellar] On the multiplicity of Info elements

Steve Lhomme <slhomme@matroska.org> Tue, 05 January 2016 08:20 UTC

Return-Path: <slhomme@matroska.org>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4D1BF1B2C4D for <cellar@ietfa.amsl.com>; Tue, 5 Jan 2016 00:20:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.278
X-Spam-Level:
X-Spam-Status: No, score=-1.278 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4KbR73UJ0PH9 for <cellar@ietfa.amsl.com>; Tue, 5 Jan 2016 00:20:01 -0800 (PST)
Received: from mail-vk0-x22e.google.com (mail-vk0-x22e.google.com [IPv6:2607:f8b0:400c:c05::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6367F1B2BAC for <cellar@ietf.org>; Tue, 5 Jan 2016 00:20:01 -0800 (PST)
Received: by mail-vk0-x22e.google.com with SMTP id k1so135590038vkb.2 for <cellar@ietf.org>; Tue, 05 Jan 2016 00:20:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=matroska-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=zz5GVpxKut5huBYcLevQLl0cNKx6UroSOQbH3VLS8+w=; b=09GrBpwZR8a3LG2ZVfNDvIRtvIB/PXUdUS/rIdA9o0Bk1aqtzYU2lWhubxqyOaXp8t SGycvoSkC/1+oyuZf7V0f8jjM5DfBsEDSARNA+A/isvVZ6/1St+xDRXKbNf5kSuYf0kx yVTp2FSIJg0aPNp9i9EEfC5yaMERfH0ekRa6c190OyMH7Ana8qUOBpHMEOvjhbHgVwqp VCNfy25MWJKHZo7Ry3vRWhKjbtmcE+ioAAVG0RbDVPoqqMOgrWjjiaDPS1/V9uhL8AkC 0uI9hbr9z4TVW8AviOg0Fp9q/CMBrcQ/oPIu0v1wB3Zl4w2rZXgwNXPZJieV4phDpl/u NswQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=zz5GVpxKut5huBYcLevQLl0cNKx6UroSOQbH3VLS8+w=; b=Fb/BJngZfcfMqbAYPZQOIdcXc7GqCD90uIh+GzTMjwr5PvT3lZ6jRhS6b4KaoNcxg+ cC1p1H1VdCSOpNRQxBShQRAmGVrZHjmT7rdV2dfkVUpe48NYzA1WADDrpZZNMl1Sb26V IWsoYjmPvYT0u09IBuSbYfDiRSsKKkAC/muKv0PIsG2K+gtjIGCys610PpCxIJS1mU88 iFFNEu3ZjsX7Vo4exELmsfWFe4GgJ/jv1Nok3LRhCU6CMPdpiJDIE+Qr5AqMBfInUPq7 kg2D6HhBCqsbQ1Yi/gFyThPegE82QG6lZKB6CiaiXBA8AnC7SbAUh7TByfJbj6gaM6Im woQw==
X-Gm-Message-State: ALoCoQkWmZpllPLuiQ27j5NebmTgIDpxZ0LqVpgKv8dKpM3uLQrChltMCVlrwXPYLNn2LQUHSlDNoZjdSTy0jY7ttYjGETLfIw==
MIME-Version: 1.0
X-Received: by 10.31.50.213 with SMTP id y204mr44427031vky.109.1451982000387; Tue, 05 Jan 2016 00:20:00 -0800 (PST)
Received: by 10.31.8.84 with HTTP; Tue, 5 Jan 2016 00:20:00 -0800 (PST)
In-Reply-To: <568AC10F.9030303@gmx.de>
References: <CAHUoETLC4dQQ7=TOuTXZ3aDjKCCJgz2s-8Gb33MoSAP3hgRQiQ@mail.gmail.com> <BEA72D66-EA3D-4CF0-987D-836E95287F39@dericed.com> <20151230091811.GA19636@bunkus.org> <CAOXsMFLCbe-W=h+tQpdRa8Nh0jz=xdbZTXEmoXsgQTbA=4OPCQ@mail.gmail.com> <C0E5EBA2-2A56-46F9-A049-629EFB11F280@dericed.com> <CAOXsMF+gc0d2LEisfHm0jnjDGQKcYquEMBt7FnZ_uuSNF=C0iw@mail.gmail.com> <568AC10F.9030303@gmx.de>
Date: Tue, 05 Jan 2016 09:20:00 +0100
Message-ID: <CAOXsMFKJJhzU-3CYqguDePY42T+Vvhx9ytAfvoM6xyqaZY+N4g@mail.gmail.com>
From: Steve Lhomme <slhomme@matroska.org>
To: "Sebastian G. <bastik>" <bastik.public.mailinglist@gmx.de>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/cellar/DFTlkmgkLgktS-UNggaw6njwjAs>
Cc: cellar@ietf.org
Subject: Re: [Cellar] On the multiplicity of Info elements
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jan 2016 08:20:03 -0000

2016-01-04 19:59 GMT+01:00 Sebastian G. <bastik>
<bastik.public.mailinglist@gmx.de>:
> 04.01.2016, 08:11 Steve Lhomme:
>> 2016-01-03 16:53 GMT+01:00 Dave Rice <dave@dericed.com>:
>>>
>>>> On Jan 2, 2016, at 3:47 AM, Steve Lhomme <slhomme@matroska.org>
>>>> wrote:
>>>>
>>>> 2015-12-30 10:18 GMT+01:00 Moritz Bunkus <moritz@bunkus.org>:
>>>>> Hey,
>>>>>
>>>>> I only remember the discussion around Tracks being multiple,
>>>>> not particularly for the other ones. Our intent way back when
>>>>> was to allow muxers to write multiple instances of _the same
>>>>> information_ in different places in order to make the file more
>>>>> resilient against damage or incomplete downloads with protocols
>>>>> like BitTorrent.
>>>>
>>>> Yes, that's the idea for the Track Info as it's vital to the
>>>> usability of the file, as well as the Segment Info. I'm not sure
>>>> it's used in practice though. Since the goal of CELLAR is
>>>> archiving solutions it may still make sense.
>>>
>>> Perhaps to declare that an Element may be repeated but must be
>>> repeated identically should be a new EBML Element Attributes, so
>>> there can be a distinction between the repeatability Segment/Info
>>> and the repeatability of SimpleBlock.
>>
>> That might be good. After all not elements make sense as repeated
>> ones. For example in Matroska you don't want a Cluster (timestamped
>> data) to be repeated.
>>
>>>>> The same reasoning could be applied to Info. Both elements are
>>>>> absolutely crucial to playback; the other level 1 elements safe
>>>>> for the clusters simply aren’t.
>>>
>>> But what should happen when the read finds differences in
>>> repeated-but-should-be-identical elements?
>>
>> Good question. Maybe repeated elements should have a CRC ? If a CRC
>> is wrong (or not found) the parser could look for a copy.
>
> I like the CRC idea for repeated elements, but it still does not define
> how players should behave if they encounter two elements, even with
> valid CRCs, not matching each other.

Also what about repeated elements that are not master elements. You'd
have no way of telling which is the best version. So repeated should
probably be master elements. Maybe CRC should be mandatory too (not
sure if real life files already follow this rule). That's the only way
a parser would be able to tell which version is correct, as far as I
can see.

> There would have to be a recommendation. "Use always the first
> occurrence of an element." or "If an element occurs repeated and its
> values differ, the last occurrence is the one that should be used."

Not necessarily. Bogus data could be on the first one. The goal is not
to write a version of the element and then an updated version in the
file. It has to be the same data. If you want to clear the first
version, you should use the Void EBML element. When the file is
originally written, repeated values should be exactly the same. That
means elements that have a file offset relative to a child element
would not be equal. Luckily in Matroska offset positions are always
relative to the Segment, so a Level-0 element.

http://www.matroska.org/technical/specs/notes.html#Position_References

> Obviously tools that create such files are violating the specifications
> since it should not be allowed to create repeated elements with
> differing values. On the other hand should it be hard to break playback.
> I prefer uniform behavior among players.
>
>>> For a scenario of two differing Info Elements, VLC and FFmpeg use
>>> different Info Elements. Which use is correct? Since the use of
>>> repeated-identical elements is resilience a deviation between the
>>> two could be expected, so we should suggest how the reader should
>>> respond.
>>
>> It was designed for recovery tools. It may not be good to change
>> players for such cases. It would make them more complex. (unless an
>> elegant/easy solution is found).
>
> For differences due to transmission errors a CRC for repeated elements
> seems a good solution.
>
> Players have to do something with repeated elements. I don't know what
> they do, but there should be a recommended way they should handle such
> cases. If a player breaks, that is OK, as long as the file was violating
> the specs. A player should behave in an expected way.

IMO what makes sense from a player point of view is to read an
element. If there's a CRC, it's broken and the semantic says the
element can be repeated, then it should look for a valid version.
Otherwise it shouldn't have to wonder which element to use if it
encounters another one. IMO repeated elements only make sense if
there's a CRC (whichever form it may take).

Another rule for repeatable elements: the element MUST be unique at
that level (not multiple).

>>>>>> SeekHead, Info, Cluster, Tracks, and Tags are multiple.
>>>>>
>>>>> SeekHead and Cluster must be multiple. SeekHead in order to
>>>>> allow moving a SeekHead to the end of the file while still
>>>>> referencing it from the start (so that normal players will
>>>>> still find it quickly). Cluster for obvious reasons.
>>>>>
>>>>>> And Cues, Attachments, and Chapters are non-multiple.
>>>>>
>>>>> I have no idea why Tags is multiple and these three aren't.
>>>>>
>>>>> To me the following would make sense:
>>>>>
>>>>> - Info, Tracks – multiple but only if each instance contains
>>>>> the same information
>>>>>
>>>>> - SeekHead, Cluster – multiple without restrictions
>>>>>
>>>>> - Attachments, Chapters, Cues, Tags – single
>>>
>>> I can understand Attachments and Tags being multiple as it could
>>> allow attachments or tags to be added to a file without having to
>>> re-write too many bytes.
>>
>> Yes. But then there should be a way for the player to know about
>> these beforehand. Good players scan Matroska files beforehand anyway
>> (unless it's live streaming).
>>
>
> I agree with having a mechanism for players to know about them beforehand.
>
> --
> Sebastian
>
> _______________________________________________
> Cellar mailing list
> Cellar@ietf.org
> https://www.ietf.org/mailman/listinfo/cellar



-- 
Steve Lhomme
Matroska association Chairman