Re: [Cellar] Matroska Elements to support frame side data

Steve Lhomme <> Mon, 12 November 2018 15:38 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DB141130E47 for <>; Mon, 12 Nov 2018 07:38:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 5BLLalDIaBKd for <>; Mon, 12 Nov 2018 07:38:21 -0800 (PST)
Received: from ( [IPv6:2a00:1450:4864:20::341]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 7C2BA12D4E8 for <>; Mon, 12 Nov 2018 07:38:20 -0800 (PST)
Received: by with SMTP id z8-v6so2388578wma.5 for <>; Mon, 12 Nov 2018 07:38:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=fS72Kaj/o66gWuKbCTyRaxS5b5NEJBkxpkv+X9TH80Q=; b=Zf5RFXWcQnsECxyvAOsuaqHx2LYXgWRvESaPLiRQ2keNW0Wa4NBGZ2GieCZZj8bzoO n6lEqljc2njTpuK5khmC1qSZF4jlrgj0/2ESYG6LMRBGm2vd5tflVnILKy+wdTN+ZT/q se1HD49CLvTA4/tR0RvmVulXtKdzDmBaAll6c5wmZZYb1MVMn/Ab8TuRBebRR9+7Q6pR SfVJD6UszaoaS557U15F6cF8N2QDb58Hw84MaUuunnjkvTlnH7YmSlVMR42hNy3ait5N g/KMJm518FygPJ+6lN127Iqb9aZ7e7Se0ZO8I/1bZQI8S4oEi8BjVmzlFK7zWajkpCQ0 A1iw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=fS72Kaj/o66gWuKbCTyRaxS5b5NEJBkxpkv+X9TH80Q=; b=U8Y9iCGwX7qtCzH/sGC/kqRqnjCuySvC1Wpviy8sJUqIgIxcsuaVMc4e80RzQs+7Sr Nb4i7V1lvweVkL6FY2JiSERemLFNqV6EY/JLLCvOmkrAEc5QwWTH2CMxlT5Q+QaDjw+k 5hP0Xi3oy/90qxzTwuunGWdrhNUcfd2DEh28KkpEG4tzV09oTYFfT3g0G7NBbnDxxoMj ESacWEsGmFxcyf/PwC2/hL8jphhDglr0qc0K2Jx3iaEBe8eAJKOUXlgbI5zP3YKrEfy9 Yl24mpOE1e0b513CF3ZkmaLrNt3zClvy4rbM60Thh5owOmIkX2nvoJxhyjxzN9x/nCPX IpHw==
X-Gm-Message-State: AGRZ1gIb+WuhhSiE4FTD1aS5Q5D9o0Ko0H/98DcaJQLnc8bcNWpPNAf1 8NS5WmoC3hJgkYdHQuYIQ/vJU6kTpi8vPg==
X-Google-Smtp-Source: AJdET5cymFqpaTkFm1trDUfmfmX7k48Nz+kA4oZECxRDrD1ut0DI/NG9a8n3VEvhioIJOSJ7/0yBdg==
X-Received: by 2002:a1c:c46:: with SMTP id 67-v6mr138715wmm.6.1542037098442; Mon, 12 Nov 2018 07:38:18 -0800 (PST)
Received: from [] ( []) by with ESMTPSA id 66-v6sm9641798wmp.28.2018. (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Nov 2018 07:38:16 -0800 (PST)
Cc: Tobias Rapp <>
References: <> <> <> <> <> <>
From: Steve Lhomme <>
Message-ID: <>
Date: Mon, 12 Nov 2018 16:38:15 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <>
Subject: Re: [Cellar] Matroska Elements to support frame side data
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 12 Nov 2018 15:38:24 -0000

Just as I double check this, it turns out WebM has support for the 
AlphaMode and all the BlockAddition that goes with it.

I have no idea how this can even work, except it must be tied to VP8 and 
VP9 internals. So it is also extra data added per frame. I'd be 
interested in a sample making use of that.

They don't even support MaxBlockAdditionID, so technically the value is 
0 for them. I just hope they don't use BlockAddId 0, it is a forbidden 
value. If not it's probably a value of 1 but not 2.

I found this sample which uses a value of 1

So we can assume 1 is really for "codec interpreted data" and all the 
other are freely interpreted.

In addition of 
Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDName and 
Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDType we may 
also add a 
Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDExtraData which 
could hold extra information for the given type. For example the type of 
timecode format for the given format.

MaxBlockAdditionID is there to signal that there will be extra data in the header of the file, since there was no other way of knowing. It's not used in WebM but they have AlphaMode (which we will need to keep as well).

As for hardcoding BlockAddIDType in the specs we can do it. But we need to allow more values anyway.

On 12/11/2018 00:15, Dave Rice wrote:
> Hi Steve,
> Initially I looked at the BlockAdditions Element as a possibility to store side data, but the definition didn’t seem to allow it. Since the specification defers to the encoding ("Interpreted by the codec as it wishes”) there doesn’t seem to be any way for the Matroska specification to define a BlockAdditionalID value (except for 0 which is reserved as a reference to the main block).
> However, this approach works for me if others accept the break in reverse compatibility.
> Several other definitions would require updates. Currently we have:
> BlockAdditions: "Contain additional blocks to complete the main one. An EBML parser that has no knowledge of the Block structure could still see and use/skip these data.”
>   BlockAddID: "An ID to identify the BlockAdditional level.”
>   BlockAdditional: "Interpreted by the codec as it wishes (using the BlockAddID)."
> AlphaMode: "Alpha Video Mode. Presence of this Element indicates that the BlockAdditional Element could contain Alpha data.”
> Notes about what definition updates would be needed:
> BlockAdditions’s definition would have to be rewritten as the data wouldn’t necessarily “complete” the main block but might alternatively supplement it or describe it.
> BlockAddID should include an enumerated list of integers and reference a registry on what each value means.
> BlockAdditional’s definition would have to be updated, since only in some cases would the BlockAddition content be interpreted by the codec.
> AlphaMode’s definition seems to imply that one of the BlockAdditional Elements contains alpha but gives no way to identify which one, so this is the largest reverse compatibility issue. Are there Matroska demuxers which would try to use timecode or rawcooked data as if it was alpha data?
> So IIUC BlockAddID=1 is reserved for the context of the associated codec mapping (as done in A_WAVPACK4). Then we reserve BlockAddID=2 for alpha and 0 is reserved since BlockAdditionID uses 0 to reference the main block. Then any other type of data for storage in BlockAdditional would use the next available unsigned integer from 3 and would require an BlockAdditionMapping. I suggest that BlockAddId = 0, 1 and 2 would not require a BlockAdditionMapping since the specification would reserve them for particular purpose.
> But if we already define BlockAddId for 0, 1, and 2 in the specification then why not continue and reserve 3 for rawcooked and 4 for timecode and so on. That would avoid the need for storing that data in BlockAdditionMapping, eliminate the need for new BlockAdditionMapping elements, and then demuxers could understand if they can use that data by simply checking the BlockAddId rather than looking up the corresponding value in BlockAddIDName.
> Also what is the need for MaxBlockAdditionID? And what is the reason to keep BlockAdditionID low, simply to keep it to a single byte?
> Dave
>> On Nov 11, 2018, at 10:40 AM, Steve Lhomme <> wrote:
>> I would go with the Track way as well. Primarily because storing a
>> string (which pretty much never changes) in each Block is a huge
>> waste.
>> Add extra data per Block is already supported using BlockAdditions.
>> There's already BlockAddID which correspond to Moritz' BlockMetadataID
>> and BlockAdditional which correspond to BlockMetadataString /
>> BlockMetadataBinary / BlockMetadataUInteger / BlockMetadataSInteger /
>> BlockMetadataFloat. It's like a codec where the CodecID defines how
>> the data in the binary blob should be interpreted.
>> The current system states that the additions are left to
>> interpretation to the codec. It was originally designed to hold the
>> lossless complement to lossy versions of Musepack. So in that case
>> it's really meant to be passed to the codec. I think we can expand
>> this system with keeping this default behaviour by default (albeit not
>> used anywhere) and have different ones on demand.
>> There's also an AlphaMode that also uses BlockAdditions to store the
>> alpha track. Which pretty much no info on how to do it....
>> As noted timecode may be a separate track (as originally intended) if
>> it is not related to the video frames (ie the timestamps doesn't
>> match).
>> This would look like this:
>> - Musepack lossless complement:
>> Segment\Tracks\TrackEntry\MaxBlockAdditionID: 1
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDValue: 1
>> (same as BlockAddID) (default)
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDName:
>> "complement" (default)
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDType: 0
>> (Codec Complement data) (default)
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAddID: 1
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAdditional:
>> lossless part interpreted by the codec
>> - Alpha layer:
>> Segment\Tracks\TrackEntry\MaxBlockAdditionID: 2
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDValue: 2
>> (same as BlockAddID)
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDName: "alpha"
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDType: 1
>> (Alpha layer data)
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAddID: 2
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAdditional:
>> alpha mask to apply on the video track
>> - RawCooked DPX data
>> Segment\Tracks\TrackEntry\MaxBlockAdditionID: 3
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDValue: 3
>> (same as BlockAddID)
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDName: "rawcooked"
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDType:
>> 0x1234567 (rawcooked identifier)
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAddID: 3
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAdditional:
>> DPX data defined by RawCooked
>> - Timecode storing
>> Segment\Tracks\TrackEntry\MaxBlockAdditionID: 3
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDValue: 3
>> (same as BlockAddID)
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDName: "timecode"
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDType:
>> 0x890ABCD (SMPTE TC identifier, can be another ID for different kind
>> of timecode)
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAddID: 3
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAdditional:
>> Timecode storage
>> That means the alpha mode would not be backward compatible with
>> existing files, because it requires non default values. But I don't
>> think anyone ever used this improperly defined feature.
>> The value of MaxBlockAdditionID is kept low on purpose. BlockAddID 1
>> was always for codec complement and thus 2 for the AlphaMode. But we
>> don't need to go much higher that than now that we have a mapping. If
>> there are Timecode AND Rawcooked it would be like this:
>> Segment\Tracks\TrackEntry\MaxBlockAdditionID: 4
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDValue: 3
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDName: "rawcooked"
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDType:
>> 0x1234567 (rawcooked identifier)
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDValue: 4
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDName: "timecode"
>> Segment\Tracks\TrackEntry\BlockAdditionMapping\BlockAddIDType:
>> 0x890ABCD (SMPTE TC identifier, can be another ID for different kind
>> of timecode)
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAddID: 3
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAdditional:
>> DPX data defined by RawCooked
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAddID: 4
>> Segment\Cluster\BlockGroup\BlockAdditions\BlockMore\BlockAdditional:
>> Timecode storage
>> Le lun. 5 nov. 2018 à 10:29, Moritz Bunkus
>> <> a écrit :
>>> Hey,
>>>> In other thoughts on this suggestion, I think it could make it difficult
>>>> to easily understand if a file has a particular type of side data. For
>>>> instance if only a few Clusters somewhere in the Segment contain a
>>>> certain type of side data, it would require parsing every Cluster to know
>>>> what types of side data are available. This uncertainly wouldn’t be the
>>>> same issue if the side data was itself a Track.
>>> It's not entirely necessary to use a full track for side data. We can simply
>>> signal the presence of side data in the track headers and refer to it from
>>> the side data in the block groups. This would also mean we only have to
>>> store the string identifying the side data type once (in the track headers)
>>> instead of in each block.
>>> (I'll use "BlockMetadata" as the basis for all element names
>>> here. Initially I proposed "FrameMetadata", but "BlockMetadata" is fine
>>> with me, too.)
>>> For example:
>>> Tracks
>>> +- TrackEntry
>>> +- TrackBlockMetadata (Master)
>>>   +- TrackBlockMetadataType (String, required)
>>>   +- TrackBlockMetadataID (Unsigned Integer, required)
>>> …
>>> Cluster
>>> +- BlockGroup
>>> +- BlockMetadata (Master)
>>>   +- BlockMetadataID (Unsigned Integer, required, refers to existing
>>>      TrackBlockMetadataID in track headers)
>>>   +- BlockMetadataString (Unicode String, optional)
>>>   +- BlockMetadataBinary (Binary, optional)
>>>   +- BlockMetadataUInteger (Unsigned Integer, optional)
>>>   +- BlockMetadataSInteger (Signed Integer, optional)
>>>   +- BlockMetadataFloat (Float, optional)
>>> with the restriction that exactly one of (BlockMetadataString,
>>> BlockMetadataBinary, BlockMetadataUInteger, BlockMetadataSInteger,
>>> BlockMetadataFloat) must exist.
>>> Advantages as I see them:
>>> • Less overhead (no repeated string parsing required)
>>> • Quicker parsing (no repeated string parsing required)
>>> • Presence of meta data is known upfront
>>> • Not using a full-blown track for meta data would alleviate the need to
>>>   specify how all those track and block features (e.g. BlockDuration,
>>>   TrackDefaultDuration…) apply to a "meta data track".
>>> Kind regards,
>>> mosu
>>> _______________________________________________
>>> Cellar mailing list
>> -- 
>> Steve Lhomme
>> Matroska association Chairman