Re: [Cellar] Timestamps again

Steve Lhomme <> Sun, 13 June 2021 08:08 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3A7C63A0E91 for <>; Sun, 13 Jun 2021 01:08:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 8U4MQmS30Pyi for <>; Sun, 13 Jun 2021 01:08:09 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::62d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 9C6A63A0E8D for <>; Sun, 13 Jun 2021 01:08:09 -0700 (PDT)
Received: by with SMTP id l1so11184217ejb.6 for <>; Sun, 13 Jun 2021 01:08:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=gTqN6mT6lobsPNP5/QXz2aPGj4XBNCKn8b6VODlZTIc=; b=hCt3YU41QUjvJ+lkBSqc/Bx6sqK46WJhGrZRb+3tcTQU96E8KHHr+2A3EPyEg6ZrjO UrIunmjzRjYuysRM04xJ7VoEneqdq8Jkqne45JS3anMv+Wt/x7elhNCeEocZP3LRVwE6 l9/87WF6F1P6H7t3X6FgCl7TpR/cMxDiyDa0vDfaGFQndQLVN8MW/c7SxmkZEA9PKCx9 qP/sdeSjib98NPaAVnJpN4Z/eM73FSHXQe4QS0a8t/HeKsXHkD6fw+E62XNxmsQ+gVV0 Ggc2cs6fvU+4LvInVvXijiqB/EKi/WipbhA69IDBTwDHcWWXDALDchDDjabynyUQakPT w6mQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=gTqN6mT6lobsPNP5/QXz2aPGj4XBNCKn8b6VODlZTIc=; b=FltAofZ1eETZHKBOh2C8YdWDg7fI/UBb0Hg7hDq9oEjYAPkV8g7Kto4SNqXEs4gX1X vtwtVzurH0Kay42i52/o0og3x61e+PliO1HCRiGo7VIny/MHvdOAmYNXLymdvKrpVExI K8pdtwEEo7vSJGu5NxnGqzhY2f/UctmY3imi3Bw/6FDg8jT+b7xqd5YMOMXizWz5nrWk wbvRTXlFH04lJyff6+e4iic0+bXggutFNdeqAyTw2Bqs7/ihU85Xhzm7jmrSXQUXZv4N J4o/4dDcWFy+EIkNS8HrQxDf3lC/iIUk00B6AHeSA7906rIGK9TD+DBc5Sl/OAxZZYK1 iiIw==
X-Gm-Message-State: AOAM5335d/Knuq2mpRQVL3/QyuNcP1a+J0A7RH0be5tCBedS2gQbQk5J QgwhFkgIznN7V/Y3WH0I/6k/U2b1l4gfBzxx
X-Google-Smtp-Source: ABdhPJz33QVMWiZ1cQ+tHVMyhlcnL/IDhrQoC/PMf83A3VHqEpFzPXY7d7M2nqKc80AF75IrKwItcA==
X-Received: by 2002:a17:907:2709:: with SMTP id w9mr10673965ejk.118.1623571687100; Sun, 13 Jun 2021 01:08:07 -0700 (PDT)
Received: from ?IPv6:2a01:cb0c:20:e900:dcd5:90f3:761:feff? ( [2a01:cb0c:20:e900:dcd5:90f3:761:feff]) by with ESMTPSA id f20sm5185327edq.64.2021. for <> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 13 Jun 2021 01:08:06 -0700 (PDT)
References: <>
From: Steve Lhomme <>
Message-ID: <>
Date: Sun, 13 Jun 2021 10:08:07 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [Cellar] Timestamps again
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 13 Jun 2021 08:08:12 -0000

On 2021-06-11 23:31, Timothy B. Terriberry wrote:
> Hello CELLAR friends,

Hi Tim,

> Michael Richardson asked me to review draft-cellar-matroska, which has 
> been taking me longer than I had hoped, but I am making progress. 
> However, I did run across one issue that might require some discussion 
> on the list to resolve, so I thought I would send it sooner rather than 
> later.
> Namely, I was skeptical that the procedures outlined in Section 26.5 
> would be able to reversibly translate between audio sample numbers and 
> scaled timestamps. So I wrote a program to check.

I think this section is outdated. This is still an ongoing subject as we 
tried to have some form of sample accuracy in Matroska.
See the (lenghty) discussion at issue 422 [1].

One possibility is to use the existing TrackTimestampScale element which 
wasn't designed for that but works pretty well. It's backward compatible 
although there is a high chance a lot of readers don't handle it properly.

A more solid solution would be to write the proper sampling rate of the 
source (but that becomes complicated in case of Variable Frame Rate as 
found nowadays in video games).

> I set up the program to evaluate all audio sample rates between 8 kHz 
> and 192 kHz in units of 1 Hz, and for each sample rate to check all 
> sample numbers between 0 and the product of the sample rate and the 
> TimestampScale, divided by the gcd of that number with 1,000,000,000 
> (basically, for this number, all of the divisions are exact, and beyond 
> that number the remainders should repeat). This is actually quite a lot 
> of cases to check exhaustively and would take quite a long time to 
> complete. However, it may be able to generate counter-examples very 
> quickly.
> I checked both strategies outlined in the text:
> (a) Truncate TimestampScale and round all other divisions, and
> (b) Use a slightly smaller TimestampScale and truncate the division in 
> the raw to absolute timestamp conversion (as suggested in paragraph 5).
> In fact, the program (attached), was able to find counter examples for 
> both strategies very quickly.
> For strategy (a), since 8,000 divides 1,000,000,000 exactly (giving a 
> TimestampScale of 125,000), all of the divisions for all sample numbers 
> are exact (whether or not you round), and everything is fine. However, 
> the first counter example was found for 8,001 Hz (where the 
> TimestampScale is 124,984), with sample number 165,781.
> The raw timestamp computed in the muxer is thus
>    (1,000,000,000*165,781 + 4000)/8001 = 20,720,034,996 ns.
> The absolute timestamp is then
>    (20,720,034,996 + 62,492)/124,984 = 165,782.
> This is approximately the same as the sample number, which given the 
> choice of TimestampScale, is to be expected, and is in fact the first 
> sample number where the match is not exact. However, the absolute 
> timestamp computed in the Reader becomes
>    165,782*124,984 = 20,720,097,488 ns.
> This produces a sample number of
>    (20,720,097,488*8,001 + 500,000,000)/1,000,000,000 = 165,782.
> This is off by one (in the positive direction) from the original.
> For strategy (b), the first counter example was for 8 kHz (where the 
> "slightly smaller" TimestampScale is 124,999), with sample number 62,501.
> The raw timestamp computed in the muxer is thus
>   (1,000,000,000*62,501 + 4000)/8000 = 7,812,625,000 ns.
> The absolute timestamp is then
>    (7,812,625,000 + 62,499)/124,999 = 62,501.
> Here the match with the sample number is still exact. However, the raw 
> timestamp computed in the Reader becomes
>    62,501*124,999 = 7,812,562,499 ns.
> This produces a sample number of
>    (7,812,562,499*8,000 + 500,000,000)/1,000,000,000 = 62,500.
> This is also off by one from the original, but in the negative direction.

I think the TrackTimecodeScale solves both these examples, because it 
using a floating point estimation before rounding the numbers. It allows 
finer granularity for each sample. For audio-only files it's even 
clearer because no matter what the TimestampScale you pick, the 
TrackTimecodeScale is directly related to the sampling frequency, in 
floating point.
The drawback is that the samples are accurate per packed blocks of data, 
then it's up to the decoder to determine the timestamp for each sample 
in a packed block.

I made a program [2] to check the tricky cases where a mixing of track 
types/frequency forces some compromise on the TimestampScale and thus on 
the rounding precision. I ran it with 8001 Hz and there is no error on 
each possible tick of a Cluster, no matter the video frequency.

As seen in this libavformat patch [3], the TrackTimecodeScale is 
TimestampScale * FrameRate. This is very different than what is 
explained in section 26.5 and 26.6 (which are now 27.5 and 27.6) where 
it's normally around 1. I think both sections have to be rewritten to 
clarify this (issue #517 [4]). Both the old way of using 
TrackTimecodeScale and the new way should be mentioned since old files 
may use the old way, although on the player side it will make no difference.

A big side note: this method cannot work in WebM because it doesn't have 
TrackTimecodeScale. Using a rationale number would also not work there 
as it would not supported. So we need to cover both methods in our 

I think we should explain the possibilities and limits of both methods 
(with and without TrackTimecodeScale) when solving issue #517. Both when 
muxing and demuxing to explain what rounding should be used.

> I have attached the program's source code so that you can check whether 
> or not I made any mistakes and modify the parameters to generate your 
> own counter-examples (or test approaches to resolving the issue).