Re: [Cellar] Rationale Numbers for timestamps

Dave Rice <dave@dericed.com> Sun, 15 January 2017 16:24 UTC

Return-Path: <dave@dericed.com>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 89C5B1293E3 for <cellar@ietfa.amsl.com>; Sun, 15 Jan 2017 08:24:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.652
X-Spam-Level:
X-Spam-Status: No, score=0.652 tagged_above=-999 required=5 tests=[SPF_NEUTRAL=0.652] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZWYPL742pBDL for <cellar@ietfa.amsl.com>; Sun, 15 Jan 2017 08:24:26 -0800 (PST)
Received: from s172.web-hosting.com (s172.web-hosting.com [68.65.122.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 03D321293D9 for <cellar@ietf.org>; Sun, 15 Jan 2017 08:24:26 -0800 (PST)
Received: from cpe-104-162-86-103.nyc.res.rr.com ([104.162.86.103]:47435 helo=[10.0.1.3]) by server172.web-hosting.com with esmtpsa (TLSv1:ECDHE-RSA-AES256-SHA:256) (Exim 4.87) (envelope-from <dave@dericed.com>) id 1cSnbT-001KHr-Tq; Sun, 15 Jan 2017 11:24:25 -0500
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Dave Rice <dave@dericed.com>
In-Reply-To: <CAOXsMF+B3ruoqqF8DKCjLp74OqVA4d2Zhb+txH28t6S_eY4PxQ@mail.gmail.com>
Date: Sun, 15 Jan 2017 11:24:21 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <34FB7CEC-68BF-47C7-AC12-BD27191C034F@dericed.com>
References: <CAOXsMFKNXNcvRzsYkQw23dz=5e6A4TGbq=rO=3KW8rokLUOeEQ@mail.gmail.com> <3363d485-83af-d46b-3bca-3b0359ca9bb2@mediaarea.net> <CAOXsMF+B3ruoqqF8DKCjLp74OqVA4d2Zhb+txH28t6S_eY4PxQ@mail.gmail.com>
To: Steve Lhomme <slhomme@matroska.org>
X-Mailer: Apple Mail (2.3124)
X-OutGoing-Spam-Status: No, score=-2.9
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server172.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - dericed.com
X-Get-Message-Sender-Via: server172.web-hosting.com: authenticated_id: dave@dericed.com
X-Authenticated-Sender: server172.web-hosting.com: dave@dericed.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/mTprgjNqVbe20e6hyYxns8ZnVwY>
Cc: Jerome Martinez <jerome@mediaarea.net>, cellar@ietf.org
Subject: Re: [Cellar] Rationale Numbers for timestamps
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Jan 2017 16:24:27 -0000

Hi Jerome, Steve,

> On Jul 27, 2016, at 3:28 PM, Steve Lhomme <slhomme@matroska.org> wrote:
> 
> 2016-07-27 6:13 GMT+02:00 Jerome Martinez <jerome@mediaarea.net>:
>> On 27/07/2016 04:35, Steve Lhomme wrote:
>>> 
>>> As the topic often resurfaces maybe it's time we discuss it in the
>>> context of CELLAR.
>>> 
>>> There have been talks about ways to do it in the Matroska mailing
>>> before in this thread:
>>> 
>>> https://lists.matroska.org/pipermail/matroska-devel/2012-October/004294.html
>>> 
>>> https://lists.matroska.org/pipermail/matroska-devel/2012-October/004295.html
>>> 
>>> https://lists.matroska.org/pipermail/matroska-devel/2012-October/004296.html
>>> 
>>> https://lists.matroska.org/pipermail/matroska-devel/2012-October/004299.html
>>> 
>>> An element to handle it was added (still in specdata.xml but commented)
>>> 
>>> https://lists.matroska.org/pipermail/matroska-devel/2012-September/004289.html
>>> and later removed
>>> 
>>> https://lists.matroska.org/pipermail/matroska-devel/2012-December/004317.html
>>> 
>>> So far there was nothing conclusive but it seems it's not possible to
>>> do it properly in a backward compatible way.
>>> 
>>> The constraints are simple:
>>> - backward compatible, so the 16 bits timestamp value in the
>>> (Simple)Block must remain unchanged. This value is multiplied by the
>>> scale (global and/or per track) to get the actual timestamp.
>>> - There could be some acceptable rounding between the two coexisting
>>> systems.
>>> - The rounding should happen in the current system, the rationale
>>> should always be exact (provided the source is exact)
>>> - A lot of Matroska elements are based on the timestamp scales. So
>>> they may be adjusted accordingly
>>> - We must minimize the elements where the same information is found
>>> twice. Basically only scale elements should be added. Maybe some
>>> elements for the Cluster timestamp might be added as well if
>>> necessary.
>>> - because of the 16 bits range per Cluster differences between a
>>> rationale system and the ns system must be tight.
>>> - we still want Clusters to be 4-5s long to minimize the scope of
>>> errors and buffering before playback. It may also correspond to some
>>> GOP values.
>>> - sample accuracy for audio would be good.
>>> 
>>> Let the math begin.
>> 
>> 
>> Rationale Numbers for timestamps are usually for US style fixed framerate,
>> so I think my answer in a previous email still applies.
>> https://mailarchive.ietf.org/arch/msg/cellar/WQCExNvAZy4goibE3_chnFB1w0c
>> Please comment about which need would not be fulfilled with this proposal.
>> 
>> Copy/paste:
>> 
>> I don't like the idea to have an incompatible change with such feature,
>> because this is not a "must have" for 99% of people, and Matroska "v5"
>> will not be adopted with that.
>> I think that we can have a workaround for having additional 0.99% happy
>> without the need of a break in compatibility:
>> - lot of people want to know the exact frame rate of a fixed frame rate
>> stream, people muxing from tapes know that the frame rate is fixed
>> (hardware limitation) so can set this element in the header.
> 
> Let's rewind a decade or more. The timestamps were originally using
> floating points on purpose. At the time all sources were analog (film
> or video tape). And in the analog world the clock is never perfect.
> You cannot expect to have 30000/1001 frames per second and 48000
> samples a second for hours without some drift. And I'd rather have the
> timing as accurate as the analog source rather than clean it for the
> digital world.

It’s true that analog format frame rates are more analogous to floating points. A video tape may stretch unevenly from use so that frames don’t start with exact precision. However these discrepancies complicate digitization and most videotape digitization will use a hardware time base corrector to buffer analog video in order to transmit it accordingly to a more precise clock, so that the resulting digital file will have already had it’s frame rate ‘cleaned’.

To Jerome, by "set this element in the header” do you mean within the Info Element similar to how it contains a nominal Duration Element? Or do you mean to set this within the associated Track Element? I’m assuming per track.

> In the context of digitizing for archivists I don't know if that's
> something that's taken in account. Dave ? For example how is that
> solved for movies from the early days where a guy would have to turn a
> handle to advance the film. It's definitely not a very precise/clean
> clock.

For videotapes that are stretched or shrunken to affect the presentation timing, these deviations are not the intent of the recording, so using a time base corrector fixes the frame rate. Also in the case of film shrinkage the projector should still present at 24 fps although damage is more likely. So I think following the intent of the format whether 30000/1001 or 24 fps is correct.

For hand cranked silent film, the film prints would usually be distributed with recommended frame rates, so that the projectionist is ‘trying’ to crank in order to present at 18 fps (or as directed). If that film is digitization to a Matroska file then I think that the Matroska should also be labelled for an 18 fps projection just as the film print would have been. If there’s desire to replicate the unperfect cranking of a human operator than that could be accomplished by the player and not the file itself. Should I file a ticket for VLC to accept input from a USB-attached hand crank?

>> putting
>> frame rate info in the header with numerator and denominator is enough
>> for doing the difference between 30000/1001 and 29970/1000, and the
>> current millisecond timestamp is enough (a demuxer aware of the
>> framerate element can "correct" the timecode from e.g. 0.033s to
>> 1001/30000 second; an old player just play the file with 1 ms
>> precision). We could add some rules in case a timecode does not fit the
>> frame rate (e.g. if timecode is 0.031, frame rate info is considered not
>> applicable)
> 
> In video the issue is not too big. But if we want sample precision for
> audio, the fraction is a lot bigger (up to 192000 fps with modern pro
> hardware). And with a timestamp scale shared among tracks that could
> be problematic. If we want sample precision for audio and video we
> could end up we'd lose a big range of the 16 bits available per
> Cluster. Each of the 65536 Block timestamp would be 1/192,000s
> (0.0052ms) giving a max Cluster duration of 0.341333s (65536/192000).
> If we want to precision for a 1001/30000 track and a 1/192000 track we
> would need a fraction of 1001/(30000*192000) (0.000173ms) resulting in
> a Cluster max duration of 0.011s.

Yes, this approach is not scalable. One could have 24000/1001 video, 30000/1001 video, 44100 hz audio, and 192000 hz audio in the same file.

> If the muxer knows that a track will always pack samples by an amount
> X (1152 in MP3 for example) we can regain some duration. But that's
> not always the case. Some codecs like Vorbis pack only a small amount
> of samples per frame and the amount can vary between frames (In Vorbis
> I, legal frame sizes are powers of two from 64 to 8192 samples). The
> worst case scenario (I know of) of Vorbis at 192 KHz, frame precision
> of the timestamp means we can only have a Cluster duration of 21.84s
> (65536*64/192000). Mixed with NTSC video that's
> 65536*64*1001/30000*192000 = 0.72s max duration for sample precision
> per frame (not sample but it's close enough). That's still 21 video
> frames in that Cluster. Not too bad.
> 
>> - people having variable frame rate with very precise timestamp need to
>> know it before starting the muxing and can use a Timecodescale element
>> of 1 (so precision of 1 nanosecond, usually enough when there is no
>> fixed frame rate).
> 
> I can think of 3 different variable frame rate use cases:
> 
> - mixing video (25/29fps) and film sources (24fps) into one final video
> - a camera dropping some frames or only storing images when they
> change. In this case there's likely a stable main clock like in
> constant frame rate.
> - a camera that would work like the eye. When there's
> adrenaline/danger you capture images faster. In this case there's no
> real clock. The timestamps can happen at anytime.

I understand the issues of adding a frame rate numerator and denominator to the track header as multiple timestamps could round to the same increment of the frame rate. However what if the rounding to the nearest increment of the frame rate is enabled by one of the reserved flag bits of the Block Header (such as bit 1 of flags = Enable TimeScale Alignment)?

Thus if the frame rate of the track header is 120000/1001, then

If Matroska timecode is 4 and Enable TimeScale Alignment is 0, than it is at 4 / (1000000000 / TimecodeScale ).
If Matroska timecode is 4 and Enable TimeScale Alignment is 1, than it is at 0 / 1200000 (nearest increment of the rationale frame rate).

If Matroska timecode is 17 and Enable TimeScale Alignment is 0, than it is at 17 / (1000000000 / TimecodeScale ).
If Matroska timecode is 17 and Enable TimeScale Alignment is 1, than it is at 2002 / 1200000 (nearest increment of the rationale frame rate).

Ignoring the rationale frame rate and the Enable TimeScale Alignment bit would still decode to the accuracy supported by the TimecodeScale; however a player that uses the rationale frame rate and Enable TimeScale Alignment could round the timecode to the nearest increment of the frame rate.

Such a feature would not be useful for mixed timebase content (documentary that mixes 24000/1001 and 30000/1001 framerates together).

>> Which real life scenario can not deal with such workaround?
> 
> The question is: why do we want rationale number if we're going to
> allow some form of rounding between the tracks that match the
> timestamp scale precisely and the one that don't. IMO if there's
> rounding let's just keep what we have.
> 
>> Compared to the concerns in:
>> https://lists.matroska.org/pipermail/matroska-devel/2012-October/004294.html
>> It is per track so it resolves most concern
>> For PCM, the corresponding video framerate is sometimes used and it would
>> not be resolved correctly, right (i.e. at 30000/1001 fps, there are
>> sometimes 1601 samples and sometimes 1602 samples and it is not possible to
>> do the difference with the 16-bit timestamp having the millisecond
>> precision)
>> Note that my proposal is similar to the one in the mail having criticism
>> about TimecodeScaleDenominator (proposal of TrackTimeBaseNumerator and
>> TrackTimeBaseDenominator)

[…]

Best Regards,
Dave Rice