Re: [Cellar] AV1 seeking

Steve Lhomme <slhomme@matroska.org> Mon, 16 July 2018 08:01 UTC

Return-Path: <slhomme@matroska.org>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 00573130DD4 for <cellar@ietfa.amsl.com>; Mon, 16 Jul 2018 01:01:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=matroska-org.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hzWWc40LIAHL for <cellar@ietfa.amsl.com>; Mon, 16 Jul 2018 01:01:48 -0700 (PDT)
Received: from mail-pl0-x230.google.com (mail-pl0-x230.google.com [IPv6:2607:f8b0:400e:c01::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C768C130DE0 for <cellar@ietf.org>; Mon, 16 Jul 2018 01:01:48 -0700 (PDT)
Received: by mail-pl0-x230.google.com with SMTP id f4-v6so10725490plb.9 for <cellar@ietf.org>; Mon, 16 Jul 2018 01:01:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=matroska-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=BKeGucWJ50dAzJzYcUMduS3bIMOqf8nYUZ51gKR94Bs=; b=ZJ4Uj+FxBZfpxirSCwJ3lxMc6pK5Uk35ldSyleZ3MUOQzUM6rucC3FsokH4bxwQ5kA Z2cqXIBGdTE6HTD4r4LvcjCNtjPLyybsi4NCjUjSFBczdXm+Hy3eb6HGtL251pn75xoK HFo5uN1fJSeoq07N8Yr2j1O4m78Y4W8p4Itn4zbOF5c0Hbgyc4/etLZqP6QuyYzxXq7K RY+DuVTOaczHacqX0f/zMKWP6wuIKlfydHNkL5bKP7FutUy57yBv+m6fGl8ejD5IxYUf tdZ81E56D2HWDqlBejCjzQTE/MS7QJz8EnKQ6qR6K3ds07f9hMokwxM2unNiAkwqCKOY 52ow==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=BKeGucWJ50dAzJzYcUMduS3bIMOqf8nYUZ51gKR94Bs=; b=fIo/CFTo8ZGSYwaEtJYZOEfMCIlE+WLLLqc2+LW4/NMKiUcgjprGNAJBm46EJHVJaL hJuHMVoYFOVQpN8ZUdJTM7oGqheTaX20l/J4wth8qsW5vIpVFtTwDHatL5ueyWYR6p6c f9GI/J2nzcZCcHXKbMXD5zvpZM7g08jIWXkwuUen85UuQ3CrWziPulT4HAMm3gm7XYxZ luEJXbyIOQBoC63JFi8q3bk3VGxRley370gPVOWXND9xkPTMkPMR4leOCy3t3dSoeM7V 4RYaG4BKVq94KKdpKsDGYXoZkxalRJNsFHn0H9IGIlotQURmZtmLpvEQCr6qjxwcuOjX sITg==
X-Gm-Message-State: AOUpUlF1V+xqBrvhv4n3cq+uG5ZX34toeUqWIFQ/SSI9FzHH7fTbfkMy 98MuyvbdZNCTtizp0ZJHQtLvNE5jRkGIXxbZn6rit2tp
X-Google-Smtp-Source: AAOMgpdyfmNXdVtwdvF3YkHP7tpelEdPwKDK4DgUSHiUGI/A+wlVZq9jVOiWm7Xxvm+CWa+z4reN3BE+vTKBtQ0ajlU=
X-Received: by 2002:a17:902:2f43:: with SMTP id s61-v6mr15460711plb.274.1531728108250; Mon, 16 Jul 2018 01:01:48 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a17:90a:9c13:0:0:0:0 with HTTP; Mon, 16 Jul 2018 01:01:47 -0700 (PDT)
In-Reply-To: <62c29889-49f5-6634-049a-a2d73315bb3c@googlemail.com>
References: <CAOXsMFKTNCxYcviYS0h_VYjegV3RZFvZ7AV7GhdCq=oeGmgMuQ@mail.gmail.com> <62c29889-49f5-6634-049a-a2d73315bb3c@googlemail.com>
From: Steve Lhomme <slhomme@matroska.org>
Date: Mon, 16 Jul 2018 10:01:47 +0200
Message-ID: <CAOXsMFKgA2PN-SUFZdCauXmOzyU-T0cHP6nxVeNOVdC1z1uWhg@mail.gmail.com>
To: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
Cc: Codec Encoding for LossLess Archiving and Realtime transmission <cellar@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/IQ1Gd77jJkIgAk1Po_zb7rlbM7o>
Subject: Re: [Cellar] AV1 seeking
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jul 2018 08:01:51 -0000

Hi,

2018-07-16 0:19 GMT+02:00 Andreas Rheinhardt
<andreas.rheinhardt@googlemail.com>:
>> In MP4 they have [initial_presentation_delay_minus_one] in the
>> CodecPrivate. I did not understand it so far because it's not found in
>> the AV1 spec. But it seems to guarantee that to read frame 'f' you
>> need to decode X frame before that one. In our case that would be 5 to
>> have at least a decoded.
>>
> You completely misunderstood this field. It is the equivalent of the
> max_num_reorder_frames value from H.264. Let me explain it in MPEG
> terminology (with b-frames) as you probably have way more experience
> with this: Consider a decoder that can only decode one frame per unit of
> time and a stream like this (left to right is decoding order; the
> numbers are presentation order):
> I0 P2 B1 ...
> If one displayed the leading I frame immediately after decoding it, one
> would not have the right frame to display at time 1, because at that
> time only I0 and P2 has been decoded, not B1. Therefore one has to
> decode I0 and P2 before one outputs the first frame and and
> max_num_reorder_frames would be 1. The typical b-pyramid would require
> to decode the first three frames before the display of the first frame
> and max_num_reorder_frames would be 2.
> The [initial_presentation_delay_minus_one] is the AV1 analogue of this.
> This number is not an upper bound for the amount of temporal units
> between delayed RAP and recovery point/for the amount of frames shared
> between a GOP. Just look at this example:
> MPEG example
> I0 P5 B1 B2 B3 B4 I10 B6 B7 B8 B9
> AV1 example (I kept the MPEG-naming with P and B to make it easier
> comparable to the above; furthermore the pointer *x denotes that a frame
> is showable and x without * is a frame header that outputs *x via
> show_existing_frame;  square brackets are the delimiters of temporal
> units; I10 is the delayed RAP frame)
> [I0] [*P5 B1] [B2] [B3] [B4] [P5] [*I10 B6] [B7] [B8] [B9] [I10] ...
> This stream can have [initial_presentation_delay_minus_one] equal to 1,

A value of 1 means 2 frame delay. That would mean B6 needs *I10 and P5 ?

> yet in order to seek to [I10] one has to decode the temporal unit [*I10
> B6] (or at least the decodable keyframe in it) which is four temporal
> units in front of [I10].

That seems more like it.

In the Frame presentation timing paragraph (E.4.7) it says:

InitialPresentationDelay =  Removal [ initial_display_delay_minus_1 ]
+ TimeToDecode [ initial_display_delay_minus_1 ]

and

PresentationTime[ 0 ] = InitialPresentationDelay
PresentationTime[ j ] = InitialPresentationDelay + (
frame_presentation_time[ j ] - frame_presentation_time[ 0 ] ) * DispCT

or in constant bitrate mode

PresentationTime[ 0 ] = InitialPresentationDelay
PresentationTime[ j ] = PresentationTime[ j - 1 ] + (
num_ticks_per_picture_minus_1 + 1 ) * DispCT


And our `Block` timestamp derives directly from this
[PresentationTime]. The delay is carried over every frame. So it does
seem like we need to globally shift these timestamps for the Track.
Possibly with `CodecDelay`.

Basically each frame has its timestamp on which this delay has to be
added (can't be negative). While `CodecDelay` is a positive value that
needs to be substracted from the timestamps. It is available in WebM,
because Opus needs it.

Let's assume CodecDelay = InitialPresentationDelay

The `Block` timestamps would be stored like this:
Block[ 0 ] = 0
Block[ 1 ] = frame_presentation_time[ 1 ] * DispCT
Block[ 2 ] = frame_presentation_time[ 2 ] * DispCT
...

And on output of the demuxer we would get:
Block[ 0 ] = -InitialPresentationDelay
Block[ 1 ] = -InitialPresentationDelay + frame_presentation_time[ 1 ] * DispCT
Block[ 2 ] = -InitialPresentationDelay + frame_presentation_time[ 2 ] * DispCT
...

`CodecDelay` cannot be used here. But it's very close to what we need.

It's not the SeekPreRoll either because the actual frame timestamps of
each frame is affected, not just when seeking (although it may be
needed for delayed RAP).

We also need to figure out whether the Blocks need to be stored with
or without the delay. If that's with the delay then we don't even need
to care about it. The frames will just not start at 0 but that's
already the case in the original stream.

But that may just be the tricky part here, the reference point. The
PresentationTime[ 0 ] doesn't start at 0 because some frames were
needed to decode before getting actual usable data out of the decoder.
But this is really the first frame to display so it would actually be
0 on the output of the demuxer. So it does seem exactly like what
CodecDelay does:
"CodecDelay is The codec-built-in delay in nanoseconds. This value
must be subtracted from each block timestamp in order to get the
actual timestamp."

The `Block` timestamp would be PresentationTime[ 0 ] which isn't 0.
But on the output Block[ 0 ] should really give 0. So we would have
something like this in the file:
Block[ 0 ] = InitialPresentationDelay
Block[ 1 ] = InitialPresentationDelay + frame_presentation_time[ 1 ] * DispCT
Block[ 2 ] = InitialPresentationDelay + frame_presentation_time[ 2 ] * DispCT
...

And on the output of the demuxer we would have the following, when
`CodecDelay` is [InitialPresentationDelay]:
Block[ 0 ] = 0
Block[ 1 ] = frame_presentation_time[ 1 ] * DispCT
Block[ 2 ] = frame_presentation_time[ 2 ] * DispCT
...

It does look like what the internal AV1 display delay is trying to achieve.

The `CodecDelay` also has this note on proper muxing:
The value SHOULD be small so the muxing of tracks with the same actual
timestamp are in the same Cluster.

Because muxing is done based on the stored `Block` timestamp. It
usually doesn't take in account CodecDelay (but it could/should).

So I'll update my AV1 codec mapping saying `CodecDelay` is
[InitialPresentationDelay] and how to use it.