Re: [Hls-interest] Image-based subtitles and trickplay tracks

"Law, Will" <wilaw@akamai.com> Tue, 26 May 2020 21:52 UTC

Return-Path: <wilaw@akamai.com>
X-Original-To: hls-interest@ietfa.amsl.com
Delivered-To: hls-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A1D633A09C8 for <hls-interest@ietfa.amsl.com>; Tue, 26 May 2020 14:52:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=akamai.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8sjmCenfdxaZ for <hls-interest@ietfa.amsl.com>; Tue, 26 May 2020 14:52:56 -0700 (PDT)
Received: from mx0a-00190b01.pphosted.com (mx0a-00190b01.pphosted.com [IPv6:2620:100:9001:583::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7C5B63A0A3E for <hls-interest@ietf.org>; Tue, 26 May 2020 14:52:55 -0700 (PDT)
Received: from pps.filterd (m0050093.ppops.net [127.0.0.1]) by m0050093.ppops.net-00190b01. (8.16.0.42/8.16.0.42) with SMTP id 04QLiQ6Y012347; Tue, 26 May 2020 22:52:47 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=jan2016.eng; bh=Ht4f3FhB07eS1sbLPWYv0o7m1RZWZUuSpiFlVAQOO68=; b=Aqe/GijvUI2LawkNWjz9hJKUkZQ0QTWC7WrY+qlWK5dcfLKYsP6s+XEBdtqSz/kLJNey WdrsU1yms3j2Q1cCHk8f8DAXaQ9fwrPSecTsyIDFHgpEMEwSvTS1wNVWFO7csaAFz06V IphKdvqyNn8YdQBDcz6atHx6wgbEkM6o3si0QsYyX1Va2Db4Cv6OxpB2FPMEz/bGZxXR H2bH0ApaXMds45Zi+bWyrTYMxUMl2KuYuyiHC0RzJfrPQrjcLDZ3R3NYK0ooxl0gBA4W 2FrtFmnjER3Ib64BER8k1JgBn6UcXRZ6DpVymmQmbISXtafc/ncF33znOj4cuWN9QdNf xw==
Received: from prod-mail-ppoint5 (prod-mail-ppoint5.akamai.com [184.51.33.60] (may be forged)) by m0050093.ppops.net-00190b01. with ESMTP id 316u3vs6g9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 26 May 2020 22:52:42 +0100
Received: from pps.filterd (prod-mail-ppoint5.akamai.com [127.0.0.1]) by prod-mail-ppoint5.akamai.com (8.16.0.27/8.16.0.27) with SMTP id 04QLHrp2023016; Tue, 26 May 2020 14:52:41 -0700
Received: from email.msg.corp.akamai.com ([172.27.123.34]) by prod-mail-ppoint5.akamai.com with ESMTP id 3171t8qyev-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 26 May 2020 14:52:40 -0700
Received: from USMA1EX-DAG1MB5.msg.corp.akamai.com (172.27.123.105) by usma1ex-dag1mb6.msg.corp.akamai.com (172.27.123.65) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 26 May 2020 17:52:40 -0400
Received: from USMA1EX-DAG1MB2.msg.corp.akamai.com (172.27.123.102) by usma1ex-dag1mb5.msg.corp.akamai.com (172.27.123.105) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 26 May 2020 17:52:38 -0400
Received: from USMA1EX-DAG1MB2.msg.corp.akamai.com ([172.27.123.102]) by usma1ex-dag1mb2.msg.corp.akamai.com ([172.27.123.102]) with mapi id 15.00.1497.006; Tue, 26 May 2020 17:52:37 -0400
From: "Law, Will" <wilaw@akamai.com>
To: "Weil, Nicolas" <nicoweil@elemental.com>, "May, Bill" <Bill.May@disneystreaming.com>, Roger Pantos <rpantos=40apple.com@dmarc.ietf.org>
CC: "hls-interest@ietf.org" <hls-interest@ietf.org>
Thread-Topic: [Hls-interest] Image-based subtitles and trickplay tracks
Thread-Index: AdYziudffSZypB1NTpWZomh1td5GPwAA+MmA
Date: Tue, 26 May 2020 21:52:36 +0000
Message-ID: <BB855961-4DA6-49D5-BFEC-EF8B85AEC241@akamai.com>
References: <89b3e7538c6d4477a97260da0a970e89@EX13D02EUB003.ant.amazon.com>
In-Reply-To: <89b3e7538c6d4477a97260da0a970e89@EX13D02EUB003.ant.amazon.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/16.36.20041300
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [172.19.87.111]
Content-Type: multipart/related; boundary="_004_BB8559614DA649D5BFECEF8B85AEC241akamaicom_"; type="multipart/alternative"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-05-26_02:2020-05-26, 2020-05-26 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=parse_limit adjust=0 reason=mlx scancount=1 engine=8.0.1-2004280000 definitions=main-2005260164
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-05-26_02:2020-05-26, 2020-05-26 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 spamscore=0 adultscore=0 cotscore=-2147483648 bulkscore=0 priorityscore=1501 phishscore=0 malwarescore=0 impostorscore=0 clxscore=1011 classifier=parse_limit adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005260167
Archived-At: <https://mailarchive.ietf.org/arch/msg/hls-interest/5nknzU-o-yKHK83CuZDOAjK4two>
Subject: Re: [Hls-interest] Image-based subtitles and trickplay tracks
X-BeenThere: hls-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions about HTTP Live Streaming \(HLS\)." <hls-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hls-interest>, <mailto:hls-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/hls-interest/>
List-Post: <mailto:hls-interest@ietf.org>
List-Help: <mailto:hls-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hls-interest>, <mailto:hls-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 May 2020 21:53:08 -0000

Some additional data for this thread from a recent study just conducted for Apple device playback against Akamai CDN by the Client Optimization Team last month. You can see that for this ATV playback session, over the time period monitored, there were 22K requests in total, of which 98% were small range requests for thumbnails/trickplay.

Every request has a fulfilment cost at some point. Tiled images would serve to lower the request rate against the edge, in this case, by an order of magnitude.   From a CDN perspective, we would lend our support towards the development of an optional tiled-image-based thumbnail solution for HLS.

Cheers
Will


[cid:image001.png@01D6336D.43ED3800]

From: "Weil, Nicolas" <nicoweil@elemental.com>
Date: Tuesday, May 26, 2020 at 12:22 PM
To: "May, Bill" <Bill.May@disneystreaming.com>, Roger Pantos <rpantos=40apple.com@dmarc.ietf.org>
Cc: "hls-interest@ietf.org" <hls-interest@ietf.org>
Subject: Re: [Hls-interest] Image-based subtitles and trickplay tracks

Comments inline.

From: May, Bill <Bill.May@disneystreaming.com>
Sent: Tuesday, May 26, 2020 9:31 AM
To: Roger Pantos <rpantos=40apple.com@dmarc.ietf.org>
Cc: Weil, Nicolas <nicoweil@elemental.com>; hls-interest@ietf.org
Subject: RE: [Hls-interest] Image-based subtitles and trickplay tracks


On May 22, 2020, at 11:17 AM, Roger Pantos <rpantos=40apple.com@dmarc.ietf.org<mailto:rpantos=40apple.com@dmarc.ietf.org>> wrote:

Hello Nicholas. Thanks for bringing these up. I have some questions:

On May 20, 2020, at 3:12 PM, Weil, Nicolas <nicoweil@elemental.com<mailto:nicoweil@elemental.com>> wrote:

Hello,

We are often seeing two image-related topics causing interoperability problems as they are not currently covered by the HLS spec. Normalizing the implementations around an official specification for these two points would be great:

Image-based subtitles tracks
For workflow reasons and charset reasons, some content owners don't include text-based subtitles in the live channels sources that they provide to distributors, but rather image-based subtitles (like DVB-Sub). While it's possible to transform these subtitles as IMSC1 Image Profile as per DASH-IF IOP section 6.4.4, there is no equivalent IMSC1 Image Profile support in the HLS RFC, which means that companies will continue to rely on proprietary forks of the HLS RFC to support these use cases. Even if it wasn't supported by Apple players, it would be tremendously helpful for interoperability in the rest of the HLS ecosystem.

I'd like to understand how widely validated the Image Profile of IMSC1 has been. Can anyone volunteer some examples where it’s been commercially deployed successfully? (Specifically IMSC1, vs. some other fork of TTML.)
[NW] IMSC1 Image profile is now supported by ATSC3, DASH-IF IOP (with support in dash.js) and IMF (with support in IMFTool<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IMFTool_IMFTool&d=DwMGaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=KkevKJerDHRF9WRs8nW8Ew&m=dBy7sHrrIgjIMBRCpRh9IfMnwq8RKZHcgJLKmlBotj0&s=fBrrNLMD_eSxO9MYInkAMOhyaX6Zicdvk3HTD6p5mho&e=> which development has been sponsored by Netflix and other studios initially).

Another question: there are requirements in the US at least that require the ability to change font sizes, colors, etc.   And, TBH, these are changes that help people world-wide.

How would you meet those requirements with bit mapped subtitles?  Wouldn’t it be better to work to eliminate bitmapped subtitles completely?
[NW] I believe these font size/color change requirements can be satisfied with IMSC1 Text Profile which is supported in rfc8216bis since 2017.
As much as I’d like to get rid of bitmap subtitles, sometimes the content owners cannot provide anything else than bitmaps in the source feed. And it’s very challenging to apply a reliable OCR pass on it, for all target languages (Latin/Cyrillic/Asian/… charsets). IMSC1 Image Profile has got a decent industry support, and the Text Profile is already supported in HLS, so I would expect it to be a natural extension for HLS to support also the Image Profile.

Image-based trickplay tracks
For player resources optimization reasons, the use of a video track as a trickplay artefact is not always possible, and a lot of player providers recommend the use of image thumbnails tracks instead of special low framerate video tracks. DASHIF IOP section 6.2.6 covers this use case but there is equivalent support in the HLS RFC. There is the Image Media playlists HLS extension proposal from Roku/Disney/WarnerMedia here https://github.com/image-media-playlist/spec<https://urldefense.proofpoint.com/v2/url?u=https-3A__nam12.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com-252Fimage-2Dmedia-2Dplaylist-252Fspec-26data-3D02-257C01-257Cbill.may-2540disneystreaming.com-257C49edde2508884eb51cca08d7fe7c807e-257C65f03ca86d0a493e9e4ac85ac9526a03-257C1-257C0-257C637257683002059478-26sdata-3D9lV-252BiNvxiK5feFCV1hYgo5I1lW0TEpMlWl-252BpwKwVKZU-253D-26reserved-3D0&d=DwMGaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=KkevKJerDHRF9WRs8nW8Ew&m=dBy7sHrrIgjIMBRCpRh9IfMnwq8RKZHcgJLKmlBotj0&s=yKAoFpJAUPAHylJJXOXaXVSeWHIwBnO5F9MOWl5tgWc&e=> but its relevance/adoption is currently limited by the fact that it's not part of the RFC. Same logic here: even if not supported by Apple players which don't need it as they can leverage I-frame tracks, it would be super useful for the rest of the HLS ecosystem to get this officially part of the RFC.

I'd like to better understand what’s driving this. Is the limitation essentially one of not being able to support an AVC decoder for i-frame display?

If that’s the case then it seems that putting JPEG images into fMP4 containers and using EXT-X-I-FRAME-STREAM-INF would be a smaller extension to HLS, both in terms of departure from the existing approach and less new spec to invent.

One of the things I don’t love about the image-media-playlist spec is that it doesn’t follow the regular HLS timing model, where the media presentation time is defined in the media data itself. Instead it relies on precise synchronization of the EXTINF values, which seems like a recipe for long-term accumulation of floating point error, as well as difficult to achieve with multiple geographically-dispersed packagers for live.


The limitation is exactly that.  A second decoder (AVC or HEVC) is not available on many devices.  This also makes mid-fragment switching difficult as well and makes switching between codecs impossible as well.

The image-media-playlist spec does rely somewhat on floating point; no more so that a seek to date or seek to time does in a regular HLS playlist, however.  I’m not sure that anyone is asking for precise millisecond switching from these images to regular AV.

I see 2 solutions to this problem: give a PTS/timescale in the HLS playlist (something like we did for transport stream to webVTT timing, but in the playlist), or wrap the jpeg in some sort of wrapper with timing (fmp4?).  It would be good, if that is the route, to have guidance from Apple on what specification to use.

The thing about JPEGs is that they are easy; almost any software decodes them; wrapping them in FMP4 doesn’t make it easier or better.

[NW] Thumbnails in the DASH-IF IOP are simple jpeg images and it makes it easy to produce and to manipulate on the service side (like aggregating several live thumbnails into tiles of thumbnails when a program is transitioned from Live to VOD). Using the same simple image container would allow direct interoperability with DASH, without requiring an additional CMAF+DASH specification cycle. As regards HLS, I was hoping that the use of EXT-PROGRAM-DATE-TIME would become mandatory, as per the preliminary LL-HLS specification. That would give use the millisecond-accurate time reference that we need to avoid drifts if we keep images in a simple image container.