Re: [Hls-interest] Image-based subtitles and trickplay tracks

Roger Pantos <rpantos@apple.com> Wed, 27 May 2020 18:15 UTC

Return-Path: <rpantos@apple.com>
X-Original-To: hls-interest@ietfa.amsl.com
Delivered-To: hls-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 522653A07BE for <hls-interest@ietfa.amsl.com>; Wed, 27 May 2020 11:15:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=apple.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zlGZOljk6NVy for <hls-interest@ietfa.amsl.com>; Wed, 27 May 2020 11:15:34 -0700 (PDT)
Received: from ma1-aaemail-dr-lapp01.apple.com (ma1-aaemail-dr-lapp01.apple.com [17.171.2.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 90AD53A07DD for <hls-interest@ietf.org>; Wed, 27 May 2020 11:15:34 -0700 (PDT)
Received: from pps.filterd (ma1-aaemail-dr-lapp01.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp01.apple.com (8.16.0.42/8.16.0.42) with SMTP id 04RHxg60058674; Wed, 27 May 2020 11:15:30 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=from : message-id : content-type : mime-version : subject : date : in-reply-to : cc : to : references; s=20180706; bh=Q7rI+f7PBN/nNLEk7Q80Kglbd+mQ7o7Uxa8OIMo7laA=; b=O79GrG4WRLTxajFS76n5XQGzivaNAUy5mPwew/wDTHXCQI9bB+VigkwmWSS9WnoTqdeV YR6ySzyMjaPiRmLWy/l3VapNsXtYYJLAwvzfv7OPJt7Jv9dDD6dqwEEGQ6Q99d8Xydgb eUP3arq7iJqkTNqzF7hlnwgLPG07Wu/eMRllQ5bOxPAL5dViCvAH3+LOXdlubi8QVCBP DdRihhxcZPVx2YevPh+UjqZ1U23yAzUuRyr9iDDqyHUtwOYePN2t+XptEfksrhoa5etS KqHTND0AP+eGufrWHkDhZ8ctKELPwOvYbydFnVUP8Mk0BMYRtfV/CQv9d1eawo/4HVkA nw==
Received: from rn-mailsvcp-mta-lapp04.rno.apple.com (rn-mailsvcp-mta-lapp04.rno.apple.com [10.225.203.152]) by ma1-aaemail-dr-lapp01.apple.com with ESMTP id 3172g395a3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 27 May 2020 11:15:29 -0700
Received: from rn-mailsvcp-mmp-lapp02.rno.apple.com (rn-mailsvcp-mmp-lapp02.rno.apple.com [17.179.253.15]) by rn-mailsvcp-mta-lapp04.rno.apple.com (Oracle Communications Messaging Server 8.1.0.5.20200312 64bit (built Mar 12 2020)) with ESMTPS id <0QB0008DJ5DTIG60@rn-mailsvcp-mta-lapp04.rno.apple.com>; Wed, 27 May 2020 11:15:29 -0700 (PDT)
Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp02.rno.apple.com by rn-mailsvcp-mmp-lapp02.rno.apple.com (Oracle Communications Messaging Server 8.1.0.5.20200312 64bit (built Mar 12 2020)) id <0QB0007004OQHI00@rn-mailsvcp-mmp-lapp02.rno.apple.com>; Wed, 27 May 2020 11:15:29 -0700 (PDT)
X-Va-A:
X-Va-T-CD: 901909f592c2daf38130fe6cbae049ed
X-Va-E-CD: 53ef3156fdf55cbab86ead6654ef657e
X-Va-R-CD: 85ee598cd49b4e89db568df59764707c
X-Va-CD: 0
X-Va-ID: d941c9ee-edea-4665-b987-7b69bd994da2
X-V-A:
X-V-T-CD: 901909f592c2daf38130fe6cbae049ed
X-V-E-CD: 53ef3156fdf55cbab86ead6654ef657e
X-V-R-CD: 85ee598cd49b4e89db568df59764707c
X-V-CD: 0
X-V-ID: f04c028e-ab49-46ea-879e-be1c7bc7828c
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-05-27_03:2020-05-27, 2020-05-27 signatures=0
Received: from [192.168.1.19] ([17.194.69.193]) by rn-mailsvcp-mmp-lapp02.rno.apple.com (Oracle Communications Messaging Server 8.1.0.5.20200312 64bit (built Mar 12 2020)) with ESMTPSA id <0QB000QJQ5DSY710@rn-mailsvcp-mmp-lapp02.rno.apple.com>; Wed, 27 May 2020 11:15:29 -0700 (PDT)
From: Roger Pantos <rpantos@apple.com>
Message-id: <0BE81246-FBFE-4782-A309-CE26BC5AF05B@apple.com>
Content-type: multipart/alternative; boundary="Apple-Mail=_2AF635F1-72BA-4AE4-823D-537E0EC4CAC7"
MIME-version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
Date: Wed, 27 May 2020 11:15:28 -0700
In-reply-to: <89b3e7538c6d4477a97260da0a970e89@EX13D02EUB003.ant.amazon.com>
Cc: "hls-interest@ietf.org" <hls-interest@ietf.org>
To: "Weil, Nicolas" <nicoweil@elemental.com>
References: <89b3e7538c6d4477a97260da0a970e89@EX13D02EUB003.ant.amazon.com>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-05-27_03:2020-05-27, 2020-05-27 signatures=0
Archived-At: <https://mailarchive.ietf.org/arch/msg/hls-interest/YgUHXriySCi1l3zcnF1KMGvqvcM>
Subject: Re: [Hls-interest] Image-based subtitles and trickplay tracks
X-BeenThere: hls-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions about HTTP Live Streaming \(HLS\)." <hls-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hls-interest>, <mailto:hls-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/hls-interest/>
List-Post: <mailto:hls-interest@ietf.org>
List-Help: <mailto:hls-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hls-interest>, <mailto:hls-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 27 May 2020 18:15:37 -0000


> On May 26, 2020, at 12:21 PM, Weil, Nicolas <nicoweil@elemental.com> wrote:
> 
> Comments inline. 
>  
> From: May, Bill <Bill.May@disneystreaming.com <mailto:Bill.May@disneystreaming.com>> 
> Sent: Tuesday, May 26, 2020 9:31 AM
> To: Roger Pantos <rpantos=40apple.com@dmarc.ietf.org <mailto:rpantos=40apple.com@dmarc.ietf.org>>
> Cc: Weil, Nicolas <nicoweil@elemental.com <mailto:nicoweil@elemental.com>>; hls-interest@ietf.org <mailto:hls-interest@ietf.org>
> Subject: RE: [Hls-interest] Image-based subtitles and trickplay tracks
>  
>  
> 
> On May 22, 2020, at 11:17 AM, Roger Pantos <rpantos=40apple.com@dmarc.ietf.org <mailto:rpantos=40apple.com@dmarc.ietf.org>> wrote:
>  
> Hello Nicholas. Thanks for bringing these up. I have some questions:
>  
> 
> On May 20, 2020, at 3:12 PM, Weil, Nicolas <nicoweil@elemental.com <mailto:nicoweil@elemental.com>> wrote:
>  
> Hello,
>  
> We are often seeing two image-related topics causing interoperability problems as they are not currently covered by the HLS spec. Normalizing the implementations around an official specification for these two points would be great:
>  
> Image-based subtitles tracks
> For workflow reasons and charset reasons, some content owners don't include text-based subtitles in the live channels sources that they provide to distributors, but rather image-based subtitles (like DVB-Sub). While it's possible to transform these subtitles as IMSC1 Image Profile as per DASH-IF IOP section 6.4.4, there is no equivalent IMSC1 Image Profile support in the HLS RFC, which means that companies will continue to rely on proprietary forks of the HLS RFC to support these use cases. Even if it wasn't supported by Apple players, it would be tremendously helpful for interoperability in the rest of the HLS ecosystem.
>  
> I'd like to understand how widely validated the Image Profile of IMSC1 has been. Can anyone volunteer some examples where it’s been commercially deployed successfully? (Specifically IMSC1, vs. some other fork of TTML.)
> [NW] IMSC1 Image profile is now supported by ATSC3, DASH-IF IOP (with support in dash.js) and IMF (with support in IMFTool <https://github.com/IMFTool/IMFTool> which development has been sponsored by Netflix and other studios initially).
It’s certainly a measure of faith that those standards include the IMSC1 image profile but that wasn’t what I asked. Do you (or anyone else) know of anyone who has successfully launched a commercial or wide scale IMSC1 image deployment, based on DASH or anything else? We’ve often found devils in the details that are not discovered until something has been completely built and qualified.
>  
> Another question: there are requirements in the US at least that require the ability to change font sizes, colors, etc.   And, TBH, these are changes that help people world-wide.
>  
> How would you meet those requirements with bit mapped subtitles?  Wouldn’t it be better to work to eliminate bitmapped subtitles completely?
> 
> [NW] I believe these font size/color change requirements can be satisfied with IMSC1 Text Profile which is supported in rfc8216bis since 2017. 
> As much as I’d like to get rid of bitmap subtitles, sometimes the content owners cannot provide anything else than bitmaps in the source feed. And it’s very challenging to apply a reliable OCR pass on it, for all target languages (Latin/Cyrillic/Asian/… charsets). IMSC1 Image Profile has got a decent industry support, and the Text Profile is already supported in HLS, so I would expect it to be a natural extension for HLS to support also the Image Profile.
>  
> Image-based trickplay tracks
> For player resources optimization reasons, the use of a video track as a trickplay artefact is not always possible, and a lot of player providers recommend the use of image thumbnails tracks instead of special low framerate video tracks. DASHIF IOP section 6.2.6 covers this use case but there is equivalent support in the HLS RFC. There is the Image Media playlists HLS extension proposal from Roku/Disney/WarnerMedia here https://github.com/image-media-playlist/spec <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fimage-media-playlist%2Fspec&data=02%7C01%7Cbill.may%40disneystreaming.com%7C49edde2508884eb51cca08d7fe7c807e%7C65f03ca86d0a493e9e4ac85ac9526a03%7C1%7C0%7C637257683002059478&sdata=9lV%2BiNvxiK5feFCV1hYgo5I1lW0TEpMlWl%2BpwKwVKZU%3D&reserved=0> but its relevance/adoption is currently limited by the fact that it's not part of the RFC. Same logic here: even if not supported by Apple players which don't need it as they can leverage I-frame tracks, it would be super useful for the rest of the HLS ecosystem to get this officially part of the RFC.
>  
> I'd like to better understand what’s driving this. Is the limitation essentially one of not being able to support an AVC decoder for i-frame display? 
>  
> If that’s the case then it seems that putting JPEG images into fMP4 containers and using EXT-X-I-FRAME-STREAM-INF would be a smaller extension to HLS, both in terms of departure from the existing approach and less new spec to invent.
>  
> One of the things I don’t love about the image-media-playlist spec is that it doesn’t follow the regular HLS timing model, where the media presentation time is defined in the media data itself. Instead it relies on precise synchronization of the EXTINF values, which seems like a recipe for long-term accumulation of floating point error, as well as difficult to achieve with multiple geographically-dispersed packagers for live.
>  
>  
> The limitation is exactly that.  A second decoder (AVC or HEVC) is not available on many devices.  This also makes mid-fragment switching difficult as well and makes switching between codecs impossible as well.
>  
> The image-media-playlist spec does rely somewhat on floating point; no more so that a seek to date or seek to time does in a regular HLS playlist, however.  I’m not sure that anyone is asking for precise millisecond switching from these images to regular AV.
>  
> I see 2 solutions to this problem: give a PTS/timescale in the HLS playlist (something like we did for transport stream to webVTT timing, but in the playlist), or wrap the jpeg in some sort of wrapper with timing (fmp4?).  It would be good, if that is the route, to have guidance from Apple on what specification to use.
>  
> The thing about JPEGs is that they are easy; almost any software decodes them; wrapping them in FMP4 doesn’t make it easier or better.
>  
> [NW] Thumbnails in the DASH-IF IOP are simple jpeg images and it makes it easy to produce and to manipulate on the service side (like aggregating several live thumbnails into tiles of thumbnails when a program is transitioned from Live to VOD). Using the same simple image container would allow direct interoperability with DASH, without requiring an additional CMAF+DASH specification cycle.

I’m more concerned with finding an approach that works well technically than hitting a predefined interop target. DASH could presumably consume jpeg-in-fmp4 just as easily as HLS.

> As regards HLS, I was hoping that the use of EXT-PROGRAM-DATE-TIME would become mandatory, as per the preliminary LL-HLS specification. That would give use the millisecond-accurate time reference that we need to avoid drifts if we keep images in a simple image container.

I can tell you from direct experience that you need much more than millisecond-accurate timestamps to avoid user-visible floating point drift over hours of playback.  Floating-point is a poor solution for media timing.


Roger.