Re: [Hls-interest] Image-based subtitles and trickplay tracks

Nigel Megitt <nigel.megitt@bbc.co.uk> Fri, 29 May 2020 16:09 UTC

Return-Path: <nigel.megitt@bbc.co.uk>
X-Original-To: hls-interest@ietfa.amsl.com
Delivered-To: hls-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D6BAF3A0D72 for <hls-interest@ietfa.amsl.com>; Fri, 29 May 2020 09:09:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lTSpurhSfAmV for <hls-interest@ietfa.amsl.com>; Fri, 29 May 2020 09:09:03 -0700 (PDT)
Received: from mailout0.cwwtf.bbc.co.uk (mailout0.cwwtf.bbc.co.uk [132.185.160.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AEE473A0D70 for <hls-interest@ietf.org>; Fri, 29 May 2020 09:09:02 -0700 (PDT)
Received: from BGB01XI1003.national.core.bbc.co.uk ([10.184.50.53]) by mailout0.cwwtf.bbc.co.uk (8.15.2/8.15.2) with ESMTP id 04TG8tuR001517; Fri, 29 May 2020 17:08:55 +0100 (BST)
Received: from BGB01XUD1012.national.core.bbc.co.uk ([10.161.14.10]) by BGB01XI1003.national.core.bbc.co.uk ([10.184.50.53]) with mapi id 14.03.0408.000; Fri, 29 May 2020 17:08:55 +0100
From: Nigel Megitt <nigel.megitt@bbc.co.uk>
To: "Weil, Nicolas" <nicoweil@elemental.com>, "May, Bill" <Bill.May@disneystreaming.com>
CC: "hls-interest@ietf.org" <hls-interest@ietf.org>
Thread-Topic: [Hls-interest] Image-based subtitles and trickplay tracks
Thread-Index: AQHWNdNzpzekBKKc0kOa9zuN+5kf1Q==
Date: Fri, 29 May 2020 16:08:54 +0000
Message-ID: <36B9F7F1-C37D-4FD4-921A-FFAE958AD791@bbc.co.uk>
Accept-Language: en-GB, en-US
Content-Language: en-GB
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.1d.0.190908
x-originating-ip: [172.19.133.74]
x-exclaimer-md-config: c91d45b2-6e10-4209-9543-d9970fac71b7
x-tm-as-product-ver: SMEX-12.5.0.1300-8.5.1020-24052.007
x-tm-as-result: No-8.667700-8.000000-10
x-tmase-matchedrid: 8HTFlOrbAtG7lpQUW6Uvz7iMC5wdwKqdwZLXS0hN8p0hvFjBsLEZNBef F1ZMQC1pgqwf7w5gpC7lPWOZdnoFfJb5x/qcSShKSEQN/D/3cG46QNs2WCY79R1jWjAZHoEnz5z AIum8OcSvRJ8H5mCp8u4qbuUjIpO48VU4DxKh+IAVglQa/gMvfHN3sLsG0mhufL8fHUCAmutSGm z5RYRGdqaY7RhXE8zsub6+alO/mA+1mjbolGnzBFPjo7D4SFg4im7lNffkTa4kqhTM/UZ9tiZvK o9YDb+Fl71xhzi34iCnQiMduIEPNk+WBFOy/M0lDB+ErBr0bAM6En2bnefhoESbbPTiMagTWIAW AqR3u1VR4kuXOqRTp1b8/cThuG+B8XguP7OkWN9pOQT7xNPuljlIA4KS6pW32tlICv9nxqeui7E QrfUGyT3hzp1/mryUE2c8MRGPcvM7LoqP3LLk3nEFgt4Wa1Lt0KTi2mJgJ0h3vUA6/Pi03K0DgS vRfZidxYNzgybYm28fKB6Ydcn12nTtTONs2LtKD/of1psMgxGKXzk+985aRd9RlPzeVuQQ5SzfH wvRD+4phNSxqZbODCw1s03Lq1UfhYCPK21sETt8orCXAPNkJLPgPvvwZyARpaS9ZlPa7pSNcue3 bAY4WSeREVRukXFw8676KIWW2dVonImkxLSzBbqQyAveNtg6dPbyZ783/tENmPMcsvd5FmBtJU9 OE/W+EDK0tBS3Le1Z89Kj5beu4dpIKzr+fh8AHcQQBuf4ZFvJ5SXtoJPLyLBOE9APtGEp6sSJFJ mSQR81fukIIWpa/ragDTvDitqBJWpN7A2W3R6eAiCmPx4NwFkMvWAuahr8q6VcjAJ0Lawqtq5d3 cxkNYK4HTgQbHHGT2CBd8NXbeVkprQLRSkjfbI+dnpQU4KK6TQXXMseb6hLhb8xGEnVfg==
x-tm-as-user-approved-sender: Yes
x-tm-as-user-blocked-sender: No
x-tmase-result: 10--8.667700-8.000000
x-tmase-version: SMEX-12.5.0.1300-8.5.1020-24052.007
Content-Type: text/plain; charset="utf-8"
Content-ID: <200BC7DE5AD52E469088EEAC9169A8DC@bbc.co.uk>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/hls-interest/-MJiPqqjRXjlU9VBP6FvDdZrkG8>
Subject: Re: [Hls-interest] Image-based subtitles and trickplay tracks
X-BeenThere: hls-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions about HTTP Live Streaming \(HLS\)." <hls-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hls-interest>, <mailto:hls-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/hls-interest/>
List-Post: <mailto:hls-interest@ietf.org>
List-Help: <mailto:hls-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hls-interest>, <mailto:hls-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 29 May 2020 16:09:05 -0000

A somewhat off-topic point, but something I'd like to pick up on:

On 26/05/2020, 13:22, "Weil, Nicolas" <nicoweil@elemental.com> wrote:

>     Another question: there are requirements in the US at least that require the ability to change font sizes, colors, etc.   And, TBH, these are changes that help people world-wide.
    
That isn't the way it looks from this side of the Atlantic! Subtitles globally have a lot of localised cultural idioms, and allowing unconstrained modification of settings like this, or making it seem okay for the client device or player code to decide can cause well thought-through authorial choices to be discarded in ways that break the experience for viewers.

For example in the UK colours are used to indicate changes of speaker, sometimes multiple times per line, and in France I understand they are used to indicate different sources or types of sound. 

In examining the claim about the benefit of those options, I've so far been unable to find good published evidence demonstrating the benefit of all of those customisations, but I did commission research for the BBC that indicates some user preferences around customising text size.

The point is, please don't design technical solutions on the _assumption_ that those US requirements are needed or wanted globally. Of course I'm not saying there's anything wrong with designing technical solutions to _accommodate_ such requirements.

Returning to the main thread, the BBC does broadcast bitmap subtitles according to the DVB specifications mentioned elsewhere, and that is considered a reasonable accessibility solution for televisions, for audience members who cannot hear the sound. I realise this does not answer Roger's question about use of image based subtitles based on IMSC Image Profile specifically.

One important factor in favour of bitmap subtitles is that the client side work needed to modify the image to include the subtitles is minimised, which can help with synchronisation requirements. For example there is no question about installing fonts, using processor cycles to layout and rasterise text etc. For lower-end devices, this can be a helpful part of the solution.

Nigel



On 26/05/2020, 13:22, ""Weil, Nicolas" <nicoweil@elemental.com <mailto:nicoweil@elemental.co=" <nicoweil@elemental.com <mailto:nicoweil@elemental.co=> wrote:

     
    
    Comments inline. 
    
     
    
    From: May, Bill <Bill.May@disneystreaming.com <mailto:Bill.May@disneystream=
    ing.com> > 
    Sent: Tuesday, May 26, 2020 9:31 AM
    To: Roger Pantos <rpantos=3D40apple.com@dmarc.ietf.org <mailto:rpantos=3D40=
    apple.com@dmarc.ietf.org> >
    Cc: Weil, Nicolas <nicoweil@elemental.com <mailto:nicoweil@elemental.com> >=
    ; hls-interest@ietf.org <mailto:hls-interest@ietf.org> 
    Subject: RE: [Hls-interest] Image-based subtitles and trickplay tracks
    
     
    
     
    
    On May 22, 2020, at 11:17 AM, Roger Pantos <rpantos=3D40apple.com@dmarc.iet=
    f.org <mailto:rpantos=3D40apple.com@dmarc.ietf.org> > wrote:
    
     
    
    Hello Nicholas. Thanks for bringing these up. I have some questions:
    
     
    
    On May 20, 2020, at 3:12 PM, Weil, Nicolas <nicoweil@elemental.com <mailto:=
    nicoweil@elemental.com> > wrote:
    
     
    
    Hello,
    
     
    
    We are often seeing two image-related topics causing interoperability probl=
    ems as they are not currently covered by the HLS spec.. Normalizing the imp=
    lementations around an official specification for these two points would be=
     great:
    
     
    
    Image-based subtitles tracks
    For workflow reasons and charset reasons, some content owners don't include=
     text-based subtitles in the live channels sources that they provide to dis=
    tributors, but rather image-based subtitles (like DVB-Sub). While it's poss=
    ible to transform these subtitles as IMSC1 Image Profile as per DASH-IF IOP=
     section 6.4.4, there is no equivalent IMSC1 Image Profile support in the H=
    LS RFC, which means that companies will continue to rely on proprietary for=
    ks of the HLS RFC to support these use cases. Even if it wasn't supported b=
    y Apple players, it would be tremendously helpful for interoperability in t=
    he rest of the HLS ecosystem.
    
     
    
    I'd like to understand how widely validated the Image Profile of IMSC1 has =
    been. Can anyone volunteer some examples where it=E2=80=99s been commercial=
    ly deployed successfully? (Specifically IMSC1, vs. some other fork of TTML.=
    )
    
    [NW] IMSC1 Image profile is now supported by ATSC3, DASH-IF IOP (with suppo=
    rt in dash.js) and IMF (with support in  <https://urldefense.proofpoint.com=
    /v2/url?u=3Dhttps-3A__github.com_IMFTool_IMFTool&d=3DDwMGaQ&c=3D96ZbZZcaMF4=
    w0F4jpN6LZg&r=3DKkevKJerDHRF9WRs8nW8Ew&m=3DdBy7sHrrIgjIMBRCpRh9IfMnwq8RKZHc=
    gJLKmlBotj0&s=3DfBrrNLMD_eSxO9MYInkAMOhyaX6Zicdvk3HTD6p5mho&e=3D> IMFTool w=
    hich development has been sponsored by Netflix and other studios initially)=
    =2E
    
     
    
    Another question: there are requirements in the US at least that require th=
    e ability to change font sizes, colors, etc.   And, TBH, these are changes =
    that help people world-wide.
    
     
    
    How would you meet those requirements with bit mapped subtitles?  Wouldn=E2=
    =80=99t it be better to work to eliminate bitmapped subtitles completely?
    
    [NW] I believe these font size/color change requirements can be satisfied w=
    ith IMSC1 Text Profile which is supported in rfc8216bis since 2017. 
    As much as I=E2=80=99d like to get rid of bitmap subtitles, sometimes the c=
    ontent owners cannot provide anything else than bitmaps in the source feed.=
     And it=E2=80=99s very challenging to apply a reliable OCR pass on it, for =
    all target languages (Latin/Cyrillic/Asian/=E2=80=A6 charsets). IMSC1 Image=
     Profile has got a decent industry support, and the Text Profile is already=
     supported in HLS, so I would expect it to be a natural extension for HLS t=
    o support also the Image Profile.
    
     
    
    Image-based trickplay tracks
    For player resources optimization reasons, the use of a video track as a tr=
    ickplay artefact is not always possible, and a lot of player providers reco=
    mmend the use of image thumbnails tracks instead of special low framerate v=
    ideo tracks. DASHIF IOP section 6.2.6 covers this use case but there is equ=
    ivalent support in the HLS RFC. There is the Image Media playlists HLS exte=
    nsion proposal from Roku/Disney/WarnerMedia here https://github.com/image-m=
    edia-playlist/spec but its relevance/adoption is currently limited by the f=
    act that it's not part of the RFC. Same logic here: even if not supported b=
    y Apple players which don't need it as they can leverage I-frame tracks, it=
     would be super useful for the rest of the HLS ecosystem to get this offici=
    ally part of the RFC.
    
     
    
    I'd like to better understand what=E2=80=99s driving this. Is the limitatio=
    n essentially one of not being able to support an AVC decoder for i-frame d=
    isplay? 
    
     
    
    If that=E2=80=99s the case then it seems that putting JPEG images into fMP4=
     containers and using EXT-X-I-FRAME-STREAM-INF would be a smaller extension=
     to HLS, both in terms of departure from the existing approach and less new=
     spec to invent.
    
     
    
    One of the things I don=E2=80=99t love about the image-media-playlist spec =
    is that it doesn=E2=80=99t follow the regular HLS timing model, where the m=
    edia presentation time is defined in the media data itself. Instead it reli=
    es on precise synchronization of the EXTINF values, which seems like a reci=
    pe for long-term accumulation of floating point error, as well as difficult=
     to achieve with multiple geographically-dispersed packagers for live.
    
     
    
     
    
    The limitation is exactly that.  A second decoder (AVC or HEVC) is not avai=
    lable on many devices.  This also makes mid-fragment switching difficult as=
     well and makes switching between codecs impossible as well.
    
     
    
    The image-media-playlist spec does rely somewhat on floating point; no more=
     so that a seek to date or seek to time does in a regular HLS playlist, how=
    ever.  I=E2=80=99m not sure that anyone is asking for precise millisecond s=
    witching from these images to regular AV.
    
     
    
    I see 2 solutions to this problem: give a PTS/timescale in the HLS playlist=
     (something like we did for transport stream to webVTT timing, but in the p=
    laylist), or wrap the jpeg in some sort of wrapper with timing (fmp4?).  It=
     would be good, if that is the route, to have guidance from Apple on what s=
    pecification to use.
    
     
    
    The thing about JPEGs is that they are easy; almost any software decodes th=
    em; wrapping them in FMP4 doesn=E2=80=99t make it easier or better.
    
     
    
    [NW] Thumbnails in the DASH-IF IOP are simple jpeg images and it makes it e=
    asy to produce and to manipulate on the service side (like aggregating seve=
    ral live thumbnails into tiles of thumbnails when a program is transitioned=
     from Live to VOD). Using the same simple image container would allow direc=
    t interoperability with DASH, without requiring an additional CMAF+DASH spe=
    cification cycle. As regards HLS, I was hoping that the use of EXT-PROGRAM-=
    DATE-TIME would become mandatory, as per the preliminary LL-HLS specificati=
    on. That would give use the millisecond-accurate time reference that we nee=
    d to avoid drifts if we keep images in a simple image container.