Re: [Hls-interest] Image-based subtitles and trickplay tracks

Alex Giladi <alex.giladi@gmail.com> Fri, 29 May 2020 16:22 UTC

Return-Path: <alex.giladi@gmail.com>
X-Original-To: hls-interest@ietfa.amsl.com
Delivered-To: hls-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2158C3A0D86 for <hls-interest@ietfa.amsl.com>; Fri, 29 May 2020 09:22:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yLXnqD3A5haF for <hls-interest@ietfa.amsl.com>; Fri, 29 May 2020 09:22:02 -0700 (PDT)
Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E85EB3A0D87 for <hls-interest@ietf.org>; Fri, 29 May 2020 09:22:01 -0700 (PDT)
Received: by mail-yb1-xb2d.google.com with SMTP id s7so1399170ybo.9 for <hls-interest@ietf.org>; Fri, 29 May 2020 09:22:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PZTL87jv4jTqsn8SLayUClzPwMEwiqCDHpFwyv8TqtY=; b=DZvGnET5ii5CeCWOxoG7VyNaA3CbP/CfYLF7A/BwBO60O2lr4aKWXlJe7sJDXOWJk8 Z9zLR66V5BFB0ePeuhlxYt1LX+t4dFgnMYs/J+HA52Cia1JkU6WDNCtHCdFHZQ3nBHSh VM2H/uY1qkQlNBAHCZ/ooU0KvpsH+v8drf0fYIWMeJXvGk11L/OUt5QzjKhdYl0OgoZw w4Qo+TNHPOuRRPBVLCR6MxC1jDxwkAehCq6Ai3sU7VdZGa11KML1ORtxE5+QAuxQTrXA Etaj7Mdmz9zjkiAdtsjPVljXCh9VyJYnKPrnSwPINu+17gIYJi2glNDJKWaYDFg438pA WqDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PZTL87jv4jTqsn8SLayUClzPwMEwiqCDHpFwyv8TqtY=; b=jYQWltfTL5rj87Ut/y7q81xwbN8xMRRIcl0lKnn6XE6KrKLsOa5h7mqklrofQf6Kel x99gROWFedChtPCHVnoz2+chL2AVadBH259+4imwOkj6RZsLcZzufxaT9oSPyAM1T1+G RYYwKZybyOsbHLNbRA04BQKczSoOSmfws8gIz5pATKNCn5AmFkx2lom5XrlxBZNFrDut OVww+KCNsXfZq9DS/XpTDAs1E+HM7DQ4MbR/vpXJUFZTMXdlxtDczb4x4TWwym5Cpwrb DZAwg6rVr+ICHCInHPueOgWFLYkJJzyxwLi2JGhPXjDOwtLEu2LCf917r89S23PQBXiZ QWLQ==
X-Gm-Message-State: AOAM5305v91FlcY8Q2sbMxjnBDFsaasxx54pizBf8XoZohAa7KdkHsd0 LFS4sug/rAGizkf6k4KxVFR0MgV02FaeTkSa8iE=
X-Google-Smtp-Source: ABdhPJzAmTABgrzzGrjQ78Ez9P7NWf7IYeNP+1BPFKs/hWoAbP4giYlU1DhwIwrQlI2SeMTpIkP8YHp+dTVhgYB3o9U=
X-Received: by 2002:a25:be53:: with SMTP id d19mr15110247ybm.138.1590769319380; Fri, 29 May 2020 09:21:59 -0700 (PDT)
MIME-Version: 1.0
References: <36B9F7F1-C37D-4FD4-921A-FFAE958AD791@bbc.co.uk>
In-Reply-To: <36B9F7F1-C37D-4FD4-921A-FFAE958AD791@bbc.co.uk>
From: Alex Giladi <alex.giladi@gmail.com>
Date: Fri, 29 May 2020 10:21:48 -0600
Message-ID: <CAF-MBSJVm5_csH+K0OdJ5R8i4zaCt55pqVfOGiFZjkpVLQ5Hvg@mail.gmail.com>
To: Nigel Megitt <nigel.megitt@bbc.co.uk>
Cc: "Weil, Nicolas" <nicoweil@elemental.com>, "May, Bill" <Bill.May@disneystreaming.com>, "hls-interest@ietf.org" <hls-interest@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000338a5505a6cbd8bf"
Archived-At: <https://mailarchive.ietf.org/arch/msg/hls-interest/REwn7mbw7olEJBtyqai1-yrMUMs>
Subject: Re: [Hls-interest] Image-based subtitles and trickplay tracks
X-BeenThere: hls-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions about HTTP Live Streaming \(HLS\)." <hls-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hls-interest>, <mailto:hls-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/hls-interest/>
List-Post: <mailto:hls-interest@ietf.org>
List-Help: <mailto:hls-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hls-interest>, <mailto:hls-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 29 May 2020 16:22:05 -0000

Hi Nigel,
This is very interesting!
How much bandwidth do the DVB subtitles take per language?
Best,
Alex.

On Fri, May 29, 2020 at 10:09 AM Nigel Megitt <nigel.megitt@bbc.co.uk>
wrote:

> A somewhat off-topic point, but something I'd like to pick up on:
>
> On 26/05/2020, 13:22, "Weil, Nicolas" <nicoweil@elemental.com> wrote:
>
> >     Another question: there are requirements in the US at least that
> require the ability to change font sizes, colors, etc.   And, TBH, these
> are changes that help people world-wide.
>
> That isn't the way it looks from this side of the Atlantic! Subtitles
> globally have a lot of localised cultural idioms, and allowing
> unconstrained modification of settings like this, or making it seem okay
> for the client device or player code to decide can cause well
> thought-through authorial choices to be discarded in ways that break the
> experience for viewers.
>
> For example in the UK colours are used to indicate changes of speaker,
> sometimes multiple times per line, and in France I understand they are used
> to indicate different sources or types of sound.
>
> In examining the claim about the benefit of those options, I've so far
> been unable to find good published evidence demonstrating the benefit of
> all of those customisations, but I did commission research for the BBC that
> indicates some user preferences around customising text size.
>
> The point is, please don't design technical solutions on the _assumption_
> that those US requirements are needed or wanted globally. Of course I'm not
> saying there's anything wrong with designing technical solutions to
> _accommodate_ such requirements.
>
> Returning to the main thread, the BBC does broadcast bitmap subtitles
> according to the DVB specifications mentioned elsewhere, and that is
> considered a reasonable accessibility solution for televisions, for
> audience members who cannot hear the sound. I realise this does not answer
> Roger's question about use of image based subtitles based on IMSC Image
> Profile specifically.
>
> One important factor in favour of bitmap subtitles is that the client side
> work needed to modify the image to include the subtitles is minimised,
> which can help with synchronisation requirements. For example there is no
> question about installing fonts, using processor cycles to layout and
> rasterise text etc. For lower-end devices, this can be a helpful part of
> the solution.
>
> Nigel
>
>
>
> On 26/05/2020, 13:22, ""Weil, Nicolas" <nicoweil@elemental.com <mailto:
> nicoweil@elemental.co=" <nicoweil@elemental.com <mailto:
> nicoweil@elemental.co=> wrote:
>
>
>
>     Comments inline.
>
>
>
>     From: May, Bill <Bill.May@disneystreaming.com <mailto:
> Bill.May@disneystream=
>     ing.com> >
>     Sent: Tuesday, May 26, 2020 9:31 AM
>     To: Roger Pantos <rpantos=3D40apple.com@dmarc.ietf.org <mailto:rpantos
> =3D40=
>     apple.com@dmarc.ietf.org> >
>     Cc: Weil, Nicolas <nicoweil@elemental.com <mailto:
> nicoweil@elemental.com> >=
>     ; hls-interest@ietf.org <mailto:hls-interest@ietf.org>
>     Subject: RE: [Hls-interest] Image-based subtitles and trickplay tracks
>
>
>
>
>
>     On May 22, 2020, at 11:17 AM, Roger Pantos
> <rpantos=3D40apple.com@dmarc.iet=
>     f.org <mailto:rpantos=3D40apple.com@dmarc.ietf.org> > wrote:
>
>
>
>     Hello Nicholas. Thanks for bringing these up. I have some questions:
>
>
>
>     On May 20, 2020, at 3:12 PM, Weil, Nicolas <nicoweil@elemental.com
> <mailto:=
>     nicoweil@elemental.com> > wrote:
>
>
>
>     Hello,
>
>
>
>     We are often seeing two image-related topics causing interoperability
> probl=
>     ems as they are not currently covered by the HLS spec.. Normalizing
> the imp=
>     lementations around an official specification for these two points
> would be=
>      great:
>
>
>
>     Image-based subtitles tracks
>     For workflow reasons and charset reasons, some content owners don't
> include=
>      text-based subtitles in the live channels sources that they provide
> to dis=
>     tributors, but rather image-based subtitles (like DVB-Sub). While it's
> poss=
>     ible to transform these subtitles as IMSC1 Image Profile as per
> DASH-IF IOP=
>      section 6.4.4, there is no equivalent IMSC1 Image Profile support in
> the H=
>     LS RFC, which means that companies will continue to rely on
> proprietary for=
>     ks of the HLS RFC to support these use cases. Even if it wasn't
> supported b=
>     y Apple players, it would be tremendously helpful for interoperability
> in t=
>     he rest of the HLS ecosystem.
>
>
>
>     I'd like to understand how widely validated the Image Profile of IMSC1
> has =
>     been. Can anyone volunteer some examples where it=E2=80=99s been
> commercial=
>     ly deployed successfully? (Specifically IMSC1, vs. some other fork of
> TTML.=
>     )
>
>     [NW] IMSC1 Image profile is now supported by ATSC3, DASH-IF IOP (with
> suppo=
>     rt in dash.js) and IMF (with support in  <
> https://urldefense.proofpoint.com=
>
> /v2/url?u=3Dhttps-3A__github.com_IMFTool_IMFTool&d=3DDwMGaQ&c=3D96ZbZZcaMF4=
>
> w0F4jpN6LZg&r=3DKkevKJerDHRF9WRs8nW8Ew&m=3DdBy7sHrrIgjIMBRCpRh9IfMnwq8RKZHc=
>     gJLKmlBotj0&s=3DfBrrNLMD_eSxO9MYInkAMOhyaX6Zicdvk3HTD6p5mho&e=3D>
> IMFTool w=
>     hich development has been sponsored by Netflix and other studios
> initially)=
>     =2E
>
>
>
>     Another question: there are requirements in the US at least that
> require th=
>     e ability to change font sizes, colors, etc.   And, TBH, these are
> changes =
>     that help people world-wide.
>
>
>
>     How would you meet those requirements with bit mapped subtitles?
> Wouldn=E2=
>     =80=99t it be better to work to eliminate bitmapped subtitles
> completely?
>
>     [NW] I believe these font size/color change requirements can be
> satisfied w=
>     ith IMSC1 Text Profile which is supported in rfc8216bis since 2017.
>     As much as I=E2=80=99d like to get rid of bitmap subtitles, sometimes
> the c=
>     ontent owners cannot provide anything else than bitmaps in the source
> feed.=
>      And it=E2=80=99s very challenging to apply a reliable OCR pass on it,
> for =
>     all target languages (Latin/Cyrillic/Asian/=E2=80=A6 charsets). IMSC1
> Image=
>      Profile has got a decent industry support, and the Text Profile is
> already=
>      supported in HLS, so I would expect it to be a natural extension for
> HLS t=
>     o support also the Image Profile.
>
>
>
>     Image-based trickplay tracks
>     For player resources optimization reasons, the use of a video track as
> a tr=
>     ickplay artefact is not always possible, and a lot of player providers
> reco=
>     mmend the use of image thumbnails tracks instead of special low
> framerate v=
>     ideo tracks. DASHIF IOP section 6.2.6 covers this use case but there
> is equ=
>     ivalent support in the HLS RFC. There is the Image Media playlists HLS
> exte=
>     nsion proposal from Roku/Disney/WarnerMedia here
> https://github.com/image-m=
>     edia-playlist/spec but its relevance/adoption is currently limited by
> the f=
>     act that it's not part of the RFC. Same logic here: even if not
> supported b=
>     y Apple players which don't need it as they can leverage I-frame
> tracks, it=
>      would be super useful for the rest of the HLS ecosystem to get this
> offici=
>     ally part of the RFC.
>
>
>
>     I'd like to better understand what=E2=80=99s driving this. Is the
> limitatio=
>     n essentially one of not being able to support an AVC decoder for
> i-frame d=
>     isplay?
>
>
>
>     If that=E2=80=99s the case then it seems that putting JPEG images into
> fMP4=
>      containers and using EXT-X-I-FRAME-STREAM-INF would be a smaller
> extension=
>      to HLS, both in terms of departure from the existing approach and
> less new=
>      spec to invent.
>
>
>
>     One of the things I don=E2=80=99t love about the image-media-playlist
> spec =
>     is that it doesn=E2=80=99t follow the regular HLS timing model, where
> the m=
>     edia presentation time is defined in the media data itself. Instead it
> reli=
>     es on precise synchronization of the EXTINF values, which seems like a
> reci=
>     pe for long-term accumulation of floating point error, as well as
> difficult=
>      to achieve with multiple geographically-dispersed packagers for live.
>
>
>
>
>
>     The limitation is exactly that.  A second decoder (AVC or HEVC) is not
> avai=
>     lable on many devices.  This also makes mid-fragment switching
> difficult as=
>      well and makes switching between codecs impossible as well.
>
>
>
>     The image-media-playlist spec does rely somewhat on floating point; no
> more=
>      so that a seek to date or seek to time does in a regular HLS
> playlist, how=
>     ever.  I=E2=80=99m not sure that anyone is asking for precise
> millisecond s=
>     witching from these images to regular AV.
>
>
>
>     I see 2 solutions to this problem: give a PTS/timescale in the HLS
> playlist=
>      (something like we did for transport stream to webVTT timing, but in
> the p=
>     laylist), or wrap the jpeg in some sort of wrapper with timing
> (fmp4?).  It=
>      would be good, if that is the route, to have guidance from Apple on
> what s=
>     pecification to use.
>
>
>
>     The thing about JPEGs is that they are easy; almost any software
> decodes th=
>     em; wrapping them in FMP4 doesn=E2=80=99t make it easier or better.
>
>
>
>     [NW] Thumbnails in the DASH-IF IOP are simple jpeg images and it makes
> it e=
>     asy to produce and to manipulate on the service side (like aggregating
> seve=
>     ral live thumbnails into tiles of thumbnails when a program is
> transitioned=
>      from Live to VOD). Using the same simple image container would allow
> direc=
>     t interoperability with DASH, without requiring an additional
> CMAF+DASH spe=
>     cification cycle. As regards HLS, I was hoping that the use of
> EXT-PROGRAM-=
>     DATE-TIME would become mandatory, as per the preliminary LL-HLS
> specificati=
>     on. That would give use the millisecond-accurate time reference that
> we nee=
>     d to avoid drifts if we keep images in a simple image container.
>
>
>
>
> --
> Hls-interest mailing list
> Hls-interest@ietf.org
> https://www.ietf.org/mailman/listinfo/hls-interest
>