Re: [AVTCORE] Magnus Westerlund's Discuss on draft-ietf-payload-rtp-ttml-03: (with DISCUSS and COMMENT)

Magnus Westerlund <magnus.westerlund@ericsson.com> Mon, 21 October 2019 13:28 UTC

Return-Path: <magnus.westerlund@ericsson.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E05C6120099; Mon, 21 Oct 2019 06:28:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.002
X-Spam-Level:
X-Spam-Status: No, score=-2.002 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=ericsson.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6tmIoJvLU9n2; Mon, 21 Oct 2019 06:27:58 -0700 (PDT)
Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150045.outbound.protection.outlook.com [40.107.15.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CE2E912004D; Mon, 21 Oct 2019 06:27:57 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=W9vDBFvmE47hbk1RfO1XkGc/DCxFVUPWzl5nQuWhIdPWm7x3/je0aGwnroEDM8fTBblJOEanZtGJf+WimiKp4E0+1066WiUHnJM0M4bd5cTexfC19vq/jxZbb8uxV+abhTJcJgEq9gsZbYRnoxP9g8lFEr6IcsnCrqLTzBXPpP+ZQ3ykQhlfe4foclVKzwcvW+xPT2niPGznG/l0oZ+18MNbcL1FHFr3qfLuYohVabXnIvv9hBmW7x5ur0KqkKVlc/jwTI6Z2xNIuaYbAVe5hwBdYHjW8rmR/ZPCZ4AqQaCmXGl++RGoeBlH/zlGLs4pFL+1CsUqX7LTGb+hSWOoCA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=T9QE+JWWA1GMbPXjFI8bID3V5wMi5BcsXU3b85mz87M=; b=jRSNCNUTJ2nw2p13aui+zHiCt6UaxlYjyWcwbdz542WhAtFImXdhTAhqDU5Kynh02i4GUCm7E7pnd06l00+Twv/7MfLvqdC8GzmmYeBOI0K/WWFGgcSy/JxGO1+XbMIm3Dt98Uv55CyOFGekm3IKbXMmm5b4fZuB2JSn12zPrz64ENSDD93IhcPd1g/cAlA0Cw1WewFAW+SJj9gi6l+8pv2mB9yRK1s8yx+UJ6a6ZilL7b20ppMwEaBCHyZgB+Fx+VCwLBXuHWzMPs1kEiyAnA79XbTytmgdL5mUThieQmrw7jbud3ZTmgetVAdKn+gz3R/xlaC+bOXt9uuM8Tlgzw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=ericsson.com; dmarc=pass action=none header.from=ericsson.com; dkim=pass header.d=ericsson.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=T9QE+JWWA1GMbPXjFI8bID3V5wMi5BcsXU3b85mz87M=; b=RCZCCcvlCa06t8XGDs0+vwWh/kg0lSUHwOqDqhclCRAaPxh0kEOSTuoQo36tPnPHheBnNxzh0IdNuqXj0HesH9/K6ha/BRpqOunOT1ST/1nOJgag8IMiBd0j1RNErtHKTGgGhShCYe7cyfZEUHb7oOvjXkkYoAxnKOd3lMbRoeM=
Received: from HE1PR0701MB2697.eurprd07.prod.outlook.com (10.168.188.16) by HE1PR0701MB2108.eurprd07.prod.outlook.com (10.168.34.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2387.14; Mon, 21 Oct 2019 13:27:55 +0000
Received: from HE1PR0701MB2697.eurprd07.prod.outlook.com ([fe80::1d5c:4814:3c1e:b769]) by HE1PR0701MB2697.eurprd07.prod.outlook.com ([fe80::1d5c:4814:3c1e:b769%10]) with mapi id 15.20.2387.016; Mon, 21 Oct 2019 13:27:55 +0000
From: Magnus Westerlund <magnus.westerlund@ericsson.com>
To: "james.sandford@bbc.co.uk" <james.sandford@bbc.co.uk>, "iesg@ietf.org" <iesg@ietf.org>
CC: "draft-ietf-payload-rtp-ttml@ietf.org" <draft-ietf-payload-rtp-ttml@ietf.org>, "roni.even@huawei.com" <roni.even@huawei.com>, "avtcore-chairs@ietf.org" <avtcore-chairs@ietf.org>, "avt@ietf.org" <avt@ietf.org>
Thread-Topic: Magnus Westerlund's Discuss on draft-ietf-payload-rtp-ttml-03: (with DISCUSS and COMMENT)
Thread-Index: AQHVg2ampO8KXvwNgEWOWOpnYSCxbaddYuKAgAerhQA=
Date: Mon, 21 Oct 2019 13:27:54 +0000
Message-ID: <8b08f06fdcc227ffe08f1d153da1d6d1edf4caca.camel@ericsson.com>
References: <157115048176.18158.4077040057321391690.idtracker@ietfa.amsl.com> <734752AF0E88364D983373FE5CEFED5770DE3FDA@bgb01xud1001>
In-Reply-To: <734752AF0E88364D983373FE5CEFED5770DE3FDA@bgb01xud1001>
Accept-Language: sv-SE, en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=magnus.westerlund@ericsson.com;
x-originating-ip: [192.176.1.83]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 113781e4-b8bd-4564-b4d5-08d7562a7b36
x-ms-traffictypediagnostic: HE1PR0701MB2108:
x-microsoft-antispam-prvs: <HE1PR0701MB210831B83E48CC423911410E95690@HE1PR0701MB2108.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 0197AFBD92
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(39860400002)(366004)(376002)(136003)(346002)(396003)(189003)(199004)(51444003)(76176011)(54906003)(81156014)(81166006)(30864003)(5660300002)(86362001)(110136005)(4001150100001)(99286004)(316002)(8676002)(2906002)(256004)(14444005)(118296001)(6116002)(3846002)(14454004)(66066001)(8936002)(6246003)(2501003)(71200400001)(66616009)(66556008)(64756008)(66476007)(4326008)(66446008)(36756003)(71190400001)(6512007)(486006)(44832011)(6436002)(186003)(7736002)(66946007)(99936001)(25786009)(26005)(76116006)(305945005)(11346002)(102836004)(2616005)(446003)(476003)(6486002)(229853002)(478600001)(6506007); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0701MB2108; H:HE1PR0701MB2697.eurprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: ericsson.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: RH0q9+ZHCKiokFkl2s3vD4G0ov/PY6xMEgEOKXibNY9TL3yp+YroAqbYdr6kq9FnPyefTIZc/w0s03+uHpTVeVnRE42CeImyJRJmo4dRM3MCDRzq6r/RhZDlmIntsJZvdf/1jYdwDgbg522aISBF7/5XP0mNTGcbxQ3YVSzCImC2yYPC8qqHfCJ+0T5xaooQXp3S2gPKv9kXysIj8PcvZFtTDLM9sQK6NtIstua6kmD9LJxSoaoa4APaqDlxMWKDOcxDGQDXFSprVHmee3McvQg/N2F1ZlPkOMGPvPlStYph2IkpHRi8RGkBygmyctwhWu2XNYTcSZHQLhswwwSC4Fv0VfHP4KwCJGLlTpZPMC6J3CBqpi+AVdXIUzoxk+xV8WF7M7SqVrjK8DanUGsz12/F8P9LGSHvUOn8tQq7oLH5dZ61seuvuuswnpxoWxqe
x-ms-exchange-transport-forked: True
Content-Type: multipart/signed; micalg="sha-256"; protocol="application/x-pkcs7-signature"; boundary="=-C5n2xezNtXD3CMM2Rg/Y"
MIME-Version: 1.0
X-OriginatorOrg: ericsson.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 113781e4-b8bd-4564-b4d5-08d7562a7b36
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Oct 2019 13:27:54.9993 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 92e84ceb-fbfd-47ab-be52-080c6b87953f
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: byvJ+uQ4YBB0yHayWKdwH5c8EMknGdGvKVgKgFN9pX6d+0xfDnyfs9WioxIWsmc9hgrPYpd1LhcOhvsT8pM0BCsl3WMt6irNta8hWejkgnE=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0701MB2108
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/b_XcBRCWvneObJb3axpiW74l3F0>
Subject: Re: [AVTCORE] Magnus Westerlund's Discuss on draft-ietf-payload-rtp-ttml-03: (with DISCUSS and COMMENT)
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Oct 2019 13:28:02 -0000

Hi James, 

Please see inline for response and comments. 


On Wed, 2019-10-16 at 15:17 +0000, James Sandford wrote:
> Thank you for the detailed review, Magnus. I've responded to your points in-
> line.
> 
> Regards,
> James
> 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > James and WG,
> > 
> > I do have a couple of issues I want to have your feedback on if they should
> > be
> > corrected or not before proceeding to publication. Note they are for
> > discussion
> > and in cases where things have been discussed and there is consensus please
> > reference that so that I can take that into consideration when we resolve
> > these.
> > 
> > 1. Section 4.1:
> >        Timestamp:
> >        The RTP Timestamp encodes the time of the text in the packet.
> > 
> >        As timed text is a media that has duration, from a start time to an
> > end
> >        time, and the RTP timestmap is a single time tick in the chose clock
> >        resolution the above text is not clear. I would think the start time
> > of
> >        the document would be the most useful to include?
> 
> Suggested updated text for the Timestamp definition: 
> 
>     The RTP Timestamp encodes the start time of the document in User Data
> Words

Okay, and this corresponds to "E" mentioned below? 

> 
> >        I think the text in 4.2.1.2 combined with the above attempts to imply
> >        that the RTP timestamp will be the 0 reference for the time-
> > expression?
> > 
> >        I think this needs a bit more clarification. Not having detailed
> >        studied TTML2/1 I might be missing important details. But some more
> >        information how the document timebase:media time line connects to the
> >        RTP timestamp appears necessary.
> 
> Suggested updated text for the last sentence of Paragraph 2 of 4.2.1.2: 
> 
>     Computed TTML media times are offset relative to E in accordance with
> Section I.2 of [TTML2].
> 
> I'm hesitant to expand on the calculation of timing beyond that in this
> document. It is discussed at length in TTML2 which is included as a normative
> reference in this document. I'd like to avoid blurring the scope of this
> document into that of TTML2 if possible.

Okay, I am all fine with keeping that in the TTML2 specs. However, the point of
linking the timescales must be clear. And to me it wasn't clear that RTP
Timestamp value is equal to the instant E that is the basis for document. 

This also leads to another warning that likely need to be in the RTP payload
time. As the document "start time" is not strongly correlated with transmission
time of the RTP packet there should be a mention of that as the RTCP
measurements based on the RTP timestamp field will contain significant error
components. 

Section 5.2 of RFC 8088 says this: 

   RTP payload formats with a timestamp definition that results in no or
   little correlation between the media time instance and its
   transmission time cause the RTCP jitter calculation to become
   unusable due to the errors introduced on the sender side.  A common
   example is a payload format for a video codec where the RTP timestamp
   represents the capture time of the video frame, but frames are large
   enough that multiple RTP packets need to be sent for each frame
   spread across the framing interval.  It should be noted whether or
   not the payload format has this property.

> 
> > 2. A Discuss Discuss: As Timed Text is directly associated with one or more
> > video and audio streams and requires synchronization with these other media
> > streams to function correct. This leads to two questions.
> > 
> >        First of all is application/ttml+xml actually the right top-level
> > media
> >        type? If using SDP that forces one unless one have BUNDLE to use a
> >        different RTP session. Many media types having this type of
> > properties
> >        of being associated with some other media types have registered media
> >        types in all relevant top-level media types.
> 
> I think we need to be careful with assumptions here. Timed Text MAY be
> associated with Video and/or Audio streams but there is no requirement to do
> so. Just as there is no requirement for Video to be associated with Audio or
> visa-versa.
> 
> I'm cautious about opening a can of worms with regards to media types. But my
> personal opinion is that TTML is not video and it is not audio. It may be
> associated with them but it is not those. It therefore shouldn't have
> registered types for video, audio, etc. If the association of different top-
> level media types is currently difficult, that is an issue I believe should be
> addressed outside of the scope of this document.

Okay, if the needs exists additional media types can be registred. It is very
much a question of which RTP usage world have TTML2 being used. In a world that
supports multiple media types on the same RTP session this is not an issue. The
m= line or equivalent for declaring media sources will simply say that there is
one TTML2 media source using top type application. 

For the older style one media type per RTP session, the implication of
application/ttml2 is that one need to have an RTP session extra in parallel if
one don't already have an application/foo media type in the multi-media session.

I just want to ensure that people intended to use this don't throw a fit when
they realize that their system spec suddenly have to add another RTP session. 




> 
> >        Secondly, this payload format may need some references to mechanisms
> > in
> >        RTP and signalling that has the purpose of associating media streams?
> > I
> >        also assume that we have the interesting cases with localization that
> >        different languages have different time lines for the text and how
> > long
> >        it shows as there are different tranditions in different countries
> > and
> >        languages for how one makes subtitles.
> > 
> >        This may also point to the need for discussing the pick one out of n
> >        mechanism that a manifest may need.
> 
> I believe this is outside of the scope of this document in the same way it is
> for audio equivalents and other text formats.

Depends if the very basic mechanisms are sufficient. I do see that RTCP CNAME
work when all media sources from one endpoint shall be synchronized. And in more
advanced case Lipsynch signalling should work as long as all TTML2 tracks belong
to a single LS group and there are no overlapp. 

However, I think there are interesting cases that would occur if one have
multiple TTML2 media source, one for each language one provide and potentially
combined with multiple audio tracks for different dubbings. The question is if
one have a manifest description work for this case. I am uncertain if SDP can
actually properly express the relations between all these media sources. Maybe
someone else on the AVTCORE mailing list can answer this. 

> 
> > 3. Section 7.1:
> > 
> >        It may be appropriate to use the same Synchronization
> >   Source and Clock Rate as the related media.
> > 
> >        Using the same SSRC as another media stream in the same RTP Session
> > is
> >        no-no. If you meant to use multiple RTP sessions and associate them
> >        using the same SSRC in diffiernt, yes it works but is not
> > recommended.
> >        This points to the need for a clearer discussion of how to achieve
> >        linkage and the reasons for why same RTP timestamp may be useful or
> > not.
> 
> Thanks for spotting this. It should refer to the Reference Clock, not
> Synchronization Source.
> 
> > 4. Fragmentation:
> >        I think the fragmentation of an TTML document across multiple RTP
> >        payloads are a bit insufficiently described. I have the impression
> > that
> >        it is hard to do something more clever than to fill each RTP payload
> > to
> >        MTU limtiation, and send them out insequence. However, I think a firm
> >        requirement to apply RTP sequence number for a single document in
> >        consecutive numbers. Also the re-assebly process appear to have to
> >        parts for detecting what belongs together, same timestamp and last
> >        packet of document should have marker bit set. As a receiver can
> > loose
> >        the last packet in the previous document, still know that it has
> >        received everything for the following document. However, if the
> > losses
> >        are multiple, inspection of the re-assemblied document will be
> >        necessary to determine if the correct beginning is present. I have
> > the
> >        impression that a proper section discussing these matter of
> >        fragmentation and re-assembly are necessary for good interoperability
> >        and function.
> 
> This has been discussed in response to Adam Roach's Discuss.

Yes, Adam brought up one aspect that I didn't think about, the splitting of UTF
characters. My focus was on the transport aspects of having a clear description
of the behavior of senders and receivers when TTML documents gets split over
multiple RTP packets. 


> 
> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > A. Section 6.
> >        To my understanding the TTML document is basically not possible to
> >        encode better. A poor generator can create unnecessary verbose XML
> >        which could be shorter, but there are no possibility here to trade-
> > off
> >        media quality for lower bit-rate. I think that should be made more
> >        explicit in Section 6.
> 
> Again, this has been discussed in response to Adam's Discuss.
> 
> > B. Section 7.
> >        Wouldn't using 90kHz be the better default? 1kHz is the minimal from
> >        RTCP report that will work decently. However, if the timed text is
> >        primarily going to be synchronized with video 90k do ensure that
> >        (sub-)frame precise timing is possible to express. I don't see any
> > need
> >        raster line specific for time text so the SMPTE 27 MHz clock is not
> >        needed. And using non default for subtitling radio etc appears fine.
> 
> Could you justify a 90k default? Such a high rate isn't appropriate for most
> timed text use cases. As stated in my response to "2." above, there is no
> requirement for TTML to be used alongside video so defining a default rate
> based on video would be inappropriate. Just as we don't expect video to use a
> 48k rate or audio a 90k rate. The default for TTML should be appropriate to
> timed text and should not make assumptions about specific implementations.

Sure, leave one 1kHz clock in as default. Howver, if one synchronize to another
media or so, using the same time based has its benefits as it enables precise
synchronization between the two. For example to get the subtitles to synchronize
with the correct video frame. Otherwise I see that one frame errors could occur.
Not that they necessary are super visible to the user, but especially as scene
cuts if the subitle shows up on the last frame of the previous scene rather then
first of the new can be visible. Thus, I think you should consider a
recommendation for the cases where one like to synchronize the TTML with other
media to select matching clock frequencies for the RTP timestamp to avoid these
issues. 


> 
> > C. Repair operations and relation to documents. Based on basic properties of
> > TTML documents, I do think the repair operations should be highly targeting
> > single documents as there is likely seconds between documents, while the
> > fragments of a document will be sent in a rather short interval. That
> > recommendation would be good to include.
> 
> Also discussed in response to Adam's Discuss.

Fine

Cheers

Magnus Westerlund 


----------------------------------------------------------------------
Network Architecture & Protocols, Ericsson Research
----------------------------------------------------------------------
Ericsson AB                 | Phone  +46 10 7148287
Torshamnsgatan 23           | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden | mailto: magnus.westerlund@ericsson.com
----------------------------------------------------------------------