Re: [AVTCORE] [Phishing Risk] Re: [External] Re: Comments on draft-ietf-avtcore-rtp-vvc-05.txt

Martin Pettersson M <martin.m.pettersson@ericsson.com> Tue, 10 November 2020 14:04 UTC

Return-Path: <martin.m.pettersson@ericsson.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3B3DD3A09EC for <avt@ietfa.amsl.com>; Tue, 10 Nov 2020 06:04:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.101
X-Spam-Level:
X-Spam-Status: No, score=-2.101 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=ericsson.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gaODzBj2i27m for <avt@ietfa.amsl.com>; Tue, 10 Nov 2020 06:04:50 -0800 (PST)
Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60081.outbound.protection.outlook.com [40.107.6.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 880313A0E8C for <avt@ietf.org>; Tue, 10 Nov 2020 06:04:41 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JRZdvbTukJGjRAI5vrDY3kRnAeGXUp92JAWNaOYXltslCb+wS8tMCiDC3qBQrj5ZwPPf5iNMyiqrWf9zxqXIgsZyBXDd4A2fhfe8345PTUiQiFKystg/J91NInUCJca22emyE9Hikv/4NsxJ2OTkXpm4p6++cFaV3HveZrg7qy0YaSi+inhjSvUMz5ofNXiKOO7lpHRyNwEGmR/O1BKICFNZpSi5g5XrqO3yrxNwqNtJrLnMuVtDuda/kCaHaa9IoO3Y2U56k3ubZMTy/fIBVFs44fOhfqFBaNfNA3lRQLEDrRr+CbOSdRZmblvzU4xAowbGlq+zb4IMXtUdJn76Cg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bZwOY8J40iphcK6o3PdAx1JGcoGMsnrNam4PyJF/a40=; b=JP1EaLpupMA7oFib+KXa/6+/sGnre22fs12XDbCE6dcD8E7OOxky+Ln46gjNjUCA2tc2lpeS6lyYho0nazuaNSzX106Axm696X6XlaAabQaAUdV3p54Ogig8Mms1K0sd3I4vsztAHhziHLoM2oDs7jLMF0HajnQyy5fs+HohxQ7DT4rQIgDF8wWQ03lSsVL8Z5vcX/lF8FePLDv+KQlz6cb8IU5duYrYZcSAU9kwGTjiAWzS9//H8dSwgrXwVCmWtV0/Zg5QJxK8RHZ5tpdK8DOTDCu54qBswOAhGlQ+DOdTg4Tb5mZaH5W5m1V8Thag38zG9dGE2/fb/LMQaurisg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=ericsson.com; dmarc=pass action=none header.from=ericsson.com; dkim=pass header.d=ericsson.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bZwOY8J40iphcK6o3PdAx1JGcoGMsnrNam4PyJF/a40=; b=RnD5k0eHjGDGo6tMQ+VkFwfoa1OP/C8gG7yv+Bg6CNN0G3635XvrmA4UY93+40bJF9lt036UMbVclabxscGoo8LBXydZyVrhPnM+E4HYYUD748EG04qJQfodMCE4w25yN1AtyHGANKKnWTrID0ZAgIEcktl2A/OxDhKBtLMD9w0=
Received: from HE1PR0702MB3642.eurprd07.prod.outlook.com (2603:10a6:7:8c::28) by HE1PR07MB3113.eurprd07.prod.outlook.com (2603:10a6:7:2e::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3564.21; Tue, 10 Nov 2020 14:04:36 +0000
Received: from HE1PR0702MB3642.eurprd07.prod.outlook.com ([fe80::f9da:25c:50c:dd1b]) by HE1PR0702MB3642.eurprd07.prod.outlook.com ([fe80::f9da:25c:50c:dd1b%4]) with mapi id 15.20.3541.017; Tue, 10 Nov 2020 14:04:36 +0000
From: Martin Pettersson M <martin.m.pettersson@ericsson.com>
To: "Mo Zanaty (mzanaty)" <mzanaty@cisco.com>, Ye-Kui Wang <yekui.wang@bytedance.com>, "'Sanchez de la Fuente, Yago'" <yago.sanchez@hhi.fraunhofer.de>
CC: 'Stephan Wenger' <stewe@stewe.org>, "avt@ietf.org" <avt@ietf.org>
Thread-Topic: [AVTCORE] [Phishing Risk] Re: [External] Re: Comments on draft-ietf-avtcore-rtp-vvc-05.txt
Thread-Index: AQHWs81Nk1K4Xs+O1EWjarimimqgaam6xl6wgASViwCAAZNHgIAAfSOw
Date: Tue, 10 Nov 2020 14:04:36 +0000
Message-ID: <HE1PR0702MB3642F43A812FF436CC06ADDBCAE90@HE1PR0702MB3642.eurprd07.prod.outlook.com>
References: <HE1PR0702MB36425058B8AEE97940A0C736CA110@HE1PR0702MB3642.eurprd07.prod.outlook.com> <3584D9AA-D447-4F5C-9302-AD07629B838D@stewe.org> <1eaa01d6b3b5$64584fd0$2d08ef70$@bytedance.com> <EBD3EEA4-4A18-4063-8F98-A3A4A2FF7ECD@hhi.fraunhofer.de> <205101d6b3cd$42453df0$c6cfb9d0$@bytedance.com> <HE1PR0702MB36428650A8ECE9D1B2AD3C33CAED0@HE1PR0702MB3642.eurprd07.prod.outlook.com> <067801d6b661$c31a4510$494ecf30$@bytedance.com> <83BF2A3B-A479-43CE-B730-ABAEB043A268@cisco.com>
In-Reply-To: <83BF2A3B-A479-43CE-B730-ABAEB043A268@cisco.com>
Accept-Language: sv-SE, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: cisco.com; dkim=none (message not signed) header.d=none;cisco.com; dmarc=none action=none header.from=ericsson.com;
x-originating-ip: [31.208.187.131]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 3ab042a3-b3a6-4be4-a564-08d885818ec9
x-ms-traffictypediagnostic: HE1PR07MB3113:
x-microsoft-antispam-prvs: <HE1PR07MB31130FFC58D7109FABEA459DCAE90@HE1PR07MB3113.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:9508;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: 49JXVZ9VW3f+vy4W9l9EgcyQU5RcyrKkIPF5MwT6EsP8W3pqnPTcdhL6OUOmT4ZAYAFH7JnGiphBzEY9VyXU34Tzf45ms3Q5hSfxNZvID0Hfa9kLSaKENGUwL0X8pFej57kDQEHrQ31pS8q5MLcbRdcn0Ec8P0jeGMpl2NJB2InHcM8PUBrEY2i/gA3ajwU3n4tMWnjWIDO6ADjptJsLtfm6Yt9Whzkinsyo82jtFqkp3tQxMz+Un4Emh6m6OqmeNUGpiVLW4ii4kE3hvYE/iYWPIGD+Btx+YC2SkAvh6Xzd25otthoo4k3iT1GZ7OF22eNhnIr3sQ0sTfNo0Pk7R6pQOqNPVpF1g8qqUthg/N2sG/hCqYqaXY9Di1MQcVHSY+q7EJuxUP1u4BQAIOmoz8TsuG38EXKnvgf/MRNW+7zpIPAKf+TD05ADeC2thR2Letd8GUbv3hf686S/rJWFhZA4o2+xN8qtGnGc7K2Uaro=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:HE1PR0702MB3642.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(136003)(376002)(39860400002)(346002)(396003)(54906003)(21615005)(6506007)(86362001)(316002)(66476007)(8676002)(66946007)(5660300002)(478600001)(4326008)(2906002)(55016002)(66556008)(76116006)(9686003)(966005)(52536014)(186003)(26005)(83380400001)(66574015)(166002)(30864003)(71200400001)(66446008)(64756008)(33656002)(53546011)(83080400002)(8936002)(110136005)(7696005)(579004)(19623455009); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: xu2jScgDpVLuF5ybMynXN5jG10NRFKo9nNezUNMsVl9k4g+0njquEpC2jzzOzJhyMOMHTaUatyXc837fzCUAg7XF+qHkYAWK8V4mlPkau9JKSMm5lMHUXUCU5qUXM0yI30zYLeGjZJpADcGxFxWctmgSmcQ1Lq5LBuig+mcZBMUq2Bj9ZcCvBzoUZZRvvOYiHh693S5kOoNNProZG3wdAm6LLg3TPadCnj15RKmLhdQHUwfMRKBKpjaPahZg6PnpAmpZctjmVHlW8xg2d7Jo6uHRBXcg/XhPQBce5p8koNPfdwJ1DE9k+g8hF4gV9OT1s7AmDEUxOlJWkF/yBFVh95RzMiGPlzVs+qJcFcqBCD1NA4VWAwvxC6tgA3JBvqeOXLOMPtdHRUW1zQrS16g72r+akM+Eg9JQ89BW/WS7jdz2fMiZtfueqgD98f5yuAE/Q/JaYfFAqYwII3eLlvgqQ9rxMYQ8N8eq2NpzBB8P9xIqs1LUj2KkkCvujDPLERQbv+RNGWG/c5EFF6XKekBjKQSqz6KjdbMnVQ9M8d/5Ae+YfoHX/9efCytDSNezHBzAqp4R0qXD0Lo19aVsSVD+kjTIsteNTfkxCGcimuxxUW2zl+inoSNhAtsWsgibHfduMrJm/aLWcFI8Wjo8fkrzNbxzwvZ8aI+5InxQQZIapjME7LPMZyQcinBX1lMNa/Fa1nAml6mDBsuD/2Q+zJb6XAUwH86bNqCFdbUyuJ+xsckN3zuJPoh/nGAcA67IGwrVt4T2jIKsMhen7rD7KxDSv62r2kdpYoD/zSA9PIYzx7mqQhdllfhSRqb7cyR4D2/WFgG0dE+x9vMd2rmu4VxxORuXzw5fUlCUBa2rLxANBaHfbagLXNwVpA7dB7EUo2vaRC6Q4pLv0nBtl+lTl0eYSg==
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_HE1PR0702MB3642F43A812FF436CC06ADDBCAE90HE1PR0702MB3642_"
MIME-Version: 1.0
X-OriginatorOrg: ericsson.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: HE1PR0702MB3642.eurprd07.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 3ab042a3-b3a6-4be4-a564-08d885818ec9
X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Nov 2020 14:04:36.4870 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 92e84ceb-fbfd-47ab-be52-080c6b87953f
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: EXXxOVwM28+EX7ycM5ZJtPbCjTmw7uzBSRJgVBN6dqdtosNCXm1Myt/nyTc2syoJ2TGTlQIsggqhF42IJvIhRh5Hr0RT/pAcWwXYAVW1LJiJ0sbnRovmDnYyNnveFpZy
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR07MB3113
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/8w1xai9kXEiIeN7gxlx5LZFdW3Y>
Subject: Re: [AVTCORE] [Phishing Risk] Re: [External] Re: Comments on draft-ietf-avtcore-rtp-vvc-05.txt
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Nov 2020 14:04:55 -0000

Hi Mo, Ye-Kui, All,



I think it sounds like a good idea to add proper frame marking for general GDR. GDR together with decoding units (DUs) are two of the main enablers for low-delay video, which is becoming increasingly important in the emerging 5G landscape. Your idea of how the frame marking can be specified for GDR makes sense to me.



Regarding using GDR for FIR response, I think there are scenarios where it would be useful to allow a GDR picture as a response to a FIR message. Below are a few examples:

1.      In (5G) scenarios where it would be of interest to keep the E2E delay as low as possible, e.g. in remote controlling of robots, vehicles, etc. Although the initial join delay in general is higher when using GDR pictures compared to IRAP pictures, the E2E delay of the stream is significantly lower. See for example the analysis made by Huawei in JVET-N0114<http://phenix.it-sudparis.eu/jvet/doc_end_user/documents/14_Geneva/wg11/JVET-N0114-v1.zip> where the E2E delay when using GDR pictures is about a third compared to when IRAP pictures are used.
2.      In scenarios where there are multiple receivers of the same low-delay stream, e.g. in a multiparty call or in an online teaching scenario. In the case where a new receiver would like to tune in to the stream, it sends a FIR message. If the response to the FIR message is to send an IDR picture in the stream to all receivers, this could cause a visible artifact (intra pulsing or glitch) for all other receivers. If instead a GDR picture is sent in the stream, the joining receiver can tune in without the quality being compromised for the other receivers.
3.      In scenarios where the bandwidth would not be sufficient to send an IRAP picture. In this case an IRAP picture could of cause be encoded with reduced number of bits which on the other hand may significantly degrade the quality in the beginning of the stream. The IRAP picture could also be signaled over a longer duration, however this would cause a longer E2E delay.

Best regards,

Martin





From: Mo Zanaty (mzanaty) <mzanaty@cisco.com>
Sent: den 10 november 2020 07:33
To: Ye-Kui Wang <yekui.wang@bytedance.com>; Martin Pettersson M <martin.m.pettersson@ericsson.com>; 'Sanchez de la Fuente, Yago' <yago.sanchez@hhi.fraunhofer.de>
Cc: 'Stephan Wenger' <stewe@stewe.org>; avt@ietf.org
Subject: Re: [AVTCORE] [Phishing Risk] Re: [External] Re: Comments on draft-ietf-avtcore-rtp-vvc-05.txt



Hi Ye-Kui, Martin, all,



I recall GDR discussions during early versions of frame marking. This was before GDR was elevated to a normative part of VVC. GDR implementations at that time were highly implementation specific, often not even using the SEI messages defined in AVC/HEVC. There was also a related intra refresh technique arguably identical to GDR but spanning many more frames, more for error resilience than switching points, and again not using SEI and highly implementation specific. Accommodating these GDR flavors in consistent and compact frame marking signaling was not pushed for in frame marking at that time.



Now that GDR is a normative part of VVC, it may be more important to signal it properly in frame marking. The single I bit is clearly insufficient. Additional bits are needed to signal GDR info (e.g. start, middle, end). If the WG wants to add GDR info at this late stage, I can propose some changes. This would certainly need yet another WGLC. So I would first like to hear from the WG and chairs about adding GDR info, then we can figure out the changes needed.



To be clear, this is to signal general GDR, not the special degenerate case of a single frame (count=0) GDR which is essentially an IDR. If folks only care about the latter, that is a much simpler editorial fix similar to what YK suggests below with additional restrictions about count=0. But signaling general GDR requires substantive changes.



Best regards,

Mo





From: Ye-Kui Wang <yekui.wang@bytedance.com<mailto:yekui.wang@bytedance.com>>
Date: Monday, November 9, 2020 at 1:30 AM
To: 'Martin Pettersson M' <martin.m.pettersson@ericsson.com<mailto:martin.m.pettersson@ericsson.com>>, "'Sanchez de la Fuente, Yago'" <yago.sanchez@hhi.fraunhofer.de<mailto:yago.sanchez@hhi.fraunhofer.de>>, "Mo Zanaty (mzanaty)" <mzanaty@cisco.com<mailto:mzanaty@cisco.com>>
Cc: Stephan Wenger <stewe@stewe.org<mailto:stewe@stewe.org>>, "avt@ietf.org<mailto:avt@ietf.org>" <avt@ietf.org<mailto:avt@ietf.org>>
Subject: RE: [AVTCORE] [Phishing Risk] Re: [External] Re: Comments on draft-ietf-avtcore-rtp-vvc-05.txt



Hi Martin, All,



[Explicitly adding Mo, for at least the question on draft-ietf-avtext-framemarking.]



Regarding FIR response: The key point on whether we should allow GDR as a response, I think, is whether additionally allowing it would be an improvement compared to not allow it. In other words, is there a scenario wherein responding with a GDR picture is better than responding with an IDR picture? If yes, then what you suggested (to allow a receiver to request whether the Decoder Refresh Point needs to be an IDR picture, a GDR picture or any of the two) sounds like a right approach.



Regarding frame marking of the I bit: To me, in the phrase “can be decoded” in the semantics mean “can be correctly decoded”, similarly as the use of “decodable” in VVC text itself, wherein often “decodable” without being preceded by “correctly” actually refer to “correctly decodable”. And indeed, with this interpretation, all intra coded pictures, including a GDR picture with recovery_poc_cnt equal to 0, but also including an intra picture that is not any of the following: VPX keyframe, H.264 IDR, H.265 IRAP, H.266 IRAP/GDR. … Hmm, but then the setting of the bit in the VVC RTP payload format draft is not correct: “The I bit MUST be 1 when the NAL unit type is 7-9 (inclusive), otherwise it MUST be 0.”



Maybe the authors (e.g., Mo) of draft-ietf-avtext-framemarking can help comment on the intention? I.e., in the following text:



   o  I: Independent Frame (1 bit) - MUST be 1 for frames that can be

      decoded independent of temporally prior frames, e.g. intra-frame,

      VPX keyframe, H.264 IDR [RFC6184<https://tools.ietf.org/html/rfc6184>], H.265 IDR/CRA/BLA/IRAP

      [RFC7798<https://tools.ietf.org/html/rfc7798>]; otherwise MUST be 0.



Is the phrase “can be decoded” intended to mean “can be correctly decoded”, or is this bit intended to mark a random access point? If the latter, then the wording should be changed to be something like the following:



   o  R: Random-accessible Picture (1 bit) - MUST be 1 for picture that can be used as a random access point, e.g., VPX keyframe, H.264 IDR picture or picture associated with a recovery point SEI message [RFC6184<https://tools.ietf.org/html/rfc6184>], H.265 IRAP picture or picture associated with a recovery point SEI message [RFC7798<https://tools.ietf.org/html/rfc7798>]; otherwise MUST be 0.



BTW, I replaced “IDR/CRA/BLA/IRAP” above with “IRAP”, as IRAP can be any of IDR/CRA/BLA.



In any case, either draft-ietf-avtcore-rtp-vvc-05 or draft-ietf-avtext-framemarking needs to be changed to be aligned with each other.



BR, YK



From: Martin Pettersson M <martin.m.pettersson@ericsson.com<mailto:martin.m.pettersson@ericsson.com>>
Sent: Friday, November 6, 2020 6:53
To: Ye-Kui Wang <yekui.wang@bytedance.com<mailto:yekui.wang@bytedance.com>>; 'Sanchez de la Fuente, Yago' <yago.sanchez@hhi.fraunhofer.de<mailto:yago.sanchez@hhi.fraunhofer.de>>
Cc: 'Stephan Wenger' <stewe@stewe.org<mailto:stewe@stewe.org>>; avt@ietf.org<mailto:avt@ietf.org>
Subject: RE: [AVTCORE] [Phishing Risk] Re: [External] Re: Comments on draft-ietf-avtcore-rtp-vvc-05.txt



Thanks Ye-Kui, Yago and Stephan for all the good comments. Please see my responses below.



FIR message for CRA pictures



Thanks Ye-Kui for clarifying why CRA pictures were not allowed as a response to a FIR message for HEVC. I agree that it would not be necessary for a sender to send a CRA picture instead of an IDR picture as a response to a FIR message (although I don’t see the harm in allowing it).



FIR message for VVC GDR pictures



> Firstly, regarding the suggestion of changing “Upon reception of a FIR, a sender must send an IDR picture.” to “Upon reception of a FIR, a sender must

> send an IDR, a CRA or a GDR picture.” Herein, my hesitation is because that a FIR, as the name indicates, requests a full intra picture that immediately

> stop any prediction from earlier pictures, hoping that once this is received by the receiver, all pictures and all picture areas are correct. GDR won’t do

> that. And note that we are also in a low delay environment, using GDR would need the receiver to wait much longer (than using IDR) to have correct full

> pictures. Note that rfc5104 also mentions that using GDR the user experience would not be as good. My personal opinion is, if using GDR were really

> acceptable, we would have added that in RFC 7798 (the HEVC RTP payload format), even though in HEVC GDR is indicated by the recovery point SEI

> message instead of by a NAL unit type as in VVC.



I think that the key aspect for GDR in VVC compared to the previous standards is that the GDR picture is now normative and fully specified in VVC as pointed out by Stephan. Thus, every VVC decoder would be required to tune in at a GDR picture, which is far from guaranteed for HEVC and AVC where a non-normative SEI message is used. Moreover, when tuning into a GDR picture in VVC, the recovery point picture and the following pictures are guaranteed to be an exact match to as if the decoding had started at an IRAP picture prior to the GDR picture. For HEVC and AVC, the exact match is not guaranteed. Thus, I would expect the user experience to be much better for VVC than for HEVC and AVC.



I agree with you Ye-Kui, that there may be an expectation in some circumstances from the receiver sending the FIR message that the response from the sender should be a picture that may be directly decoded and displayed. As a suggestion to address the expectations of the receiver, an option could be to allow a receiver to request whether the Decoder Refresh Point needs to be an IDR picture, a GDR picture or any of the two. This may be either specified as a parameter in the FIR message or as a new request message, e.g. a refresh request message.



Frame marking for GDR pictures



Ye-Kui:

> Thirdly, including GDR to the I bit constraint confuses both the name and the semantics of the bit, “independent of temporally prior frames”. To me, if

> an indication of GDR is important, that should be included in draft-ietf-avtext-framemarking, preferrable using a separate indication, and if so, it should

> also include GDR in AVC and HEVC indicated by the recovery point SEI message, and then in this draft carries that over in the same way as carrying over

> the I bit. (BTW, Shuai, we should update the status of the [FrameMarking] reference.)



Yago:

> As for the I bit I agree with Ye-Kui that it seems cleaner to me that if an indication of GDR is desirable, a separate indication could be included in

> draft-ietf-avtext-framemarking instead of reusing  the existing one that states that a frame is independent. However, should we allow or disallow GDR

> pictures with ph_recovery_poc_cnt equal to 0 for this case?



I agree that it may cause confusion and that it would make sense to have a separate indication for GDR pictures in draft-ietf-avtext-framemarking as suggested.



But if we decide to not mark GDR pictures with the I-bit, then I think the semantics for the I-bit in draft-ietf-avtext-framemarking need to be rephrased as I think it would otherwise cause confusion the other way.



“I: Independent Frame (1 bit) - MUST be 1 for frames that can be decoded independent of temporally prior frames”



As it is stated now, in my interpretation, it suggests that GDR pictures must have the I-bit set since a GDR picture can be decoded (i.e. the decoding could be started) without using any temporally prior frames. In particular, for a GDR picture with recovery_poc_cnt equal to 0, the GDR picture can be fully decoded without dependency on any other frame.



BR, Martin





From: avt <avt-bounces@ietf.org<mailto:avt-bounces@ietf.org>> On Behalf Of Ye-Kui Wang
Sent: den 6 november 2020 00:42
To: 'Sanchez de la Fuente, Yago' <yago.sanchez@hhi.fraunhofer.de<mailto:yago.sanchez@hhi.fraunhofer.de>>
Cc: 'Stephan Wenger' <stewe@stewe.org<mailto:stewe@stewe.org>>; avt@ietf.org<mailto:avt@ietf.org>; 'Martin Pettersson M' <martin.m.pettersson=40ericsson.com@dmarc.ietf.org<mailto:martin.m.pettersson=40ericsson.com@dmarc.ietf.org>>
Subject: Re: [AVTCORE] [Phishing Risk] Re: [External] Re: Comments on draft-ietf-avtcore-rtp-vvc-05.txt



Hi Yago,



Regarding GDR pictures with ph_recovery_poc_cnt equal to 0, my opinion is similar as for CRA pictures, i.e., if as an encoder to response to a FIR, there is no reason for the encoder to encode an intra refresh picture as CRA or GDR with ph_recovery_poc_cnt equal to 0 instead of as IDR.



BR, YK



From: Sanchez de la Fuente, Yago <yago.sanchez@hhi.fraunhofer.de<mailto:yago.sanchez@hhi.fraunhofer.de>>
Sent: Thursday, November 5, 2020 13:12
To: Ye-Kui Wang <yekui.wang@bytedance.com<mailto:yekui.wang@bytedance.com>>
Cc: Stephan Wenger <stewe@stewe.org<mailto:stewe@stewe.org>>; Martin Pettersson M <martin.m.pettersson=40ericsson.com@dmarc.ietf.org<mailto:martin.m.pettersson=40ericsson.com@dmarc.ietf.org>>; shuaiizhao(Shuai Zhao) <shuaiizhao@tencent.com<mailto:shuaiizhao@tencent.com>>; avt@ietf.org<mailto:avt@ietf.org>
Subject: [Phishing Risk] Re: [AVTCORE] [External] Re: Comments on draft-ietf-avtcore-rtp-vvc-05.txt



Dear Ye-Kui, all,



After reading Ye-Kui’s email I thought again about my previous Email and even using GDR pictures with ph_recovery_poc_cnt equal to 0 would be questionable as a response for a FIR since in that case the same picture could have a NAL unit type of IDR. Same reasoning as Ye-Kui gave for CRA, there is no good reason for the encoder to encode an intra refresh picture as GDR pictures with ph_recovery_poc_cnt equal to 0 instead of as IDR if it could be an IDR.



As for the I bit I agree with Ye-Kui that it seems cleaner to me that if an indication of GDR is desirable, a separate indication could be included in draft-ietf-avtext-framemarking instead of reusing the existing one that states that a frame is independent. However, should we allow or disallow GDR pictures with ph_recovery_poc_cnt equal to 0 for this case?



Best regards,

Yago Sánchez



---

Department Video Coding & Analytics

Group Multimedia Communications

Fraunhofer HHI - Heinrich Hertz Institute
Einsteinufer 37, 10587 Berlin, Germany
http://www.hhi.fraunhofer.de/ip/mc<https://protect2.fireeye.com/v1/url?k=a723debd-f8b8e4f8-a7239e26-867b36d1634c-f397f8517be49b2f&q=1&e=525becb6-dd97-448d-a358-26fbd8ac83f5&u=http%3A%2F%2Fwww.hhi.fraunhofer.de%2Fip%2Fmc>

Tel.: +49 30 310 02663

yago.sanchez@hhi.fraunhofer.de<mailto:yago.sanchez@hhi.fraunhofer.de>







   On 5. Nov 2020, at 21:51, Ye-Kui Wang <yekui.wang@bytedance.com<mailto:yekui.wang@bytedance.com>> wrote:



   Hi Martin, Stephan, All,



   I was also hesitating to say yes when I first saw most of the suggestions, so hesitating such that I was hoping Stephan et al would reply and address them 😊



   Great that Stephan did respond. Thanks!



   Now a few additional comments from my side.



   Firstly, regarding the suggestion of changing “Upon reception of a FIR, a sender must send an IDR picture.” to “Upon reception of a FIR, a sender must send an IDR, a CRA or a GDR picture.” Herein, my hesitation is because that a FIR, as the name indicates, requests a full intra picture that immediately stop any prediction from earlier pictures, hoping that once this is received by the receiver, all pictures and all picture areas are correct. GDR won’t do that. And note that we are also in a low delay environment, using GDR would need the receiver to wait much longer (than using IDR) to have correct full pictures. Note that rfc5104 also mentions that using GDR the user experience would not be as good. My personal opinion is, if using GDR were really acceptable, we would have added that in RFC 7798 (the HEVC RTP payload format), even though in HEVC GDR is indicated by the recovery point SEI message instead of by a NAL unit type as in VVC.



   Secondly, on adding CRA. Herein my hesitation is because CRA is not really supposed to be used in low-delay conversational application environment. If as an encoder you don’t plan to have some associated leading pictures encoded for a CRA picture, there is no reason for the encoder to encode an intra refresh picture as CRA instead of as IDR. That’s why we did not allow CRA as a response to FIR in RFC 7798.



   Thirdly, including GDR to the I bit constraint confuses both the name and the semantics of the bit, “independent of temporally prior frames”. To me, if an indication of GDR is important, that should be included in draft-ietf-avtext-framemarking, preferrable using a separate indication, and if so, it should also include GDR in AVC and HEVC indicated by the recovery point SEI message, and then in this draft carries that over in the same way as carrying over the I bit. (BTW, Shuai, we should update the status of the [FrameMarking] reference.)



   However, adding the GDR abbreviation is good, and I think we should also add a brief description of GDR into clause 1.1.2 (Systems and Transport Interfaces).



   BR, YK



   From: avt <avt-bounces@ietf.org<mailto:avt-bounces@ietf.org>> On Behalf Of Stephan Wenger
   Sent: Thursday, November 5, 2020 10:50
   To: Martin Pettersson M <martin.m.pettersson=40ericsson.com@dmarc.ietf.org<mailto:martin.m.pettersson=40ericsson.com@dmarc.ietf.org>>; shuaiizhao(Shuai Zhao) <shuaiizhao@tencent.com<mailto:shuaiizhao@tencent.com>>; avt@ietf.org<mailto:avt@ietf.org>
   Subject: [External] Re: [AVTCORE] Comments on draft-ietf-avtcore-rtp-vvc-05.txt



   Hi Martin,



   Thanks for those suggested changes, which I think would consistently implement the option to react to a “Full Intra Request” (FIR) with a gradual decoder refresh (GDR) series of pictures.



   As to whether we should allow GDR as a reaction to a FIR: I’m a bit torn here.  Arguments can be made either way, please see below.  Would others in the WG please weigh in?



   On one hand, the argument (somewhat rephrased here) that VVC’s GDR pictures are a fully specified replacement for traditional “all intra” pictures is a good one.  I concur that this is somewhat new in VVC, compared to older video coding standards.  Pretty much all of those could do some form of GDR (even good old H.261 and MPEG-2), but things were clumsy, results were not guaranteed, or one had to rely on SEI messages and similar exotics for implementation.  In the environments where FIR matters—video conferencing mostly—no one ever used GDR in any context except in those ca. 1990 H.261-based systems which didn’t implement full intra pictures at all, and relied on intra macroblock walk-around during the initial communication setup.



   On the other hand, there’s a reason why FIR until now was consistently interpreted as a requirement of sending a single “all intra” picture (whatever that translates to in the various video coding standards and technologies).  That reason was related to the architecture of the MCUs that were around when RFC 5104 was written, back in the 2005-2008 timeframe.  What people requested then was that the internal architecture of an MCU should stay as independent of the codec in use as possible.  For FIR, that means that means: if an MCU sends out a FIR to a sending endpoint, it expects exactly one intra picture at the earliest opportunity that can be used to sync in added decoders of unknown state.  That logic would now have to change to receive either a single IDR picture or a series of pictures that make up a GDR.  A transcoding MCU would have to go further and include the decoding of those multiple pictures with all the tricky (though now fully specified!) stuff that goes on in VVC, before transcoding.



   Stephan





   From: avt <avt-bounces@ietf.org<mailto:avt-bounces@ietf.org>> on behalf of Martin Pettersson M <martin.m.pettersson=40ericsson.com@dmarc.ietf.org<mailto:martin.m.pettersson=40ericsson.com@dmarc.ietf.org>>
   Date: Tuesday, November 3, 2020 at 08:38
   To: "shuaiizhao(Shuai Zhao)" <shuaiizhao@tencent.com<mailto:shuaiizhao@tencent.com>>, "avt@ietf.org<mailto:avt@ietf.org>" <avt@ietf.org<mailto:avt@ietf.org>>
   Subject: [AVTCORE] Comments on draft-ietf-avtcore-rtp-vvc-05.txt



   Hi,



   Thanks for the good progress on the VVC RTP payload format. Below are some suggested modifications for your consideration:



   1.   In section 3.2, add “GDR                       Gradual Decoding Refresh”



   2.   In section 8.4, change “Upon reception of a FIR, a sender must send an IDR picture.” to “Upon reception of a FIR, a sender must send an IDR, a CRA or a GDR picture.”



   Motivation:

   One of the versatile features in VVC is its support for low-latency coding where the GDR picture is a key component to achieve low latency. Compared to AVC and HEVC where GDR is signaled in an SEI message with optional support by the decoder, the GDR picture in VVC is a normative part of the specification and the decoder must be able to tune in at a GDR picture. Therefore it makes sense to allow a sender to respond with a GDR picture upon receiving a FIR. Note also that a gradual decoding refresh point is mentioned as a possible Decoder Refresh Point in response to the FIR command in https://tools.ietf.org/html/rfc5104.



   Sending a CRA picture as a response to FIR would be fine as well in my opinion. I don’t see the reason to exclude that.





   3.   In section 9.1, change “The I bit MUST be 1 when the NAL unit type is 7-9 (inclusive), otherwise it MUST be 0.” to “The I bit MUST be 1 when the NAL unit type is 7-10 (inclusive), otherwise it MUST be 0.”



   In section 9.2, change “The I bit MUST be 1 when the NAL unit type is 7-9 (inclusive), otherwise it MUST be 0.” to “The I bit MUST be 1 when the NAL unit type is 7-10 (inclusive), otherwise it MUST be 0.”



   Motivation:

   NAL unit type 10 is GDR_NUT.



   In https://tools.ietf.org/id/draft-ietf-avtext-framemarking-09.html the I bit is specified as:

   I: Independent Frame (1 bit) - MUST be 1 for frames that can be decoded independent of temporally prior frames, e.g. intra-frame, VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/RAP [RFC7798]; otherwise MUST be 0.



   The GDR picture is typically not fully refreshed in one frame, but it does not need prior temporal pictures to start the decoding process, i.e. a bitstream that starts with a GDR picture in VVC is a valid bitstream.



   Best regards,

   Martin Pettersson