Re: [AVTCORE] [New-wg-docs] I-D Action: draft-ietf-avtcore-rtp-vvc-00.txt(Internet mail)

"Roni Even (A)" <> Mon, 23 March 2020 05:38 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 7AB213A07AB for <>; Sun, 22 Mar 2020 22:38:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id juzHOZEqA96D for <>; Sun, 22 Mar 2020 22:38:09 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id B4F963A07A7 for <>; Sun, 22 Mar 2020 22:38:08 -0700 (PDT)
Received: from (unknown []) by Forcepoint Email with ESMTP id 85EC392857588EBC631C; Mon, 23 Mar 2020 05:38:04 +0000 (GMT)
Received: from ( by ( with Microsoft SMTP Server (TLS) id 14.3.408.0; Mon, 23 Mar 2020 05:38:03 +0000
Received: from ([]) by ([]) with mapi id 14.03.0487.000; Mon, 23 Mar 2020 13:37:59 +0800
From: "Roni Even (A)" <>
To: "shuaiizhao(Shuai Zhao)" <>, "Sanchez de la Fuente, Yago" <>, "" <>
CC: "" <>
Thread-Topic: [AVTCORE] [New-wg-docs] I-D Action: draft-ietf-avtcore-rtp-vvc-00.txt(Internet mail)
Thread-Index: AQHV7GrZ89ZWNVIB0E2bb4Nnz19MqKgr9cOAgB+0bwD//4UugIAKodqw
Date: Mon, 23 Mar 2020 05:37:58 +0000
Message-ID: <>
References: <> <> <> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
x-originating-ip: []
Content-Type: multipart/alternative; boundary="_000_6E58094ECC8D8344914996DAD28F1CCD27DD0A44DGGEMM506MBXchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <>
Subject: Re: [AVTCORE] [New-wg-docs] I-D Action: draft-ietf-avtcore-rtp-vvc-00.txt(Internet mail)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 23 Mar 2020 05:38:12 -0000

Hi Shuai Zhao,
You can add a new co-author if he contributed text to the document.
Roni Even
AVTCore co-chair

From: shuaiizhao(Shuai Zhao) []
Sent: Tuesday, March 17, 2020 4:16 AM
To: Sanchez de la Fuente, Yago;
Cc:; shuaiizhao(Shuai Zhao); Roni Even (A)
Subject: Re: [AVTCORE] [New-wg-docs] I-D Action: draft-ietf-avtcore-rtp-vvc-00.txt(Internet mail)

Dear Chair,

Yago has provided very valuable comments to this work and already integrated the improvement in his working draft.  Is there any objection to add him as co-author and then he is able to submit an new WG draft?

Shuai Zhao

From: "Sanchez de la Fuente, Yago" <<>>
Date: Monday, March 16, 2020 at 12:34
To: "shuaiizhao(Shuai Zhao)" <<>>
Cc: "<>" <<>>, "<>" <<>>
Subject: Re: [AVTCORE] [New-wg-docs] I-D Action: draft-ietf-avtcore-rtp-vvc-00.txt(Internet mail)

Dear Shuai, all,

I have reviewed the draft and have the following comments of editorial or non-normative nature:

1) [Page 5] In the introduction to BDOF the text reads “Bi-directional optical flow (BDOF) is a similar method to DMVR but at 4x4 sub-block level. Another difference is that DMVR is based on block matching while BDOF derives MVs with equations.” while BDOF does not modify the MVs but adds a sample-wise offset in the weighted prediction step. It should rather read something like “Bi-directional optical flow (BDOF) adds a sample wise offset at 4x4 sub-block level that is derived with equations based on gradients of the referenced blocks”

2) [Page 6] Text about the constraint flags read: “It further optionally includes constraint flags, which indicate that the video bitstream will be constraint…”, while the syntax elements for the constraint are not optional but setting them is optional. Therefore, I would rather remove the wording “optionally” and clarify that setting the constraints is optional.

3) [Page 7] In the text about SPS there is mentioning of CLVS and random access. I think it could be useful to write down that a CLVS in VVC might start with a GDR NUT. I am not sure how frequently it will be used but I think this is something we could make it clear here so that there is no expectation that there is a IDR or CRA in the bitstream as it is not required for VVC.

4) [Page 7] It could be useful to make it clear that a picture might refer to more than on APS. Note that the APS for the LMCS can only be the same for all slices in the picture but different slices within a picture might refer to different ALF-APSs and more than one of these APS NUTs might be present for an AU.

5) [Page 7] The text about profile tier level reads: “The profile, tiler and level syntax structures in DCI, VPS and SPS contain profile, tier, level information for all layers that refer to the DCI, for layers associated with one or more output layer sets specified by the VPS, and for the lowest layer among the layers that refers to the SPS, respectively." I think the word “lowest" is not accurate. If we have SPS sharing and the PTL information is included in an SPS, the information applies to any layer referring to that SPS not only the lowest layer among all referring to that SPS. I would rewrite is as follows: “… with one or more output layer sets specified by the VPS, and for any layer that refers to the SPS, respectively.”]

6) [Page 7] I think that it would be beneficial to write something about the picture header (PH). Especially, since VVC has two modes to include that information. One when there is a NAL unit for the conveyed information and one with the PH within the VCL NAL unit itself.

7) [Page 8] There is some text again with respect to the  profile tier level structure optionally including the constraint flags. (Similar as comment 2)

8) [Page 9] Related to spatial scalability, the text reads: “Then, the resampling process for inter-layer prediction is performed at the block-level, by modifying the existing interpolation process for motion compensation. It means that no additional resampling process is needed to support scalability.” The wording “by modifying” sounds to me as if there is something particular done for this case although later it is stated no additional resampling process is needed… I would rather write "without modifying” and maybe writing that the same interpolation process as for Reference Picture Resampling is used.

9) [Page 13]. There are some missing definitions for CLVSS pictures and IRAP picture.

10) [Page 16] There is an abbreviation for DONB while not used anywhere that should be removed.

Best regards,
Yago Sánchez

Department Video Coding & Analytics
Group Multimedia Communications

Fraunhofer HHI - Heinrich Hertz Institute
Einsteinufer 37, 10587 Berlin, Germany

Tel.: +49 30 310 02663<>

On 26. Feb 2020, at 07:24, shuaiizhao(Shuai Zhao) <<>> wrote:

Dear experts,

We have uploaded the “draft-ietf-avtcore-rtp-vvc-00”, which addressed all the comments from the IETF 106 meetings. Please have a read and lets us know your comments.

Shuai Zhao

From: New-wg-docs <<>> on behalf of "<>" <<>>
Date: Tuesday, February 25, 2020 at 22:06
To: "<>" <<>>
Subject: [New-wg-docs] I-D Action: draft-ietf-avtcore-rtp-vvc-00.txt(Internet mail)

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Audio/Video Transport Core Maintenance WG of the IETF.

        Title           : RTP Payload Format for Versatile Video Coding (VVC)
        Authors         : Shuai Zhao
                          Stephan Wenger
                Filename        : draft-ietf-avtcore-rtp-vvc-00.txt
                Pages           : 38
                Date            : 2020-02-25

   This memo describes an RTP payload format for the video coding
   standard ITU-T Recommendation [H.266] and ISO/IEC International
   Standard [ISO23090-3], both also known as Versatile Video Coding
   (VVC) and developed by the Joint Video Experts Team (JVET).  The RTP
   payload format allows for packetization of one or more Network
   Abstraction Layer (NAL) units in each RTP packet payload as well as
   fragmentation of a NAL unit into multiple RTP packets.  The payload
   format has wide applicability in videoconferencing, Internet video
   streaming, and high-bitrate entertainment-quality video, among other

The IETF datatracker status page for this draft is:

There are also htmlized versions available at:

Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at<>.

Internet-Drafts are also available by anonymous FTP at:

New-wg-docs mailing list<>

Audio/Video Transport Core Maintenance<>