Re: [AVTCORE] [External] Re: [Internet] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)

Dr Hendry <dr.hendry@lge.com> Tue, 31 August 2021 18:45 UTC

Thread-Topic: [External] Re: [AVTCORE] [Internet] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)
Thread-Index: AQHXms6ege1QZOWXOE6M7oykKLXSo6uGSwUAgACpFIA=
References: <OF49502BBF.C31D0A88-ON0525873D.007EAFEC-0525873D.007EAFEC@lge.com> <OF49502BBF.C31D0A88-ON0525873D.007EAFEC-0525873D.007EAFEE@lge.com> <047101d79ad1$15f418c0$41dc4a40$@bytedance.com> <CD5ACE1E-FD86-4538-868C-71893F5303D7@hhi.fraunhofer.de> <OFBD6BBEF8.5C1FEAAA-ON05258742.0066FCEE-05258742.0066FCEE@lge.com>
In-Reply-To: <CD5ACE1E-FD86-4538-868C-71893F5303D7@hhi.fraunhofer.de>
Accept-Language: en-US, de-DE
MIME-Version: 1.0
Reply-To: Dr Hendry <dr.hendry@lge.com>
From: Dr Hendry <dr.hendry@lge.com>
To: Sanchez de la Fuente Yago <yago.sanchez@hhi.fraunhofer.de>
Cc: Ye-Kui Wang <yekui.wang@bytedance.com>, Stephan Wenger <stewe@stewe.org>, avt@ietf.org, "shuaiizhao(Shuai Zhao)" <shuaiizhao@tencent.com>
Message-ID: <OFBD6BBEF8.5C1FEAAA-ON05258742.0066FCEE-05258742.0066FCF0@lge.com>
Date: Tue, 31 Aug 2021 13:44:54 -0500
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/uf6dUSEl_bjPkexMm4DeKsl4ACQ>
Subject: Re: [AVTCORE] [External] Re: [Internet] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)
Precedence: list

Hi Yago, all,

Sorry for late response.

Considering what we have so far (comments from Stefan, Ye-Kui, and you), I think what you suggested is good. Thanks!

best regards,

Hendry

---------- Original Message ----------

From : "Sanchez de la Fuente Yago" <yago.sanchez@hhi.fraunhofer.de>
To : Ye-Kui Wang <yekui.wang@bytedance.com>
Cc : Dr Hendry Principal Research Engineer(dr.hendry), Stephan Wenger <stewe@stewe.org>, avt@ietf.org, "shuaiizhao(Shuai Zhao)" <shuaiizhao@tencent.com>
Date : 2021/08/27 02:26:40 [GMT-07:00]
Subject : Re: [External] Re: [AVTCORE] [Internet] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)

Hi Hendry, all,

Thanks Hendry for reviewing the draft.

With respect to comments 1 and 3 I agree with Stephan’s response.

With respect to comment 2, I agree with Ye-Kui that there might be cases in which aggregating NAL units of multiple pictures of the same AU might be useful, while in oder cases as Hendry suggests not having APs containing pictures of different layers from the same AU might simplify MANEs operation. I think that the issue is similar to sub-pictures, so in my opinion the best would be to add an informative note for both sub-pictures and pictures in layered bitstreams explaining that if a system envisions sub-picture level or picture level modifications, for example by removing sub-pictures or pictures of a particular layer, a good design choice on the sender’s side would be to aggregate NAL units belonging to only the same sub-picture or picture of a particular layer.

Would this make sense?

Best regards,

Yago Sánchez

---

Department Video Communication and Applications

Group Multimedia Communications

Fraunhofer HHI - Heinrich Hertz Institute
Einsteinufer 37, 10587 Berlin, Germany
http://www.hhi.fraunhofer.de/ip/mc" rel="nofollow">http://www.hhi.fraunhofer.de/ip/mc

Tel.: +49 30 310 02663

yago.sanchez@hhi.fraunhofer.de

On 27. Aug 2021, at 01:21, Ye-Kui Wang <yekui.wang@bytedance.com> wrote:
All,

Just to express my opinion now with a better understanding of Hendry’s intent.

To me, as long as the design allows encapsulating (some or all) NAL units of only one picture, or (some or all) NAL units of only one subpicture into one AP, it’s sufficient for those use cases wherein picture-level or subpicture-level handling by MANEs is needed, because the sender can do just what’s needed. While disallowing encapsulating of (some or all) NAL units from more than one picture of an AU into one AP is for disallowing “stupid” encoder/encapsulation behavior (the value and need of this need a bit justification), it also disallows whatever use cases wherein encapsulating of (some or all) NAL units from more than one picture is useful.

BR, YK

From: Dr Hendry <dr.hendry@lge.com>
Sent: Thursday, August 26, 2021 16:04
To: Ye-Kui Wang <yekui.wang@bytedance.com>
Cc: 'Stephan Wenger' <stewe@stewe.org>; avt@ietf.org; 'shuaiizhao(Shuai Zhao)' <shuaiizhao@tencent.com>; yago.sanchez@hhi.fraunhofer.de
Subject: RE: [External] Re: [AVTCORE] [Internet] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)

Hi Stefan, Ye-Kui, all,

Thanks!
Please find my responses next to your comments, tagged with [HD]

best regards,
Hendry

---------- Original Message ----------

From : "Ye-Kui Wang" <yekui.wang@bytedance.com>
Subject : RE: [External] Re: [AVTCORE] [Internet] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)

Hi Hendry,

Thanks for reviewing the draft! Here is my response regarding your second comment.

The current wording of the sentence “An AP aggregates NAL units of one access unit.” is indeed a bit unclear. The intent was to say that an AP can only contain NAL units from one AU, not that an AP always contain all NAL units of an AU. Thus what you wanted is allowed. We just need to reword a bit to clarify this intent.

[HD] I have the same understanding as well from reading the current text, but agree that the new wording is better than the current one (i.e., an AP can only contain NAL units from one AU).
However, the issue i tried to raise is not that an AP always contains all NAL units of an AU. My concern is that having an AP contains NAL units from multiple picture unit (PU) may cause extra work for MANE when there is one or more layers need to be removed.

On the other hand, I had brief discussion with Ye-Kui prior to replying this email about this topic and we discussed about the case where encoder wants to put multiple pictures (or even all pictures of an AU) into one AP. That certainly a possibility but nowadays i think it is unlikely we will have multiple pictures in one AP considering the resolutions of videos exchanged over the network.

Responding the Stefan's remark that unpacking and re-packing APs is a trivial operation compared to other stuff a MANE needs to do, i think it is not only about unpacking and re-packing but the need to check every AP in the case of layer dropping / removal. In such case, a MANE cannot know which AP needs some treatment because one of more of its aggregation units need to be dropped from just checking the package header.

@Stephan: With such a clarification of the intent, which is easy, the subpicture part of Hendry’s 2nd comment would also be resolved.

[HD] After seeing Stefan's comment about the need for more information to be signalled if we want to go further down to subpicture, I agree that it is not a trivial signalling. It seems good to me if we just stop till PU level.

BR, YK

From: Stephan Wenger <stewe@stewe.org>
Subject: [External] Re: [AVTCORE] [Internet] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)

Hi Hendry,

Regarding your first comment, I propose not to act on it. IETF RFCs tend to tolerate a lot more explanatory text and redundancies than video coding specs tend to do—in fact, explanatory text is usually encouraged. The redundancy you mentioned is not uncommon, and, to me, improves readability of the text.

[HD] I think it is OK. If it is easier to understand that way, we should keep it. I also checked how it was written in the payload format for HEVC and saw that this actually inheritted from that spec, thus, it is a good idea to keep it since people may already be familiar with it.

Ye-Kui and Yago, can you please also take a look at the second comment? My hunch is that Hendry’s “minimum” solution is the design intention, hence we may have a terminology issue here.

I’m not quite sure what to do about sub-pictures. Perhaps an informative note like the following: “If a sender employs sub-pictures and the system envisions sub-picture-based compressed domain modification of the NAL unit stream in a MANE, for example by removing sub-pictures, a good design choice on the sender’s side would be to aggregate NAL units belong to individual sub-pictures in their own respective aggregation packet.”. Is that what you are after? I don’t mind adding such a note, but will remark that such sub-picture processing would certainly require signaling, and that signaling is currently undefined. I think such signaling would be non-trivial and would likely require something similar as the CLUE framework, defining which would be heavy lifting indeed. Also, unpacking and re-packing APs is a trivial operation compared to other stuff a MANE needs to do, hence I’m not overly worried.

[HD] Please see my response to Ye-Kui's email above.

Regarding the third comment, I believe our design intention is aligned with your preference. I think we will implement that as proposed.

[HD] Sounds good.

AVT WG, how does above sound?

S.
From: "shuaiizhao(Shuai Zhao)" <shuaiizhao@tencent.com>
Date: Monday, August 23, 2021 at 09:11
To: "avt@ietf.org" <avt@ietf.org>
Cc: Stephan Wenger <stewe@stewe.org>, Dr Hendry <dr.hendry@lge.com>
Subject: Re: [Internet][AVTCORE] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)

Thanks, Hendry, for your comments. For easy communication with other reviewers, I copy/paste the content in the xlsx file here.

1

General / Editorial

4.3.2

Figure 5 and 6

The difference between the first aggregation unit and the rest of aggregation unit is that the first aggregation unit may contain DONL (Conditional).

Move the illustration of DONL (Conditional) to Figure 4. With that change, Figure 5 and Figure 6 are the same so one of them can be removed and text can be made shorter.

2

Technical

4.3.2

Page 24 / Line 1

Currently it is specified that "An AP aggregates NAL units of one access unit". It seems that this allows an AP to contain NAL units from multiple pictures in the case there are multiple layers in the stream. Further, it also allow an AP to contain some NAL units of a picture and some of NAL units of the following picture. This is not desirable since it requires MANE to perform extra works when it needs to drop certain picture.

Further, VVC has subpicture that may be independently coded which means it can be extracted as well. Consider making life easier for MANE to extract / drop NAL units based on subpicture as well.

At minimum, constraint that an AP can aggregate NAL units of one picture unit, instead of one access unit.

Further, consider having constraint that an AP can aggregate NAL units of one subpicture, if present.

3

Technical

4.3.3

Page 28 & 29

The semantics of P bit in FU header when equal to 1 say the FU contain the last NAL unit of a coded picture. Is it the last VCL NAL unit or simply NAL unit (i.e., non-VCL NAL unit)?

Note that the last NAL unit in a picture unit can be non-VCL NAL unit as well (e.g., the last one is a suffix SEI NAL unit or a suffix APS NAL unit). If the last NAL unit is a non-VCL NAL unit (e.g., a suffix NAL unit), which may be dropped, it may cause a burden to MANE since it may need to update the previous packet containing the NAL unit that immediately preceed the drop NAL unit in decoding order to change the value of P bit from 0 to 1.

Clarify the semantic of P bit.

If it can be equal to 1 only for FU containing the last fragment of the last VCL NAL unit of a picture, add explation that there may be packet(s) containing non-VCL NAL unit that is associated with the picture as well.

Otherwise, if it can be equal to 1 for either VCL or non-VCL NAL unit, then add explanation that in the case that the NAL unit is dropped / not forwarded to the receiver and the NAL unit that immediately precede the last NAL unit is also contained in FUs, the value of P bit in the FU header of the last FU of that NAL unit need to be updated to be equal to 1.

Preferrence: P bit can be equal to 1 for the last FU of the last VCL NAL unit of a picture.

Best,

Shuai

From: avt <avt-bounces@ietf.org> on behalf of Dr Hendry <dr.hendry@lge.com>
Reply-To: Dr Hendry <dr.hendry@lge.com>
Date: Monday, August 23, 2021 at 9:04 AM
To: "avt@ietf.org" <avt@ietf.org>
Cc: "stewe@stewe.org" <stewe@stewe.org>
Subject: [Internet][AVTCORE] Comments (was: FW: [jvet] FW: WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”)

Dear sir,

Upon reviewing the available draft RTP payload format for VVC (draft-ietf-avtcore-rtp-vvc-10), I have sent several comments to the editors but was requested to send them to you instead.

Please find the comments in the attached file.

best regards,

Hendry

---------- Original Message ----------

From : Stephan Wenger <stewe@stewe.org>
To : Dr Hendry Principal Research Engineer(dr.hendry)
Cc : shuai.zhao@ieee.org, yago.sanchez@hhi.fraunhofer.de, yekui.wang@bytedance.com
Date : 2021/08/20 15:49:22 [GMT-07:00]
Subject : Re: [jvet] FW: [AVTCORE] WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”
Hi Hendry, could you put the content of the xls into plain ascii and send to avt@ietf.org? At this point, public evidence of review is important.
Tnx, S.

Sent from my iPhone

On Aug 20, 2021, at 15:34, Dr Hendry <dr.hendry@lge.com> wrote:
Dear Stefan, all,

After reviewing the draft, I have few comments for it. I am not familiar with the process but I hope the comments can be addressed.

Best regards,

Hendry

---------- Original Message ----------

From : Stephan Wenger <stewe@stewe.org>
To : jvet@lists.rwth-aachen.de
Date : 2021/08/16 22:57:24 [GMT-07:00]
Subject : [jvet] FW: [AVTCORE] WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”
All:

Please see below the announcement for the Working Group Last Call (WGLC) of the VVC RTP payload format in the IETF. The WGLC is the first of a two-step approval process. If you have interest and bandwidth, please comment as described below.

Stephan

From: avt <avt-bounces@ietf.org> on behalf of Bernard Aboba <bernard.aboba@gmail.com>
Date: Monday, August 16, 2021 at 13:30
To: IETF AVTCore WG <avt@ietf.org>
Subject: [AVTCORE] WGLC on “RTP Payload Format for Versatile Video Coding (VVC)”
This is an announcement of WG last call on "RTP Payload Format for Versatile Video Coding (VVC)”.
 
The document is available for inspection here:
https://datatracker.ietf.org/doc/draft-ietf-avtcore-rtp-vvc/" rel="nofollow">https://datatracker.ietf.org/doc/draft-ietf-avtcore-rtp-vvc/
 
 
WG Last Call will end on August 30, 2021.
 
In response, please state one of the following:
 
* I support advancing the document to Proposed Standard
 
* I object to advancement to Proposed Standard, due to Issues
described below <Issue description or link>.
 
Bernard Aboba
 
For the Chairs
_______________________________________________
jvet mailing list -- jvet@lists.rwth-aachen.de
To unsubscribe send an email to jvet-leave@lists.rwth-aachen.de
https://lists.rwth-aachen.de/postorius/lists/jvet.lists.rwth-aachen.de" rel="nofollow">https://lists.rwth-aachen.de/postorius/lists/jvet.lists.rwth-aachen.de

[AVTCORE] Comments (was: FW: [jvet] FW: WGLC on “… Dr Hendry
Re: [AVTCORE] [Internet] Comments (was: FW: [jvet… shuaiizhao(Shuai Zhao)
Re: [AVTCORE] [Internet] Comments (was: FW: [jvet… Stephan Wenger
Re: [AVTCORE] [Internet] Comments (was: FW: [jvet… Stephan Wenger
Re: [AVTCORE] [External] Re: [Internet] Comments … Ye-Kui Wang
Re: [AVTCORE] [External] Re: [Internet] Comments … Ye-Kui Wang
Re: [AVTCORE] [External] Re: [Internet] Comments … Sanchez de la Fuente, Yago
Re: [AVTCORE] [External] Re: [Internet] Comments … Dr Hendry
Re: [AVTCORE] [External] Re: [Internet] Comments … Dr Hendry

1	General / Editorial	4.3.2	Figure 5 and 6	The difference between the first aggregation unit and the rest of aggregation unit is that the first aggregation unit may contain DONL (Conditional).	Move the illustration of DONL (Conditional) to Figure 4. With that change, Figure 5 and Figure 6 are the same so one of them can be removed and text can be made shorter.
2	Technical	4.3.2	Page 24 / Line 1	Currently it is specified that "An AP aggregates NAL units of one access unit". It seems that this allows an AP to contain NAL units from multiple pictures in the case there are multiple layers in the stream. Further, it also allow an AP to contain some NAL units of a picture and some of NAL units of the following picture. This is not desirable since it requires MANE to perform extra works when it needs to drop certain picture. Further, VVC has subpicture that may be independently coded which means it can be extracted as well. Consider making life easier for MANE to extract / drop NAL units based on subpicture as well.	At minimum, constraint that an AP can aggregate NAL units of one picture unit, instead of one access unit. Further, consider having constraint that an AP can aggregate NAL units of one subpicture, if present.
3	Technical	4.3.3	Page 28 & 29	The semantics of P bit in FU header when equal to 1 say the FU contain the last NAL unit of a coded picture. Is it the last VCL NAL unit or simply NAL unit (i.e., non-VCL NAL unit)? Note that the last NAL unit in a picture unit can be non-VCL NAL unit as well (e.g., the last one is a suffix SEI NAL unit or a suffix APS NAL unit). If the last NAL unit is a non-VCL NAL unit (e.g., a suffix NAL unit), which may be dropped, it may cause a burden to MANE since it may need to update the previous packet containing the NAL unit that immediately preceed the drop NAL unit in decoding order to change the value of P bit from 0 to 1.	Clarify the semantic of P bit. If it can be equal to 1 only for FU containing the last fragment of the last VCL NAL unit of a picture, add explation that there may be packet(s) containing non-VCL NAL unit that is associated with the picture as well. Otherwise, if it can be equal to 1 for either VCL or non-VCL NAL unit, then add explanation that in the case that the NAL unit is dropped / not forwarded to the receiver and the NAL unit that immediately precede the last NAL unit is also contained in FUs, the value of P bit in the FU header of the last FU of that NAL unit need to be updated to be equal to 1. Preferrence: P bit can be equal to 1 for the last FU of the last VCL NAL unit of a picture.