Re: [AVTCORE] Comments on draft-ietf-avtcore-rtp-vvc-03(Internet mail)

Jonathan Lennox <jonathan.lennox42@gmail.com> Thu, 29 October 2020 20:14 UTC

Return-Path: <jonathan.lennox42@gmail.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1DC033A005D for <avt@ietfa.amsl.com>; Thu, 29 Oct 2020 13:14:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.847
X-Spam-Level:
X-Spam-Status: No, score=-0.847 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kCyE-Wehmb0y for <avt@ietfa.amsl.com>; Thu, 29 Oct 2020 13:14:01 -0700 (PDT)
Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D09EC3A005C for <avt@ietf.org>; Thu, 29 Oct 2020 13:14:00 -0700 (PDT)
Received: by mail-wm1-x32a.google.com with SMTP id l8so1008141wmg.3 for <avt@ietf.org>; Thu, 29 Oct 2020 13:14:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5YaqhO/L3yKBsTel9tgGC/bCsppwvg8CxGab6tieZEw=; b=rJ8TemEgcNCKUxtNd5kAVecJ2qvVU5oplhmNJeQOI3qC0xbjUnwRfsCXX0L66pxMCT GLV/vWAzneoo2bbFapdC/CQrUHRNaMndxSx80ryV2Xe4VpDymiuaXfuyWIXkCn7+DZnS UrwU7KmaICaL9FFiqYfMzCmqFzjIk5PuI1iKil/DwuLSNyyHlAimrFL03wH2agVmjkOF lHk7YhakCxNGNB/ydL3rxMB8sS10RksMSJj8qUxlACcW5QlJTaC4xVuo904YFo/VAsE9 5xdY2zoQ7nsM/9SeXF4ZBU6Nvbrs4WJJkKf7uquTnJtP7Orv+BQ8DcecOZxpH/OhVcVB e3iw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5YaqhO/L3yKBsTel9tgGC/bCsppwvg8CxGab6tieZEw=; b=AqFHC4CUQlAbCIy9yZDwM+KXa1I9/mDHdyak9K+gJGxXVpMv/VD09AUQort6h6lB6S HsF+HEeFprNrvcZiUBbVxyG7IUeJXx+gIvmkJWiGbMQJwEYXXJWGDcMHDuMCAnpHVN9g ZPqvrs0SgElSz3nvhhhOKUYj8pn49nYAe+eOX36//WUZQbSF5MI3Jb2K8fUJ14Ok8eLB MBRTOlPwvVhQFkEXRtjUUXVcUsKxLzhKwl6VOBe7ghw2ixnonA6AGDxlnz+6vaWV/k3A pIwqFEf9LNMsrwYqs2zONtk3DInyvRqD/DgZAUeIuP5k98oP0VACyKfGrKfNriuOe+pK nXbA==
X-Gm-Message-State: AOAM532Pk0AbwEK4FMIfoaqyVEqse6wKGt8Vhgw2aRFC/f8SU3teteGx eLxqeagw6k0jGwitOzryMYI2rvFG9Kn1onMb5XU=
X-Google-Smtp-Source: ABdhPJwUBfj0Lzl8js73Zg+2sPLmYCWi9vEc5rndZVoXKR2IdUuivqGNYdo7laxw6Wfw6WZhUKCBN/v6ijepuT6juPg=
X-Received: by 2002:a1c:307:: with SMTP id 7mr608542wmd.165.1604002439111; Thu, 29 Oct 2020 13:13:59 -0700 (PDT)
MIME-Version: 1.0
References: <0ad901d6acbe$d47e9ec0$7d7bdc40$@bytedance.com> <C0E701AC-A4D6-4369-840D-BA66758961C2@tencent.com>
In-Reply-To: <C0E701AC-A4D6-4369-840D-BA66758961C2@tencent.com>
From: Jonathan Lennox <jonathan.lennox42@gmail.com>
Date: Thu, 29 Oct 2020 16:13:47 -0400
Message-ID: <CAKx+b+a_g1YH_-vHr7bxJoLV9KkayHsiHnZ2_4MxmXiRJsjpUA@mail.gmail.com>
To: "shuaiizhao(Shuai Zhao)" <shuaiizhao@tencent.com>
Cc: "avt@ietf.org" <avt@ietf.org>, Bernard Aboba <bernard.aboba@gmail.com>, Ye-Kui Wang <yekui.wang@bytedance.com>
Content-Type: multipart/alternative; boundary="0000000000009a290805b2d4eb36"
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/IeNfDQ8shdEFTF1fnMDZpVOb43E>
Subject: Re: [AVTCORE] Comments on draft-ietf-avtcore-rtp-vvc-03(Internet mail)
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Oct 2020 20:14:04 -0000

This is fine, I have no objection.  Welcome, Ye-Kui!

On Wed, Oct 28, 2020 at 1:06 PM shuaiizhao(Shuai Zhao) <
shuaiizhao@tencent.com> wrote:

> Dear Chairs,
>
>
>
> We thank Yekui for his detailed comments. All of the co-authors are agreed
> to these changes.
>
>
>
> We are requesting to add YeKui as a co-author to this draft and will be
> providing a new version to address those comments.
>
>
>
> Therefore, I would like to ask AVTcore chairs if we have a consensus to do
> so.
>
>
>
> Best,
>
> Shuai Zhao
>
>
>
>
>
> *From: *avt <avt-bounces@ietf.org> on behalf of Ye-Kui Wang <
> yekui.wang@bytedance.com>
> *Date: *Tuesday, October 27, 2020 at 17:12
> *To: *'shuai zhao' <shuai.zhao@ieee.org>rg>, "avt@ietf.org" <avt@ietf.org>
> *Cc: *'Jonathan Lennox' <jonathan.lennox@8x8.com>
> *Subject: *[AVTCORE] Comments on draft-ietf-avtcore-rtp-vvc-03(Internet
> mail)
>
>
>
> Thanks Shuai for integrating my earlier comments!
>
>
>
> I read some parts of draft-ietf-avtcore-rtp-vvc-03, and got the following
> comments and suggested changes:
>
> 1)      In Abstract, “ISO23090-3” should be replaced with “23090-3”.
>
> 2)      In paragraph 1 of Section 1, the last sentence, “[H.266] is
> reported to provide significant coding efficiency gains over H.265 and
> earlier video codec formats.” should be replaced with “VVC is reported to
> provide significant coding efficiency gains over HEVC [HEVC], a.k.a. H.265,
> and earlier video codecs.”
>
> 3)      In paragraph 2 of Section 1, “This memo specifices …” should be
> “This memo specifies …”.
>
> 4)      In Section 1.1 and its subsections, all instances of “[VVC]” and
> “[HEVC]” should be replaced with “VVC” and “HEVC”, respectively.
>
> 5)      In Section 1.1, paragraph 1, replace “ITU- T” with “ITU-T”
> (remove the unnecessary space after ‘-’).
>
> 6)      In Section 1.1.1, paragraph 1 in the subsection of “Motion
> prediction and coding”, replace “Sub- block” with “Sub-block” (remove the
> unnecessary space after ‘-’).
>
> 7)      In Section 1.1.1, paragraph 1 in the section of “Intra prediction
> and intra-coding”, in “a 6-most -probable-mode scheme”, remove the
> unnecessary space after “most.
>
> 8)      In Section 1.1.2, add “(informative)” at the end of the section
> title.
>
> 9)      In Section 1.1.2, change the subsection title “Decoding
> Capability Information” to “Decoding capability information” for
> consistency.
>
> 10)   In Section 1.1.2, the subsection of “Video parameter set”, replace
> “TThe ideo parameter set (VPS)” with “The video parameter set (VPS)”.
>
> 11)   In Section 1.1.2, change the subsection title “Picture Header” to
> “Picture header” for consistency.
>
> 12)   In Section 1.1.2, the subsection of “Picture Header”, first
> sentence, change “A Picture Header” to “A picture header” for consistency.
>
> 13)   In Section 1.1.2, change the subsection title “Sub-Profiles” to
> “Sub-profiles” for consistency.
>
> 14)   In Section 1.1.2, change the subsection title “Constraint Fields”
> to “General constraint fields”.
>
> 15)   In Section 1.1.2, the subsection of “General constraint fields”,
> change “(more of which are flags)” to “(most of which are flags)”.
>
> 16)   In Section 1.1.2, the subsection of “Temporal scalability support”,
> remove the editor’s note, as the text description is good enough.
>
> 17)   In Section 1.1.2, change the subsection title “Picture reference
> resampling (RPR)” to “Reference picture resampling (RPR)”.
>
> 18)   In Section 1.1.2, the subsection of “Reference picture resampling
> (RPR)”, replace the editor’s note with the following description text for
> RPR:
>
>
>
> In AVC and HEVC, the spatial resolution of pictures cannot change unless a
> new sequence using a new SPS starts, with an IRAP picture. VVC enables
> picture resolution change within a sequence at a position without encoding
> an IRAP picture, which is always intra-coded. This feature is sometimes
> referred to as reference picture resampling (RPR), as the feature needs
> resampling of a reference picture used for inter prediction when that
> reference picture has a different resolution than the current picture being
> decoded. RPR allows resolution change without the need of coding an IRAP
> picture, which causes a momentary bit rate spike in streaming or video
> conferencing scenarios, e.g., to cope with network condition changes.  RPR
> can also be used in application scenarios wherein zooming of the entire
> video region or some region of interest is needed.
>
>
>
> 19)   In Section 1.1.2, the subsection of “Spatial, SNR, and multiview
> scalability”, replace “all those forms of scalability are supported
> natively …” with “all those forms of scalability are supported in the first
> version of VVC, natively …”.
>
> 20)   In Section 1.1.2, the subsection of “Spatial, SNR, and multiview
> scalability”, remove the last sentence that says “Scalability support can
> be implemented in a single decoding "loop" and is widely considered a
> comparatively lightweight operation.” Most readers would understand this as
> the same single-loop decoding concept as in H.264/SVC, which is then
> incorrect. I think the key point intended to be said is the same as what is
> said in the subsequent description of spatial scalability (i.e., the
> support in VVC does not need a resampling filtering module as in the
> scalable extensions of AVC and HEVC, but only needs some high-level syntax
> changes).
>
> 21)   In Section 1.1.2, change the subsection title of “Spatial
> Scalability” to “Spatial scalability” for consistency.
>
> 22)   In Section 1.1.2, the subsection of “Spatial scalability”, 1st
> sentence, remove “in the "main" profile of VVC,” as the spatial scalability
> support is in a separate profile, although also in VVC version 1.
>
> 23)   In Section 1.1.2, after the subsection of “SNR scalability”, add a
> subsection of “Multiview scalability”, with the following description text:
> The first version of VVC also supports multiview scalability, wherein a
> multi-layer bitstream carries layers representing multiple views, and one
> or more of the represented views can be output at the same time.
>
> 24)   In Section 1.1.2, the subsection of “SEI Message”, change the title
> to be “SEI messages” for consistence, and add at the end of the last
> sentence “but in a companion specification [VSEI]”, and add the VSEI spec
> (H.274) reference to the list of references. The reference to the VSEI spec
> should be added as some applications that use this RTP payload format may
> use some of the SEI messages specified in the VSEI spec.
>
> 25)   In Section 1.1.3, change the section title to be “High-Level
> Picture Partitioning (informative)”, and replace the basically empty
> section body with the following description text:
>
>
>
> VVC inherited the concept of tiles and wavefront parallel processing (WPP)
> from HEVC, with some minor to moderate differences. The basic concept of
> slices was kept in VVC but designed in an essentially different form. VVC
> is the first video coding standard that includes subpictures as a feature,
> which provides the same functionality as HEVC motion-constrained tile sets
> (MCTSs) but designed in a different way to have better coding efficiency
> and to be friendlier for usage in application systems. More details of
> these differences are described below.
>
> Tiles and WPP
>
>
>
> Same as in HEVC, a picture can be split into tile rows and tile columns in
> VVC, in-picture prediction across tile boundaries is disallowed, etc.
> However, the syntax for signaling of tile partitioning has been simplified,
> by using a unified syntax design for both the uniform and the non-uniform
> mode. In addition, signaling of entry point offsets for tiles in the SH is
> optional in VVC while it is mandatory in HEVC. The WPP design in VVC has
> two differences compared to HEVC: i) The CTU row delay is reduced from two
> CTUs to one CTU; ii) Signaling of entry point offsets for WPP in the SH is
> optional in VVC while it is mandatory in HEVC.
>
>
>
> Slices
>
>
>
> In VVC, the conventional slices based on CTUs (as in HEVC) or macroblocks
> (as in AVC) have been removed. The main reasoning behind this architectural
> change is as follows. The advances in video coding since 2003 (the
> publication year of AVC v1) have been such that slice based error
> concealment has become practically impossible, due to the ever-increasing
> number and efficiency of in-picture and inter-picture prediction
> mechanisms. An error-concealed picture is the decoding result of a
> transmitted coded picture for which there is some data loss (e.g., loss of
> some slices) of the coded picture or a reference picture for at least some
> part of the coded picture is not error-free (e.g., that reference picture
> was an error-concealed picture). For example, when one of the multiple
> slices of a picture is lost, it may be error-concealed using an
> interpolation of the neighboring slices. While advanced video coding
> prediction mechanisms provide significantly higher coding efficiency, they
> also make it harder for machines to estimate the quality of an
> error-concealed picture, which was already a hard problem with the use of
> simpler prediction mechanisms. Advanced in-picture prediction mechanisms
> also cause the coding efficiency loss due to splitting a picture into
> multiple slices to be more significant. Furthermore, network conditions
> become significantly better while at the same time techniques for dealing
> with packet losses have become significantly improved. As a result, very
> few implementations have recently used slices for maximum transmission unit
> size matching. Instead, substantially all applications where low-delay
> error resilience is required (e.g., video telephony and video conferencing)
> rely on system/transport-level error resilience (e.g., retransmission,
> forward error correction) and/or picture-based error resilience tools
> (feedback based error resilience, insertion of IRAPs, scalability with
> higher protection level of the base layer, and so on). Considering all the
> above, nowadays it is very rare that a picture that cannot be correctly
> decoded is passed to the decoder, and when such a rare case occurs, the
> system can afford to wait for an error-free picture to be decoded and
> available for display without result in frequent and long periods of
> picture freezing seen by end users.
>
>
>
> Slices in VVC have two modes: rectangular slices and raster-scan slices.
> The rectangular slice, as indicated by its name, cover a rectangular region
> of the picture. Typically, a rectangular slice consists of a number of
> complete tiles. However, it is also possible that a rectangular slice is a
> subset of a tile and consists of one or more consecutive, complete CTU rows
> within a tile. A raster-scan slice consists of one or more complete tiles
> in tile raster scan order, hence the region covered by a raster-scan slices
> need not but could have a non-rectangular shape, but it may also happen to
> have the shape of a rectangle. The concept of slices in VVC is therefore
> strongly linked to or based on tiles instead of CTUs (as in HEVC) or
> macroblocks (as in AVC).
>
>
>
> Subpictures
>
>
>
> VVC is the first video coding standard that includes the support of
> subpictures as a feature. Each subpicture consists of one or more complete
> rectangular slices that collectively cover a rectangular region of the
> picture. A subpicture may be either specified to be extractable (i.e.,
> coded independently of other subpictures of the same picture and of earlier
> pictures in decoding order) or not extractable. Regardless of whether a
> subpicture is extractable or not, the encoder can control whether in-loop
> filtering (including deblocking, SAO, and ALF) is applied across the
> subpicture boundaries individually for each subpicture.
>
>
>
> Functionally, subpictures are similar to the motion-constrained tile sets
> (MCTSs) in HEVC. They both allow independent coding and extraction of a
> rectangular subset of a sequence of coded pictures, for use cases like
> viewport-dependent 360° video streaming optimization and region of interest
> (ROI) applications.
>
>
>
> There are several important design differences between subpictures and
> MCTSs. First, the subpictures feature in VVC allows motion vectors of a
> coding block pointing outside of the subpicture even when the subpicture is
> extractable by applying sample padding at subpicture boundaries in this
> case, similarly as at picture boundaries. Second, additional changes were
> introduced for the selection and derivation of motion vectors in the merge
> mode and in the decoder side motion vector refinement process of VVC. This
> allows higher coding efficiency compared to the non-normative motion
> constraints applied at encoder-side for MCTSs. Third, rewriting of SHs (and
> PH NAL units, when present) is not needed when extracting of one or more
> extractable subpictures from a sequence of pictures to create a
> sub-bitstream that is a conforming bitstream. In sub-bitstream extractions
> based on HEVC MCTSs, rewriting of SHs is needed. Note that in both HEVC
> MCTSs extraction and VVC subpictures extraction, rewriting of SPSs and PPSs
> is needed. However, typically there are only a few parameter sets in a
> bitstream, while each picture has at least one slice, therefore rewriting
> of SHs can be a significant burden for application systems. Fourth, slices
> of different subpictures within a picture are allowed to have different NAL
> unit types. Fifth, VVC specifies HRD and level definitions for subpicture
> sequences, thus the conformance of the sub-bitstream of each extractable
> subpicture sequence can be ensured by encoders.
>
>
>
> If needed, I can work with the authors to integrate these suggested
> changes, if agreed, into the next version of the draft.
>
>
>
> BR, YK
>
>
>
> *From:* shuai zhao <shuai.zhao@ieee.org>
> *Sent:* Monday, October 26, 2020 21:23
> *To:* avt@ietf.org
> *Cc:* Jonathan Lennox <jonathan.lennox@8x8.com>om>; bernard.aboba@gmail.com;
> Ye-Kui Wang <yekui.wang@bytedance.com>
> *Subject:* [External] Fwd: New Version Notification for
> draft-ietf-avtcore-rtp-vvc-03.txt
>
>
>
> Thanks Yekui for providing valuable comments. This version I have fixed
> the following:
>
>
>
> ·         Updates on VVC coding tool up to Section 1.1.3
>
> ·         Add a PRP section with editor’s notes
>
> ·         Make Section 1.1.2 naming consistent per Yekui’s comments.
>
>
>
> Shuai
>
>
>
>
>
>
> Begin forwarded message:
>
>
>
> *From: *internet-drafts@ietf.org
>
> *Subject: New Version Notification for draft-ietf-avtcore-rtp-vvc-03.txt*
>
> *Date: *October 26, 2020 at 21:15:04 PDT
>
> *To: *"Shuai Zhao" <shuai.zhao@ieee.org>rg>, "Yago Sanchez" <
> yago.sanchez@hhi.fraunhofer.de>gt;, "Stephan Wenger" <stewe@stewe.org>
>
>
>
>
> A new version of I-D, draft-ietf-avtcore-rtp-vvc-03.txt
> has been successfully submitted by Shuai Zhao and posted to the
> IETF repository.
>
> Name:                  draft-ietf-avtcore-rtp-vvc
> Revision:              03
> Title:                     RTP Payload Format for Versatile Video Coding
> (VVC)
> Document date:               2020-10-27
> Group:                  avtcore
> Pages:                  44
> URL:
> https://www.ietf.org/archive/id/draft-ietf-avtcore-rtp-vvc-03.txt
> Status:
> https://datatracker.ietf.org/doc/draft-ietf-avtcore-rtp-vvc/
> Htmlized:
> https://datatracker.ietf.org/doc/html/draft-ietf-avtcore-rtp-vvc
> Htmlized:       https://tools.ietf.org/html/draft-ietf-avtcore-rtp-vvc-03
> Diff:
> https://www.ietf.org/rfcdiff?url2=draft-ietf-avtcore-rtp-vvc-03
>
> Abstract:
>   This memo describes an RTP payload format for the video coding
>   standard ITU-T Recommendation H.266 and ISO/IEC International
>   Standard ISO23090-3, both also known as Versatile Video Coding (VVC)
>   and developed by the Joint Video Experts Team (JVET).  The RTP
>   payload format allows for packetization of one or more Network
>   Abstraction Layer (NAL) units in each RTP packet payload as well as
>   fragmentation of a NAL unit into multiple RTP packets.  The payload
>   format has wide applicability in videoconferencing, Internet video
>   streaming, and high-bitrate entertainment-quality video, among other
>   applications.
>
>
>
>
> Please note that it may take a couple of minutes from the time of
> submission
> until the htmlized version and diff are available at tools.ietf.org.
>
> The IETF Secretariat
>
>
>
>