[xrblock] Video loss concealment support in draft-ietf-xrblock-rtcp-xr-concsec

Qin Wu <bill.wu@huawei.com> Tue, 16 October 2012 06:31 UTC

Return-Path: <bill.wu@huawei.com>
X-Original-To: xrblock@ietfa.amsl.com
Delivered-To: xrblock@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7EB6721F890F for <xrblock@ietfa.amsl.com>; Mon, 15 Oct 2012 23:31:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.395
X-Spam-Level:
X-Spam-Status: No, score=-4.395 tagged_above=-999 required=5 tests=[AWL=-0.434, BAYES_00=-2.599, HTML_FONT_FACE_BAD=0.884, HTML_MESSAGE=0.001, MIME_BASE64_TEXT=1.753, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SB3yOVAdFnIK for <xrblock@ietfa.amsl.com>; Mon, 15 Oct 2012 23:31:08 -0700 (PDT)
Received: from lhrrgout.huawei.com (lhrrgout.huawei.com [194.213.3.17]) by ietfa.amsl.com (Postfix) with ESMTP id 0501321F8806 for <xrblock@ietf.org>; Mon, 15 Oct 2012 23:31:06 -0700 (PDT)
Received: from 172.18.7.190 (EHLO lhreml204-edg.china.huawei.com) ([172.18.7.190]) by lhrrg01-dlp.huawei.com (MOS 4.3.5-GA FastPath queued) with ESMTP id ALR05153; Tue, 16 Oct 2012 06:31:05 +0000 (GMT)
Received: from LHREML402-HUB.china.huawei.com (10.201.5.241) by lhreml204-edg.china.huawei.com (172.18.7.223) with Microsoft SMTP Server (TLS) id 14.1.323.3; Tue, 16 Oct 2012 07:30:32 +0100
Received: from SZXEML445-HUB.china.huawei.com (10.82.67.183) by lhreml402-hub.china.huawei.com (10.201.5.241) with Microsoft SMTP Server (TLS) id 14.1.323.3; Tue, 16 Oct 2012 07:31:03 +0100
Received: from w53375 (10.138.41.149) by szxeml445-hub.china.huawei.com (10.82.67.183) with Microsoft SMTP Server (TLS) id 14.1.323.3; Tue, 16 Oct 2012 14:30:52 +0800
Message-ID: <DD2E3ADF9AA44E5BAFBDD56E4AC818F6@china.huawei.com>
From: Qin Wu <bill.wu@huawei.com>
To: xrblock@ietf.org
Date: Tue, 16 Oct 2012 14:30:51 +0800
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0494_01CDABAA.D7C47960"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
X-Originating-IP: [10.138.41.149]
X-CFilter-Loop: Reflected
Subject: [xrblock] Video loss concealment support in draft-ietf-xrblock-rtcp-xr-concsec
X-BeenThere: xrblock@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Metric Blocks for use with RTCP's Extended Report Framework working group discussion list <xrblock.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xrblock>, <mailto:xrblock-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/xrblock>
List-Post: <mailto:xrblock@ietf.org>
List-Help: <mailto:xrblock-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xrblock>, <mailto:xrblock-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Oct 2012 06:31:10 -0000

Hi,
In order to support video loss concealment, I like to propose the following changes to draft-ietf-xrblock-rtcp-xr-concsec:
1. Abstract:
OLD TEXT:
"
This document defines an RTP Control Protocol(RTCP)

Extended Report (XR) Block that allows the

reporting of Concealed Seconds metrics for a range

of RTP applications primarily for audio

applications of RTP.

"
NEW TEXT:
"
This document defines an RTP Control Protocol(RTCP)

Extended Report (XR) Block that allows the

reporting of Concealed Seconds metrics for a

range of RTP applications.

"
2. Section 1.1 Editor's Note
OLD TEXT:

"

At any instant, the audio output at a receiver may be classified as
either 'normal' or 'concealed'.  'Normal' refers to playout of audio
payload received from the remote end, and also includes locally
generated signals such as announcements, tones and comfort noise.
Concealment refers to playout of locally-generated signals used to
mask the impact of network impairments such as lost packets or to
reduce the audibility of jitter buffer adaptations.



      Editor's Note: For video application, the output at a receiver
      should also be classified as either normal or concealed.  Should
      this paragraph be clear about this?


"

NEW TEXT:

"

At any instant, the media output at a receiver may be classified as
either 'normal' or 'concealed'.  'Normal' refers to playout of media
payload received from the remote end, and also includes locally
generated signals such as announcements, tones and comfort noise.
Concealment refers to playout of locally-generated signals used to
mask the impact of network impairments such as lost packets or to
reduce the discontinuities in the media play-out (e.g.,audibility 

of jitter buffer adaptations).

"

3. Section 1.4 Editor Note

OLD TEXT:

"

This metric is primarily applicable to audio applications of RTP.
EDITOR'S NOTE: are there metrics for concealment of transport errors

for video.

"

NEW TEXT:

"

These metrics are primarily applicable to audio applications of RTP.

In addition, these metrics are also used for concealment of transport 

errors for video applications of RTP.

"

4. Section 2.1 Editor's Note

OLD TEXT:

"

2.1.  Standards Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   In addition, the following terms are defined:


      Editor's Note: For Video loss concealment, at least the following
      four methods are used,i.e., Frame freeze,inter-frame
      extrapolation, interpolation, Noise insertation, should this
      section consider giving definition of these four methods for video
      loss concealment?


"

NEW TEXT:

"

2.1.  Standards Language

 

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

   document are to be interpreted as described in RFC 2119 [RFC2119].

 

   In addition, the following terms are defined:

 

   Frame freeze

   The impaired video frame is not displayed, instead,

   the previously displayed frame is hence “frozen” 

   for the duration of the loss event.

   

   Inter-frame extrapolation

If an area of the video frame is damaged by loss, 

the same area from the previous frame(s) can be 

used to estimate what the missing pixels would 

have been.  This can work well in a scene with 

no motion but can be very noticeable if there 

is significant movement from one frame to another. 

 Simple decoders may simply re-use the pixels 

that were in the missing area while more complex decoders 

may try to use several frames to do a more complex extrapolation

 

   Interpolation

A decoder may use the undamaged pixels in the image to estimate 

what the missing block of image should have.

 

Noise insertion

A decoder may insert random pixel values - which would 

generally be less noticeable than a blank rectangle in the image



"

5. Section 3. 1st paragraph Editor's Note

OLD TEXT:

"

   This sub-block provides a description of potentially audible
   impairments due to lost and discarded packets at the endpoint,
   expressed on a time basis analogous to a traditional PSTN T1/E1
   errored seconds metric.

      Editor's Note: Should impairment also cover video application?



"

NEW TEXT:

"

   This sub-block provides a description of potentially network

   impairments due to lost and discarded packets at the endpoint,

   expressed on a time basis analogous to a traditional PSTN T1/E1

   errored seconds metric.



"

6. Section 3.2,  Packet Loss Concealment method defintion, Editor's Note

OLD TEXT:

"

   Packet Loss Concealment Method (plc): 2 bits

      This field is used to identify the packet loss concealment method
      in use at the receiver, according to the following code:

         bits 014-015

            0 = silence insertion

            1 = simple replay, no attenuation

            2 = simple replay, with attenuation

            3 = enhanced


            Other values reserved

               Editor's Note 1 : In the packet loss concealment
               methods,"Enhanced" is defines as one new Packet loss
               Concealment method?  However it is not clear what this
               packet loss concealment method looks like?

               Editor's Note 2: For Video loss concealment, there are a
               range of methods used, for example:

                  (i) Frame freeze In this case the impaired video frame
                  is not displayed and the previously displayed frame is
                  hence "frozen" for the duration of the loss event

                  (ii) Inter-frame extrapolation If an area of the video
                  frame is damaged by loss, the same area from the
                  previous frame(s) can be used to estimate what the
                  missing pixels would have been.  This can work well in
                  a scene with no motion but can be very noticeable if
                  there is significant movement from one frame to
                  another.  Simple decoders may simply re-use the pixels
                  that were in the missing area, more complex decoders
                  may try to use several frames to do a more complex
                  extrapolation.

                  (iii) Interpolation A decoder may use the undamaged
                  pixels in the image to estimate what the missing block
                  of image should have

                  (iv) Noise insertion A decoder may insert random pixel
                  values - which would generally be less noticeable than
                  a blank rectangle in the image.

               Therefore more text required in the future draft to
               discuss Techniques for Video Loss Concealment method in
               this document.



"

NEW TEXT:

"

Packet Loss Concealment Method (plc): 4 bits

 

      This field is used to identify the packet loss concealment method

      in use at the receiver, according to the following code:

 

         bits 011-014

 

            0 = silence insertion (audio)

 

            1 = simple replay, no attenuation (audio)

 

            2 = simple replay, with attenuation (audio)

 

            3 = enhanced (audio)

            4 = Frame Freezed (video)

            5 = Inter-Frame extrapolation (video)

            6 = Interpolation (video)

            7 = Noise Insertion (video)

            Other values reserved

"

7. Section 3.2, Unimpaired Seconds, Editor's Note

OLD TEXT:

"

      Normal playout of comfort noise or other silence concealment
      signal during periods of talker silence, if VAD [VAD] is used,
      shall be counted as unimpaired seconds.

         Editor's Note: It should be clear that VAD does not apply to
         video.


"

NEW TEXT:

"

For speech application, normal playout of comfort noise or other silence concealment
 signal during periods of talker silence, if VAD [VAD] is used, shall be
  counted as unimpaired seconds.

"

8. Section 3.2, Concealed Seconds, Editor's Note

OLD TEXT:

"

      Equivalently, a concealed second is one in which some Loss-type
      concealment has occurred.  Buffer adjustment-type concealment
      SHALL not cause Concealed Seconds to be incremented, with the
      following exception.  An implementation MAY cause Concealed
      Seconds to be incremented for 'emergency' buffer adjustments made
      during talkspurts.

      Loss-type concealment is reactive insertion or deletion of samples
      in the audio playout stream due to effective frame loss at the
      audio decoder.  "Effective frame loss" is the event in which a
      frame of coded audio is simply not present at the audio decoder
      when required.  In this case, substitute audio samples are
      generally formed, at the decoder or elsewhere, to reduce audible
      impairment.

      Buffer Adjustment-type concealment is proactive or controlled
      insertion or deletion of samples in the audio playout stream due
      to jitter buffer adaptation, re-sizing or re-centering decisions
      within the endpoint.

      Because this insertion is controlled, rather than occurring
      randomly in response to losses, it is typically less audible than
      loss-type concealment.  For example, jitter buffer adaptation
      events may be constrained to occur during periods of talker
      silence, in which case only silence duration is affected, or
      sophisticated time-stretching methods for insertion/deletion
      during favorable periods in active speech may be employed.  For
      these reasons, buffer adjustment-type concealment MAY be exempted
      from inclusion in calculations of Concealed Seconds and Severely
      Concealed Seconds.

         Editor's Note: In this document, two kind of concealments are
         defined: a.  Loss-type concealment b.  Buffer Adjustment-type
         concealment Loss-type concealment is applicable to both audio
         and video.  However Buffer Adjustment-type concealment is
         usually applied to audio.  Should this section be clear about
         this?
"

NEW TEXT:

"

     Equivalently, a concealed second is one in which some Loss-type
      concealment has occurred.  Buffer adjustment-type concealment
      is usually designed for audio application and SHALL not cause 

      Concealed Seconds to be incremented, with the
      following exception.  An implementation MAY cause Concealed
      Seconds to be incremented for 'emergency' buffer adjustments made
      during talkspurts.

      Loss-type concealment is reactive insertion or deletion of samples
      in the media playout stream due to effective frame loss at the
      media decoder.  "Effective frame loss" is the event in which a
      frame of coded media is simply not present at the media decoder
      when required.  In this case, substitute media samples are
      generally formed, at the decoder or elsewhere, to reduce audible pr perceivable
      impairment.

      Buffer Adjustment-type concealment is proactive or controlled
      insertion or deletion of samples in the audio playout stream due
      to jitter buffer adaptation, re-sizing or re-centering decisions
      within the endpoint. Because this insertion is controlled, rather than occurring
      randomly in response to losses, it is typically less audible than
      loss-type concealment.  For example, jitter buffer adaptation
      events may be constrained to occur during periods of talker
      silence, in which case only silence duration is affected, or
      sophisticated time-stretching methods for insertion/deletion
      during favorable periods in active speech may be employed.  For
      these reasons, buffer adjustment-type concealment MAY be exempted
      from inclusion in calculations of Concealed Seconds and Severely
      Concealed Seconds. 

"



Regards!

-Qin