Re: [codec] #16: Multicast?

"Raymond (Juin-Hwey) Chen" <> Tue, 18 May 2010 02:36 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 939763A6BF6 for <>; Mon, 17 May 2010 19:36:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.078
X-Spam-Status: No, score=-0.078 tagged_above=-999 required=5 tests=[AWL=-0.079, BAYES_50=0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id YzBl+OUsxU2I for <>; Mon, 17 May 2010 19:36:12 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id D900328C148 for <>; Mon, 17 May 2010 19:33:09 -0700 (PDT)
Received: from [] by with ESMTP (Broadcom SMTP Relay (Email Firewall v6.3.2)); Mon, 17 May 2010 19:32:50 -0700
X-Server-Uuid: 02CED230-5797-4B57-9875-D5D2FEE4708A
Received: from ([]) by ([]) with mapi; Mon, 17 May 2010 19:32:50 -0700
From: "Raymond (Juin-Hwey) Chen" <>
To: Koen Vos <>
Date: Mon, 17 May 2010 19:32:44 -0700
Thread-Topic: [codec] #16: Multicast?
Thread-Index: Acr00cm5M2ksDsGeS6eUJbwN7imhXQBUE0SQ
Message-ID: <>
References: <> <> <> <> <> <000001cae173$dba012f0$92e038d0$@de> <> <001101cae177$e8aa6780$b9ff3680$@de> <> <002d01cae188$a330b2c0$e9921840$@de> <> <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
x-cr-puzzleid: {14528D8F-C077-4CEA-BD0B-46EFE02D1CE3}
acceptlanguage: en-US
MIME-Version: 1.0
X-WSS-ID: 67EF23D820S128147746-01-01
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "" <>
Subject: Re: [codec] #16: Multicast?
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 18 May 2010 02:36:13 -0000

Hi Koen,

I agree with Roman that this 1X versus 3X debate has reached a 
point that it is almost irrelevant.  What's really relevant now is 
the question of whether the IETF codec should have a low-delay mode 
with a codec frame size of 5 to 10 ms, and for that question you 
previously wrote on May 6:

> Anyway, I think we're all aligned here: ultra-low delay is 
> important and has been part of the requirements from day one. Has 
> anyone disagreed?

After this email of yours, I have not seen anyone explicitly 
disagreed.  In fact, of the 6 intended applications listed in the 
WG charter, draft-ietf-codec-requirements-00 lists 2 of them having 
< 10 ms codec delay as a requirement and 3 of them having < 10 ms 
codec delay as "desirable" or "highly desirable".  It seems to me 
that the consensus in the WG is that we need a low-delay mode with 
a 5 to 10 ms codec frame size to address delay-sensitive 
applications, and we need another higher-delay mode with perhaps a 
20 ms frame size to get higher bit-rate efficiency for applications 
that are less sensitive to delay.  If we agree with that (and I 
thought you already agreed on May 6), then whether the frame size 
multiplier is 1X or 3X doesn't really matter and shouldn't affect 
the codec requirements or the direction of the codec design. 

With that said, I don't want you to think I impolitely ignored your 
comments in your email below, so I will respond to them in-line.  
(For those who are not interested, please ignore the rest of this 

You wrote:
> Quoting "Raymond (Juin-Hwey) Chen":
>> I didn't come up with this 3*(codec frame size) delay number for IP
>> phones myself.  A very senior technical lead in Broadcom's IP phone
>> chips group told me that

> Still:
> - You've presented no satisfactory explanation for your theory

[Raymond]: I already explained and analyzed it in my previous emails.  
The codec-dependent delay = (codec frame size) + (codec look-ahead) + 
(codec filtering delay) + (processing delay to encode and decode one 
frame) + (transmission delay to send one frame of compressed bit-
stream) + (multi-tasking delay due to processor not immediately 
available to start encoding and decoding operations as soon as one 
frame of input audio samples or received bit-stream is ready) + 
(possibly miscellaneous other delays)

Although the processing delay is short (but not zero) for a PC having 
2 ~ 3 GHz CPU, for IP phones the clock speed of the processor is much 
lower, resulting in significantly higher processing delay than in a 
PC.  Also, the multi-tasking delay can be quite significant, and it 
is dependent on the codec frame size in IP phones.  All these delay 
components here and there can easily add up to 3X codec frame size.

If you have a copy of the 1995 Elsevier book "Speech Coding and 
Synthesis" edited by W.B. Kleijn and K.K. Paliwal, please go read 
Section 2 and see Fig. 1 of my chapter 6 on low-delay coding of 
speech.  It should help you understand what I have been saying.  
Unfortunately due to copyright restrictions, I cannot scan those 
pages and attach with this email.

> - Nor any verifiable empirical evidence

[Raymond]: The one-way delay measurement was done at the lab of 
Broadcom's IP phone chip group.  I don't know how I can make that 
"verifiable" to you.  

> - It comes from an anonymous source

[Raymond]: I didn't just say that I heard it from someone who worked 
at some (unnamed) company; I specifically said this info came from 
Broadcom's IP phone group.  Is that not sufficient?

> - It has changed over time (going from 5x to 3x)

[Raymond]: I admit that this was my mistake due to a misunderstanding 
of what that senior technical lead I mentioned above told me. 
Initially he said the real-world one-way delay was about 5X codec 
frame size. I thought he meant codec-dependent delay; that's why I 
initially used 5X.  Later he clarified to me that when he said 5X, he 
included codec-independent delay, and the codec-dependent delay 
component was really 3X, so I corrected myself to 3X.

> - It goes against accepted wisdom

[Raymond]: No, the commonly accepted wisdom back in late 1980s and 
early 1990s (when several dozen low-delay speech coding papers were 
published) was 3X codec frame size, although some might have used 2X 
as the best-case scenario. Since then the CPU of PCs has increased in 
speed by more than 30X (in early 1990s a popular CPU was the 90 MHz 
Pentium) and the network speed also increased a lot, so I can believe 
you that PC-based softphones could potentially reach below 2X as you 
said, but IP phone processors are much slower than the CPUs of PCs, 
making it harder to get below 2X.  In any case, 1X was never the 
commonly accepted wisdom, since it is theoretically impossible.

> So I'd say the burden of proof is on you.

[Raymond]: Broadcom currently ships more than half of the world's IP 
phone chips, and Broadcom's IP phone group has also measured the one-
way codec-dependent delay carefully in the lab with carefully chosen 
signals, equipment, and methodology, so I don't see why Broadcom's 
measurement is any less valid than Cullen's, to the point that the 
burden of proof is on us.  I don't believe Broadcom's IP phone 
implementation is particularly bad in delay compared with others, 
either (otherwise we are unlikely to be the market leader).   In any 
case, I didn't say that I believe Cullen's results must be wrong.  I 
was just surprised at the discrepancy of the measurement results from 
these two different sources and would like to find out why and 
possibly reconcile the seemingly contradictory results.  That's why I 
asked Cullen to share more details about his measurements.