Re: [codec] #16: Multicast?

So, we have consensus on 
1) low delay mode
2) low complexity mode (whatever this means)
3) technical understanding on how latency sums up on different platforms

The remaining discussion seems to be on which kind of platforms to support:
a) fast PC and softphone 
b) embedded, slow CPU on hardphones (+conferences server) 

A bit it is also about how to make money with phone calls in future:
a) Selling services in addition to offering calls via softphones
b) Selling devices that help to conduct calls

I know that there it has a long tradition in standardization to fortify the own market position as early as possible. Thus, it is
well understandable that a) and b) are competing. However, wouldn't it make sense if we support both a) and b) and let future market
forces decide? 

With best regards,

 Christian

---------------------------------------------------------------
Dr.-Ing. Christian Hoene
Interactive Communication Systems (ICS), University of Tübingen 
Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532 
http://www.net.uni-tuebingen.de/

>-----Original Message-----
>From: Sandy (Alexander) MacInnis [mailto:macinnis@broadcom.com]
>Sent: Thursday, May 27, 2010 4:55 PM
>To: Christian Hoene; 'Herve Taddei'; 'Koen Vos'; Raymond (Juin-Hwey) Chen
>Cc: codec@ietf.org
>Subject: RE: [codec] #16: Multicast?
>
>Everyone,
>
>Sorry for stepping in here... full disclosure: I'm not a speech coding expert, and I work at Broadcom,
>where Raymond works.
>
>I too would like to end this discussion; it seems to have diverged from a discussion of the
>requirements for the CODEC algorithm to have a mode with low algorithmic delay, which AFAIK is already
>agreed anyway, to some rather tangential discussions related to, but not really addressing, real time
>scheduling of the algorithm on a processor.
>
>The point from Raymond that is the head of this particular discussion trail is RTS, i.e. real time
>scheduling. I know his note about that is long; it might be worth reading it again.
>
>It's not a fair assumption that 100% of a shared resource - in this instance, a processor - can be
>spent performing real-time-scheduled tasks. If there is a set of RT (real time) tasks that have
>different periods, and periods = deadlines, all being scheduled on the same processor, the best you
>can do is less than 100%. How close you can get depends on the details; it might be e.g. 68%, or it
>could be significantly less; there's a lot of literature on this. If the system is optimally designed
>for the purposes of RTS, i.e. all other tasks are treated as non-real time and have lower priority
>than all real time tasks, there are no priority inversions, task switching is very efficient, etc. the
>RTS performance can come close to theory, but if any of these assumptions are not true, it be
>significantly worse.
>
>If the total RT demands are only a very small fraction of the total shared resource, i.e. processor
>cycles, it tends to be easier to perform the scheduling and ensure that it works correctly. Such a
>scenario may be more important than RTS indicates if the system is not well designed for real time
>operation, i.e. a PC. And, such systems draw MUCH more power than well-designed embedded products.
>Conversely, low power and modest clock rates are good design principles for embedded products, if
>those that are wall (mains) powered. E.g. someone noted leakage power at 65nm - have you looked at
>40nm? It just keeps getting worse. Designing for slower max clock rate saves substantial power.
>
>There are good reasons why a common convention of real time scheduling is the assumption that period =
>deadline. As Raymond noted, other design assumptions are possible, but they have their own problems.
>
>Note also, as Raymond pointed out, that RTS also applies to intermediate points in the end-end system,
>such as gateways. Such a device may have very powerful processors, and if so, it should be for the
>specific purpose performing a large number of RT tasks, loading the processors as much as can be
>guaranteed.
>
>I would hope that this committee is not planning to be in the position of dictating that all
>implementations of the algorithm require a processor that is so fast that the system can guarantee
>service that latency is much less than the period of an audio frame. And if not, then a reasonable
>assumption is that, in general, the deadline of service latency does equal the period of an audio
>frame. That assumption is part of one of upper-limit calculations from Raymond.
>
>--Sandy
>
>
>-----Original Message-----
>From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On Behalf Of Christian Hoene
>Sent: Thursday, May 27, 2010 5:50 AM
>To: 'Herve Taddei'; 'Koen Vos'; Raymond (Juin-Hwey) Chen
>Cc: codec@ietf.org
>Subject: Re: [codec] #16: Multicast?
>
>Guys,
>
>all I want to do is to find an end to this discussion. Sometimes it helps to come up with some
>provocative statements.
>The example of hearing aids was mentioned only as an example of a device that might include a phone in
>future. If you will, I forgot
>to mention the tooth-phone http://www.wired.com/science/discoveries/news/2002/06/53302
>
>Anyhow, shall we consider that
>
>a) devices have plenty of power to do the coding (<2% needed for the codec)
>b) devices are already fully loaded and need all time for encoding and decoding (100% needed for codec
>and signal processing)
>xy) between X% and Y%.
>
>What is the minimal complexity of the codec? Enough to run at 100% load on a
>a) 10 MIPS DSP
>b) 50 MIPS DSP
>c) 8000 MIPS DSP
>d) 60 MHz Pentium
>e) 1 GHz Pentium
>
>How to measure the complexity?
>a) with gettimeofday on a reference PC setup
>b) with ITU-T G.191 library
>c) With a reference virtual machine
>
>Decide and continue...
>
>Christian
>
>---------------------------------------------------------------
>Dr.-Ing. Christian Hoene
>Interactive Communication Systems (ICS), University of Tübingen
>Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
>http://www.net.uni-tuebingen.de/
>
>>-----Original Message-----
>>From: Herve Taddei [mailto:herve.taddei@huawei.com]
>>Sent: Thursday, May 27, 2010 1:54 PM
>>To: 'Christian Hoene'; 'Koen Vos'; 'Raymond (Juin-Hwey) Chen'
>>Cc: codec@ietf.org
>>Subject: RE: [codec] #16: Multicast?
>>
>>Hi Christian,
>>
>>> Nevertheless, in the end, my position is simple. The lesser the
>>computational
>>> complexity, the smaller the battery. This statement
>>> will remain true for future semiconductor technologies, too. Thus, a low
>>complexity
>>> mode and low delay mode is advisable for small,
>>> portable battery-powered devices such as wireless headsets, hearing aids,
>>or
>>> wireless sensor nodes. Or other kind of devices, which
>>> have problems with head dissipation.
>>Are all those devices being of interest for the codec under development? For
>>example is it your plan to make use of the IETF codec in hearing aids? In
>>that case, perhaps requirements for hearing aids should be discussed and
>>added to the requirements. But I am not sure it is really relevant for an
>>audio codec designed specifically for use over the Internet.
>>
>>Best regards
>>
>>Herve Taddei
>>
>>> -----Original Message-----
>>> From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On Behalf Of
>>> Christian Hoene
>>> Sent: Thursday, May 27, 2010 12:45 PM
>>> To: 'Koen Vos'; 'Raymond (Juin-Hwey) Chen'
>>> Cc: codec@ietf.org
>>> Subject: Re: [codec] #16: Multicast?
>>>
>>> Hello Koen and Raymond,
>>>
>>> yesterdays, I had a brief look on ITU-T G.114
>>> http://www1.cs.columbia.edu/~andreaf/new/documents/other/T-REC-G.114-
>>> 200305.pdf
>>> It might help in your discussion...
>>>
>>> Regarding Moore-law and so: We shall keep in mind that cannot continue
>>forever.
>>> Already today, due to Quantum tunneling and
>>> subthreshold leakage current very small semiconductor structures consume
>>> increasing amounts of energy. Thus, it might not always be
>>> advisable to use the latest technology if power consumption shall be low.
>>>
>>> It is know that "CMOS circuits dissipate power by charging the various
>>load
>>> capacitances (mostly gate and wire capacitance, but also
>>> drain and some source capacitances) whenever they are switched. The charge
>>> moved is the capacitance multiplied by the voltage
>>> change. Multiply by the switching frequency on the load capacitances to
>>get the
>>> current used, and multiply by voltage again to get
>>> the characteristic switching power dissipated by a CMOS device: P = CV²f".
>>>
>>> C is the capacity (depending on the size of the structure)
>>> V is Voltage
>>> f is the powering frequency
>>> P is the power
>>>
>>> Thus, the power does not decrease if the calculation (e.g., the encoding
>>and
>>> decoding) is done faster or slower.
>>>
>>> In order to save power in mobile device, Dynamic frequency scaling and
>>Dynamic
>>> voltage scaling change the frequency and/or the
>>> voltage to save power.  If power consumption needs to be reduced, the
>>device
>>> reduces voltage and frequency and thus the calculation
>>> takes longer. Thus, even if the CPU can do the encoding/decoding at full
>>speed
>>> and in a fraction of the frame duration, it is not
>>> always advisable to do it like that. Instead, if energy supply is limited,
>>then
>>> calculations shall be slowed down.
>>>
>>> My argumentations above support the position of Raymond that device with
>>low
>>> processing power will be used and that this increases
>>> to the transmission delay. However, I am not an expert in system or chip
>>design
>>> and thus, I might have missed a few details or
>>> tradeoffs.
>>>
>>> Nevertheless, in the end, my position is simple. The lesser the
>>computational
>>> complexity, the smaller the battery. This statement
>>> will remain true for future semiconductor technologies, too. Thus, a low
>>complexity
>>> mode and low delay mode is advisable for small,
>>> portable battery-powered devices such as wireless headsets, hearing aids,
>>or
>>> wireless sensor nodes. Or other kind of devices, which
>>> have problems with head dissipation.
>>>
>>> With best regards,
>>>
>>>  Christian Hoene
>>>
>>> ---------------------------------------------------------------
>>> Dr.-Ing. Christian Hoene
>>> Interactive Communication Systems (ICS), University of Tübingen
>>> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
>>> http://www.net.uni-tuebingen.de/
>>>
>>>
>>> >-----Original Message-----
>>> >From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On Behalf Of
>>> Koen Vos
>>> >Sent: Thursday, May 27, 2010 6:43 AM
>>> >To: Raymond (Juin-Hwey) Chen
>>> >Cc: codec@ietf.org
>>> >Subject: Re: [codec] #16: Multicast?
>>> >
>>> >Quoting "Raymond (Juin-Hwey) Chen":
>>> >> My point is that we should not expect that future IP phones or gateways
>>> >> will operate at a very low percentage point of the processor load just
>>> >> because Moore's Law can improve processor speed over time.
>>> >
>>> >In other words, future manufacturers won't spend a few dimes on
>>> >reducing delay, even though today they're happy to add several dollars
>>> >to the price just to enable wideband?  That's a statement about the
>>> >relative importance of delay.
>>> >
>>> >For the discussion about transmission delay vs. frame size, see e.g.
>>> >http://www.ietf.org/mail-archive/web/codec/current/msg01477.html
>>> >
>>> >koen.
>>> >
>>> >
>>> >
>>> >> Hi Koen,
>>> >>
>>> >> In-line below...
>>> >>
>>> >> You wrote:
>>> >>> The essence, if I understand you correctly, is that there still exist
>>> >>> low-end platforms with barely enough processing power to run a VoIP
>>> >>> call.  If such platforms use a naive FIFO scheduler, they'll create up
>>> >>> to one frame of processing delay for encoder and decoder each, on top
>>> >>> of the frame of buffering delay.
>>> >>
>>> >> [Raymond]: It doesn't have to be low-end platforms.  I wouldn't
>>consider
>>> >> high-density VoIP gateways "low-end".  What matters is whether the
>>> >> processor is heavily loaded (i.e. busy at a high percentage of time)
>>> >> with real-time tasks (and thus is just fast enough). I think this is
>>> >> true for typical implementations of IP phones and VoIP gateways.
>>> >>
>>> >> I also wouldn't use the term "a naïve FIFO scheduler" to describe the
>>> >> "run to completion" real-time scheduler that I talked about in my last
>>> >> email, because that term seems to imply that it is a very simple-minded
>>> >> and inferior approach used by an inexperienced person who doesn't know
>>> >> anything better.  My understanding from talking to the three senior
>>> >> technical leads of Broadcom is that the reality is when you have many
>>> >> real-time tasks that you need to handle concurrently, using a
>>> >> prioritized interrupt-driven scheduler is just way too complex and
>>> >> messy, and it doesn't even guarantee that you will get a lower delay if
>>> >> you do go through the trouble.  In contrast, the kind of "run to
>>> >> completion" real-time scheduler that I talked about is a more elegant
>>> >> solution as it simplifies the scheduling problem substantially and also
>>> >> allows you to have more efficient utilization of the processor.
>>> >>
>>> >> Other than these two points, your understanding of my main point is
>>> >> correct.
>>> >>
>>> >>> The good news is that Moore's law will continue to drive down the
>>> >>> fraction of platforms with such processing delay problems.
>>> >>
>>> >> [Raymond]: This may be true for PC but probably not true in general.
>>> >> PC is a general-purpose computing device that has to handle numerous
>>> >> possible tasks, and a voice phone call takes only a very small fraction
>>> >> of the worst-case computational power requirement of a PC.  In
>>contrast,
>>> >> for special-purpose dedicated hardware devices such as IP phones or
>>> >> VoIP gateways, it would make no sense to use a processor that is many
>>> >> times faster than the worst-case computational power requirement.  For
>>> >> the sake of cost and power efficiency, the designers of such special-
>>> >> purpose devices will want to use a processor that's just slightly
>>faster
>>> >> than required, because then they can use the cheapest and/or lowest
>>> >> power-consuming processor that's fast enough to get the job done.
>>> >> If they choose to use a processor much faster than is required, then
>>> >> competitors using processors just fast enough can have lower costs
>>> >> and power consumption and can take market share away from them.
>>> >>
>>> >> A case in point: after its first appearance several decades ago, 8-bit
>>> >> microprocessors are still widely used in many devices today despite the
>>> >> several orders of magnitude of speed improvement provided by Moore's
>>> >> Law, because those devices just don't need anything faster, so using
>>> >> anything faster would be a waste of money and power consumption.
>>> >>
>>> >> My point is that we should not expect that future IP phones or gateways
>>> >> will operate at a very low percentage point of the processor load just
>>> >> because Moore's Law can improve processor speed over time. Therefore,
>>> >> don't expect the 3X multiplier for codec frame size to go down much
>>> >> below where they are now.
>>> >>
>>> >> In fact, if in addition to a VoIP call, a PC is heavily loaded with a
>>> >> lot of other concurrent tasks, many of which may be real-time tasks
>>> >> (e.g. video, playing/burning CD/DVD, networking, etc.), then it will be
>>> >> difficult for the PC to have small encoding and decoding RTS delays (d2
>>> >> and d5 in my delay analysis).  In this case, the codec frame size
>>> >> multiplier will be closer to 3X than to 1X, unless you are willing to
>>> >> let the voice stream occasionally run out of real time and produce an
>>> >> audible glitch (which is not acceptable from the voice quality
>>> >> perspective).  If you agree with this and agree that a PC sometimes
>>> >> does get very heavily loaded, then if you don't want the voice stream
>>> >> to run out of real time, the worst-case codec-dependent delay for
>>> >> PC can still be around 3X the codec frame size.
>>> >>
>>> >>> I'm a bit surprised by your analysis of "packet transmission delay",
>>> >>> as it has little bearing on our multiplier (ie the change in delay as
>>> >>> a function of frame size). See old posts.
>>> >>
>>> >> [Raymond]: I am not sure I understand what you are saying.  You
>>probably
>>> >> misunderstood the goal of my analysis. I mentioned in my last email
>>that
>>> >> my delay analysis aimed to derive the lower and upper bounds of the
>>> >> codec-dependent one-way delay as functions of both the codec frame size
>>> >> AND the packet size.  That "packet transmission delay" does depend on
>>> >> the packet size, so it should be included.  Also, including it doesn't
>>> >> increase the lower bound of the delay (and the codec frame size
>>> >> multiplier there); it only affects the upper bound.
>>> >>
>>> >> Or, are you saying the "packet transmission delay" depends on the
>>packet
>>> >> size, not the codec frame size, and therefore is not codec-dependent?
>>> >> Well, we know the packet size should be a positive integer multiple of
>>> >> the codec frame size.  Once the codec frame size is determined, there
>>> >> are only limited choices of packet sizes you can use, so in this sense
>>> >> the packet size does depend on the codec frame size.  Therefore, the
>>> >> "packet transmission delay" indirectly depends on the choice of the
>>> >> codec.
>>> >>
>>> >> Best Regards,
>>> >>
>>> >> Raymond
>>> >>
>>> >>
>>> >>
>>> >
>>> >_______________________________________________
>>> >codec mailing list
>>> >codec@ietf.org
>>> >https://www.ietf.org/mailman/listinfo/codec
>>>
>>> _______________________________________________
>>> codec mailing list
>>> codec@ietf.org
>>> https://www.ietf.org/mailman/listinfo/codec
>
>
>_______________________________________________
>codec mailing list
>codec@ietf.org
>https://www.ietf.org/mailman/listinfo/codec