Re: Power consumption reduction in edge and core routers in the internet

Curtis Villamizar <curtis@occnc.com> Sat, 08 September 2012 22:57 UTC

Return-Path: <curtis@occnc.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D730E21F84E1 for <rtgwg@ietfa.amsl.com>; Sat, 8 Sep 2012 15:57:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.995
X-Spam-Level:
X-Spam-Status: No, score=-1.995 tagged_above=-999 required=5 tests=[AWL=0.500, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, GB_I_LETTER=-2, HELO_MISMATCH_COM=0.553, RDNS_NONE=0.1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nWYcO7rqpsI6 for <rtgwg@ietfa.amsl.com>; Sat, 8 Sep 2012 15:57:05 -0700 (PDT)
Received: from gateway1.orleans.occnc.com (unknown [173.9.106.132]) by ietfa.amsl.com (Postfix) with ESMTP id 911E721F84DE for <rtgwg@ietf.org>; Sat, 8 Sep 2012 15:57:05 -0700 (PDT)
Received: from harbor1.ipv6.occnc.com (harbor1.ipv6.occnc.com [IPv6:2001:470:1f07:1545::2:819]) (authenticated bits=0) by gateway1.orleans.occnc.com (8.14.5/8.14.5) with ESMTP id q88Mv1Hn022065; Sat, 8 Sep 2012 18:57:01 -0400 (EDT) (envelope-from curtis@occnc.com)
Message-Id: <201209082257.q88Mv1Hn022065@gateway1.orleans.occnc.com>
To: Balaji Venkat <balajivenkat299@gmail.com>
From: Curtis Villamizar <curtis@occnc.com>
Subject: Re: Power consumption reduction in edge and core routers in the internet
In-reply-to: Your message of "Sun, 09 Sep 2012 02:15:27 +0530." <BB69859C-0287-4904-B41E-CFDA28E26A62@gmail.com>
Date: Sat, 08 Sep 2012 18:57:00 -0400
Cc: Shankar Raman <mjsraman@gmail.com>, "rtgwg@ietf.org" <rtgwg@ietf.org>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: curtis@occnc.com
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtgwg>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Sep 2012 22:57:07 -0000

In message <BB69859C-0287-4904-B41E-CFDA28E26A62@gmail.com>
Balaji Venkat writes:
 
> Dear Curtis,
>  
> We used nroffedit for preparing these drafts and forgot to delete the
> default references that got stuck in there.
>  
> We intended these to be serious material.
>  
> If you could comment on the specifics of why you disagree with our
> proposals it would be most beneficial.
>  
> Regards
> Balaji


Without too lengthy a response I have to say that I don't think your
approach is going to yield much if any benefit.

First reducing power in core and edge are very different problems.

There are a few things in the core.  For example:

  1.  Constraining the number of forwarding entries to get the
      existing external TCAM back into the forwarding chip and doing
      the forwarding with a less power greedy method than TCAM.

  2.  If possible eliminating the very large buffer on one side of a
      switching fabric, making it small enough to use on chip RAM and
      putting the bulk of the buffering on one side.  Some think
      putting enough overspeed in the fabric and putting large buffer
      on the output side may be a solution.  Others favor a VOQ
      approach.  The extreme memory bandwidth needed to go to external
      DRAM is the culprit.  As on chip SRAM sizes increase over time
      this may be possible.  Putting external DRAM on both sides of a
      fabric is a big power drain.

The first above is supported by using MPLS with a BGP free core (no IP
forwarding except for the small number of prefixes needed for control
traffic and then just a set of MPLS labels to traverse the topology.
Comments on LDP vs RSVP-TE to /dev/null please (not important here).
Comments on IPFRR are also not relevant as IPFRR doesn't address this.

For edge, forwarding speeds are much lower (about 3 decimal orders of
magnitude give or take one) and buffering requirements are smaller.
On chip buffering of one side is clearly feasible but needs to be done
right.  At the very low end, on chip buffering on both ends might be
feasible.  Access is further out than edge and on chip buffer should
be sufficient.  With IP, if global routing is needed, then an external
TCAM is generally used, though some chips will use external DRAM
(realizing that multiple accesses would be needed for a trie
approach).  I'm far less familiar with where state-of-the-art is for
edge and access.

For large networks the parallel links provide a further potential for
power reduction.  Generally some traffic must be run for monitoring
even if the link is unused.  However in principle on-chip power
islands could be used to eliminate the leakage current that unused
power forwarding engines would draw, while keeping the minimal
circuitry needed for an OAM or link layer equivalent active.  I don't
know any chip that does this.  Generally that would be taking
component links out of a parallel set of links (for example a wide LAG
of 100Gb/s links).

Powering down the laser is very problematic in long haul (LH).  The LH
optical systems monitor and balance power levels on the optical fiber.
Lighting up additional wavelengths on a WDM can take minutes before
power levels are adjusted right and transmit and receive sides and all
the optical amplifiers along the way are sufficiently in sync to bring
bit error rates down.  LH optics and modulation is a black art,
particularly ultra LH (ULH) and subsea (no OEO for up to 6000km) and
right now shutting lasers down can be disruptive.  The optical
amplifier pump lasers have to be powered at about the same levels
regardless of how many optical channels are carried.

This does not apply to edge and access.  Between edge and access, the
physical topology is often a set of logical rings over WDM with ROADM
add/drop.  Taking down one side of each ring would not be viable.  If
there is parallel links or other redundancy, then shutting down lasers
in short haul (less than 40 or less than 100 km) is feasible.

Moving "high touch" functions such as deep packet inspection and
filtering to edge helps.  For core where a provider peering might
require filtering, keeping the rule set very small can keep it on
chip.  Adding still more TCAM or DRAM for filtering drive power up.
If a card must be all things, then it would help to allow those
components to be powered down if not needed and the forwarding chip
SERDES facing them to be powered down if not needed (internal to core
for example, rather than peering).

So I really think that you haven't started by stating a valid problem
other than recognizing that reducing power would be a good thing.

Maybe you should start over with one draft stating the problem.
Identify where the power goes in a provider core network or content
provider data center (or other extremely large data center) or in a
service provider edge and access.  Is it forwarding logic?  Is it chip
externals such as forwarding TCAM or DRAM, buffer SERDES/DRAM, filter
TCAM or DRAM, switching fabric, etc.  What is the contribution of
optical components driving the line?  (hint: don't look here).

Then once you know where the power goes, try to fix it.  IMHO you'll
end up with a very different approach.

Well - I suppose that wasn't as concise as it could be.

Curtis


> On 09-Sep-2012, at 1:32 AM, Curtis Villamizar <curtis@occnc.com> wrote:
>  
> > 
> > In message <CC71048E.E133%balajivenkat299@gmail.com>
> > balaji venkat Venkataswami writes:
> > 
> >> Dear all,
> >> 
> >> After due consultation with the chairs and IRTF folks we have been
> >> instructed to ask for discussion on the following drafts that deal with
> >> power consumption reduction in edge and core devices in the internet.
> >> 
> >> We deal with the problem in a hierarchical approach. One within an
> >> Autonomous System and the other between Autonomous Systems and hence for the
> >> whole internet.
> >> 
> >> We duly submit to you this work that we have done for your kind review and
> >> consideration.
> >> 
> >> We deal with solutions for unicast and multicast at the AS level and the
> >> intra-AS level. Multicast solutions mostly in the intra-AS level.
> >> 
> >> We submit this work and request you to comment on it.
> >> 
> >> Multicast intra-AS level
> >> * mjsraman-pce-power-replic
> >> <http://tools.ietf.org/html/draft-mjsraman-pce-power-replic>  (timeline)
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-pce-power-replic-timing
> >> .html> 
> >> * mjsraman-pim-ecmp-redirect-power-replic-cap
> >> <http://tools.ietf.org/html/draft-mjsraman-pim-ecmp-redirect-power-replic-ca
> >> p>  (timeline) 
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-pim-ecmp-redirect-power
> >> -replic-cap-timing.html>
> >> Unicast AS level
> >> * mjsraman-rtgwg-bgp-power-path
> >> <http://tools.ietf.org/html/draft-mjsraman-rtgwg-bgp-power-path>  (timeline)
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-rtgwg-bgp-power-path-ti
> >> ming.html> 
> >> * mjsraman-rtgwg-inter-as-psp
> >> <http://tools.ietf.org/html/draft-mjsraman-rtgwg-inter-as-psp>  (timeline)
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-rtgwg-inter-as-psp-timi
> >> ng.html> 
> >> * mjsraman-rtgwg-inter-as-psp
> >> <http://tools.ietf.org/html/draft-mjsraman-rtgwg-inter-as-psp>  (timeline)
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-rtgwg-inter-as-psp-timi
> >> ng.html> 
> >> * mjsraman-rtgwg-inter-as-psp-protect
> >> <http://tools.ietf.org/html/draft-mjsraman-rtgwg-inter-as-psp-protect>
> >> (timeline) 
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-rtgwg-inter-as-psp-prot
> >> ect-timing.html> 
> >> Unicast intra-AS level
> >> * mjsraman-rtgwg-intra-as-psp-te-leak
> >> <http://tools.ietf.org/html/draft-mjsraman-rtgwg-intra-as-psp-te-leak>
> >> (timeline) 
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-rtgwg-intra-as-psp-te-l
> >> eak-timing.html> 
> >> * mjsraman-rtgwg-ospf-power-topo
> >> <http://tools.ietf.org/html/draft-mjsraman-rtgwg-ospf-power-topo>
> >> (timeline) 
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-rtgwg-ospf-power-topo-t
> >> iming.html> 
> >> Multicast intra-AS level
> >> * mjsraman-rtgwg-pim-power
> >> <http://tools.ietf.org/html/draft-mjsraman-rtgwg-pim-power>  (timeline)
> >> <http://www.arkko.com/tools/lifecycle/draft-mjsraman-rtgwg-pim-power-timing.
> >> html> 
> >> 
> >> Thanks again to the WG and IETF chairs for letting us share our work with
> >> this group.
> >> 
> >> Thanks and regards
> >> Shankar and team
> > 
> > 
> > My first reaction was that this was just a really bad idea from some
> > students who didn't understand very much about how provider networks
> > were built.
> > 
> > But looking at some of the citations I'm wondering if this whole thing
> > is just a joke.
> > 
> >  draft-mjsraman-rtgwg-ospf-power-topo-01
> > 
> >   [EVILBIT]  Bellovin, S., "The Security Flag in the IPv4 Header",
> >              RFC 3514, April 1 2003.
> > 
> >   [RFC5513]  Farrel, A., "IANA Considerations for Three Letter
> >              Acronyms", RFC 5513, April 1 2009.
> > 
> >   [RFC5514] Vyncke, E., "IPv6 over Social Networks", RFC 5514, April
> >             1 2009.
> > 
> > I'm really hoping the whole thing is a joke, but unfortunately I do
> > think that you are seriously proposing this.  Just confirming first.
> > 
> > Curtis