Re: [armd] address resolution requirement from hosts to overlay edge nodes. Any opinion?

Igor Gashinsky <igor@yahoo-inc.com> Fri, 17 February 2012 07:16 UTC

Return-Path: <igor@yahoo-inc.com>
X-Original-To: armd@ietfa.amsl.com
Delivered-To: armd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9469E21F8812 for <armd@ietfa.amsl.com>; Thu, 16 Feb 2012 23:16:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -18.599
X-Spam-Level:
X-Spam-Status: No, score=-18.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, USER_IN_DEF_WHITELIST=-15]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bs2jmHhISBkC for <armd@ietfa.amsl.com>; Thu, 16 Feb 2012 23:16:25 -0800 (PST)
Received: from mrout2.yahoo.com (mrout2.yahoo.com [216.145.54.172]) by ietfa.amsl.com (Postfix) with ESMTP id 72B3621F8800 for <armd@ietf.org>; Thu, 16 Feb 2012 23:16:25 -0800 (PST)
Received: from netops1.corp.bf1.yahoo.com (netops1.corp.bf1.yahoo.com [98.139.254.110]) by mrout2.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q1H7G5I4081386; Thu, 16 Feb 2012 23:16:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1329462965; bh=dxXfAcfKE4g47xopaUXAfxk4cpufvygmRpC4lyrp1kk=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=ja1A+86u1G/DiQu0GqfdpezOiMAgNdgs7cXucKsthCL1tt5itdKaqz1pJdgYdTNF/ Yi+xKaft6Y2DgoO/gRT0156TN+gGp7K+38z5EBukctaSbMnAkn2IrUcBnHdpgMVoZx I+hi/c9/tZ9J1oC1TFELvag6Zhf1XvmXHjAQ4zPA=
Date: Thu, 16 Feb 2012 23:16:05 -0800 (PST)
From: Igor Gashinsky <igor@yahoo-inc.com>
X-X-Sender: igor@netops1.corp.bf1.yahoo.com
To: Dino Farinacci <dino@cisco.com>
In-Reply-To: <5EC573B7-3DF4-42AC-A3C5-BEA3C2AB8A1D@cisco.com>
Message-ID: <alpine.LRH.2.00.1202162246110.23977@netops1.corp.bf1.yahoo.com>
References: <CA+-tSzzNeLP4N=Nv1EeBML51KTpmxPP3NWut+vnaWFy8RtUViA@mail.gmail.com> <7AE6A4247B044C4ABE0A5B6BF427F8E291E1A5@dfweml503-mbx> <CA+-tSzyvoDfwnKc7Yt65abQWSqMg2jF0iQax=wcYkmwtNGxZng@mail.gmail.com> <60C093A41B5E45409A19D42CF7786DFD522A9BE1F1@EUSAACMS0703.eamcs.ericsson.se> <CA+-tSzwZVYyEO62ngYGojwSrkSBBY2SWr93PDQmAp7a3y_7TMQ@mail.gmail.com> <CAL3FGfy0iyo_TTr-iuSzQuqRm8Li753UFWQsk=RGWh_nCdPMMw@mail.gmail.com> <CA+-tSzwFWBWd0_QZ4CqgQmjTUaXnBafNVdk8oZvK6oRTCR4Jqg@mail.gmail.com> <CAL3FGfwx=n9kKjwcARg6-ge2a-t-R+7RmR=d-qRJx=TdzNHMAQ@mail.gmail.com> <alpine.LRH.2.00.1202141450530.7083@netops1.corp.bf1.yahoo.com> <5EC573B7-3DF4-42AC-A3C5-BEA3C2AB8A1D@cisco.com>
User-Agent: Alpine 2.00 (LRH 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Mailman-Approved-At: Fri, 17 Feb 2012 07:00:02 -0800
Cc: Thomas Narten <narten@us.ibm.com>, "armd@ietf.org" <armd@ietf.org>
Subject: Re: [armd] address resolution requirement from hosts to overlay edge nodes. Any opinion?
X-BeenThere: armd@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion of issues associated with large amount of virtual machines being introduced in data centers and virtual hosts introduced by Cloud Computing." <armd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/armd>, <mailto:armd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/armd>
List-Post: <mailto:armd@ietf.org>
List-Help: <mailto:armd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/armd>, <mailto:armd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Feb 2012 07:16:29 -0000

On Wed, 15 Feb 2012, Dino Farinacci wrote:

:: > I've so far stayed pretty quiet on this, but, on this, I have to strongly 
:: > disagree. It is not FUD that multicast doesn't scale well inside large 
:: 
:: There are others who probably wish to not divulge their proprietary 
:: scalign numbers who would disagree with you.

Would anybody be willing to speak about orders of magnitude here? We had 
to do a *lot* of nasty, ugly, hacky-ish things to get mcast to scale to 
the 10's of Ks (S,G) scale, but, when we looked at what it would take to 
make the next order-of-magnitude jump to 100k+, that's when the 
cost/complexity tradeoff simply made it not practical -- I'd love to hear 
if anybody is actually doing 100k+ (S,G) of random-ish traffic profile.

:: > datacenters -- it is a simple fact, and I speak as an operator of what 
:: > several of my vendors called the 2nd largest multicast deployment they 
:: > have ever seen, with many 10's of thousands (S,G) entries. 
:: 
:: It depends what you are comparing 10,000 to. Comparing to unicast 
:: numbers would not being comparing apples with apples. Remember the 
:: granularity of a multicast route is much finer than a unciast route 
:: because it wants to conserve bandwidith and build good distribution 
:: trees.
:: 
:: It is a simple bandwidth versus state tradeoff and 10,000 is pretty large.

You are absolutely right.. but the tradeoff is mostly dependant on the 
*degree of replication* for all those mcast routes vs the state that they 
cost. For my network, the *average* degree of replication is actually 
quite small (on the order of 2-3 hosts per group, taking up a tiny 
fraction of the bandwidth), so trading that amount of bandwidth for state 
was a no brainer. However, if you have very few groups with a very high 
replication factor (say, 1000 hosts), then you would arrive at a very 
different conclusion, but, then you likely don't have a multicast scaling 
problem, since then you likely only have a few hundred (S,G) entries.. So, 
do people have some other datapoints on degree of replication, amount of 
bw saved by mcast vs state on large datacenter (ie not-designed-for-video) 
multicast networks?

:: Many think a data center with 10,000 hosts are large too. I know you 
:: can one-mag-up us Igor, but 10,000 is a large number for enterprise 
:: sites.

I guess that brings us to the crux of the question -- ARMD = Address 
Resolution for *Massive* numbers of hosts in the Data center (from the 
charter), so, what is the order of magnitude for massive? :)

To me, (and perhaps i'm in a very small minority here?), massive implies 
cloud scale datacenters, so, 10-20k physical hosts *per cluster*, and 
somewhere around 400-500k VM's is the absolute *minimum* for what I would 
concider to be massive (and, really, I think 100k physical, 2M+ logical is 
what i believe a realistic aiming point these days should be). 

If that's the case, then we are looking at about 2-order-of-magnitude 
higher then what most enterprises do.. so, what problem space do we want 
to solve? 

Thanks,
-igor

PS as you can probably tell, Dino and I have had this discussion before, 
quite a few times :)

--------------------+----------------------+------------------
   Igor Gashinsky   | Network Architecture | Yahoo! Inc.
 igor@yahoo-inc.com |  cell 917.807.2213   | Do You... Yahoo?
--------------------+----------------------+------------------