Re: [armd] soliciting typical network designs for ARMD

Warren Kumari <warren@kumari.net> Wed, 31 August 2011 19:14 UTC

Return-Path: <warren@kumari.net>
X-Original-To: armd@ietfa.amsl.com
Delivered-To: armd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2EADE21F8F49 for <armd@ietfa.amsl.com>; Wed, 31 Aug 2011 12:14:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.553
X-Spam-Level:
X-Spam-Status: No, score=-102.553 tagged_above=-999 required=5 tests=[AWL=0.046, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gnInwiIQoei7 for <armd@ietfa.amsl.com>; Wed, 31 Aug 2011 12:14:40 -0700 (PDT)
Received: from vimes.kumari.net (vimes.kumari.net [198.186.192.250]) by ietfa.amsl.com (Postfix) with ESMTP id EE15121F8F3F for <armd@ietf.org>; Wed, 31 Aug 2011 12:14:39 -0700 (PDT)
Received: from dot.her.corp.google.com (unknown [74.202.225.33]) by vimes.kumari.net (Postfix) with ESMTPSA id A10531B404C9; Wed, 31 Aug 2011 15:16:10 -0400 (EDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset="windows-1252"
From: Warren Kumari <warren@kumari.net>
In-Reply-To: <CAOyVPHS-OF8+GRpmcAxbCj5_HEvgVSOvRMA2hC66v1pxs526Nw@mail.gmail.com>
Date: Wed, 31 Aug 2011 15:16:09 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <35BAFA1F-25E8-442E-8FE6-2D5691DCBEAC@kumari.net>
References: <CAP_bo1b_2D=fbJJ8uGb8LPWb-6+sTQn1Gsh9YAp8pFs3JY_rrw@mail.gmail.com> <CAOyVPHTLYv=-GbjimpDr5NsxMUeWKtVKzStY9yxQO7s4YD2Ywg@mail.gmail.com> <CAP_bo1Ya7p+OS7fS40jE4+UZuhmeO+MAroC=CZK5sMEE625z8Q@mail.gmail.com> <CAOyVPHTcFr7F4ymQyXyECtS6f8z1XyZn40a_5WcpcjF9y0hZvQ@mail.gmail.com> <CA+-tSzx6DGPptGdtx5awzhnPPJgRHow2SWfuwRP4rwjdN1MXmw@mail.gmail.com> <CAOyVPHRUFrm2xqwrd4OVQbRotae+3+E8xhOF4n1dmWERVdLPEg@mail.gmail.com> <CA+-tSzzvj=eUYT4ZOKiy9yGssmrx71eby2f1xkKKh4NkXL5-Vg@mail.gmail.com> <CAOyVPHS-OF8+GRpmcAxbCj5_HEvgVSOvRMA2hC66v1pxs526Nw@mail.gmail.com>
To: Vishwas Manral <vishwas.ietf@gmail.com>
X-Mailer: Apple Mail (2.1084)
Cc: armd@ietf.org
Subject: Re: [armd] soliciting typical network designs for ARMD
X-BeenThere: armd@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion of issues associated with large amount of virtual machines being introduced in data centers and virtual hosts introduced by Cloud Computing." <armd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/armd>, <mailto:armd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/armd>
List-Post: <mailto:armd@ietf.org>
List-Help: <mailto:armd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/armd>, <mailto:armd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 19:14:41 -0000

On Aug 11, 2011, at 11:40 PM, Vishwas Manral wrote:

> Hi Linda/ Anoop,
>  
> Here is the example of the design I was talking about, as defined by google.

Just a clarification -- s/as defined by google/as described by someone who happens to work for google/

W

> http://www.ietf.org/id/draft-wkumari-dcops-l3-vmmobility-00.txt
>  
> Thanks,
> Vishwas
> On Tue, Aug 9, 2011 at 2:50 PM, Anoop Ghanwani <anoop@alumni.duke.edu> wrote:
> 
> >>>>
> (though I think if there was a standard way to map Multicast MAC to Multicast IP, they could probably use such a standard mechanisms).
> >>>>
> 
> They can do that, but then this imposes requirements on the
> equipment to be able to do multicast forwarding, and even if does,
> because of pruning requirements the number of groups would be
> very large.  The average data center switch probably won't handle
> that many groups.
> 
> On Tue, Aug 9, 2011 at 2:41 PM, Vishwas Manral <vishwas.ietf@gmail.com> wrote:
> Hi Anoop,
>  
> From what I know they do not use Multicast GRE (I hear the extra 4 bytes in the GRE header is a proprietery extension).
>  
> I think a directory based mechanism is what is used (though I think if there was a standard way to map Multicast MAC to Multicast IP, they could probably use such a standard mechanisms).
>  
> Thanks,
> Vishwas
> On Tue, Aug 9, 2011 at 2:03 PM, Anoop Ghanwani <anoop@alumni.duke.edu> wrote:
> Hi Vishwas,
> 
> How do they get multicast through the network in that case?
> Are they planning to use multicast GRE, or just use directory
> based lookups and not worry about multicast applications
> for now?
> 
> Anoop
> 
> On Tue, Aug 9, 2011 at 1:27 PM, Vishwas Manral <vishwas.ietf@gmail.com> wrote:
> Hi Linda,
>  
> The data packets can be tunnelled at the ToR over say a GRE packet and the core is a Layer-3 core (except for the downstream ports). So we could have encapsulation/ decapsulation of L2 over GRE at the ToR.
>  
> The very same thing can be done at the hypervisor layer too, in which case the entire DC network would look like a Layer-3 flat network including the ToR to server link and the hypervisor would do the tunneling.
>  
> I am not sure if you got the points above or not. I know cloud OS companies that provide the service and have big announced customers.
>  
> Thanks,
> Vishwas
> On Tue, Aug 9, 2011 at 11:51 AM, Linda Dunbar <dunbar.ll@gmail.com> wrote:
> Vishwas,
>  
> In my mind the bullet 1) in the list refers to ToR switches downstream ports (facing servers) running Layer 2 and ToR uplinks ports run IP Layer 3.
>  
> Have you seen data center networks with ToR switches downstream ports (i.e. facing servers) enabling IP routing, even though the physical links are Ethernet?  
> If yes, we should definitely include it in the ARMD draft.
>  
> Thanks,
> Linda
> On Tue, Aug 9, 2011 at 12:58 PM, Vishwas Manral <vishwas.ietf@gmail.com> wrote:
> Hi Linda,
> I am unsure what you mean by this, but:
> 	• layer 3 all the way to TOR (Top of Rack switches),
> We can also have a heirarchical network, with the core totally Layer-3 (and having seperate routing), from the hosts still in a large Layer-3 subnet. Another aspect could be to have a totally Layer-3 network.
>  
> The difference between them is the link between the servers and the ToR.
>  
> Thanks,
> Vishwas
> On Tue, Aug 9, 2011 at 10:22 AM, Linda Dunbar <dunbar.ll@gmail.com> wrote:
> During the 81st IETF ARMD WG discussion, it was suggested that it is necessary to document typical data center network designs so that address resolution scaling issues can be properly described. Many data center operators have expressed that they can't openly reveal their detailed network designs. Therefore, we only want to document anonymous designs without too much detail. During the journey of establishing ARMD, we have come across the following typical data center network designs:
> 	• layer 3 all the way to TOR (Top of Rack switches),
> 	• large layer 2 with hundreds (or thousands) of ToRs being interconnected by Layer 2. This design will have thousands of hosts under the L2/L3 boundary router (s)
> 	• CLOS design  with thousands of switches. This design will have thousands of hosts under the L2/L3 boundary router(s)
> We have heard that each of the designs above has its own problems. ARMD problem statements might need to document DC problems under each typical design.
> Please send feedback to us (either to the armd email list  or to the ARMD chair Benson & Linda) to indicate if we have missed any typical Data Center network designs.
>  
> Your contribution can greatly accelerate the progress of ARMD WG.
>  
> Thank you very much.
>  
> Linda & Benson
> 
>