Re: [Dcrouting] draft-przygienda-rift-03
Tony Przygienda <tonysietf@gmail.com> Thu, 11 January 2018 17:40 UTC
Return-Path: <tonysietf@gmail.com>
X-Original-To: dcrouting@ietfa.amsl.com
Delivered-To: dcrouting@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3117A12D87A; Thu, 11 Jan 2018 09:40:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Yy0zFXMceXHE; Thu, 11 Jan 2018 09:40:47 -0800 (PST)
Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D54EB12AF6E; Thu, 11 Jan 2018 09:40:46 -0800 (PST)
Received: by mail-wm0-x22e.google.com with SMTP id g75so7192334wme.0; Thu, 11 Jan 2018 09:40:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=28LzpsAOotuG0kTesPyhckSUU7GjJaKZZTZr25OZ7Tg=; b=Kw5diFxbvFji8IgBYw826CWHCjtbP96aYs8CbS1/JyvEmCwfmDFf9BQemp5mcS107L hr4N2V8/IKTvPT1QoudwpMDGdVWEIVaav9bwSDKpeb0mXhDPGBU6fXeFdgbfAGd+i4QD EfWutEW92RR0o9iXBqaqc0lW8MyfVP4BMinovVMXWJv6K2hiaemFz/OZAwcZFaOMAqRW +By8fDok4BfHJ90XKp3N4ymo3aaI9CU/rqi9Y+XvIvDXcFy/smNNInn+f+/LI4OiIGfx BcRNRsDGat45w0VQK1TrQLdrh19YUdmpM+Xclz/j8Q7f91Y8xyKjIrMz5elvORs9qWOZ 7Sgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=28LzpsAOotuG0kTesPyhckSUU7GjJaKZZTZr25OZ7Tg=; b=fNe7RfolSDlRAAMWmZBiBg2WklZ1J7DRnqzI7rQpjlwdYIaOvKnM3lt+Qv/SBsMYuJ OlfdjlUoT/fahGK/s9D8pF/kgx2CdXiMUEHTwJG43BEuBbrZiYYiVCpR+KAOiVF5QkR+ 14IgSpQSMaiXPNGXw/esj0WyB1CYNZjmDGwRWhW6Wn4fi/EmTcaOPjlEkqeG5f6Gael2 jsOZtLurYqBSqTVaZBahMTdxAUn22dXvrXlMzT7b6DpPfQNLh2E12CrRy0yANRKLJYPI Vn5+5WNOzLgRsygly59ntTxLytRgAOPWlLbtnGOnAqKimyLizeB8xJpZPLbd+2t3clMq +JNg==
X-Gm-Message-State: AKwxytet5jgPEkXfMU4oX1ivB70D5TIStn+7L8BCkwPYddQGCJLsoBny lP1cuZF6GohgkUW+Y39mMsJLttDolmILwJiFYcJ1JxLg
X-Google-Smtp-Source: ACJfBovE+dv/dnxi+8dmmt3c/ip9MbCW3y43Z8k3L5C2aLetXu8D6giGDE0CiOCwP5PW8PuvFn5TXMVgvoTQ6fOcVL8=
X-Received: by 10.80.153.139 with SMTP id m11mr13471418edb.145.1515692445175; Thu, 11 Jan 2018 09:40:45 -0800 (PST)
MIME-Version: 1.0
Received: by 10.80.164.199 with HTTP; Thu, 11 Jan 2018 09:40:04 -0800 (PST)
In-Reply-To: <CA+b+ERnOc7V7+OL2wsfZsRsdSpjeSQmQQdH7SX_WLbySaVtxKw@mail.gmail.com>
References: <CA+b+ERnOc7V7+OL2wsfZsRsdSpjeSQmQQdH7SX_WLbySaVtxKw@mail.gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Thu, 11 Jan 2018 09:40:04 -0800
Message-ID: <CA+wi2hNbhXuXLKPD_0FL2csv1o9d37hF0XFex632z1skXUji+w@mail.gmail.com>
To: Robert Raszuk <robert@raszuk.net>
Cc: rift@ietf.org, spring@ietf.org, dcrouting@ietf.org
Content-Type: multipart/alternative; boundary="94eb2c0ec4fac89593056283a558"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dcrouting/70GWB8ElrXRVB_c-g7auty-kASk>
Subject: Re: [Dcrouting] draft-przygienda-rift-03
X-BeenThere: dcrouting@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Routing in the Data Center: discussions about problems, requirements and potential solutions." <dcrouting.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dcrouting>, <mailto:dcrouting-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dcrouting/>
List-Post: <mailto:dcrouting@ietf.org>
List-Help: <mailto:dcrouting-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dcrouting>, <mailto:dcrouting-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jan 2018 17:40:50 -0000
Robert, productive points, thanks for raising them ... I go a bit in depth 1. I saw no _real_ use-cases for SID in DC so far to be frank (once you run RIFT). The only one that comes up regularly is egress engineering and that IMO is equivalent to SID=leaf address (which could be a HV address of course once you have RIFT all way down to server) so really, what's the point to have a SID? It's probably much smarter to use IBGP & so on overlay to do this kind of synchronization if needed since labels/SIDs become very useful in overlay to distinguish lots stuff there like VPNs/services which you'd carry e.g. in MPLSoUDP. In underlay just use the destination v4/v6 address. Having said that, discussion always to be had if you pay me dinner ;--) and I know _how_ we can do SIDs in RIFT since I thought it through but again, no _real_ use case so far. And if your only concern is to "shape towards a prefix" we have PGP in the draft which doesn't need new silicon ;-P And then ultimately, yes, if you really, really want a SID per prefix everywhere then you'll carry SIDs to everywhere since unicast SIDs are really just a glorified way to say "I have this non-aggreagable 20 bit IP host address" which architecturally is a very interesting proposition in terms of scaling (but then again, no account for taste and RFC1925 clause 3 applies) ... Your LSDB will be still much smaller, your SPF will be still simple on leaf in RIFT but your FIB will blow up and anything changing on a leaf shakes all other leafs (unless you start to run pollicies to control distribution @ which point in time you start to baby-sit your fabric @ high OPEX). One of the reasons to do per-prefix SID would be non-ECMP anycast (where SIDs _are_ in fact usefull) but if you read RIFT draft carefully you will observe that RIFT can do anycast without need for ECMP, i.e. true anycast in a sense and with that having anycast SID serves no real purpose in RIFT and is actually generally much harder to do since you need globally unique label blocks and so on ... 2. Horizontal links on CLOSes are not used that way normally all I saw since your blocking goes to hell unless you provision some kind of really massive parallel links between ToRs _and_ understand your load. We _could_ build RIFT that way but you give up balancing through the fabric and loop-free property in a sense (that's a longish discussion and scaling since now you have prefixes showing up all kind of crazy places instead of default). I see enough demand, we get there ... Otherwise RFC1925 clause 10 and 5. 3. PS1: Yes, lots of things "could" be done and then we "could" build a protocol to do that and RFC1925 clause 7 and 8 applies. Such horizontal links, unless provisioned correctly will pretty much just ruin your blocking/loss on the fabric is the experience (which the math supports). In a sense if you know your big flows you can build a specialized topology to do the optimal distribution (MPLS tunnels anyone ;-) but the point of fabric is that it's a fabric (i.e. load agnostic, cheap, no OPEX and easily scalable). Otherwise a good analogy would be that you like to build special RAM chips for the type of data structures you are storing and we know how well that scales over time. We know now that within 3-4 years characteristics of DC flows flip upside down without a sweat when people go from server/client to microservices, from servers to containers and so on and so on. So if you can't predict your load all the time you need a _regular_ topology where _regular_ is more of a mathematical than a protocol discussion. Fabric analogy of "buy more RAM chips in Fry's and just stick them in" applies here. So RIFT is done largely to serve a well-known structure called a "lattice" (with some restrictions) since we need an "up" and "down". Things like hypercubes, thoroidal meshes and so on and so on exist but CLOS won for a very good reason in history for that kind of problems (once you move to NUMA other things win ;-) And if you know your loads and your can heft the OPEX and you like to play with protocols generally and if you can support the scale in terms of leaf FIB sizes, flooding, slower convergence & so on & so on and you run flat IGP on some kind of stuff that you build that doesn't even have to be regular in any sense. We spent many years solving THAT problem obviously and doing something like RIFT to replace normal IGP is of limited interest IMO (albeit certain aspects having to do with modern implemenation techniques may get us there one day but it's much less of pressing problem than solving specialized DC routing well IMO again). 3. PS2: RIFT cannot build an "unsupported topology" no matter how you cable (that's the point of it) or rather we have miscabling detection and do not form adjacencies when you read the draft carefully. That's your "flash red light" and it comes included for free with my compliments ;-) ... Otherwise RFC1925 clause 10. Otherwise, if you have concrete charter points you'd like to add, be more specific in your asks and we see what the list thinks after ... thanks --- tony On Thu, Jan 11, 2018 at 1:30 AM, Robert Raszuk <robert@raszuk.net> wrote: > Hi, > > I have one little question/doubt on scalability point of RIFT ... > > Assume that someone would like to signal IPv6 prefix SID for Segment > Routing in the underlay within RIFT. > > Wouldn't it result in amount of protocol state in full analogy to massive > deaggregation - which as of today is designed to be very careful and > limited operation only at moments of failure(s) ? > > I sort of find it a bit surprising that RIFT draft does not provide > encoding for SID distribution when it is positioned as an alternative to > other protocols (IGPs or BGP) which already provide ability to carry all > types of SIDs. > > Cheers, > Robert. > > PS1: Horizontal links which were discussed could be installed to offload > from fabric transit massive amount of data (ex: storage mirroring) directly > between leafs or L3 TORs and not to be treated as "backup". > > PS2: Restricting any protocol to specific topologies seems like pretty > slippery slope to me. In any case if protocol does that it should also > contain self detection mechanism of "unsupported topology" and flash red > light in any NOC. > > > > _______________________________________________ > Dcrouting mailing list > Dcrouting@ietf.org > https://www.ietf.org/mailman/listinfo/dcrouting > >
- [Dcrouting] draft-przygienda-rift-03 Robert Raszuk
- Re: [Dcrouting] draft-przygienda-rift-03 Tony Przygienda
- Re: [Dcrouting] draft-przygienda-rift-03 Tony Przygienda
- Re: [Dcrouting] draft-przygienda-rift-03 Robert Raszuk
- Re: [Dcrouting] draft-przygienda-rift-03 Tony Przygienda
- Re: [Dcrouting] draft-przygienda-rift-03 Tony Przygienda
- Re: [Dcrouting] draft-przygienda-rift-03 Robert Raszuk
- Re: [Dcrouting] draft-przygienda-rift-03 Tony Przygienda
- Re: [Dcrouting] draft-przygienda-rift-03 Robert Raszuk