Re: [Dcrouting] draft-przygienda-rift-03

Tony Przygienda <tonysietf@gmail.com> Thu, 11 January 2018 18:14 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: dcrouting@ietfa.amsl.com
Delivered-To: dcrouting@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2C25212EC15; Thu, 11 Jan 2018 10:14:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YOjNLfWGTShX; Thu, 11 Jan 2018 10:14:20 -0800 (PST)
Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com [IPv6:2a00:1450:400c:c09::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 83B0D12EC19; Thu, 11 Jan 2018 10:14:20 -0800 (PST)
Received: by mail-wm0-x234.google.com with SMTP id g75so7377242wme.0; Thu, 11 Jan 2018 10:14:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=UOdvwV2owj7LKJBCFUvzKuKbXklYBo2Tzi7cUB4ECRc=; b=IqXKUsHqGuE4APKsrKWL75XfXOBM2HR0zMqlYPlkiEr6TWFHeobqKTlJjeAG3ccE34 +L8onjMjMPPBXMWMdaxTZm4VZT+Y5l+NtV6h+w7zAg5QRtyx9RWJGQ4/mNHg1ixdm6KG gg7XhbAFl7hxaoY37DYa0Ou0DoIlO8A7fSKPzH2w69SxYjG++QxtNas/BYvHtmfrECWq qQOKJQn8Zxf+RfssAYJdYm6BWpek1Ea0RHYMYbVpZ4sHr0zCG3O0yLW65Skh5H7sR5AJ 8alFYYqovEKgfe+Y3VVoeQz3ksiY99WNHoZaNlOtCt6pHQj0J96DA29X2rajM2Vp5s2i OO3A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=UOdvwV2owj7LKJBCFUvzKuKbXklYBo2Tzi7cUB4ECRc=; b=SfIWI/8q38NpgdJKtLcrC3K0dVC09fOOez7zoAvxri0QDKTQGtyfgwwyzMLu5ThE1d jb+ERAlzmRU6ND9//z6jBNOFIUjpMw7dNJmCzEVjKWvYqlWxXGGaKSBKXnDRxNayYXT2 WbZ/qnqL+VCxJ8ic0mMuMDkNe6hM2LZBOBdrHC3igFv8PFKMKlbI95xZmyYhIr8s76UV erxxj48EximU8nkuBDCTJU/NKSWmE5PQ4Kteq/v5XVXPNM77ELwga+DFQ7VGJfV5SWBu BhSAyd5XStXk+q1FPN8cM+m0i5rdMjDb5fHRlvp9SwFbZS85uWWaaRhn/joaG26iNaji eNJQ==
X-Gm-Message-State: AKGB3mLVKL6wQHc0WI1ee5Wm5GLbgPzq6n5RmD2xCVhkh/a96rJCF8nS LPllMHMDJmMzZJKieNK6dVWiBrw3o+b94TzeT0M=
X-Google-Smtp-Source: ACJfBotIWCTtPLGvxSlzSPCOI090iis+G8Mp0Q1o6YOCSclT1TkiPamjc1ZPveo8AKu14+yM/XDB+0zjSUjjrpzLsgM=
X-Received: by 10.80.155.89 with SMTP id a25mr32109737edj.290.1515694459077; Thu, 11 Jan 2018 10:14:19 -0800 (PST)
MIME-Version: 1.0
Received: by 10.80.164.199 with HTTP; Thu, 11 Jan 2018 10:13:38 -0800 (PST)
In-Reply-To: <CA+wi2hNbhXuXLKPD_0FL2csv1o9d37hF0XFex632z1skXUji+w@mail.gmail.com>
References: <CA+b+ERnOc7V7+OL2wsfZsRsdSpjeSQmQQdH7SX_WLbySaVtxKw@mail.gmail.com> <CA+wi2hNbhXuXLKPD_0FL2csv1o9d37hF0XFex632z1skXUji+w@mail.gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Thu, 11 Jan 2018 10:13:38 -0800
Message-ID: <CA+wi2hPx3ub+9x_32hOT5oZt_n5Bm=TgQwMQruAxhs9hqh0egg@mail.gmail.com>
To: Robert Raszuk <robert@raszuk.net>
Cc: rift@ietf.org, spring@ietf.org, dcrouting@ietf.org
Content-Type: multipart/alternative; boundary="94eb2c1aec78d2475b0562841d15"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dcrouting/mjby979oLrtWYRiUmCxNR1FhTmg>
Subject: Re: [Dcrouting] draft-przygienda-rift-03
X-BeenThere: dcrouting@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Routing in the Data Center: discussions about problems, requirements and potential solutions." <dcrouting.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dcrouting>, <mailto:dcrouting-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dcrouting/>
List-Post: <mailto:dcrouting@ietf.org>
List-Help: <mailto:dcrouting-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dcrouting>, <mailto:dcrouting-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jan 2018 18:14:24 -0000

Having said that there are interesting use cases we talk of "some binding
per node" but that's more of a KV store corner where people think it's very
useful to have the spines pushing stuff down to all nodes or (at cost of
stability) having a node push some stuff up that gets pushed down to all
other nodes. That's more of a "per leaf node" information case and pretty
good unless leafs go crazy doing dynamic re-assignment (but we can dampen
that in spines @ every level if needed) but it's not _per prefix_ which you
seemed to ask about ....

--- tony

On Thu, Jan 11, 2018 at 9:40 AM, Tony Przygienda <tonysietf@gmail.com>
wrote:

> Robert, productive points, thanks for raising them ... I go a bit in depth
>
> 1. I saw no _real_ use-cases for SID in DC so far to be frank (once you
> run RIFT). The only one that comes up regularly is egress engineering and
> that IMO is equivalent to SID=leaf address (which could be a HV address of
> course once you have RIFT all way down to server) so really, what's the
> point to have a SID? It's probably much smarter to use IBGP & so on overlay
> to do this kind of synchronization if needed since labels/SIDs become very
> useful in overlay to distinguish lots stuff there like VPNs/services which
> you'd carry e.g. in MPLSoUDP. In underlay just use the destination v4/v6
> address. Having said that, discussion always to be had if you pay me dinner
> ;--) and I know _how_ we can do SIDs in RIFT since I thought it through but
> again, no _real_ use case so far. And if your only concern is to "shape
> towards a prefix" we have PGP in the draft which doesn't need new silicon
> ;-P And then ultimately, yes, if you really, really want a SID per prefix
> everywhere then you'll carry  SIDs to everywhere since unicast SIDs are
> really just a glorified way to say "I have this non-aggreagable 20 bit IP
> host address" which architecturally is a very interesting proposition in
> terms of scaling (but then again, no account for taste and RFC1925 clause 3
> applies) ...  Your LSDB will be still much smaller, your SPF will be still
> simple on leaf in RIFT but your FIB will blow up and anything changing on a
> leaf shakes all other leafs (unless you start to run pollicies to control
> distribution @ which point in time you start to baby-sit your fabric @ high
> OPEX). One of the reasons to do per-prefix SID would be non-ECMP anycast
> (where SIDs _are_ in fact usefull) but if you read RIFT draft carefully you
> will observe that RIFT can do anycast without need for ECMP, i.e. true
> anycast in a sense and with that having anycast SID serves no real purpose
> in RIFT and is actually generally much harder to do since you need globally
> unique label blocks and so on ...
>
> 2. Horizontal links on CLOSes are not used that way normally all I saw
> since your blocking goes to hell unless you provision some kind of really
> massive parallel links between ToRs _and_ understand your load. We _could_
> build RIFT that way but you give up balancing through the fabric and
> loop-free property in a sense (that's a longish discussion  and scaling
> since now you have prefixes showing up all kind of crazy places instead of
> default). I see enough demand, we get there ...  Otherwise RFC1925 clause
> 10 and 5.
>
> 3. PS1: Yes, lots of things "could" be done and then we "could" build a
> protocol to do that and RFC1925 clause 7 and 8 applies. Such horizontal
> links, unless provisioned correctly will pretty much just ruin your
> blocking/loss on the fabric is the experience (which the math supports). In
> a sense if you know your big flows you can build a specialized topology to
> do the optimal distribution (MPLS tunnels anyone ;-) but the point of
> fabric is that it's a fabric (i.e. load agnostic, cheap, no OPEX and easily
> scalable). Otherwise a good analogy would be that you like to build special
> RAM chips for the type of data structures you are storing and we know how
> well that scales over time. We know now that within 3-4 years
> characteristics of DC flows flip upside down without a sweat when people go
> from server/client to microservices, from servers to containers and so on
> and so on. So if you can't predict your load all the time you need a
> _regular_ topology where _regular_ is more of a mathematical than a
> protocol discussion. Fabric analogy of "buy more RAM chips in Fry's and
> just stick them in" applies here. So RIFT is done largely to serve a
> well-known structure called a "lattice" (with some restrictions) since we
> need an "up" and "down". Things like hypercubes, thoroidal meshes and so on
> and so on exist but CLOS won for a very good reason in history for that
> kind of problems (once you move to NUMA other things win ;-) And if you
> know your loads and your can heft the OPEX and you like to play with
> protocols generally and if you can support the scale in terms of leaf FIB
> sizes, flooding, slower convergence & so on & so on and you run flat IGP on
> some kind of stuff that you build that doesn't even have to be regular in
> any sense. We spent many years solving THAT problem obviously and doing
> something like RIFT to replace normal IGP is of limited interest IMO
> (albeit certain aspects having to do with modern implemenation techniques
> may get us there one day but it's much less of pressing problem than
> solving specialized DC routing well IMO again).
>
> 3. PS2: RIFT cannot build an "unsupported topology" no matter how you
> cable (that's the point of it) or rather we have miscabling detection and
> do not form adjacencies when you read the draft carefully. That's your
> "flash red light" and it comes included for free with my compliments  ;-)
> ... Otherwise RFC1925 clause 10.
>
> Otherwise, if you have concrete charter points you'd like to add, be more
> specific in your asks and we see what the list thinks after ...
>
> thanks
>
> --- tony
>
>
> On Thu, Jan 11, 2018 at 1:30 AM, Robert Raszuk <robert@raszuk.net> wrote:
>
>> Hi,
>>
>> I have one little question/doubt on scalability point of RIFT ...
>>
>> Assume that someone would like to signal IPv6 prefix SID for Segment
>> Routing in the underlay within RIFT.
>>
>> Wouldn't it result in amount of protocol state in full analogy to massive
>> deaggregation - which as of today is designed to be very careful and
>> limited operation only at moments of failure(s) ?
>>
>> I sort of find it a bit surprising that RIFT draft does not provide
>> encoding for SID distribution when it is positioned as an alternative to
>> other protocols (IGPs or BGP) which already provide ability to carry all
>> types of SIDs.
>>
>> Cheers,
>> Robert.
>>
>> PS1: Horizontal links which were discussed could be installed to offload
>> from fabric transit massive amount of data (ex: storage mirroring) directly
>> between leafs or L3 TORs and not to be treated as "backup".
>>
>> PS2: Restricting any protocol to specific topologies seems like pretty
>> slippery slope to me. In any case if protocol does that it should also
>> contain self detection mechanism of "unsupported topology" and flash red
>> light in any NOC.
>>
>>
>>
>> _______________________________________________
>> Dcrouting mailing list
>> Dcrouting@ietf.org
>> https://www.ietf.org/mailman/listinfo/dcrouting
>>
>>
>