Re: [Dcrouting] draft-przygienda-rift-03

Tony Przygienda <tonysietf@gmail.com> Thu, 11 January 2018 17:40 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: dcrouting@ietfa.amsl.com
Delivered-To: dcrouting@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3117A12D87A; Thu, 11 Jan 2018 09:40:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Yy0zFXMceXHE; Thu, 11 Jan 2018 09:40:47 -0800 (PST)
Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D54EB12AF6E; Thu, 11 Jan 2018 09:40:46 -0800 (PST)
Received: by mail-wm0-x22e.google.com with SMTP id g75so7192334wme.0; Thu, 11 Jan 2018 09:40:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=28LzpsAOotuG0kTesPyhckSUU7GjJaKZZTZr25OZ7Tg=; b=Kw5diFxbvFji8IgBYw826CWHCjtbP96aYs8CbS1/JyvEmCwfmDFf9BQemp5mcS107L hr4N2V8/IKTvPT1QoudwpMDGdVWEIVaav9bwSDKpeb0mXhDPGBU6fXeFdgbfAGd+i4QD EfWutEW92RR0o9iXBqaqc0lW8MyfVP4BMinovVMXWJv6K2hiaemFz/OZAwcZFaOMAqRW +By8fDok4BfHJ90XKp3N4ymo3aaI9CU/rqi9Y+XvIvDXcFy/smNNInn+f+/LI4OiIGfx BcRNRsDGat45w0VQK1TrQLdrh19YUdmpM+Xclz/j8Q7f91Y8xyKjIrMz5elvORs9qWOZ 7Sgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=28LzpsAOotuG0kTesPyhckSUU7GjJaKZZTZr25OZ7Tg=; b=fNe7RfolSDlRAAMWmZBiBg2WklZ1J7DRnqzI7rQpjlwdYIaOvKnM3lt+Qv/SBsMYuJ OlfdjlUoT/fahGK/s9D8pF/kgx2CdXiMUEHTwJG43BEuBbrZiYYiVCpR+KAOiVF5QkR+ 14IgSpQSMaiXPNGXw/esj0WyB1CYNZjmDGwRWhW6Wn4fi/EmTcaOPjlEkqeG5f6Gael2 jsOZtLurYqBSqTVaZBahMTdxAUn22dXvrXlMzT7b6DpPfQNLh2E12CrRy0yANRKLJYPI Vn5+5WNOzLgRsygly59ntTxLytRgAOPWlLbtnGOnAqKimyLizeB8xJpZPLbd+2t3clMq +JNg==
X-Gm-Message-State: AKwxytet5jgPEkXfMU4oX1ivB70D5TIStn+7L8BCkwPYddQGCJLsoBny lP1cuZF6GohgkUW+Y39mMsJLttDolmILwJiFYcJ1JxLg
X-Google-Smtp-Source: ACJfBovE+dv/dnxi+8dmmt3c/ip9MbCW3y43Z8k3L5C2aLetXu8D6giGDE0CiOCwP5PW8PuvFn5TXMVgvoTQ6fOcVL8=
X-Received: by 10.80.153.139 with SMTP id m11mr13471418edb.145.1515692445175; Thu, 11 Jan 2018 09:40:45 -0800 (PST)
MIME-Version: 1.0
Received: by 10.80.164.199 with HTTP; Thu, 11 Jan 2018 09:40:04 -0800 (PST)
In-Reply-To: <CA+b+ERnOc7V7+OL2wsfZsRsdSpjeSQmQQdH7SX_WLbySaVtxKw@mail.gmail.com>
References: <CA+b+ERnOc7V7+OL2wsfZsRsdSpjeSQmQQdH7SX_WLbySaVtxKw@mail.gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Thu, 11 Jan 2018 09:40:04 -0800
Message-ID: <CA+wi2hNbhXuXLKPD_0FL2csv1o9d37hF0XFex632z1skXUji+w@mail.gmail.com>
To: Robert Raszuk <robert@raszuk.net>
Cc: rift@ietf.org, spring@ietf.org, dcrouting@ietf.org
Content-Type: multipart/alternative; boundary="94eb2c0ec4fac89593056283a558"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dcrouting/70GWB8ElrXRVB_c-g7auty-kASk>
Subject: Re: [Dcrouting] draft-przygienda-rift-03
X-BeenThere: dcrouting@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Routing in the Data Center: discussions about problems, requirements and potential solutions." <dcrouting.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dcrouting>, <mailto:dcrouting-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dcrouting/>
List-Post: <mailto:dcrouting@ietf.org>
List-Help: <mailto:dcrouting-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dcrouting>, <mailto:dcrouting-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jan 2018 17:40:50 -0000

Robert, productive points, thanks for raising them ... I go a bit in depth

1. I saw no _real_ use-cases for SID in DC so far to be frank (once you run
RIFT). The only one that comes up regularly is egress engineering and that
IMO is equivalent to SID=leaf address (which could be a HV address of
course once you have RIFT all way down to server) so really, what's the
point to have a SID? It's probably much smarter to use IBGP & so on overlay
to do this kind of synchronization if needed since labels/SIDs become very
useful in overlay to distinguish lots stuff there like VPNs/services which
you'd carry e.g. in MPLSoUDP. In underlay just use the destination v4/v6
address. Having said that, discussion always to be had if you pay me dinner
;--) and I know _how_ we can do SIDs in RIFT since I thought it through but
again, no _real_ use case so far. And if your only concern is to "shape
towards a prefix" we have PGP in the draft which doesn't need new silicon
;-P And then ultimately, yes, if you really, really want a SID per prefix
everywhere then you'll carry  SIDs to everywhere since unicast SIDs are
really just a glorified way to say "I have this non-aggreagable 20 bit IP
host address" which architecturally is a very interesting proposition in
terms of scaling (but then again, no account for taste and RFC1925 clause 3
applies) ...  Your LSDB will be still much smaller, your SPF will be still
simple on leaf in RIFT but your FIB will blow up and anything changing on a
leaf shakes all other leafs (unless you start to run pollicies to control
distribution @ which point in time you start to baby-sit your fabric @ high
OPEX). One of the reasons to do per-prefix SID would be non-ECMP anycast
(where SIDs _are_ in fact usefull) but if you read RIFT draft carefully you
will observe that RIFT can do anycast without need for ECMP, i.e. true
anycast in a sense and with that having anycast SID serves no real purpose
in RIFT and is actually generally much harder to do since you need globally
unique label blocks and so on ...

2. Horizontal links on CLOSes are not used that way normally all I saw
since your blocking goes to hell unless you provision some kind of really
massive parallel links between ToRs _and_ understand your load. We _could_
build RIFT that way but you give up balancing through the fabric and
loop-free property in a sense (that's a longish discussion  and scaling
since now you have prefixes showing up all kind of crazy places instead of
default). I see enough demand, we get there ...  Otherwise RFC1925 clause
10 and 5.

3. PS1: Yes, lots of things "could" be done and then we "could" build a
protocol to do that and RFC1925 clause 7 and 8 applies. Such horizontal
links, unless provisioned correctly will pretty much just ruin your
blocking/loss on the fabric is the experience (which the math supports). In
a sense if you know your big flows you can build a specialized topology to
do the optimal distribution (MPLS tunnels anyone ;-) but the point of
fabric is that it's a fabric (i.e. load agnostic, cheap, no OPEX and easily
scalable). Otherwise a good analogy would be that you like to build special
RAM chips for the type of data structures you are storing and we know how
well that scales over time. We know now that within 3-4 years
characteristics of DC flows flip upside down without a sweat when people go
from server/client to microservices, from servers to containers and so on
and so on. So if you can't predict your load all the time you need a
_regular_ topology where _regular_ is more of a mathematical than a
protocol discussion. Fabric analogy of "buy more RAM chips in Fry's and
just stick them in" applies here. So RIFT is done largely to serve a
well-known structure called a "lattice" (with some restrictions) since we
need an "up" and "down". Things like hypercubes, thoroidal meshes and so on
and so on exist but CLOS won for a very good reason in history for that
kind of problems (once you move to NUMA other things win ;-) And if you
know your loads and your can heft the OPEX and you like to play with
protocols generally and if you can support the scale in terms of leaf FIB
sizes, flooding, slower convergence & so on & so on and you run flat IGP on
some kind of stuff that you build that doesn't even have to be regular in
any sense. We spent many years solving THAT problem obviously and doing
something like RIFT to replace normal IGP is of limited interest IMO
(albeit certain aspects having to do with modern implemenation techniques
may get us there one day but it's much less of pressing problem than
solving specialized DC routing well IMO again).

3. PS2: RIFT cannot build an "unsupported topology" no matter how you cable
(that's the point of it) or rather we have miscabling detection and do not
form adjacencies when you read the draft carefully. That's your "flash red
light" and it comes included for free with my compliments  ;-) ...
Otherwise RFC1925 clause 10.

Otherwise, if you have concrete charter points you'd like to add, be more
specific in your asks and we see what the list thinks after ...

thanks

--- tony


On Thu, Jan 11, 2018 at 1:30 AM, Robert Raszuk <robert@raszuk.net> wrote:

> Hi,
>
> I have one little question/doubt on scalability point of RIFT ...
>
> Assume that someone would like to signal IPv6 prefix SID for Segment
> Routing in the underlay within RIFT.
>
> Wouldn't it result in amount of protocol state in full analogy to massive
> deaggregation - which as of today is designed to be very careful and
> limited operation only at moments of failure(s) ?
>
> I sort of find it a bit surprising that RIFT draft does not provide
> encoding for SID distribution when it is positioned as an alternative to
> other protocols (IGPs or BGP) which already provide ability to carry all
> types of SIDs.
>
> Cheers,
> Robert.
>
> PS1: Horizontal links which were discussed could be installed to offload
> from fabric transit massive amount of data (ex: storage mirroring) directly
> between leafs or L3 TORs and not to be treated as "backup".
>
> PS2: Restricting any protocol to specific topologies seems like pretty
> slippery slope to me. In any case if protocol does that it should also
> contain self detection mechanism of "unsupported topology" and flash red
> light in any NOC.
>
>
>
> _______________________________________________
> Dcrouting mailing list
> Dcrouting@ietf.org
> https://www.ietf.org/mailman/listinfo/dcrouting
>
>