Re: [arch-d] Centralization or diversity

Toerless Eckert <tte@cs.fau.de> Tue, 07 January 2020 16:53 UTC

Return-Path: <eckert@i4.informatik.uni-erlangen.de>
X-Original-To: architecture-discuss@ietfa.amsl.com
Delivered-To: architecture-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B723B12011B for <architecture-discuss@ietfa.amsl.com>; Tue, 7 Jan 2020 08:53:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.87
X-Spam-Level:
X-Spam-Status: No, score=-0.87 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BIivd8Qi77Ug for <architecture-discuss@ietfa.amsl.com>; Tue, 7 Jan 2020 08:53:08 -0800 (PST)
Received: from faui40.informatik.uni-erlangen.de (faui40.informatik.uni-erlangen.de [IPv6:2001:638:a000:4134::ffff:40]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A5006120052 for <architecture-discuss@ietf.org>; Tue, 7 Jan 2020 08:53:08 -0800 (PST)
Received: from faui48f.informatik.uni-erlangen.de (faui48f.informatik.uni-erlangen.de [131.188.34.52]) by faui40.informatik.uni-erlangen.de (Postfix) with ESMTP id 0745554802F; Tue, 7 Jan 2020 17:53:02 +0100 (CET)
Received: by faui48f.informatik.uni-erlangen.de (Postfix, from userid 10463) id 00608440059; Tue, 7 Jan 2020 17:53:01 +0100 (CET)
Date: Tue, 7 Jan 2020 17:53:01 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>
Cc: Andrew Campling <andrew.campling@419.consulting>, "architecture-discuss@ietf.org" <architecture-discuss@ietf.org>
Message-ID: <20200107165301.GP8801@faui48f.informatik.uni-erlangen.de>
References: <LO2P265MB0573A1353911BFDD554DE5C8C2760@LO2P265MB0573.GBRP265.PROD.OUTLOOK.COM> <CAKKJt-dtX4kceJqY4kaB-vg4rs0uEn01SyyzR4m+UAO_0bZ=7g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAKKJt-dtX4kceJqY4kaB-vg4rs0uEn01SyyzR4m+UAO_0bZ=7g@mail.gmail.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/architecture-discuss/rUElxN5EwTYaiNEAc0jj_DpKLlw>
Subject: Re: [arch-d] Centralization or diversity
X-BeenThere: architecture-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: open discussion forum for long/wide-range architectural issues <architecture-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/architecture-discuss>, <mailto:architecture-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/architecture-discuss/>
List-Post: <mailto:architecture-discuss@ietf.org>
List-Help: <mailto:architecture-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/architecture-discuss>, <mailto:architecture-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jan 2020 16:53:11 -0000

On Tue, Jan 07, 2020 at 10:22:56AM -0600, Spencer Dawkins at IETF wrote:
> I see that this topic is fragmenting into different threads, which is fine,
> so I'm not sure where I should insert this comment, but when we're talking
> about "centralized" and "decentralized", it's worth noting that
> "decentralized" fault tolerance can trip over implementation errors that
> are common across many devices owned or operated by a centralized entity.
> One of the largest SS7 outages in the US happened in 1990, when a one-line
> error that was present in a large number of switches caused >50 percent
> call failures across the entire ATT network.

Do i hear a slight criticism/concern about most networks current SDN ==
centralized intelligence & failure dependency ? Or is it just me ?

> So, implementation diversity matters, not just decentralization.

I think i had mentioned this already in the very beginning of the
discussion.

Btw: The historic often cited cause for as much as half-country wide
Internet service outages in Germany was broken Radius/Accounting servers.
Of course, when there was volume accounting you could not make money
when those servers where broken, but even when the accounting models
changed did these issues/dependencies persist if i remember correctly.

In other words: design-wise you need to make sure you have distributed
resilience up to lets say "cash-registers" that can still ensure service
that still work under failure are also still accounted and charged for.

Aka: distributed "routing" and other lower layer resilience
is not sufficient in a commercial environment. And i can not remmber
that DoD/Arpanet ever tried to fund research into these commercial
issues. 

> There's a nice write-up at
> http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/att_collapse.html, that
> matches my recollection of reports at the time.
> 
> For me, the money quote was
> 
> When the destination switch received the second of the two closely timed
> messages while it was still busy with the first (buffer not empty, line 7),
> the program should have dropped out of the if clause (line 7), processed
> the incoming message, and set up the pointers to the database (line 11).
> Instead, because of the break statement in the else clause (line 10), the
> program dropped out of the case statement entirely and began doing optional
> parameter work which overwrote the data (line 13). Error correction
> software detected the overwrite and shut the switch down while it could
> reset. Because every switch contained the same software, the resets
> cascaded down the network, incapacitating the system.

Yes. I think the low-level analysis of newer incidents of this type can
not be that easily explained anymore. A few years back, there was a big 
breakdown in google from too many conflicting automation systems if i remember
correctly. Try to explain in detail how that worked...

Cheers
    Toerless

> Best,
> 
> Spencer
> 
> On Wed, Nov 13, 2019 at 12:57 AM Andrew Campling
> <andrew.campling@419.consulting> wrote:
> 
> > "Martin Thomson" <*mt@lowentropy.net* <mt@lowentropy.net>> wrote on Tue,
> > 05 November 2019 22:58:
> >
> > The draft specifically calls out the notion of a single point of failure
> > being a problem.  But my experience with centralized services is that they
> > aren't centralized in the fault tolerance sense.  If I look at the big
> > services, that scale is only achieved with careful distributed systems
> > design.  Name any modern service of even modest scale and you generally
> > find excellent fault tolerance.
> >
> > I thought that the document made it quite clear that it wasn???t
> > specifically referring to a single point of failure in a technical, fault
> > tolerance sense.  In fact it made this clear by, for example, also
> > highlighting issues such as ???administrative or governance system can become
> > weak through too much power or imagined power concentratred in one place???.
> >
> > Finally, I don't like the emphasis on DNS in this document.  It only
> > serves to sensationalize.
> >
> > I thought that the reference to DNS was particularly helpful given one of
> > the potential side-effects of the push behind DoH could be to centralise
> > what is currently a highly decentralised system.  I agree with the comment
> > in section 4 that ???where such centralised points are created, they will
> > eventually fail, or they will be misused through surveillance or legal
> > actions regardless of the best efforts of the Internet community.  The best
> > defense to data leak is to avoid creating that data store to begin with???.
> >
> > In addition, noting the references to RFC 1958 and RFC3935, I believe that
> > it would be prudent for RFC8484 to be reviewed accordingly.
> >
> >
> > *Andrew*
> >
> >
> > _______________________________________________
> > Architecture-discuss mailing list
> > Architecture-discuss@ietf.org
> > https://www.ietf.org/mailman/listinfo/architecture-discuss
> >

> _______________________________________________
> Architecture-discuss mailing list
> Architecture-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/architecture-discuss


-- 
---
tte@cs.fau.de