Re: [arch-d] Centralization or diversity

Spencer Dawkins at IETF <> Tue, 07 January 2020 16:23 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 1F5841200F3 for <>; Tue, 7 Jan 2020 08:23:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id F4bAIVMdi28U for <>; Tue, 7 Jan 2020 08:23:26 -0800 (PST)
Received: from ( [IPv6:2a00:1450:4864:20::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 84C611200E0 for <>; Tue, 7 Jan 2020 08:23:25 -0800 (PST)
Received: by with SMTP id m26so143347ljc.13 for <>; Tue, 07 Jan 2020 08:23:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aiqbK6di65Og3ZpZ8prF73kPkkKx9vfgIp5Kuh7Mmz8=; b=SvYWTliAFY8uL/qR8TXSPzyp6+bhI/u6+19L+XeTrkUvZ32j8CUokPy81pSYTz5Sk9 MEjpnKK5Ib7/XOfITEPNg74PQRqupE0tHwyVVdhIFiOhcXMWSU0t4cHXiFcydilboemd ImKNgl1mTQJrfmcwCDOlpsrCHhAnojUo2QHd/DalQ/G/f0WsYTsqwHtbBDtPh3vrSY/m 7uDBZDabaMsHzGiVW8kBuZ+q+e/zO6syIB835/aPBLGHINuZwkYCgL1UGoHIb1T43BxS +oQ+JdAlFpX6eVXG8EbkGeFho3Wt5cWZXUGpcPRB76TXEkOeUS7ALyCq9DxGY2T+OjV7 n0xg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aiqbK6di65Og3ZpZ8prF73kPkkKx9vfgIp5Kuh7Mmz8=; b=BXPK3kVdpxJPY5TRaHEYmqv5TUFXumCYlxGawzrxlwR6rTsa0XAQN0+pdM1zqjuVml grDoAoND6A61W6VdYQpq7/dya/05uD3E+cXBo9D2pFlEeILxthvX2AB43wYq/UdSvYmQ r3uB6I3B/s6rk8LDndhPZ/bDVaEyjmWMpUPt0Jmc4avEsOnYshjuJnFAhyM3cyqWJN2i LeqQgAxEhel8sUdlMB2yH9yyaRHErim11QrQWXQwxgaACzvhP769/rVCKewSl+WRdV9f 7gXc7VjifHoCjCHb2vFlFNB1ZBGBcEJ+EBTLPlBYQ6Q7x22GO1FTq+i3vVm9nqDfd8rw eQxQ==
X-Gm-Message-State: APjAAAX2HLjlIOMOzchRVo9W5YvfaTXRdx8hAzgGEzpkOavfX6eHtDml VJiQxkdCAsF6KWQ7as18qAMIWjy0vx35QlhBHgLOHc8w
X-Google-Smtp-Source: APXvYqx7e9mR8sfhyuRnIwImFHH4G8j9lwnDI7F1NFGaSDstC9hnmZveBetvkijquI8bhD34PtuFpF+zMt7A8+i3iOQ=
X-Received: by 2002:a2e:7005:: with SMTP id l5mr187756ljc.230.1578414203679; Tue, 07 Jan 2020 08:23:23 -0800 (PST)
MIME-Version: 1.0
References: <LO2P265MB0573A1353911BFDD554DE5C8C2760@LO2P265MB0573.GBRP265.PROD.OUTLOOK.COM>
In-Reply-To: <LO2P265MB0573A1353911BFDD554DE5C8C2760@LO2P265MB0573.GBRP265.PROD.OUTLOOK.COM>
From: Spencer Dawkins at IETF <>
Date: Tue, 7 Jan 2020 10:22:56 -0600
Message-ID: <>
To: Andrew Campling <>
Cc: "" <>
Content-Type: multipart/alternative; boundary="000000000000eb39d0059b8f314a"
Archived-At: <>
Subject: Re: [arch-d] Centralization or diversity
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: open discussion forum for long/wide-range architectural issues <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 07 Jan 2020 16:23:31 -0000

I see that this topic is fragmenting into different threads, which is fine,
so I'm not sure where I should insert this comment, but when we're talking
about "centralized" and "decentralized", it's worth noting that
"decentralized" fault tolerance can trip over implementation errors that
are common across many devices owned or operated by a centralized entity.
One of the largest SS7 outages in the US happened in 1990, when a one-line
error that was present in a large number of switches caused >50 percent
call failures across the entire ATT network.

So, implementation diversity matters, not just decentralization.

There's a nice write-up at, that
matches my recollection of reports at the time.

For me, the money quote was

When the destination switch received the second of the two closely timed
messages while it was still busy with the first (buffer not empty, line 7),
the program should have dropped out of the if clause (line 7), processed
the incoming message, and set up the pointers to the database (line 11).
Instead, because of the break statement in the else clause (line 10), the
program dropped out of the case statement entirely and began doing optional
parameter work which overwrote the data (line 13). Error correction
software detected the overwrite and shut the switch down while it could
reset. Because every switch contained the same software, the resets
cascaded down the network, incapacitating the system.



On Wed, Nov 13, 2019 at 12:57 AM Andrew Campling
<> wrote:

> "Martin Thomson" <** <>> wrote on Tue,
> 05 November 2019 22:58:
> The draft specifically calls out the notion of a single point of failure
> being a problem.  But my experience with centralized services is that they
> aren't centralized in the fault tolerance sense.  If I look at the big
> services, that scale is only achieved with careful distributed systems
> design.  Name any modern service of even modest scale and you generally
> find excellent fault tolerance.
> I thought that the document made it quite clear that it wasn’t
> specifically referring to a single point of failure in a technical, fault
> tolerance sense.  In fact it made this clear by, for example, also
> highlighting issues such as “administrative or governance system can become
> weak through too much power or imagined power concentratred in one place”.
> Finally, I don't like the emphasis on DNS in this document.  It only
> serves to sensationalize.
> I thought that the reference to DNS was particularly helpful given one of
> the potential side-effects of the push behind DoH could be to centralise
> what is currently a highly decentralised system.  I agree with the comment
> in section 4 that “where such centralised points are created, they will
> eventually fail, or they will be misused through surveillance or legal
> actions regardless of the best efforts of the Internet community.  The best
> defense to data leak is to avoid creating that data store to begin with”.
> In addition, noting the references to RFC 1958 and RFC3935, I believe that
> it would be prudent for RFC8484 to be reviewed accordingly.
> *Andrew*
> _______________________________________________
> Architecture-discuss mailing list