Re: [DNSOP] How Slack didn't turn on DNSSEC - is there an appetite for Clarifications RFC?

Petr Špaček <pspacek@isc.org> Fri, 03 December 2021 09:35 UTC

Return-Path: <pspacek@isc.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF7893A154F for <dnsop@ietfa.amsl.com>; Fri, 3 Dec 2021 01:35:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.951
X-Spam-Level:
X-Spam-Status: No, score=-3.951 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-1.852, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=isc.org header.b=VYIeaswo; dkim=pass (1024-bit key) header.d=isc.org header.b=ajrkL2WT
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id W4uB1hwju6Jn for <dnsop@ietfa.amsl.com>; Fri, 3 Dec 2021 01:35:46 -0800 (PST)
Received: from mx.pao1.isc.org (mx.pao1.isc.org [IPv6:2001:4f8:0:2::2b]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7B8CC3A1550 for <dnsop@ietf.org>; Fri, 3 Dec 2021 01:35:46 -0800 (PST)
Received: from zimbrang.isc.org (zimbrang.isc.org [149.20.1.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx.pao1.isc.org (Postfix) with ESMTPS id 5138F433F2B for <dnsop@ietf.org>; Fri, 3 Dec 2021 09:35:44 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=isc.org; s=ostpay; t=1638524144; bh=twM5grUtMLSf7+ApLl3Tbbm85/RBPD6ewmyzZ0NqB7g=; h=Date:To:References:From:Subject:In-Reply-To; b=VYIeaswo54tMlW0ltxoAAe7wDlMtvRZBr5fcTzQyosXhp7/bkPi3/3CXVH0Guf8P1 EhCECOcsSXc3PQq2SUG0ngAMe3lpp73o9xTdQtERh1gr55KlzMois1jtwIHlyOVUhB odGuAQ7uBLyK0YHYggKXStLLQZQ8lA3iAFoBUWoY=
Received: from zimbrang.isc.org (localhost.localdomain [127.0.0.1]) by zimbrang.isc.org (Postfix) with ESMTPS id 491FCF26F21 for <dnsop@ietf.org>; Fri, 3 Dec 2021 09:35:44 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbrang.isc.org (Postfix) with ESMTP id 2276FF26F22 for <dnsop@ietf.org>; Fri, 3 Dec 2021 09:35:44 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.10.3 zimbrang.isc.org 2276FF26F22
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isc.org; s=05DFB016-56A2-11EB-AEC0-15368D323330; t=1638524144; bh=LAYyJz8jswJcs0D8tv9fBIfcuaFGOdVhKG9TDqL0T80=; h=Message-ID:Date:MIME-Version:To:From; b=ajrkL2WT4Enp5TcrD6evnL+Cs+UaW97Z4546+aK5k7cFk4QJ6aO1J44bb1UoxIMei hATX0fYzmpp/VaGF9aVD6A7xysCwo1CnHg1fmo0fUIl8zEV4Eoj5kp9AyVilJsmaK9 tL+RiRRJi79Frlyqt0EcjwyG4ByQlgOPfAW3aosI=
Received: from zimbrang.isc.org ([127.0.0.1]) by localhost (zimbrang.isc.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id KvtMDM9pqPhV for <dnsop@ietf.org>; Fri, 3 Dec 2021 09:35:44 +0000 (UTC)
Received: from [192.168.0.157] (ip-86-49-254-49.net.upcbroadband.cz [86.49.254.49]) by zimbrang.isc.org (Postfix) with ESMTPSA id B09EFF26F21 for <dnsop@ietf.org>; Fri, 3 Dec 2021 09:35:43 +0000 (UTC)
Message-ID: <4833ba58-71f5-70f1-6f3e-95715fd2cc77@isc.org>
Date: Fri, 03 Dec 2021 10:35:41 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.2
Content-Language: en-US
To: dnsop@ietf.org
References: <20211130183809.04E8230CA390@ary.qy>
From: Petr Špaček <pspacek@isc.org>
In-Reply-To: <20211130183809.04E8230CA390@ary.qy>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/nhni7hIIH6laGmcEj7FRd22gE-o>
Subject: Re: [DNSOP] How Slack didn't turn on DNSSEC - is there an appetite for Clarifications RFC?
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Dec 2021 09:35:52 -0000

On 30. 11. 21 19:38, John Levine wrote:
> This blog post has been making the rounds. Since it is about a
> sequence of DNS operational failures, it seems somewhat relevant here.
> 
> https://slack.engineering/what-happened-during-slacks-dnssec-rollout/
> 
> tl;dr first try was rolled back due to what turned out to be an unrelated failure at some ISP
> 
> second try was rolled back when they found they had a CNAME at a zone
> apex, which they had never noticed until it caused DNSSEC validation
> errors.
> 
> third try was rolled back when they found random-looking failures that
> they eventually tracked down to bugs in Amazon's Route 53 DNS server.
> They had a wildcard with A but not AAAA records. When someone did an
> AAAA query, the response was wrong and said there were no records at
> all, not just no AAAA records. This caused failures at 8.8.8.8 clients
> since Google does aggressive NSEC, not at 1.1.1.1 because Cloudflare
> doesn't.
> 
> They also got some bad advice, e.g., yes the .COM zone adds and
> deletes records very quickly, but that doesn't mean you can unpublish
> a DS and just turn off DNSSEC because its TTL is a day. Their tooling
> somehow didn't let them republish the DNSKEY at the zone apex that
> matched the DS, only a new one that didn't.
> 
> It is clear from the blog post that this is a fairly sophisticated
> group of ops people, who had a reasonable test plan, a bunch of test
> points set up in dnsviz and so forth.  Neither of these bugs seem
> very exotic, and could have been caught by routine tests.
> 
> Can or should we offer advice on how to do this better, sort of like
> RFC 8901 but one level of DNS expertise down?

During DNS OARC 36, there were talks about an opportunity to write 
"Clarifications" RFC.

Personally I think the current RFC are clear how NSEC(3) bitmaps should 
look like, but I can also see that making sense various types of "NSEC 
lies", "type shotguns", etc. and their consequences requires way more 
research then reading RFC 4034.

Are there people on this list interested working on this?

I can contribute ideas and some proto-text, but as you can see for 
yourself someone needs to translate my texts to proper English so I 
should not be a document editor.

-- 
Petr Špaček