[DNSOP] Lameness, registries, and enforcement was Re: [Ext] Lameness terminology (was: Status of draft-ietf-dnsop-terminology-bis)

Edward Lewis <edward.lewis@icann.org> Fri, 04 May 2018 12:37 UTC

Return-Path: <edward.lewis@icann.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5F024126BF7 for <dnsop@ietfa.amsl.com>; Fri, 4 May 2018 05:37:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.201
X-Spam-Level:
X-Spam-Status: No, score=-4.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LbOqbxsIXcD1 for <dnsop@ietfa.amsl.com>; Fri, 4 May 2018 05:37:12 -0700 (PDT)
Received: from out.west.pexch112.icann.org (pfe112-ca-2.pexch112.icann.org [64.78.40.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 128E3120047 for <dnsop@ietf.org>; Fri, 4 May 2018 05:37:10 -0700 (PDT)
Received: from PMBX112-W1-CA-1.pexch112.icann.org (64.78.40.21) by PMBX112-W1-CA-2.pexch112.icann.org (64.78.40.23) with Microsoft SMTP Server (TLS) id 15.0.1178.4; Fri, 4 May 2018 05:37:07 -0700
Received: from PMBX112-W1-CA-1.pexch112.icann.org ([64.78.40.21]) by PMBX112-W1-CA-1.PEXCH112.ICANN.ORG ([64.78.40.21]) with mapi id 15.00.1178.000; Fri, 4 May 2018 05:37:07 -0700
From: Edward Lewis <edward.lewis@icann.org>
To: Mark Andrews <marka@isc.org>, Bill Woodcock <woody@pch.net>
CC: David Huberman <david.huberman@icann.org>, Shane Kerr <shane@time-travellers.org>, "dnsop@ietf.org" <dnsop@ietf.org>
Thread-Topic: Lameness, registries, and enforcement was Re: [DNSOP] [Ext] Lameness terminology (was: Status of draft-ietf-dnsop-terminology-bis)
Thread-Index: AQHT46ScP1eUq+dIxEycUlrVcgU2iw==
Date: Fri, 04 May 2018 12:37:06 +0000
Message-ID: <53696FE1-9E92-4CC8-8A54-5AF5F4251590@icann.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.a.0.180210
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [192.0.47.234]
Content-Type: text/plain; charset="utf-8"
Content-ID: <72414B8F31F52F4DB0C6052A7D826186@pexch112.icann.org>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/Iz4G-b3gRIVSgSaGFjU9bWsERpU>
Subject: [DNSOP] Lameness, registries, and enforcement was Re: [Ext] Lameness terminology (was: Status of draft-ietf-dnsop-terminology-bis)
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 May 2018 12:37:14 -0000

This isn't about terminology but the once-again debate about a registry's responsibility here.

It's simple to state a policy that says:

If an registered NS record does not function properly, the registrant will be notified and the NS record will be removed from the DNS until such time that it functions properly.

Nice, simple, clean.  Sounds like something a responsible registry would do.  But it is on top  of an iceberg of issues.

Issue 1: define "function properly".  That can be done.  Lame, non-responsive, and so on.  But as I said privately to the original poster, the "science" of bad responses is vastly different from the "science" of no response.  (I recall from my experimentation that for some addresses, I could repeat the question over 10 times [some seconds apart], maybe 13, and still get back a "first" response from the address.  I used the id field to tell the queries apart.  To this day, I am astonished by that.)

Issue 2: how is the registrant notified, and what constitutes "success" in notifying the registrant?  Is an email to the NOC contact enough?  A robo-call?  What if the contact information is inaccurate?  This question is needed to tell whether the registry is properly implementing the policy they have.

Issue 3: determining the state of the service.  This is tougher than it seems.  Multiple vantage points, sampling over time, setting a threshold for how many failed responses per time quantum constitute failure, yadda, yadda, yadda.  Keep in mind, the NS record may be part of an anycast cloud and, if the registry is hitting one instance, that one might be affected by a spurious traffic flood.

My concern is the liability for false positives in failure testing.  I've been at the wrong end of such a test, where the registry had failures on their end and pointed the finger at us.  (IPv6 was the subject of the test.)  Even if the customer-impact of that was low, we spent a lot of resources pouring through logs, contacting service providers, tracing the routes, only to find the error was a scripting error by the registry.  I traced that down by meeting the tester -in person- and going over the test results.

Issue 4: If the registry pulls the NS record, the operator can't test their changes until the registry re-tests.  This makes operating the registration harder, the tech doing the work has to either engage the registry tech support "live" (include language barrier) or suspend completing the ticket until the registry gets around to the next test.

Issue 5: Even if the registry pulls the offending NS record, it might still be in the authoritative set, meaning caches will still have it present.  I.e., pulling the NS record at the parent is trumped by the child.  (This assumes some other NS is working, making the authoritative sset visible.)

Philosophically, in DNS, once a delegation is made, it's the child's.  For better or worse, the protocol doesn't equip the registry to "coach" the child well.  Any work done towards that is "fighting entropy".  It can be done, but consumes energy (instead of producing it).