[DNSOP] review of draft-ietf-dnsop-no-response-issue-05

Matthew Pounsett <matt@conundrum.com> Mon, 26 September 2016 22:12 UTC

Return-Path: <matt@conundrum.com>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 30F5612B01C for <dnsop@ietfa.amsl.com>; Mon, 26 Sep 2016 15:12:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.618
X-Spam-Level:
X-Spam-Status: No, score=-1.618 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_FONT_FACE_BAD=0.981, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=conundrum-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HNizeOvE38EF for <dnsop@ietfa.amsl.com>; Mon, 26 Sep 2016 15:12:37 -0700 (PDT)
Received: from mail-qk0-x231.google.com (mail-qk0-x231.google.com [IPv6:2607:f8b0:400d:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4EEC112B35D for <dnsop@ietf.org>; Mon, 26 Sep 2016 15:12:36 -0700 (PDT)
Received: by mail-qk0-x231.google.com with SMTP id n185so185260462qke.1 for <dnsop@ietf.org>; Mon, 26 Sep 2016 15:12:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=conundrum-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=lY7ezW4ocxgHL0IHG08y/v5LO+CNGAro4rETCPDDkYQ=; b=J6wXmhuTkQt0+zt3NH07arcG4jubxzR99Vkf/XB3gUp8L/vNjXDY+zwTLkTkGbTb4t FMevyIK7nTDO6KdKtCXJgfTq4a1zKf9Iacu/lAOxRIIVZuMaGAv2bB9MwDCalpmXJCzu 3LLZLuFUG8jv3B7r4d+UohoYcEXEqA+6bo5cfdma3miBoOCNnzaqsXr3/onTH43HbLv1 cJcTtlBiGBYHZh+zQL8P2QZCvdGMatjhmGZ9HqIw+C6OCtDbgeJgH309YTlAjqulVGL6 7OZzv4jGbOguMmMDzkdF3d81Tes/KlPML5yIrF3PgFRAWWFoRbTiTEOxS1T0FsI7pNVi oFQg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=lY7ezW4ocxgHL0IHG08y/v5LO+CNGAro4rETCPDDkYQ=; b=DaUrZ/rDL7qH7lnyFRozfvbf4Jo8+Mt7VaBOk+zd5adjhFWBO5ZtMBu7l+VvdKFCz4 /+KDrnmInxM9DrMBvzt1+9mLp+eFB9GlSeeDnPaOLV4Sn+GoMoGBTHIaNqIux6N1+Xn4 ZNoh2qQJBFlVtLh9Xbpf7UWpppb25POMe+QBnkwyN5pcFntO0ccyIQck/CYmwoB50oRL bxuRVfT5gn02vAzSWOdLh41pnCd7LOtTovS6r0Nq4cpXFRwHXWG4CmtAwmmLGzEHUMJ5 ryXI3jb8XTqd+ra1kZ57bNheiTBk3fhMjBhF+QUmhX4FCPZ8Cntf63J2PLtNEzzQUgs9 xNaw==
X-Gm-Message-State: AA6/9RkPN3I8M8wCtrQ0YCcf+rfiXNGoWfdjWAzx2b73t/N3uU7erqdR6qrdC6rJTRmCgFhTtLqo0C6B+CU4uQ==
X-Received: by 10.55.48.10 with SMTP id w10mr24365341qkw.43.1474927955711; Mon, 26 Sep 2016 15:12:35 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.237.35.197 with HTTP; Mon, 26 Sep 2016 15:12:35 -0700 (PDT)
X-Originating-IP: [69.64.144.72]
From: Matthew Pounsett <matt@conundrum.com>
Date: Mon, 26 Sep 2016 18:12:35 -0400
Message-ID: <CAAiTEH9Rbw4u3gV9GULQ-8WdoPHf3SXQMTRY+CtfUGrNQSGAWw@mail.gmail.com>
To: dnsop <dnsop@ietf.org>
Content-Type: multipart/alternative; boundary=001a1146fefadf067e053d706c8c
Archived-At: <https://mailarchive.ietf.org/arch/msg/dnsop/z5OqfuJIgwssxsqCqDOFnazIgME>
Subject: [DNSOP] review of draft-ietf-dnsop-no-response-issue-05
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Sep 2016 22:12:40 -0000

My first impression of this document is that it is still in need of some
extreme editing – mostly for grammar and syntax, but also for clarity and
readability.  I've included many of the early problems I found in a list of
nits at the end of this email, but at two and three errors per paragraph
(and occasionally per sentence) it's just too much to cover this way, so I
stopped noting nits after the first couple of sections.  I suggest such
sweeping changes to later sections that any nits I noted there may end up
being rewritten anyway.

The draft still suffers from inconsistent approaches to the content.  In
any given section it might shift from a problem description, to
admonishment, to advice for workarounds, and back again without a clear or
consistent structure.  This makes it difficult to figure out what the
intended takeaway is in places.  I've tried to point out specifics below,
and suggest better approaches where I can.

Don't let my criticism of the document give you the wrong impression.  I
think this is useful work, and I'm very appreciative of the work that has
gone into it so far.  I just happen to think that there is still work left
to be done.


As for the meat of the matter...


In the Introduction, in the paragraph which begins "Unless a nameserver is
under attack" would it make sense to replace "broken delegations" with
"lame delegations"?   We have a term for delegating to a name server which
is not authoritative for the delegation... we should probably use it.


Section 2, "Consequences," seems out of place to me.  It makes a number of
assertions that seem to follow no particular flow, or pattern.  The line of
reasoning or evidence that lead to these assertions has not appeared yet in
the document, and so they seem unsupported.  If I'm interpreting the
author's intent correctly, perhaps this section should be split up and
interspersed into the current section 3 as the direct or indirect
consequences of each type of dropped response.


The way section 3 is titled and introduced I expect to just see a survey of
commonly dropped queries, but the content seems to be inconsistent as to
whether it's just a survey, or a survey and advice to client authors about
working around problems, or just advice without a clear description of the
problem behaviour.  It seems to me that if the section is intended to be
advice to authors, then it should be introduced that way.  Also,
clarification for server authors on how they should respond–instead of
dropping queries–should probably come before any advice on how to work
around dropped queries.

If the goal here is to help implementors fix what's broken, then I would
suggest each subsection start with a clear description of an observed
broken behaviour, followed up with a description of how that behaviour
should change (possibly only as a short summary with external references,
as we don't want to reproduce whole RFCs here) and then follow that with
advice for client authors for working around existing brokenness.  Sticking
to a pattern–either the one I've suggested or some other–for every query
type would make the entire section easier to read.  It will also make it
easier for the working group to go through the section item by item and
discuss whether we agree on the problem descriptions and proposed
remediation.

RFC 2119 language is conspicuously absent from this section and SHOULD be
added.


Section 4, "Remediating", is very problematic.  I think the biggest issues
stem from two false assertions:
1) "TLD operators are being asked to do this as they...have access to a
large numbers of nameserver names as well as contact details for the
registrants of those nameservers."

This is incorrect.  Or, at least, frequently incorrect.  TLD registries
fall into one of two categories: thick or thin.   Thick registries do have
this information, but a significant number of thin registries exist, which
delegate the responsibility for maintaining registrant contact information
to the registrar.  All thin registries know is that a delegation should
exist, which name servers are currently supposed to be authoritative for
the subzone, and which registrar asked them to create the delegation.

This false assertion leads the author down the path of making the TLD
registry responsible for contacting registrants.  The third paragraph
backpedals a bit on the question of who should be doing what, but then the
rest of the section goes on to talk about the registry anyway.

A lot of text could be removed from the first few paragraphs by removing
the need to soften the initial statement that this responsibility lies with
the TLDs.

2) The second false assertion is the implied assertion that RFC 1033 says
anything at all about a zone operator's responsibility to remove
delegations that lead to badly behaving name servers.

To provide context ... the title of RFC 1033 is "DOMAIN ADMINISTRATORS
OPERATIONS GUIDE," and the first sentence of the referenced COMPLAINTS
section is, "these are the suggested steps you should take if you are
having problems that you believe are caused by someone else's name server."
  The specifically referenced step is to "ask the parent authorities to
excommunicate the domain."  None of this is normative language directing a
parent operator to do anything.

RFC 1033 appears to be nothing more than a "how to" guide for new DNS
operators of the late '80s.  Its main goal seems to be to give a
prospective operator an overview of the DNS, and instructions for how to
get a DNS server up and running for the first time.  We shouldn't be
looking to it for standards guidance any more than we should look to an
early (or any) edition of DNS & BIND.

Even if the normative language implied in this draft existed in an RFC
somewhere, it is legally impossible for most TLDs to follow the directions
this draft gives.  Any registry that falls under the "gTLD" category is
contractually prohibited from directly contacting its registrants, and none
of the registries or registrars for gTLDs would have a legal leg to stand
on if they tried to remove a delegation due to a misbehaving name server.
Even if the name server belonged to the registrant of the delegation this
would be problematic, but in many cases the registrant and the DNS operator
are not the same.  Attempting to remove a registrant's delegation because
of the actions of a third party would have any gTLD's legal department up
in arms in an instant.

Removing the above two assertions obviates a lot of the current content of
section 4, and leaves only descriptions of testing that MAY be implemented,
but is already discussed in more detail in section 9.

If this section remains, removal of any suggestion of imperative from it is
essential to this document being viable.


Section 5, "Firewalls," seems to be more of section 3 and should be merged
there.  If the author wanted to separate the misbehaviour of name servers
from the misbehaviour of middle boxes that would be reasonable, but then
this should at least appear immediately after section 3, without some other
subject sitting between them.


Same for section 6.  This just seems to be a special case of the firewall
problem.


And again for section 7.  This definitely belongs in section 3 as just
another case of a misbehaving name server.


The third paragraph of section 8 fails to address recursive servers
handling unknown types.  This may just be an oversight if the intent was to
limit the paragraph to authoritative servers.  If that's the case, the
first sentence of the paragraph should be rephrased.  It's important to set
the reader's expectations for scope as early in the subject as possible.

Section 8 as a whole probably belongs closer in the document to section 3,
as it is entirely advice for implementors.  It could probably be split up
into section 3 in much the same way I suggested with section 2.




Nits:

– Introduction:

"Failure to respond to a query is indistinguishable from a packet loss."
This seems awkward to me.  I'd change it to either "from packet loss" or
"from a lost packet."

In the following sentence, "a analysis of query response patterns will results"
contains two errors.  It should be "an analysis of query response patterns
will result..."

"Servers which fail to respond to queries to remain results in developers
being hesitant to deploy new standards."   I'm not sure what "to remain" is
doing in there.   Should this be: "Servers which fail to respond to queries
result in developers being hesitant to deploy new standards."  ?

– Consequences

"Lack of following the relevant RFCs has lead to various consequences.
Some as a direct result and some from recursive servers try to work around
the non compliance."  This is a sentence fragment; replace that period with
a comma.

"Fixing known issues know"   Fixing known issues now?

"Wide spread non response to EDNS queries has lead to recursive servers
having to assume EDNS may not supported and fallback to plain DNS is
required."  I think that "is" is a typo of "as".  This also reads like
fallback is another thing they must assume.  May I suggest: "Widespread
non-response to EDNS queries leads recursive servers to assume EDNS may not
supported, and fallback to plain DNS as a result."

"3. Common queries kinds that result in non responses" ... "Common kinds of
queries that result in non response"?