[secdir] Review of draft-vixie-dns-rpz-04

Paul Wouters <paul@nohats.ca> Mon, 01 July 2019 03:21 UTC

Date: Sun, 30 Jun 2019 23:21:20 -0400
From: Paul Wouters <paul@nohats.ca>
To: draft-vixie-dnsop-dns-rpz@ietf.org, ise-chairs@ietf.org, secdir <secdir@ietf.org>
Message-ID: <alpine.LRH.2.21.1906302319040.11225@bofh.nohats.ca>
User-Agent: Alpine 2.21 (LRH 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/TS12mR9FJLggIsohDL-uLBwuSqU>
Subject: [secdir] Review of draft-vixie-dns-rpz-04
Precedence: list

I have reviewed this document at the request of the Independent Stream Editor.

Note that as the document itself states, if something was designed from scratch,
different design decisions would be made, and several improvements could be
implemented. But the goal of this document is to describes a defacto industry
standard that has been implemented by multiple vendors. As such, I believe it
is best that this document is published to ensure interoperability. It will
also allow the IETF community to start on a bis document to improve on the
shortcomings of this implementation which thankfully are already described in
the document.

This document facilities nationstate wide censorship. Since code is already out
there, we cannot prevent this anymore, and the best way forward is to write
a bis document without these flaws (for which I've already offered idea and
a willingness to contribute to the bis document). As such, I think this document
should be published either as-is, or with just minor clarifications, required
for the existing code base to interop on the short term. Long term, the goal
would be for the bis implementation to spread further.

In general, I found some of the text very difficult to read, despite
having extensive DNS experience and having written DNS RFCs. But I don't
think at this time we should attempt to change that for this document.

If an RR found in an RPZ is meaningless or unusable for
response policy purposes, then the containing RRset SHOULD be
ignored,

Is there a difference in failing between no QNAME present in the RPZ,
and this "ignore" ? It is not clear what ignore means if there is no
other RRset covering the QNAME. Return NXDOMAIN or NODATA (or ServFail?)

Version and format is mentioned before it is explained? Where is this version?

A single resource record (RR) consisting of a CNAME whose target is
the root domain (.) will cause a response of NXDOMAIN to be returned.

What should happen when there are two RRs with either same or different
RDATA?

I am little concerned with overloading CNAME's to "." and "*.",
as those queries do have real answers in the root zone (NSEC proof
of non-existence) and would have prefered to see this land into a
specifically reserved zone instead. But that will need to be done for
the bis version.

Precedence is mentioned before it is explained where or how it is
encoded. (similar to version)

rpz-drop. / rpz-passthru. is uses yet _another_ zone. It would have been
nicer if all these CNAME targets lived within one zone, as that makes
handling special zones easier in general. It seems also likely that if
any of this is hand edited, the trailing dot might be forgotten, leading
to weird things? The document explains this design weakness as well.

What is the chance of bogus RPZ lookalike code causing interferece? For
example, what happens if I add this to my nohats.ca. zone:

nohats.ca IN CNAME rpz-passthrough.

That is, can someone spoof a domain onto an RPZ list they do not control? In theory, this
would not be possible. But in practise, a lot of DNS code is re-used between RPZ code and
regular DNS code.

The special RPZ encodings which are not to be taken as Local Data are
CNAMEs with targets that are:
+ "." (NXDOMAIN action),
+ "*." (NODATA action),
+ a top level domain starting with "rpz-",
+ a child of a top level domain starting with "rpz-".

Is there any other usage known of CNAME's pointing to "." or "*." ?

Claiming a CNAME target cannot start with "rpz-" or a 2nd level domain
cannot start with "rpz-" is surely a very dangerous hack, and infringes
upon other parties that might not be aware of these restrictions
possibly causing deployments issues. What it someone has the domain
"rpz-exmaple.com." ? Or what if ICANN delegates "rpz-example.". While
these should not be affected as LHS (versus RDATA), it seems this is a
disaster waiting to happen.

This is really a bad use of squatting on the root zone name
space. It would have been much better if a clear longer string that
wouldn't match an expected real use domain or prefix was picked, eg
"response-zone-data." and "*.response-zone-data."

At least for now, no TLD's start with "rpz-". A quick test shows an
unfortunate namespace clash with things like the Religions Padagigisches
Zentrum and others:

rpz-aurich.de
rpz-bayern.de
rpz-bilder.de
rpz-bochum.de
rpz-bonn.de
rpz-bremen.de
rpz-dresden.de
rpz-ekhn.de
rpz-heavyequipmentsales.de
rpz-heilsbronn.de
rpz-igb.de
rpz-immobilien.de
rpz-kassel.de
rpz-kirchheimbolanden.de
rpz-kusel.de
rpz-manualdoproprietario.com.br
rpz-nord.de
rpz-solutions.co.uk
rpz-speyer.de
rpz-sued.de
rpz-test.co.uk
rpz-tester.co.uk
rpz-testers.co.uk
rpz-testing.co.uk
rpz-tests.co.uk
rpz-unterfranken.de
rpz-valve.co.uk
rpz-valve.uk
rpz-valvesolutions.co.uk
rpz-valvetest.co.uk
rpz-valvetest.uk
rpz-valvetesters.co.uk
rpz-valvetesting.co.uk
rpz-web.de
rpz-zlin.cz
rpz-zwickau.de

If anything, this shows it is important to get this document out so that we can work on the bis
document that will have its own restricted sub-zone dedicated for this where it is not possible
that QNAME/RDATA targets would get mixed up and accidentally fail on RPZ.

Wildcards are not valid as CNAME targets in ordinary DNS zones.

How does using this "illegal" construct work in various DNS software that
underlies the RPZ implementation? Is there a danger in DNS libraries used
by RPZ software to be implemented incorrectly if the underlying DNS library
is very strict? I think not, as this wildcard use is pretty fundamental
to RPZ, so I assume the whole thing just won't work if this is the case.
But a side-effect could be that the DNS library is modified to allow
wildcards in CNAME RDATA, causing real DNS to mistakenly be allowed to
do this. Would that cause operational issues with regular DNS?

But again, this is more something to be thought about for the next version
of RPZ.

The 5 listed trigger types are:
- Client IP Address
- QNAME
- Response IP Address
- NSDNAME
- NSIP

I find it confusing that this is a mix of "C style defines" and "multiple
english words". I would have probably used CLIENT_IP and RESPONSE_IP
(and NS_NAME and NS_IP), and not NSDNAME (as that is confusingly looking
like NS and DNAME, two existing RRTYPES)

The explanation of 4.1 saying it can be used to quarantine a compromised
client is probably a feature that should be added to the Introduction
text, which otherwise focuses mostly on access prevention of legitimate
clients.

IP address encoding should probably gets its own subsction in Section
4 before describing the first trigger type.

I find NSDNAME confusing, because it appears to be something with NS
and DNAME, but it really is just "Nameserver trigger". Maybe NS_NAME
would have been more clear, but I don't know how much this term is now
embedded into existing reference implementations.

Is there a reason nameserver names cannot be blocked using QNAME
triggers? Why is this a specific different type of trigger? Wouldn't
blocking ns.nohats.ca as an IP work equally well? I guess similarly with
the NSIP trigger? I'm sure there is a good reason for this, but I can't
tell that from any of the present text.

I would help to add some text that states line order of an RPZ zonefile is
not at all considered as a precedence order when processing RPZ rules. The
"order" of the rules is in the implementation, not based on the order
of the lines in an rpz zone file.

I'm a little concerned about NSDNAME triggers, since those can cover
part of an RRset (where an RRset is normally covered by one RRSIG in
DNSSEC). Does this type of trigger remove/modify the individual RRs
only, or does one hit within an RRSet match the entire RRset ? eg
if ns1.nohats.ca is listed in the NSDNAME set to be blocked, does it
affect ns0.nohats.ca if for nohats.ca those two are the NS records? Or
in English, if a domain being published by multiple NS records has one
NS records that is matched in an RPZ block, will it just block the one
nameserver or block the entire domain? This is probably the only issue
that I would insist on clarifying in the document, as I can see different
implementations doing this differently, making RPZ zones inconsistent
across implementations.

There is a lot of text on what to do when there are conflicting RPZ
directives, by specifying a complex ordering mechanism (eg Section
5). Another approach could be to not allow such conflicts to happen. eg
refuse to add a RPZ rule that would be the equivalent of "unreachable
code". My guess is that this is complicated and possibly too resource
intensive to enforce in code? But perhaps that can/should be spelled
out more clearly?

An implementation SHOULD include a configuration option such
as "recursive-only no" to relax this restriction.

I assume this means "An RPZ implementation"? Please clarify that.
I'm confused why this option is sometimes needed. Can you explain the
use of this option better. Ideally, it would explain why the SHOULD is
not a MUST as well? And what is the expected interaction with forwarder
statements?

Also by default, RPZ policies are applied only while responding to
DNS requests that do not request DNSSEC metadata (DO=0) or for which
no DNSSEC metadata exists. An implementation MAY include a
configuration option such as "break-dnssec yes" to relax this
restriction.

This comes as a big surprise to me so far into the document. Definitely,
a few sentences about DNSSEC interaction should go into the the
Introduction.

Also, wouldn't this be an attack vector? If you are doing Client IP
address RPZ blocks, wouldn't a malicious client just set DO=1 to avoid
having its malicious DNS resolve requests blocked? This seems like a
big shortcoming (and I'm saying that while being relieved this is
there because it means it won't break my own DNS, since I always use
DNSSEC). Something a bis document should fix up properly.

If a policy rule matches and results in a modified answer, then that
modified answer will include in its additional section the SOA RR of
the policy zone whose rule was used to generate the modified answer.

This is also something that comes out of nowhere and pretty late, and probably
deservers a mention in the introduction. I also do not see any of this in
the examples, so it would be good to add an example DNS answer that shows
such an additional section SOA RR.

DISABLED
SHOULD be implemented and causes any rule of the zone, when
selected as a "best match", to have no effect, except to log what
would otherwise have happened, provided sufficient logging is
enabled.

I prefer the name PERMISSIVE, in the same way that SElinux has enabled,
permissive and disabled. It is not intuitive that "disabled" still
causes all RPZ queries and processing to happen. I assume it is too
late now to change this as there are some implementations out there now,
but again something to think of for the bis document.

The master also SHOULD be configured as necessary to send
NOTIFY messages to each slave. Because minimal data latency is
critical both to the prevention of crime and abuse and to the
withdrawal of erroneous or outdated policy, a DNS RPZ producer SHOULD
also make every effort to minimize data latency, including ensuring
that NOTIFY messages are sent in a timely manner after each change of
the DNS RPZ on the master server.

I think this paragraph is arguing for MUST's while stating SHOULDs ? Right
now it seems to contradict the importance level.

For example, a DNS RPZ might include a QNAME
policy rule for "BAD.EXAMPLE.COM" as well as a Response IP Address
policy rule for 192.0.2.1.

Clarify that these are special domains and IP addresses that cannot occur
in the wild? Possibly mention the example/documentation RFC 5737. What
we want to avoid is someone copying this as example for a new RPZ zone
and changing 192.0.2.1 into some other actual valid public IP that's in
use on the internet.

Implementations which include this optimization SHOULD provide a
configuration switch (for example, "qname-wait-recurse") to turn it
on and off.

Why is this? The text clearly demonstrates that in some cases this path is
never needed to be traversed (eg on hitting Client IP Address trigger). Why
SHOULD implementations have an override for this? (I found out why near the
end in Section 12, see my comments there)

The default value of the switch MAY be on or off.

This MAY is curious, as a switch MUST be on or off? Probably language
without the word MAY makes more sense in this context.

Section 9.3 seems dangerous. Especially, as some TLDs are known
to accidentally make orphaned glue authoritative data in their
TLD zone. Malicious parties could specifically seek these out
knowing it will bypass RPZ. If I could manage to get an A record for
nohats.ca. into the ca. TLD zone, would any RPZ restrictions based on
NS nohats.ca. automatically be skipped by RPZ restrictions?

I find the start of 9.4 confusing. It would seem those implementations
are completely vulnerable to DNS spoofing and lack any support of
DNSSEC. Those resolvers should be considered completely broken and this
document should not facilitate those servers at all. Seperate from that,
I understand there are children and parent sticky resolver policies and
that part does fit into this section properly. Perhaps a slightly more
accurate rewrite of this first paragraph would fix this.

I feel Section 9.5 could be better implemented with a new keyword
(another CNAME target) that would mean both uses are disallowed. But as
this document is describing existing implementations, that can be taken
forward to the bis document.

Section 10 is a very useful section for work on a bis document. While
I normally would be tempted to not put this into and RFC (similar to
the Implementation Considerations that is remove prior to publication),
in this case I agree it should be left in here to help the DNSOPS WG to
write a bis document.

Section 12.2 is obviously the one that concerns me most. A successor
version of RPZ must support DNSSEC in such a way that updated clients can
use RPZ yet their answers cannot be withheld - this prevents nationstate
censorship using mandatory RPZ.A

Section 12.4 is a little odd. It claims that for counter-intellegence
purpose, the early abort of recursion leaks information to the attacker
and therefor there should be an option to enable/disable that. It seems
either the code should never abort early, or one just has to live with the
information leak in favour of performance. The individual administrator
is probably not able to set this option in favourable way for the internet
at large anyway, and if such an administrator makes a personal choice, it
is most likely they will always prefer optimising their cpu usage of RPZ.

I see no mention of the TTL of RPZ data. Should there be some advise
about what TTLs to use for RPZ data?

Is there any (bad) interaction with query minimalization and rpz ?

The document could do with a Human Rights consideration section, as the
document specifies a method to implement censorship. But I think the
goal is to publish and immediately improve on this in such a way that it
cannot be abused for nation state censorship, so I am okay with not doing
it for this version, which is mostly documentation of existing process.

NITS:

"An RPZ need not support query access since query access is never required."

This is a bit confusing, since an RPZ is a zone, and what else is a
zone used for? (does this mean you can only take the entire RPZ, and
not query it "live" ?)

Paul

[secdir] Review of draft-vixie-dns-rpz-04 Paul Wouters
Re: [secdir] Review of draft-vixie-dns-rpz-04 Paul Vixie