Re: [v6ops] draft-palet-v6ops-464xlat-opt-cdn-caches concerns, including a potential major security vulnerability

JORDI PALET MARTINEZ <jordi.palet@consulintel.es> Sun, 03 November 2019 19:43 UTC

Return-Path: <prvs=121079e0a6=jordi.palet@consulintel.es>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6AD6C1200C1 for <v6ops@ietfa.amsl.com>; Sun, 3 Nov 2019 11:43:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, SPF_HELO_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=consulintel.es
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cjkyskryl7RL for <v6ops@ietfa.amsl.com>; Sun, 3 Nov 2019 11:43:01 -0800 (PST)
Received: from mail.consulintel.es (mail.consulintel.es [IPv6:2001:470:1f09:495::5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DD1BE120086 for <v6ops@ietf.org>; Sun, 3 Nov 2019 11:42:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=consulintel.es; s=MDaemon; t=1572810177; x=1573414977; i=jordi.palet@consulintel.es; q=dns/txt; h=User-Agent:Date: Subject:From:To:Message-ID:Thread-Topic:References:In-Reply-To: Mime-version:Content-type; bh=aSDUFk5CoOXmxUqZdoKWUER7/39Oo6lgr0 FiMyZ0vvQ=; b=inAZ6AK3YRXcvwv1vyAXQMZfYSWm/ZglxJsgBqdd6GntIDzsCT 6qry8nCZ32MFkV518ZbCTHjg2ezyLqj622pRANCmWuga8TGk1pb5MgkEbio2gPhb HVV/JE6cehlnwst2NUi8/JibOl20jc+t3iIrmcwTm4pfw8k2cI5WqlUWM=
X-MDAV-Result: clean
X-MDAV-Processed: mail.consulintel.es, Sun, 03 Nov 2019 20:42:57 +0100
X-Spam-Processed: mail.consulintel.es, Sun, 03 Nov 2019 20:42:55 +0100
Received: from [10.10.10.130] by mail.consulintel.es (MDaemon PRO v16.5.2) with ESMTPA id md50006451769.msg for <v6ops@ietf.org>; Sun, 03 Nov 2019 20:42:55 +0100
X-MDRemoteIP: 2001:470:1f09:495:550f:425e:d72f:461e
X-MDHelo: [10.10.10.130]
X-MDArrival-Date: Sun, 03 Nov 2019 20:42:55 +0100
X-Authenticated-Sender: jordi.palet@consulintel.es
X-Return-Path: prvs=121079e0a6=jordi.palet@consulintel.es
X-Envelope-From: jordi.palet@consulintel.es
X-MDaemon-Deliver-To: v6ops@ietf.org
User-Agent: Microsoft-MacOutlook/10.10.f.191014
Date: Sun, 03 Nov 2019 20:42:51 +0100
From: JORDI PALET MARTINEZ <jordi.palet@consulintel.es>
To: Erik Nygren <erik+ietf@nygren.org>, "v6ops@ietf.org list" <v6ops@ietf.org>
Message-ID: <FF50DE13-C0C2-40CA-BEBC-6873A57489A8@consulintel.es>
Thread-Topic: draft-palet-v6ops-464xlat-opt-cdn-caches concerns, including a potential major security vulnerability
References: <CAKC-DJioTgMc9nVkkqKkut3qq5--a_7-Dnqaw62UBVDTQBbagw@mail.gmail.com>
In-Reply-To: <CAKC-DJioTgMc9nVkkqKkut3qq5--a_7-Dnqaw62UBVDTQBbagw@mail.gmail.com>
Mime-version: 1.0
Content-type: multipart/alternative; boundary="B_3655658571_151474369"
Archived-At: <https://mailarchive.ietf.org/arch/msg/v6ops/TM5O8dI598HalCHihaYiH9OK4N8>
Subject: Re: [v6ops] draft-palet-v6ops-464xlat-opt-cdn-caches concerns, including a potential major security vulnerability
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Nov 2019 19:43:07 -0000

Hi Erik,

 

First of all, thanks a lot for your detailed comments, and sorry to come back so late, but last months have been really crazy non-stop …

 

See below, in-line.

 

 

El 12/8/19 16:30, "Erik Nygren" <erik+ietf@nygren.org> escribió:

 

Some comments on draft-palet-v6ops-464xlat-opt-cdn-caches from the perspectives of having a $dayjob at a CDN....

This comes down to three general concerns, plus a major security concern:

 

Will this do more harm than good?  (ie, creating brokenness which motivates content to switch back to IPv4-only)
 

è    If we are able to resolve the issues, which is the goal, then should be more “good” and no harm at all.

 

Stateless EAMT just won't work due to the typical rate of changes of A/AAAA pairings in the DNS
 

è    I expect the changes in the A/AAAA to follow DNS TTLs, right? Otherwise, the cached information at the DNS proxy, or even hosts, will be also break “regular” usage, even if this optimization is not in place.

 

The IPv4:IPv6 associations may not be adequately 1:1 and client behavior may be hard to anticípate
 

è    I’m not sure you’re reading the last version of the document (draft-palet-v6ops-464xlat-opt-cdn-caches-03). In that version, we anticipated that the EAMT must be extended with the FQDN, so we can correctly match the IPv4 and IPv6 address only if it belongs to the same FQDN.

 

4) Poisoning risks/vulnerability  (a major security consideration, which may make the DNS approach not viable)

 

è    How different is that from a regular DNS cache poisoning? I don’t see the difference.

 

TL;DR is that the security issue (an attacker being able to poison the EAMT cache, which can

also happen non-maliciously as well) may mean the DNS-based approach is far too dangerous to deploy.

 

 

For #1  (the risk/concern factor)...

 

I worry really worry that while the DNS association option is a clever hack, that it will turn out to be highly fragile and problematic in reality.  It only takes some corner-cases that are missed or one bad implementation for it to do much more harm than good.  (I wrote this even before I realized the security issue listed below.)

 

One of the biggest challenges I'm already seeing on getting video content dual-stacked in some parts of the world is consumer electronics with poor IPv6 implementations interacting badly with buggy CPE/ISP/home-router configurations.  Even if IPv6 is generally faster (and may be more available if a good happy eyeballs implementation is used), it only takes a relatively small number of subscribers with some Smart TV + ISP configuration that breaks in the face of dual-stack content to cause a content provider to get enough calls to switch the content back to IPv4-only.  I've seen this happen in at least both the Nordics and East Asia.  I worry that this optimization feature could introduce an entirely new class of problems like this if poorly implemented.  If the only recourse content providers have is to switch their content back to being IPv4-only (as that *may* make the difference between content appearing broken to IPv4 users and between it appearing to be working), then this could hinder IPv6 usage much more than it helps.

 

I don’t see this being a problem for a simple reason. This will require a CPE firmware update. If I’m the ISP updating the firmware, I will make sure before the upgrade, that it works. Otherwise this optimization will not be in place. If the upgrade is done by the customer, by their own, this kind of problems happen with any buggy update, not just with this optimization, and of course, in that case it is the customer the one that know that after I did the upgrade, I broken something …

 

One approach that might help this would be to have hostname "allowed" lists, allowing a ISPs to turn this on for select content.  That could still be problematic if the "allowed" list of hostnames overlaps on the same IPs as other content. It might at least provide for a way for ISPs to use this feature with their own STB+content combinations, such as for their own "walled garden" use-cases where they control both the IPv4-only devices and the content being delivered.  (Getting robust IPv6 support onto the devices would obviate the need for this, however.)  Another possibility might be some opt-in mechanism?  (eg, if an additional DNS record is looked up that indicates that this translation is supported?)

 

We have something similar also in the latest version of the document, so again I’m not sure you’re reading the last one.

 

For #2  (the stateless/stateful factor)...

 

The A/AAAA IPs returned in DNS can be very dynamic.  For example, many CDNs have DNS TTLs in the 20 second to 300 second range.  At least for Akamai, the assignments for which addresses are returned for A and AAAA lookups are done independently.  As such, frequent churn in associations should be expected.

 

If the churn only impact new TCP connections then this part might work some (much?) of the time (minus other caveats below).

 

Is not this the same impact that will happen today if the TTLs fall in the middle of an existing TCP connection? Again, I don’t see a difference versus the problem created by low DNS TTLs (without the optimization).

 

However if the translation is stateless then this is going to be breaking TCP connections all the time and will make this solution unusable.

It sounds like the current EAM Table (EAMT) lookups are stateless?

 

Same as above. The EAMT entries will expire at the same time as the DNS TTLs, so I don’t see a difference.

 

This means that each time an association is removed or changed that it will break all of the long-lived connections using that association.  That seems likely to result in lots of timeouts and rebuffering  (ie, as streams will start playing and then will fail when the association changes or disappears and the connection breaks).

 

Mmmm … do you mean that the EAMT instead of expiring based on TTL needs to check if there is a live TCP and, in that case, keep it for the time that TCP connection still exists. Not sure if that’s so easy to implement. My reading is that the EAMT mission is to create “state”. NAT46 is stateless if there is a /64 for the translation; it is a stateful NAT44+stateless NAT46 if there is not such specific prefix. When we add the EAMT we have state (this A must be translated to this specific AAAA), but we may need to better define how to handle it.

 

However, if there is no optimization and the DNS TTLs expires in the middle of a TCP connection and the CDN has changed the A or AAAA, the problem is the same.

 

I think a requirement for this solution to work would be for the EAMT table lookups to result in stateful NAT entries on the home router.    (ie, the lookups are just used to create stateful NAT entries for the 4-tuple for the duration of that session/connection.) 

 

I think I got that …

 

 

For #3 (IPv4:IPv6 associations may not be adequately 1:1 and client behavior may be hard to anticipate)

This issue has a longer subset of concerns/issues.  The doc seems to touch on a bunch of them, but I worry if there are lots of lurking corner-cases, each one of which could cause problems.  Without a good way to fix or disable bad implementations, each one could be a major problem:

 

Again, same question, as I’m not sure you have read the last version (it was submitted very few days before your email, you may have downloaded a previous version to provide the inputs in this email). The way we are handling is disallowing the optimization if a 1:1 is not the case.

 

* Not all clients are good at obeying DNS TTLs.  Some are really bad.  (eg, Java will sometimes just do a single DNS lookup and use it for the duration of the process lifetime.  Many major browsers also will use a connection for the duration of its lifetime even if the TTL has changed.  In the best-case (with stateful translation) this just results in more IPv4 due to less cache entries.  In a bad case (without stateful translation) it could mean that connections are pretty much guaranteed to break when they pass their TTLs and the EAMT entries time out but the clients keep using them.

 

* The aliasing issue (where the IPv4:IPv6 associations between hostnames are far from 1:1) is real and will result in brokenness.  The doc has a bunch of heuristics that help. but given how clients often don't obey TTLs there may not be enough information available to detect and fix enough of these cases.  It seems like there are lots of possibilities for timing and race conditions for cache entries.   As an example, I know of one case where IPv4 is shared by lots of hostnames and SNI is used to determine which TLS certificate to hand out, but on the IPv6 side the address determines which TLS certificate to return.  This or similar may be fairly common as IPv4 is much more address-constrained.  The result is that a bad association will return in clints seeing the wrong certificate.  There are lots of similar but more subtler corner-cases here.

 

* Related to the previous, it will be tricky to figure out how to handle a set of IPv4 addresses from A records and sets of IPv6 addresses from AAAA records.  It will be important to make sure to have some permutations of the association to avoid hotspotting.

 

* The aliasing issue may result in effectively dual-stacking some IPv4-only content which would also be a serious problem.  (This is related to #4 below as well.)

 

I think all those cases get resolved because then the optimization is disabled.

 

For #4 (major security risk / vulnerability):

 

It seems like a malicious actor can cause one client (eg, a browser visiting web sites) to create lots of bad associations by returning A/AAAA associations unrelated to them.  If done for IPv4-only content and in a consistent manner, this seems like it can be used to hijack this content.  This vulnerability may be a death-nail for this DNS approach, at least without a hostname allowlist.  

 

* Take an IPv4-only site "www.example.com" which returns an IPv4 address "192.0.2.2" (with a fairly short DNS TTL)

* Malicious attacker tricks a client in the same network to resolve "hostile.example.net" (eg, as a single-pixel ad) which has {"2001:db8::2:2","192.0.2.2"} but with long DNS TTLs.  The attacker doesn't need to have any relationship to the IPv4 address to perform this attack.

* Now clients (even dual-stack clients) trying to visit "192.0.2.2" will have their traffic sent to the attacker-controlled IPv6 address ("2001:db8::2:2"), which could be a middlebox for surveillance, or something to try and phish the user, or even a DDoS target.

 

As we are disabling optimization when the same IPv4 address points to different names, I don’t think this is a problem.

 

This also potentially is a problem when the same IPv4 address has a mixture of dual-stack and IPv4-only hostnames pointing to it.

Using DNS to remove EAMT entries might help a little here (eg, if IPv4-only hostnames with NODATA are found for AAAA lookups), but given that this is also a security vulnerability and given other issues (eg, widely differing DNS client behaviors and TTLs) it seems highly likely that this is very exploitable.

 

Best regards, Erik

 

 



**********************************************
IPv4 is over
Are you ready for the new Internet ?
http://www.theipv6company.com
The IPv6 Company

This electronic message contains information which may be privileged or confidential. The information is intended to be for the exclusive use of the individual(s) named above and further non-explicilty authorized disclosure, copying, distribution or use of the contents of this information, even if partially, including attached files, is strictly prohibited and will be considered a criminal offense. If you are not the intended recipient be aware that any disclosure, copying, distribution or use of the contents of this information, even if partially, including attached files, is strictly prohibited, will be considered a criminal offense, so you must reply to the original sender to inform about this communication and delete it.