Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Fri, 30 January 2015 02:13 UTC

Return-Path: <dkg@fifthhorseman.net>
X-Original-To: ietf-privacy@ietfa.amsl.com
Delivered-To: ietf-privacy@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0DCB31A88D9; Thu, 29 Jan 2015 18:13:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.8
X-Spam-Level:
X-Spam-Status: No, score=0.8 tagged_above=-999 required=5 tests=[BAYES_50=0.8] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KQKMyhyJzmYf; Thu, 29 Jan 2015 18:13:43 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [209.234.253.108]) by ietfa.amsl.com (Postfix) with ESMTP id 6A3C11A88D8; Thu, 29 Jan 2015 18:13:43 -0800 (PST)
Received: from fifthhorseman.net (unknown [38.109.115.130]) by che.mayfirst.org (Postfix) with ESMTPSA id 44B96F984; Thu, 29 Jan 2015 21:13:41 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 1B4A0201D1; Thu, 29 Jan 2015 21:13:40 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Daniel Migault <mglt.ietf@gmail.com>, saag@ietf.org, ietf-privacy@ietf.org, Eliot Lear <lear@cisco.com>
In-Reply-To: <CADZyTkkCrvTam_ba7Tq6A-cHAVZn+ktKqwWsr_PNQaz2jyTkUQ@mail.gmail.com>
References: <CADZyTkkLu6qQ9LCqDkTHA9o+-YVvQuaUp33kqkAt=PRaQS-Jew@mail.gmail.com> <CADZyTkkCrvTam_ba7Tq6A-cHAVZn+ktKqwWsr_PNQaz2jyTkUQ@mail.gmail.com>
User-Agent: Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (x86_64-pc-linux-gnu)
Date: Thu, 29 Jan 2015 21:13:36 -0500
Message-ID: <874mr9aucv.fsf@alice.fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Archived-At: <http://mailarchive.ietf.org/arch/msg/ietf-privacy/UkOg4bm4_9KHQnTUdbqpiDkCUqA>
Cc: Time Zone Data Distribution Service <tzdist@ietf.org>
Subject: Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05
X-BeenThere: ietf-privacy@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Internet Privacy Discussion List <ietf-privacy.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-privacy>, <mailto:ietf-privacy-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf-privacy/>
List-Post: <mailto:ietf-privacy@ietf.org>
List-Help: <mailto:ietf-privacy-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-privacy>, <mailto:ietf-privacy-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jan 2015 02:13:47 -0000

Hi Daniel and Elliot--

On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote:
> Our document describing Time Zone Data Distribution Service
> <http://tools.ietf.org/html/draft-ietf-tzdist-service-05> [1] is close to
> be finalized and we would like to proceed to cross area review.
>
> We would greatly appreciate to get review by February 11.
 [...]
> [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05

Thanks for your work on this.  This is the first time i've seen this
draft; apologies for not looking at it earlier.

I'm only subscribed to saag@ietf.org (and ietf-privacy, which is idle
lately, but i've included here because some of my review touches on
privacy), so this post might not make it through to tzdist@ietf.org --
feel free to forward it as needed.

I did a quick skim here with my security and privacy hats on, and have a
few comments:

(privacy) Privacy Considerations section is missing
===================================================

There is *no* "Privacy Considerations" section in the draft at all.
Please read RFC 6973 for guidance in conducting a privacy review of the
protocol.  The act of querying these servers leaks something about the
location of the person doing the query, at least, and may leak
information about other locations that they're interested in.  It's also
possible that regular attempts to query this information will provide a
linkable trail of the user, which could then be (mis)used without their
knowledge or permission.

Here's an attempt at a quick analysis, though i haven't thought through
the protocol in detail.  I hope you'll do your own analysis, and you're
welcome to take any of mine:

Implausibly: if the average user is interested in 5 timezones, and there
are 774 known zones ("find /usr/share/zoneinfo -type f | wc"), and those
interests were evenly distributed across the zones for every users, then
the set of requests to update an individual's preferred timezones yields
nearly 50 bits of entropy, far more than enough to distinguish every
individual human from each other.

More plausibly: timezone interest is probably less than 5 for most
people, and it isn't evenly distributed: the people who are interested
in Americas/New_York are more likely to be interested in
Americas/Los_Angeles than in Arctic/Longyearbyen.  But anyone with an
unusual set of TZs can probably be identified (perhaps uniquely) by any
provider they talk to just by what TZs they ask for.

Since §4.1.4 says "Clients SHOULD poll for changes, using an appropriate
conditional request, at least once a day", a malicious provider intent
on surveilling its users and with a mechanism to do so would have a
daily checkin.  I imagine this as some kind of background system service
looking for updates.  the daily checkin could be used to track a user's
movements around the network, if their device is not stationary.  The
time of checkin could also be used as a linking mechanism, if the
machine polls with rigid regularity.

Are there strategies that someone interested in preserving their
anonymity from a tzdata provider should take to remain anonymous?  If
so, what are they?


(privacy) HTTP pipelining?
==========================

Clients requesting multiple unusual TZs together are more easily
identifiable to servers, than clients who request only one.  Should
clients request all their interested TZs at once, or spread out their
polling updates over time?  HTTP pipelining is clearly more efficient;
but what are the privacy implications if you have a system service that
does this?

(privacy) HTTP Cookies?
=======================

The choice of HTTP transport also allows for servers to set cookies in
clients -- should clients accept and re-transmit cookies from the
server?  What are the privacy implications?


(privacy) Tracking via ETag?
============================

Also, conditional requests seem to be encouraged via the use of an ETag
header.  It looks to me like a provider who wants to track its users
individually (even in the absence of cookies) could use a cache of
personalized ETags to do so.

For example, the first time any client requests TZ X (with no
If-None-Match request header), the server mints a new ETag Y, generates
a new client ID Z, and records:

 * Client ID Z
 * the requested TZ X
 * the new ETag Y
 * the time of issuance
 * the IP address
 * any other interesting metadata

When a request comes in for TZ X with an If-None-Match: Y header, the
server can link the two requests and record them both with client ID Z.

When the underlying data for the TZ actually changes, the server mints a
new ETag (for the new version of TZ X), but associates it with the same
client ID Z.


(privacy) Logging policy for distribution servers?
==================================================

There is also no mention of recommended logging policy for the servers,
no attempt to address data minimization or the risks to trackable users
based on normal server logs.

(privacy) Authenticated clients are trackable
=============================================

the Security Considerations section says:

   Servers MAY require some form of authentication or authorization of
   clients (including secondary servers) to restrict which clients are
   allowed to access their service, or provide better identification of
   errant clients.  As such, servers MAY require HTTP- based
   authentication as per [RFC7235].

Clients who make authenticated connections to servers are eminently
trackable by those servers.  What are the privacy implications for those
clients?


(privacy) network observers tracking clients
============================================

Someone passively observing the network could also potentially track the
clients of a given server via traffic analysis, even if the server is
not cooperating.  First, the attacker could get a stash of all the data
that the server has, noting the size of each zone under each supported
format.

When a new request is made for a zone, the attacker can observe the size
of the query and the size of the response and guess with high
probability which zone was requested.

If the clients poll once a day on a schedule (i.e. exactly every 86400
seconds) then the network observer may be able to track updates and
determine when a client interested in a particular zone does an update.

What mechanisms could a client (and server?) use to frustrate such a
network-based attacker to keep a given client's identity anonymous?


(security/privacy) HTTP redirection
====================================

What if the server sends an HTTP redirection (e.g. via HTTP response 301
or 302) --  should the client follow it?  What if it is to a cleartext
HTTP resource?  What are the security and privacy consequences of
following these redirections off-origin?


(security) Consequences of accepting bad TZ updates?
====================================================

I'm glad that the Security Considerations recognizes that reliable TZ
data is vital -- but no example is given of what a data compromise might
look like.  Is it worth providing a couple of examples of bad outcomes?
are we talking about missed appointments?  or crashing software?  or
something else?

(security) why not require TLS on both sides?
=============================================

you've got that the service MUST operate over https, but the clients
only SHOULD try https first.  Why allow for cleartext access at all?
Why not say that both clients and servers MUST support HTTPS?

I see https://tools.ietf.org/wg/tzdist/trac/ticket/7 suggests that there
is consensus that you don't want "mandatory to use", but i don't know
where the discussion is, or why you don't want it.


(security) Provider-to-Provider TLS
===================================

Connections between "Secondary Providers" and "Root Providers" seem
different from the connections between Clients and Providers.  If you
can't mandate HTTPS for all clients for some reason, what about at least
mandating that the caching infrastructure requires TLS for all
provider-to-provider connections?  The secondary provider will need a
TLS stack anyway (as a server), so it should be able to do TLS on the
upstream side.


(security) DNS compromise leaves only cleartext
===============================================

If a network-based attacker can filter network traffic, they can simply
drop all outbound _timezones._tcp.example.com DNS queries, and then when
the client gives up, they can allow through (or provide their own, if
DNSSEC isn't involved) responses to _timezone._tcp.example.com.

This immediately puts the network attacker in the position of being able
to dictate timezone information to a client willing to fall back to
cleartext.



(security) no-DNSSEC fallback checks are ambiguous
==================================================

The Security Considerations currently say:

   In the absence of a secure DNS option, clients SHOULD check that the
   target FQDN returned in the SRV record matches the original service
   domain that was queried.  If the target FQDN is not in the queried
   domain, clients SHOULD verify with the user that the SRV target FQDN
   is suitable for use before executing any connections to the host.

What does "matches" mean here?  the second sentence suggests that it
means "shares some sort of a suffix with" -- but which part?  If i query
for an SRV of _timezones._tcp.tz.example.com, and it replies with an
FQDN of bar.example.com, is that OK?  what about x.y.z.bar.example.com?
If DNSSEC isn't available, the attacker can still point this response to
any IP address of their choice, right?

What does "verify with the user" mean if this is a TZdata service, which
is presumably running automatically on the computer to keep this
information up-to-date?  most such services have no user interaction at
all.

If there is a UI, what options would the user be given in such a case?
Is this a popup dialog box that says "you asked for timezone data
updates from tz.example.com -- is it ok to get it from whatever.example
instead?"  What users can make sense of this dialog?  What information
would a fully-technically-cognizant user (a deep wizard) use to answer
it sensibly?  What would a normal user use?

If DNSSEC *is* available, is it OK if the record points outside the
zone?  what if it points to a non-signed zone?


(security) Conflicts between Providers?
========================================

The draft implies that a client might fetch data from multiple
providers.  What should the client do if two providers provide
conflicting information about the same TZ?

(security) use examples of certificate validation
==================================================

The combination of SRV records and X.509 certificate validation and
(maybe) DNSSEC is a tricky subject.  you've referenced RFC 6125, but i
don't think that's enough.

Do you mean to suggest that the certificate should use a SRVName
subjectAltName (RFC4985)?  or should it use a DNSName subjectAltName
with the name sent in the SRV query?  or a DNSName subjectAltName
with the FQDN returned in the SRV response?

Providing an example would make it clearer what you mean.  For example:
 
   If a client looks up SRV for _timezones._tcp.example.com, and gets a
   response of tz.example.net, then the certificate should (a) be valid,
   and (b) have either a subjectAltName DNSName of tz.example.net or a
   subjectAltName SRVName of _timezones._tcp.example.com (or both).

(please adjust to taste, i don't mean to tell you what the right choice
is here, it's an ugly problem)

(security) Statically-signed data vs. transport security
========================================================

The security of the transmission process seems to rely entirely on
transport security.

If there is a compromise in transmission between the Root provider and
the secondary provider, or a compromise of any provider, the client has
no way of knowing that they're getting bad data.

tzdata changes infrequently enough that it seems like it could be signed
with an offline key, making compromise of running systems much less
fruitful.  But this only works if the client can verify the offline
signature.

Have you considered any mechanism that the client could use to verify
the tz update based on data itself, without depending solely on
transport security?

I see this question tangentially raise here:

  https://www.ietf.org/mail-archive/web/tzdist/current/msg00102.html

but it's answered only in the "we still need TLS" way (which i agree
with).  Is any work done (or planned) on providing signed/verifiable
data?


(security) TLS best-practices?
===============================

I'm glad that you've got TLS as a MUST for servers.  Is it worth making
a normative reference to the UTA's TLS best-practices document?

  https://tools.ietf.org/html/draft-ietf-uta-tls-bcp




Sorry this got long, and that this is more in the form of questions than
patches.  I hope i haven't repeated too much of what the tzdist WG has
already discussed -- please feel free to point me to relevant
discussions that i may have missed.

            --dkg