Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05

Just following up on my own email, the working group is advised to take
quite seriously privacy considerations.  As Daniel referenced RFC 6973,
even though we considered some of these issues, I refer you to an
article in today's Wall Street Journal[1] that highlights how easy it is
to correlate information to individuals and how important a role
location plays into that.

Eliot
[1]
http://www.wsj.com/articles/metadata-can-expose-persons-identity-even-when-name-isnt-1422558349?mod=WSJ_hp_EditorsPicks

On 1/30/15 6:24 AM, Eliot Lear wrote:
> Thank you Daniel for your prompt review.  The working group and draft
> editor shall address your comments prior to advancing this document. 
> N.B., some discussion has already occurred in this area, even though it
> is not covered in the draft.
>
> Eliot
>
>
> On 1/30/15 3:13 AM, Daniel Kahn Gillmor wrote:
>> Hi Daniel and Elliot--
>>
>> On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote:
>>> Our document describing Time Zone Data Distribution Service
>>> <http://tools.ietf.org/html/draft-ietf-tzdist-service-05> [1] is close to
>>> be finalized and we would like to proceed to cross area review.
>>>
>>> We would greatly appreciate to get review by February 11.
>>  [...]
>>> [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05
>> Thanks for your work on this.  This is the first time i've seen this
>> draft; apologies for not looking at it earlier.
>>
>> I'm only subscribed to saag@ietf.org (and ietf-privacy, which is idle
>> lately, but i've included here because some of my review touches on
>> privacy), so this post might not make it through to tzdist@ietf.org --
>> feel free to forward it as needed.
>>
>> I did a quick skim here with my security and privacy hats on, and have a
>> few comments:
>>
>> (privacy) Privacy Considerations section is missing
>> ===================================================
>>
>> There is *no* "Privacy Considerations" section in the draft at all.
>> Please read RFC 6973 for guidance in conducting a privacy review of the
>> protocol.  The act of querying these servers leaks something about the
>> location of the person doing the query, at least, and may leak
>> information about other locations that they're interested in.  It's also
>> possible that regular attempts to query this information will provide a
>> linkable trail of the user, which could then be (mis)used without their
>> knowledge or permission.
>>
>> Here's an attempt at a quick analysis, though i haven't thought through
>> the protocol in detail.  I hope you'll do your own analysis, and you're
>> welcome to take any of mine:
>>
>> Implausibly: if the average user is interested in 5 timezones, and there
>> are 774 known zones ("find /usr/share/zoneinfo -type f | wc"), and those
>> interests were evenly distributed across the zones for every users, then
>> the set of requests to update an individual's preferred timezones yields
>> nearly 50 bits of entropy, far more than enough to distinguish every
>> individual human from each other.
>>
>> More plausibly: timezone interest is probably less than 5 for most
>> people, and it isn't evenly distributed: the people who are interested
>> in Americas/New_York are more likely to be interested in
>> Americas/Los_Angeles than in Arctic/Longyearbyen.  But anyone with an
>> unusual set of TZs can probably be identified (perhaps uniquely) by any
>> provider they talk to just by what TZs they ask for.
>>
>> Since §4.1.4 says "Clients SHOULD poll for changes, using an appropriate
>> conditional request, at least once a day", a malicious provider intent
>> on surveilling its users and with a mechanism to do so would have a
>> daily checkin.  I imagine this as some kind of background system service
>> looking for updates.  the daily checkin could be used to track a user's
>> movements around the network, if their device is not stationary.  The
>> time of checkin could also be used as a linking mechanism, if the
>> machine polls with rigid regularity.
>>
>> Are there strategies that someone interested in preserving their
>> anonymity from a tzdata provider should take to remain anonymous?  If
>> so, what are they?
>>
>>
>> (privacy) HTTP pipelining?
>> ==========================
>>
>> Clients requesting multiple unusual TZs together are more easily
>> identifiable to servers, than clients who request only one.  Should
>> clients request all their interested TZs at once, or spread out their
>> polling updates over time?  HTTP pipelining is clearly more efficient;
>> but what are the privacy implications if you have a system service that
>> does this?
>>
>> (privacy) HTTP Cookies?
>> =======================
>>
>> The choice of HTTP transport also allows for servers to set cookies in
>> clients -- should clients accept and re-transmit cookies from the
>> server?  What are the privacy implications?
>>
>>
>> (privacy) Tracking via ETag?
>> ============================
>>
>> Also, conditional requests seem to be encouraged via the use of an ETag
>> header.  It looks to me like a provider who wants to track its users
>> individually (even in the absence of cookies) could use a cache of
>> personalized ETags to do so.
>>
>> For example, the first time any client requests TZ X (with no
>> If-None-Match request header), the server mints a new ETag Y, generates
>> a new client ID Z, and records:
>>
>>  * Client ID Z
>>  * the requested TZ X
>>  * the new ETag Y
>>  * the time of issuance
>>  * the IP address
>>  * any other interesting metadata
>>
>> When a request comes in for TZ X with an If-None-Match: Y header, the
>> server can link the two requests and record them both with client ID Z.
>>
>> When the underlying data for the TZ actually changes, the server mints a
>> new ETag (for the new version of TZ X), but associates it with the same
>> client ID Z.
>>
>>
>> (privacy) Logging policy for distribution servers?
>> ==================================================
>>
>> There is also no mention of recommended logging policy for the servers,
>> no attempt to address data minimization or the risks to trackable users
>> based on normal server logs.
>>
>> (privacy) Authenticated clients are trackable
>> =============================================
>>
>> the Security Considerations section says:
>>
>>    Servers MAY require some form of authentication or authorization of
>>    clients (including secondary servers) to restrict which clients are
>>    allowed to access their service, or provide better identification of
>>    errant clients.  As such, servers MAY require HTTP- based
>>    authentication as per [RFC7235].
>>
>> Clients who make authenticated connections to servers are eminently
>> trackable by those servers.  What are the privacy implications for those
>> clients?
>>
>>
>> (privacy) network observers tracking clients
>> ============================================
>>
>> Someone passively observing the network could also potentially track the
>> clients of a given server via traffic analysis, even if the server is
>> not cooperating.  First, the attacker could get a stash of all the data
>> that the server has, noting the size of each zone under each supported
>> format.
>>
>> When a new request is made for a zone, the attacker can observe the size
>> of the query and the size of the response and guess with high
>> probability which zone was requested.
>>
>> If the clients poll once a day on a schedule (i.e. exactly every 86400
>> seconds) then the network observer may be able to track updates and
>> determine when a client interested in a particular zone does an update.
>>
>> What mechanisms could a client (and server?) use to frustrate such a
>> network-based attacker to keep a given client's identity anonymous?
>>
>>
>> (security/privacy) HTTP redirection
>> ====================================
>>
>> What if the server sends an HTTP redirection (e.g. via HTTP response 301
>> or 302) --  should the client follow it?  What if it is to a cleartext
>> HTTP resource?  What are the security and privacy consequences of
>> following these redirections off-origin?
>>
>>
>> (security) Consequences of accepting bad TZ updates?
>> ====================================================
>>
>> I'm glad that the Security Considerations recognizes that reliable TZ
>> data is vital -- but no example is given of what a data compromise might
>> look like.  Is it worth providing a couple of examples of bad outcomes?
>> are we talking about missed appointments?  or crashing software?  or
>> something else?
>>
>> (security) why not require TLS on both sides?
>> =============================================
>>
>> you've got that the service MUST operate over https, but the clients
>> only SHOULD try https first.  Why allow for cleartext access at all?
>> Why not say that both clients and servers MUST support HTTPS?
>>
>> I see https://tools.ietf.org/wg/tzdist/trac/ticket/7 suggests that there
>> is consensus that you don't want "mandatory to use", but i don't know
>> where the discussion is, or why you don't want it.
>>
>>
>> (security) Provider-to-Provider TLS
>> ===================================
>>
>> Connections between "Secondary Providers" and "Root Providers" seem
>> different from the connections between Clients and Providers.  If you
>> can't mandate HTTPS for all clients for some reason, what about at least
>> mandating that the caching infrastructure requires TLS for all
>> provider-to-provider connections?  The secondary provider will need a
>> TLS stack anyway (as a server), so it should be able to do TLS on the
>> upstream side.
>>
>>
>> (security) DNS compromise leaves only cleartext
>> ===============================================
>>
>> If a network-based attacker can filter network traffic, they can simply
>> drop all outbound _timezones._tcp.example.com DNS queries, and then when
>> the client gives up, they can allow through (or provide their own, if
>> DNSSEC isn't involved) responses to _timezone._tcp.example.com.
>>
>> This immediately puts the network attacker in the position of being able
>> to dictate timezone information to a client willing to fall back to
>> cleartext.
>>
>>
>>
>> (security) no-DNSSEC fallback checks are ambiguous
>> ==================================================
>>
>> The Security Considerations currently say:
>>
>>    In the absence of a secure DNS option, clients SHOULD check that the
>>    target FQDN returned in the SRV record matches the original service
>>    domain that was queried.  If the target FQDN is not in the queried
>>    domain, clients SHOULD verify with the user that the SRV target FQDN
>>    is suitable for use before executing any connections to the host.
>>
>> What does "matches" mean here?  the second sentence suggests that it
>> means "shares some sort of a suffix with" -- but which part?  If i query
>> for an SRV of _timezones._tcp.tz.example.com, and it replies with an
>> FQDN of bar.example.com, is that OK?  what about x.y.z.bar.example.com?
>> If DNSSEC isn't available, the attacker can still point this response to
>> any IP address of their choice, right?
>>
>> What does "verify with the user" mean if this is a TZdata service, which
>> is presumably running automatically on the computer to keep this
>> information up-to-date?  most such services have no user interaction at
>> all.
>>
>> If there is a UI, what options would the user be given in such a case?
>> Is this a popup dialog box that says "you asked for timezone data
>> updates from tz.example.com -- is it ok to get it from whatever.example
>> instead?"  What users can make sense of this dialog?  What information
>> would a fully-technically-cognizant user (a deep wizard) use to answer
>> it sensibly?  What would a normal user use?
>>
>> If DNSSEC *is* available, is it OK if the record points outside the
>> zone?  what if it points to a non-signed zone?
>>
>>
>> (security) Conflicts between Providers?
>> ========================================
>>
>> The draft implies that a client might fetch data from multiple
>> providers.  What should the client do if two providers provide
>> conflicting information about the same TZ?
>>
>> (security) use examples of certificate validation
>> ==================================================
>>
>> The combination of SRV records and X.509 certificate validation and
>> (maybe) DNSSEC is a tricky subject.  you've referenced RFC 6125, but i
>> don't think that's enough.
>>
>> Do you mean to suggest that the certificate should use a SRVName
>> subjectAltName (RFC4985)?  or should it use a DNSName subjectAltName
>> with the name sent in the SRV query?  or a DNSName subjectAltName
>> with the FQDN returned in the SRV response?
>>
>> Providing an example would make it clearer what you mean.  For example:
>>  
>>    If a client looks up SRV for _timezones._tcp.example.com, and gets a
>>    response of tz.example.net, then the certificate should (a) be valid,
>>    and (b) have either a subjectAltName DNSName of tz.example.net or a
>>    subjectAltName SRVName of _timezones._tcp.example.com (or both).
>>
>> (please adjust to taste, i don't mean to tell you what the right choice
>> is here, it's an ugly problem)
>>
>> (security) Statically-signed data vs. transport security
>> ========================================================
>>
>> The security of the transmission process seems to rely entirely on
>> transport security.
>>
>> If there is a compromise in transmission between the Root provider and
>> the secondary provider, or a compromise of any provider, the client has
>> no way of knowing that they're getting bad data.
>>
>> tzdata changes infrequently enough that it seems like it could be signed
>> with an offline key, making compromise of running systems much less
>> fruitful.  But this only works if the client can verify the offline
>> signature.
>>
>> Have you considered any mechanism that the client could use to verify
>> the tz update based on data itself, without depending solely on
>> transport security?
>>
>> I see this question tangentially raise here:
>>
>>   https://www.ietf.org/mail-archive/web/tzdist/current/msg00102.html
>>
>> but it's answered only in the "we still need TLS" way (which i agree
>> with).  Is any work done (or planned) on providing signed/verifiable
>> data?
>>
>>
>> (security) TLS best-practices?
>> ===============================
>>
>> I'm glad that you've got TLS as a MUST for servers.  Is it worth making
>> a normative reference to the UTA's TLS best-practices document?
>>
>>   https://tools.ietf.org/html/draft-ietf-uta-tls-bcp
>>
>>
>>
>>
>> Sorry this got long, and that this is more in the form of questions than
>> patches.  I hope i haven't repeated too much of what the tzdist WG has
>> already discussed -- please feel free to point me to relevant
>> discussions that i may have missed.
>>
>>             --dkg
>
>
>
> _______________________________________________
> saag mailing list
> saag@ietf.org
> https://www.ietf.org/mailman/listinfo/saag

Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05

Attachment: signature.asc