Re: [perpass] draft-josefsson-email-received-privacy

ned+perpass@mrochek.com Wed, 28 October 2015 01:24 UTC

MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: TEXT/PLAIN; CHARSET="us-ascii"
From: ned+perpass@mrochek.com
Message-id: <01PSEZ3X9I8000HE89@mauve.mrochek.com>
Date: Tue, 27 Oct 2015 09:31:36 -0700
In-reply-to: "Your message dated Mon, 26 Oct 2015 09:35:22 +0100" <87pp02rprp.fsf@latte.josefsson.org>
References: <871tcl3f03.fsf@latte.josefsson.org> <20151024224621.15562.qmail@ary.lan> <0c5701d10f8f$882e4a10$988ade30$@huitema.net> <87pp02rprp.fsf@latte.josefsson.org>
To: Simon Josefsson <simon@josefsson.org>
Archived-At: <http://mailarchive.ietf.org/arch/msg/perpass/VqQ8Y1NOYablB6i6SI7vUTEjqZA>
Cc: 'John Levine' <johnl@taugh.com>, perpass@ietf.org, Christian Huitema <huitema@huitema.net>
Subject: Re: [perpass] draft-josefsson-email-received-privacy
Precedence: list

> "Christian Huitema" <huitema@huitema.net> writes:

> > On Saturday, October 24, 2015 3:46 PM, John Levine wrote:
> >> ...
> >> My suggestion would be to start by finding people who have experience
> >> in large mail systems (Ned would be a good start if he has the time),
> >> and then state clearly what you're trying to do.  It looks like it's
> >> identifying and minimizing the amount of PII collected, reported (to
> >> downstream consumers), and logged (for internal users) for incoming
> >> mail.  Once you've done that, it'd be quite interesting to try and see
> >> what gets collected, and what the tradeoffs are if you don't collect
> >> it or don't report or log it.
> >
> > That sounds like a reasonable plan. Let's start, then. What about having
> > interested parties meet at a bar in Yokohama, say Monday evening, and start
> > drafting the first solution? I would be happy to pay the first round of
> > drinks, if that speeds up consensus...

> Sounds like a good idea, thanks for suggesting it.  I'm hoping Linus
> will be able to make it since I won't be there.

> I am concerned about taking on a too wide scope.

There are possible issues either way. On the one hand, a wider scope has the
potential of getting too wide and losing focus on what can be accomplished. But
on the other hand, Received: fields do not exist in isolation, and if you don't
understand their context you're not going to be able to craft viable privacy
considerations recommendations for them.

> I think it may be
> possible to reach closure on improving the Received header wrt privacy.
> I'm not against taking on the larger problems mentioned above, however I
> feel that it should not stand in the way of progress of fixing some
> identified problems of the Received header, and I worry that a wider
> task may not reach closure.

Not to put too fine a point on it, but you're a ways away from pinning down the
actual problem you want to solve here.

I already pointed out that Received: fields are, to a large extent, irrelevant
as far as state actors with access to transaction logs are concerned. This
leaves a rather different set of privacy concerns than the ones cited in the
current draft.

There's also the issue of submit versus relay. As far as I'm concerned you have
yet to elaborate a meaningful privacy case for routine suppression of
information in Received: fields generated by relay operations.

> To have a starting point for a problem statement for what I'd like to
> focus on, the Received header, I would propose:

> 1) The Problem: SMTP agents add IP addresses and timestamp information
>    about its clients in the Received header, in a way that clients
>    aren't able to influence.

First, you continue to fail to distinguish between SMTP (relay) and SUBMIT
(submission) here. These are separate protocols for a lot of very good reasons,
and you need to deal with them separately.

Second, the inability of of clients to influence the generation of Received:
fields is actually a feature, not a bug. And unless you are prepared to
substantiate a claim based on actual evidence of widely deployed email clients
acting in the interest of their user's privacy, it's almost certainly going to
be a privacy win as well.

Or, to put this another way, which do you think is easier to change: The
behavior of thousands of different clients currently in the hands of billions
of users, or the behavior of a handful of service providers?

> 2) Deployment Consideration: Received headers are useful for mail loop
>    detection and debugging by humans, and must continue to serve that
>    purpose as good possible, so as long as it doesn't violate the
>    privacy problem identified in 1).

This list is far from complete - the obvious omission is the use of Received:
fields for spam checks. Another is identifying sources of delays - transaction
logs are better for this but they only work within an administrative domain.
It's also the case that nobody - including all the email exports in the IETF -
are going to be able to create anything like a complete list of the uses of
Received: header fields out there.

Like it or not, this is what happens when something has been widely deployed
for over 25 years.

And despite the huge variability of the Received: fields that are generated,
it's entirely possible to write robust processing software for them. (The
assorted tricks for doing this are well known in the community.)

> 3) Work-together Consideration: If there are deployment of related ideas
>    already, we want to follow and describe them as long as it solves the
>    problem in 1) and retain the useful properties in 2).

The problem you're going to run into here is that ISPs/MSPs are generally not
very sharing about their plans. And folks like me that work with various
different ISPs /MSPs can't talk about the specifics of what's going on due to
nondisclosure arrangements.

> 4) Simplicity: Pick the simplest solution that meets the requirements.

Extensibility often trumps simplicity in the email space. It certainly does/did
in the design of Received: fields, and that's another thing that's going to
have to be dealt with by any specification that's written.

> I'd like to get away from the focus on SMTP submissions, the privacy
> violation affects all SMTP relaying.

You *really* need to substantiante this claim.

> Submission may be the most
> critical part, but there are other situations when you don't want your
> client's IP address or time of communication to leak.

Before I get into the timestamp issue, I want to reiterate John's excellent
point:

  You know how crypto people feel
  when someone shows up with a wonderful new crypto scheme?  And then
  when the someone says well, just tell me what's wrong with it?  Mail
  is a lot like that.  It's much more complex and subtle than it
  appears, even to people who've used it casually for a long time.

There was a time when email was truly a heterogeneous store and forward system
spanning many protocols, message formats, gateways, and lots of other junk.
Messages were often delayed, sometimes by substantial amounts of time, and
routinely transited multiple administrative domains before being delivered. And
there were probably some things that could be discovered by looking at
Received: field timestamps at that point.

Those days are long gone. These days the majority of email messages are handled
in their entirely by a small number of large service providers. Of the
remainder, most are intra-domain business email handled by enterprise mail
systems like Microsoft Exchange. Tied to that is the inter-domain email from
those enterprise email systems. The vast majority of messages now only transit
one or two administrative domains.

All of these systems employ sophisticated queuing systems and storage
technologies and are highly optimized to transfer massive amounts of
email with exceptionally low latency. (Low latency is not just a user
experience win, it's also a win in terms of not needing as much queue storage.)
And by massive I mean tens of thousands of messages a second or more,
delivering to millions of users (or more), scattered across data centers all
over the world.

And even the tiny fraction of systems that are left are typically using
high quality open source like Exim, Postfix, or sendmail. These packages also
offer very good perfomance.

The result is that typical email delivery times have been driven down into the
tens of seconds range; a couple of minutes at most. And if there's more of a
delay, it's usually at the point of final delivery, which has no privacy
implications that I can see.

And when the times are higher than this, it's usually deliberate. For example,
I noticed when I did the testing I reported in a previous message that Yahoo
seems to consistently insert a three minute delay. (It's probably part of their
spam checks.)

And since the Received: fields are, modulo clock skew (now usually a nonissue
due to the widespread use of NTP) and coding errors, going to be bracketed by
the Date: field and the delivery timestamp, the amount of information conveyed
by the timestamps in normal operation is quite low, especially in terms of
privacy. (Their utility arises when things aren't normal, e.g., when you want
to know the reason for an abnormal delay or you use mistakes in their
construction as a spam indicator.)

The possible exception to this is the time zone. I'm having trouble coming
up with a scenario where the time zone of the Received: field provides
more insight than the one in the Date: field, but I suppose it's possible.
But if there's actually reason to care about this it can be dealt with simply
by saying "use UTC".

Now, this begs the question of what, if anything, to do about the Date: field.
It's required, and if the client doesn't insert it the submit server will. (Or
the message may be rejected as spam. And sending out bogus Date: field values
is *very* likely to trigger spam detectors.)

This makes it impossible to use regular email to send messages with an
indeterminate submission time, but with all due respect, that's not something
the regular email service is designed to provide, and given the low standard
deviation of end-to-end message tranfer with the use of extensions like
FUTURERELEASE, it's not something the regular email service can provide
in any meaningful way anyhow.

If you want to build such a service on top of regular email, it would have
to be done as some sort of remailer. Which is entirely possible to do, but
is entirely out of scope of what's being proposed here.

tl;dr Unless I'm missing something, timestamps are not a privacy concern.

				Ned

[perpass] draft-josefsson-email-received-privacy Linus Nordberg
Re: [perpass] draft-josefsson-email-received-priv… Christian Huitema
Re: [perpass] draft-josefsson-email-received-priv… Nick Doty
Re: [perpass] draft-josefsson-email-received-priv… Brian Trammell
Re: [perpass] draft-josefsson-email-received-priv… ned+perpass
Re: [perpass] draft-josefsson-email-received-priv… Stephen Farrell
Re: [perpass] draft-josefsson-email-received-priv… ned+perpass
Re: [perpass] draft-josefsson-email-received-priv… Simon Josefsson
Re: [perpass] draft-josefsson-email-received-priv… Simon Josefsson
Re: [perpass] draft-josefsson-email-received-priv… Simon Josefsson
Re: [perpass] draft-josefsson-email-received-priv… Jacob Appelbaum
Re: [perpass] draft-josefsson-email-received-priv… John Levine
Re: [perpass] draft-josefsson-email-received-priv… Christian Huitema
Re: [perpass] draft-josefsson-email-received-priv… John R Levine
Re: [perpass] draft-josefsson-email-received-priv… Stephen Farrell
Re: [perpass] draft-josefsson-email-received-priv… Simon Josefsson
Re: [perpass] draft-josefsson-email-received-priv… John R Levine
Re: [perpass] draft-josefsson-email-received-priv… Simon Josefsson
Re: [perpass] draft-josefsson-email-received-priv… John R Levine
Re: [perpass] draft-josefsson-email-received-priv… ned+perpass
Re: [perpass] draft-josefsson-email-received-priv… John R Levine
Re: [perpass] draft-josefsson-email-received-priv… ned+perpass
Re: [perpass] draft-josefsson-email-received-priv… Linus Nordberg
Re: [perpass] draft-josefsson-email-received-priv… John R Levine