Re: [Pearg] Call for adoption: draft-learmonth-pearg-safe-internet-measurement-02.txt

I have reviewed this document and while I think some of the advice
here is potentially useful, I don't think the recommendations really
match what's current practice or what's practical. As such, I don't
think it should be adopted without quite a bit more work.

As background, here is a partial list of the kinds of studies that
are commonly done:

- A/B tests of browser behavior where we randomize users into
  a control and treatment group and then measure some parameter
  (e.g., connection success rate) under normal browsing usage.
  This kind of A/B test is standard practice for many browser
  features, at least for Chrome and Firefox. See:
  https://textslashplain.com/2017/10/18/chrome-field-trials/
  https://support.mozilla.org/en-US/kb/shield

- Forced tests where we have the browser perform some action
  (e.g., connect to a server) with various parameters and
  measure differences between the parameter settings.
  This is less common, but we used these during the roll out
  of TLS 1.3 and are currently using these for DoH.
  See:
https://blog.mozilla.org/futurereleases/2018/11/27/next-steps-in-dns-over-https-testing/

- Studies where we use an ad network to initiate some behavior
  in the browser. This is pretty common approach when you don't
  control a browser measurement platform. See, for instance:
  https://labs.apnic.net/?p=341

https://www.usenix.org/conference/usenixsecurity13/technical-sessions/paper/lian

- Wide-scale scans (e.g., https://zmap.io/).

I don't really want to get into a long debate about whether any
particular study type is appropriate. Rather, these are common study
types and so if the advice in this document is to be useful, then it
needs to reasonably match what people do -- or at least have a much
stronger argument that people should change what they do than is
offered here.

S 2. CONSENT
The text in this draft leans pretty heavily on getting consent,
either direct consent (including all users of the shared network)
or "proxy consent".

However, many of these kinds of studies don't really lend themselves
to detailed consent from individual users of the browser -- let alone
to from every user on the network they are on. As a concrete example,
ad-type studies don't generally get any kind of consent at all.  For
instance, here's the experimental setup for APNIC's DNSSEC
measurements: https://labs.apnic.net/?p=341

    The experiment uses an online advertisement campaign to deliver
    the test code to end systems. When the end system is passed an ad
    that is carrying the experiment the system runs embedded Adobe
    Flash code. The code is executed when the ad is passed to the
    user, and does not rely on a user "click" or any other user
    trigger action. The active code interrogates one of two experiment
    controllers by performing a URL fetch. The contents of the fetched
    experiment control URL are a dynamically generated sequence of
    four URLs. These four URLs are the substance of the test setup.

It's worth noting at this point that the Web is a platform for running
remote code, and by browsing you're opting into that, and ad studies
just leverage that behavior.

As another example, Mozilla's Shield Studies system
(https://wiki.mozilla.org/Firefox/Shield/Shield_Studies),is generally
opt-out, with a specific opt-in for when the study collects
more sensitive data:

    Shield Studies are available on all channels. Individual studies
    can be opt-out or opt-in depending on the nature of the study and
    the data being collected. Opt-out Shield Studies can only collect
    Type 1 and 2 data. This is the same type of basic interaction data
    we collect by default as defined in our privacy policy. There may
    be instances where we want to collect data that is not covered by
    our default data collection policies. In those instances you will
    be prompted to opt-in to the study. A complete description of the
    study and any additional data that would be collected will be
    disclosed before enrolling. A great example of this is Pioneer,
    which is actually an opt-in Shield Study currently in the wild.

As above, in the cases where it's opt-in, the primary question is
about the user's private data, and it's when that data is implicated
that we get more consent. There's no practical way to get consent from
other users of the same network. So, the consent section of this draft
doesn't seem to really apply well.

Now, obviously, one could argue that this kind of study should follow
a different set of practices, but I don't think that's really
right. There seem to be two core issues here:

- The effects of various changes on the user or the network they are
  on.

- The data collection inherent in doing the study.

WRT to the first point, as a general matter, modern browsers
auto-update, so the user has generally opted into regularly getting
whatever new code the vendor thinks makes the best browser. We use
studies to determine whether a given change is good but in many cases
the alternative is just to roll the change out to everyone,
effectively doing the study on the whole population. So, either
we can have that effect on everyone or only on a limited subset
(what we do now). We run a lot of experiments and having explicit
consent for each one would make that much harder, resulting in
testing much more on the full population.

Similarly, on the topic of data collection, browsers report back quite
a bit of technical data about their behavior. In both Chrome and
Firefox, this is on by default, though you can turn it off. As
suggested by the quote above on Shield studies, many studies just
gather this kind of data (and often data that the browser would
already report back) and Mozilla, at least, has a pretty
well-developed framework for determining what kind of data requires
what level of consent
(https://wiki.mozilla.org/Firefox/Data_Collection).

S 3.4.

   When deciding on the data to collect, assume that any data collected
   might become public.  There are many ways that this could happen,
   through operation security mistakes or compulsion by a judicial
   system.

This seems like it might be a good practice under some circumstances,
but impractical in many other cases. Moreover, given the enormous
amounts of personal data that users routinely hand to Web sites,
this seems like an unreasonably high standard.

S 3.5.
   For all data collected, consider whether or not it is really needed.

This section (and Section 4) seem really undeveloped. As a comparison
point, consider Mozilla's Data Collection principles
(https://wiki.mozilla.org/Firefox/Data_Collection). I'm not saying
that those principles are perfect, but they're fairly complete and
thought through. if you're going to publishing an RFC with proposed
principles for what data is collected, I would expect it to be
comparably thorough.

On Tue, May 21, 2019 at 3:23 AM Sara Dickinson <sara@sinodun.com> wrote:

> Hi All,
>
> This email starts a two week Call for Adoption of
> draft-learmonth-pearg-safe-internet-measurement
>
> The draft is available at:
> https://datatracker.ietf.org/doc/draft-learmonth-pearg-safe-internet-measurement/
>
> Please review this draft to see if you think it is suitable for adoption
> by PEARG and send comments to the list, clearly stating your view.
>
> Please also indicate if you are willing to contribute text, review, etc.
>
> This call for adoption ends on 4th April 2019.
>
> Sara.
>
> --
> Pearg mailing list
> Pearg@irtf.org
> https://www.irtf.org/mailman/listinfo/pearg
>