Re: Migrating some high-entropy HTTP headers to Client Hints.

Yoav Weiss <yoav@yoav.ws> Thu, 11 April 2019 21:50 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 627491205F5 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 11 Apr 2019 14:50:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.899
X-Spam-Level:
X-Spam-Status: No, score=-2.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yoav-ws.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WdRnHMPP_eEG for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 11 Apr 2019 14:50:26 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [IPv6:2603:400a:ffff:804:801e:34:0:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BAEAA1205CD for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 11 Apr 2019 14:50:26 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.89) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1hEhYa-0008DJ-Rs for ietf-http-wg-dist@listhub.w3.org; Thu, 11 Apr 2019 21:48:28 +0000
Resent-Date: Thu, 11 Apr 2019 21:48:28 +0000
Resent-Message-Id: <E1hEhYa-0008DJ-Rs@frink.w3.org>
Received: from titan.w3.org ([2603:400a:ffff:804:801e:34:0:4c]) by frink.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <yoav@yoav.ws>) id 1hEhYZ-0008CB-Aa for ietf-http-wg@listhub.w3.org; Thu, 11 Apr 2019 21:48:27 +0000
Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from <yoav@yoav.ws>) id 1hEhYV-0000Pe-Tf for ietf-http-wg@w3.org; Thu, 11 Apr 2019 21:48:27 +0000
Received: by mail-wr1-x42f.google.com with SMTP id k17so4590470wrx.10 for <ietf-http-wg@w3.org>; Thu, 11 Apr 2019 14:47:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yoav-ws.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Cby8yIE5sRGntVOdkCP3BE4h5Z4lpFS758aa6h0KO3w=; b=rofjzVrfO9fmh/TV8eKMIUZeEpCLRwkhF0EnnzHeNWZRlK6aIaSuIr1fda1Y0HLtxM MfsMv4CHiWxw6kIgIBGZTlbrErds3iFSEZGlLK0XepHtaF/+hYb2gYHJyGb8RfLxSXVb ebwoBXJV68EfanTR0zPf0KPKgfAdm26LFFKZUTPcn2H0IBxZR8FqzZ96Dw5adotEK9bi RUUOFZ+4bWxYhTci346yffHPq2qQfaejnWCxObe0KhZFUAWFepvUw08bikMvhgz8w6D6 fKXwolOBX6myzUyLk3qHEylBmg1pQBsbw1zM4PonNcTRqahZqEH7ftZPmNCRAfmpRaDG Gycg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Cby8yIE5sRGntVOdkCP3BE4h5Z4lpFS758aa6h0KO3w=; b=kSyydCHvnwGf7UuTz8+9dTzI+Ex1rA+29mhWhEden5BfuNIxn6cfL8ms6GNZSB3qtY Z0xA7VwQawL9B283CPqddGJQ6K3aFgM9KZhNqjIEK7Z5666OLFQp4r6ON7TjG4VGsK9D /y8dOIYbtRztLbQjcSbKQysZV5CRKG5Puf53dO40YsU6DtbbQGZk3kIOEbrPQHL1oDuu 3g3dFsS/qXG1OvubEunkMXBqycI+S38AIn8yyK5A4J2eQJ3PuJ26ri0Sn2ztF3MMBD8p 8LdG04opmlCYvgP0kUwpG6YPufTbHcN93euqPrUnDq2bIeAPg1e71ft9c4O+hJzt+Fw8 wnQQ==
X-Gm-Message-State: APjAAAXpXedjzOMaAT0ECS7buYynjQWBVPPFmVV2a5VS+hGcxZyerx1V 1N2vxY/ZAmapBbOYFCPCgydvH4jXij9MNSCggyTgXYsmg2gbPw==
X-Google-Smtp-Source: APXvYqwCi/XVb1TYUUV6ZhtNxSYneJAVIE4HRCusruBQx3x7N86GENooayF0aHNLn1umhtG8VNbx6mBOL9SInmHp3gI=
X-Received: by 2002:a5d:428c:: with SMTP id k12mr17655785wrq.279.1555019270115; Thu, 11 Apr 2019 14:47:50 -0700 (PDT)
MIME-Version: 1.0
References: <CAKXHy=eHiMtXi8vkDYtADHdU0tnUfd3p+Wfy7vSkLgT7cA1W0w@mail.gmail.com> <f042d223-85ee-fd16-74fa-7d6d993f817f@gmail.com> <4d321ba1-f6f1-05c3-5b76-24f6a9b89525@afilias.info>
In-Reply-To: <4d321ba1-f6f1-05c3-5b76-24f6a9b89525@afilias.info>
From: Yoav Weiss <yoav@yoav.ws>
Date: Thu, 11 Apr 2019 17:47:33 -0400
Message-ID: <CACj=BEiNPVgd7SvhBCqaTyp7c0Jg8QcjXnQt5Fk7Te367hB_pQ@mail.gmail.com>
To: Ronan Cremin <rcremin@afilias.info>
Cc: Thomas Peterson <hidinginthebbc@gmail.com>, Mike West <mkwst@google.com>, HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="00000000000036e61905864823f5"
Received-SPF: pass client-ip=2a00:1450:4864:20::42f; envelope-from=yoav@yoav.ws; helo=mail-wr1-x42f.google.com
X-W3C-Hub-Spam-Status: No, score=-5.0
X-W3C-Hub-Spam-Report: AWL=3.906, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1hEhYV-0000Pe-Tf cb456611d512bc1499dba3c5669b1ff3
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Migrating some high-entropy HTTP headers to Client Hints.
Archived-At: <https://www.w3.org/mid/CACj=BEiNPVgd7SvhBCqaTyp7c0Jg8QcjXnQt5Fk7Te367hB_pQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/36523
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hey Ronan,


On Thu, Apr 11, 2019 at 8:11 AM Ronan Cremin <rcremin@afilias.info> wrote:

> Hi,
>
> My name is Ronan Cremin, I help to build a device recognition product
> widely-used in the web analytics, publishing and advertising industries.
> Full disclosure: my employer profits from analysis of UA strings, though
> moving the same information to client hints is not expected to impact
> this materially.
>
> One concern over moving UA string information to Client Hints is that
> the information required to publish device-specific responses arrives
> only in the second request from the client. This imposes a performance
> penalty on publishers that serve a device-tailored HTML document. As
> Mike mentioned, RWD notwithstanding, many publishers employ
> device-specific responses as envisaged in RFC1945, usually to tailor the
> experience to a class of device e.g. smartphone, tablet, desktop and so
> on.


The viewport Client Hint can provide such distinction, but it exposes more
bits that are actually needed, so not a great option to expose by default,
without an opt-in.
From your description, maybe exposing another tri-state hint by default
will be enough to cover the use-case and maybe it not expose too much bits
about the user in the process.
Can you open an issue on https://github.com/WICG/ua-client-hints describing
your use-case?

If we were to conclude that something like that is privacy-safe, I guess
the main problem would be to define where is the line drawn between a phone
and a tablet, and between a tablet and a laptop.
I suspect a standard definition of those borders is likely to become stale
fairly quickly...

Publishers endeavour to fit everything required for the first screen
> of content into this first response, so a delay to this impacts
> performance. The last time I checked more than 80% of the top 100
> websites used this technique.
>

When was that? Do you have data you can point us to?


>
> Web analytics might also be impacted. Most web analytics solutions
> support a JavaScript-free integration approach based on linking a single
> pixel image hosted by the analytics platform. The ability to do this is
> impacted for the same reason—the information required for analytics
> becomes available only on the second request from the client.
>

I'm not sure that's a winning argument, as it sounds like those analytics
vendors exploit the current UA string to extract bits of information from
passive requests.
The current proposal will enable them to do the same (with the same number
of RTTs), but only after an explicit opt-in to receive that data from the
browser. An opt-in that can be monitored by the browser, extensions and
privacy researchers.


> Has thought been given to the performance impact of the proposal?


Yes.


> Yoav
> mentions this issue in his Client Hints infrastructure document
> (https://github.com/yoavweiss/client-hints-infrastructure) but I haven't
> seen any attempt to quantify the impact.


As indicated in the document you linked to, we currently don't have a great
way to make fingerprinting-bits-exposing Client Hints an opt-in while
keeping sending those on the very first request.
That's unfortunate and we hope to improve on that in the future.
At the same time, the User-Agent string is exposing many bits of entropy,
so it is a privacy hole we're interested in blocking.


>
> Regards,
> Ronan
>
> On 29/11/2018 12:08, Thomas Peterson wrote:
> > I would propose that all Accept* headers are included in Client Hints
> > as all can be used for some level of fingerprinting, e.g. Accept can
> > used to distinguish between desktop browsers (which typically have
> > html/xml MIME types) and cURL/wget which by default have '*/*'. Many
> > user agents also do their own guess work on response bodies anyway
> > (such as looking at the magic number) to determine content type or
> > encoding, so the impact of a "failed negotiation" of content can be
> > limited.
> >
> > Also, Is there a particular reason why Sec-CH-Lang omits Quality Values?
> >
> >
> > Regards
> >
> >
> > On 29/11/2018 10:22, Mike West wrote:
> >> Hey folks,
> >>
> >> Section 9.7 of RFC7231
> >> <https://tools.ietf.org/html/rfc7231#section-9.7> rightly notes that
> >> some of the content negotiation headers user agents deliver in HTTP
> >> requests create substantial fingerprinting surface. I think it would
> >> be beneficial if we took steps to reduce their prevalence on the
> >> wire, and Client Hints looks like a reasonable infrastructure on top
> >> of which to build.
> >>
> >> `User-Agent` and `Accept-Language` seem like particularly tasty and
> >> low-hanging fruit, and I've sketched out two proposals as proofs of
> >> concept:
> >>
> >> *   `User-Agent` could be represented as ~four distinct hints: `UA`,
> >> `Model`, `Platform`, and `Arch`:
> >> https://github.com/mikewest/ua-client-hints is a high-level
> >> explainer, and https://tools.ietf.org/html/draft-west-ua-client-hints
> >> a sketchy ID for the new headers.
> >>
> >> *   `Accept-Language` could be represented as a `Lang` hint:
> >> https://github.com/mikewest/lang-client-hint is a high-level
> >> explainer, https://tools.ietf.org/html/draft-west-lang-client-hint an
> >> equally sketchy ID for the new header.
> >>
> >> I'd appreciate y'all's feedback. Thanks!
> >>
> >> -mike
> >
>
>
>
>