Benjamin Kaduk's No Objection on draft-ietf-httpbis-client-hints-14: (with COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 19 May 2020 21:00 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C00B33A099A for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 19 May 2020 14:00:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.649
X-Spam-Level:
X-Spam-Status: No, score=-2.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nz9UJln2Nwvo for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 19 May 2020 13:59:59 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 75AC33A096C for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 19 May 2020 13:59:59 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1jb9I8-0005v9-IO for ietf-http-wg-dist@listhub.w3.org; Tue, 19 May 2020 20:56:48 +0000
Resent-Date: Tue, 19 May 2020 20:56:48 +0000
Resent-Message-Id: <E1jb9I8-0005v9-IO@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <noreply@ietf.org>) id 1jb9I5-0005uI-Qk for ietf-http-wg@listhub.w3.org; Tue, 19 May 2020 20:56:46 +0000
Received: from mail.ietf.org ([4.31.198.44]) by mimas.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <noreply@ietf.org>) id 1jb9I2-0002p6-MZ for ietf-http-wg@w3.org; Tue, 19 May 2020 20:56:45 +0000
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 6E9093A0A0A; Tue, 19 May 2020 13:56:30 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: "The IESG" <iesg@ietf.org>
Cc: draft-ietf-httpbis-client-hints@ietf.org, httpbis-chairs@ietf.org, ietf-http-wg@w3.org, Mark Nottingham <mnot@mnot.net>, mnot@mnot.net
X-Test-IDTracker: no
X-IETF-IDTracker: 6.130.1
Auto-Submitted: auto-generated
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <158992178960.5956.2137971544232835817@ietfa.amsl.com>
Date: Tue, 19 May 2020 13:56:30 -0700
Received-SPF: pass client-ip=4.31.198.44; envelope-from=noreply@ietf.org; helo=mail.ietf.org
X-W3C-Hub-Spam-Status: No, score=-6.2
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1jb9I2-0002p6-MZ cff26bb9f358e118f39558fd47c0390a
X-Original-To: ietf-http-wg@w3.org
Subject: Benjamin Kaduk's No Objection on draft-ietf-httpbis-client-hints-14: (with COMMENT)
Archived-At: <https://www.w3.org/mid/158992178960.5956.2137971544232835817@ietfa.amsl.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/37672
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Benjamin Kaduk has entered the following ballot position for
draft-ietf-httpbis-client-hints-14: No Objection

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-httpbis-client-hints/



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Section 1

   There are thousands of different devices accessing the web, each with
   different device capabilities and preference information.  These
   device capabilities include hardware and software characteristics, as
   well as dynamic user and user agent preferences.  Historically,

nit: should "user-agent" be hyphenated?

   applications that wanted to allow the server to optimize content
   delivery and user experience based on such capabilities had to rely
   on passive identification (e.g., by matching the User-Agent header

nit: it feels like "allow the server" would be something that involves
granting permission or the client sending an active signal (as proposed
by this document), as opposed to just the apaplication that "wanted the
server to optimize" and had to make do with such limited signal as was
already available.

   field (Section 5.5.3 of [RFC7231]) against an established database of
   user agent signatures), use HTTP cookies [RFC6265] and URL

nit: hyphenate user-agent again, used as an adjective.

   o  User agent detection cannot reliably identify all static
      variables, cannot infer dynamic user agent preferences, requires
      external device database, is not cache friendly, and is reliant on

nit: singular/plural mismatch ("an external device database" or
"external device databases")

   o  Cookie-based approaches are not portable across applications and
      servers, impose additional client-side latency by requiring
      JavaScript execution, and are not cache friendly.

(I think I missed a step in why a cookie-based approach inherently
requires javascript execution, though maybe it doesn't matter.)

   Proactive content negotiation (Section 3.4.1 of [RFC7231]) offers an
   alternative approach; user agents use specified, well-defined request
   headers to advertise their capabilities and characteristics, so that

Chasing the reference, it's not clear that it supports quite this strong
of a statement: in addition to the explicit negotiation fields, it also
allows using implicit characteristics such as client IP address and
User-Agent.

Section 2.1

   access of third parties to those same header fields.  Without such an
   opt-in, user agents SHOULD NOT send high-entropy hints, but MAY send
   low-entropy ones [CLIENT-HINTS-INFRASTRUCTURE].

It looks like the reference only defines a registry for low-entropy
hints, and we are inferring that any hints not listed in that table are
to be treated as "high-entropy".  Perhaps we could reword both
directions of this directive to refer only to the registry of
low-entropy hints (e.g., "SHOULD NOT send hints that are not listed in
[registry]")?

   Implementers need to be aware of the passive fingerprinting
   implications when implementing support for Client Hints, and follow
   the considerations outlined in the Security Considerations
   (Section 4) section of this document.

side note: in some sense the Accept-CH mechanism transforms it from a
passive to an active fingerprinting mechanism.

Section 2.2

   information in them.  When doing so, and if the resource is
   cacheable, the server MUST also generate a Vary response header field
   (Section 7.1.4 of [RFC7231]) to indicate which hints can affect the
   selected response and whether the selected response is appropriate
   for a later request.

side note: I suspect the answer I want is already present with a
detailed reading of RFC 7231, but I wonder if it's worth saying
something here about whether the Vary response header could/should
include registered client hint header field names that were not present
in the request in question.

Section 3.1

   Based on the Accept-CH example above, which is received in response
   to a user agent navigating to "https://example.com", and delivered
   over a secure transport, a user agent will have to persist an Accept-
   CH preference bound to "https://example.com".  It will then use it

What level of requirement is implied by "will have to" here?  IIUC, it's
just that "if anything is persisted, it must be keyed on" but with no
obligation to do any persistence.  If so, perhaps a wording like "any
persisted Accept-CH preference will be bound to" would be better?

   for navigations to e.g. "https://example.com/foobar.html", but not to
   e.g. "https://foobar.example.com/".  It will similarly use the
   preference for any same-origin resource requests (e.g. to

nit: comma after "e.g." (throughout).

   "https://example.com/image.jpg") initiated by the page constructed
   from the navigation's response, but not to cross-origin resource
   requests (e.g. "https://thirdparty.com/resource.js").  This
   preference will not extend to resource requests initiated to
   "https://example.com" from other origins (e.g. from navigations to
   "https://other-example.com/").

Perhaps thirdparty.example and other.example, to stay within the BCP32
space?

Section 3.2

   When selecting a response based on one or more Client Hints, and if
   the resource is cacheable, the server needs to generate a Vary
   response header field ([RFC7234]) to indicate which hints can affect
   the selected response and whether the selected response is
   appropriate for a later request.

Is BCP 14 language approprite here?

   Above example indicates that the cache key needs to include the Sec-
   CH-Example header field.

nit: please add the article "the" to make this a complete sentence.

Section 4

While I don't expect that I can tell the major browser vendors anything
new about the privacy considerations to client hints, I do think that we
should give some guidance to implementors of other HTTP clients, who may
not have such extensive depth of knowlege, on the general landscape in
which this mechanism is set.  The subsections hereof do a great job
covering a lot of relevant details and specific factors to consider;
thank you!  I think it may also be appropriate to have some more generic
lead-in text, noting that in the worst case, merely converting a passive
fingerprinting mechanism to an active fingerprinting mechanism with
server opt-in does not actually provide any privacy benefit (the worst
case being when all servers ask for all the data and clients accede)!
While we might hope that the need to jump through an extra hoop to
access fingerprinting information might dissuade some servers from
asking for it, it seems imprudent to assume that it will happen, so in
order to obtain real privacy benefit there needs to be some additional
policy controls in the client and in what hints are defined/implemented.
As I mentioned already, we already have a lot of the details for how to
apply such policy controls, and limitations to only define hints that
expose information already available in other means; what I'd like to
see is the high-level picture that ties them together.

Section 4.1

   upon it.  The header-based opt-in means that we can remove passive
   fingerprinting vectors, such as the User-Agent string (enabling
   active access to that information through User-Agent Client Hints
   [4]), or otherwise expose information already available through

I think this [4] is the same as [UA-CH].

Also, use of the first person ("we") is somewhat unusual in RFC style.

   Therefore, features relying on this document to define Client Hint
   headers MUST NOT provide new information that is otherwise not
   available to the application via other means, such as existing
   request headers, HTML, CSS, or JavaScript.

As written, this is a fairly weird condition.  What constitutes
"available to the application via other means"?  Does "put up an
interstitial until the user provides the information in question" count?

   o  Entropy - Exposing highly granular data can be used to help
      identify users across multiple requests to different origins.
      Reducing the set of header field values that can be expressed, or
      restricting them to an enumerated range where the advertised value
      is close but is not an exact representation of the current value,

nit: "close to" seems like it would scan better.

   Different features will be positioned in different points in the
   space between low-entropy, non-sensitive and static information (e.g.
   user agent information), and high-entropy, sensitive and dynamic
   information (e.g. geolocation).  User agents need to consider the
   value provided by a particular feature vs these considerations, and
   MAY have different policies regarding that tradeoff on a per-feature
   basis.

How about on a per-origin basis (and, e.g., domain reputation)?  An
"entropy budget" where an origin that asks for too many distinct hints
won't get all of them?
(I also wonder if a descriptive "may wish to have" is better than the
normative "MAY", here.)

   o  Implementers SHOULD restrict delivery of some or all Client Hints
      header fields to the opt-in origin only, unless the opt-in origin
      has explicitly delegated permission to another origin to request
      Client Hints header fields.

Am I reading things right that this document does not define any such
delegation mechanisms but is just admitting the possibility of such
mechanisms being defined in the future?  I'd suggest clarifying up in
§2.1 with a parenthetical (akin to the "outlined below" note about the
opt-in mechanism).

   Implementers SHOULD support Client Hints opt-in mechanisms and MUST
   clear persisted opt-in preferences when any one of site data,
   browsing history, browsing cache, cookies, or similar, are cleared.

Who is the target audience for this SHOULD?  If it's just "people
implementing this document", it seems ineffectual, and if it's any
broader scope it seems unenforcable.

Section 4.3

   Research into abuse of Client Hints might look at how HTTP responses
   that contain Client Hints differ from those with different values,

nit: what are "responses that contain Client Hints"?  We have discussed
Accept-CH header fields in responses, and client hints in requests, but
the only mention I recall of hints in responses was in the Vary header
field, and it's not clear that that is what was intended.

Section 5

   While HTTP header compression schemes reduce the cost of adding HTTP
   header fields, sending Client Hints to the server incurs an increase
   in request byte size.  Servers SHOULD take that into account when

nit: I wonder if this would be more clear as:

% Sending Client Hints to the server incurs an increase in request byte
% size.  Some of this increase can be mitigated by HTTP header
% compression schemes, but each new hint will still lead to some
% increased bandwidth usage.  Servers SHOULD [...]

Section 7.1

I'm not sure I understand why [FETCH] is listed as a normative
reference.

I find it amusing that we reference both 7231 and 7234 for Vary, though
to my untrained eye the current references both seem appropriate in
their respective locations.

Section 7.2

If [CLIENT-HINTS-INFRASTRUCTURE] is to be the source of truth for
low-entropy (and, by deduction) high-entropy hints, it seems like it
should be normative.