Re: Benjamin Kaduk's No Objection on draft-ietf-httpbis-client-hints-14: (with COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 19 June 2020 02:11 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2EF2F3A0AD1 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 18 Jun 2020 19:11:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.649
X-Spam-Level:
X-Spam-Status: No, score=-2.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UAz6R_eiERRq for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 18 Jun 2020 19:11:20 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C35433A0AA1 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 18 Jun 2020 19:11:20 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1jm6Ue-0001SC-UE for ietf-http-wg-dist@listhub.w3.org; Fri, 19 Jun 2020 02:11:01 +0000
Resent-Date: Fri, 19 Jun 2020 02:11:00 +0000
Resent-Message-Id: <E1jm6Ue-0001SC-UE@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <kaduk@mit.edu>) id 1jm6Ud-0001RU-OQ for ietf-http-wg@listhub.w3.org; Fri, 19 Jun 2020 02:10:59 +0000
Received: from outgoing-auth-1.mit.edu ([18.9.28.11] helo=outgoing.mit.edu) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <kaduk@mit.edu>) id 1jm6Ub-0003X5-2Q for ietf-http-wg@w3.org; Fri, 19 Jun 2020 02:10:59 +0000
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 05J2Aba2026030 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jun 2020 22:10:39 -0400
Date: Thu, 18 Jun 2020 19:10:36 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: Mark Nottingham <mnot@mnot.net>
Cc: Yoav Weiss <yoav@yoav.ws>, The IESG <iesg@ietf.org>, draft-ietf-httpbis-client-hints@ietf.org, httpbis-chairs@ietf.org, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <20200619021036.GW11992@kduck.mit.edu>
References: <158992178960.5956.2137971544232835817@ietfa.amsl.com> <CACj=BEiezqmP5AszaCC=jt5igYudGs-QQeejEr-2PFqvKDUbyw@mail.gmail.com> <BE79CBD3-AD98-4A54-9596-9318E46752A2@mnot.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <BE79CBD3-AD98-4A54-9596-9318E46752A2@mnot.net>
User-Agent: Mutt/1.12.1 (2019-06-15)
X-W3C-Hub-Spam-Status: No, score=-7.2
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1jm6Ub-0003X5-2Q 7c2d7a9a9a3b7e8671b2adfdd463b1aa
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Benjamin Kaduk's No Objection on draft-ietf-httpbis-client-hints-14: (with COMMENT)
Archived-At: <https://www.w3.org/mid/20200619021036.GW11992@kduck.mit.edu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/37794
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Thanks, Mark, if I had know those ("Vary") bits of the HTTP spec I would
have made different suggestions :)

-Ben

On Fri, Jun 19, 2020 at 12:05:59PM +1000, Mark Nottingham wrote:
> Just adding some detail --
> 
> > On 17 Jun 2020, at 6:47 pm, Yoav Weiss <yoav@yoav.ws> wrote:
> > 
> > Thanks for reviewing and apologies for the delayed reply :/
> > 
> > Comments addressed below and incorporated into https://github.com/httpwg/http-extensions/pull/1220
> > Your review would be appreciated :)
> > 
> > On Tue, May 19, 2020 at 10:56 PM Benjamin Kaduk via Datatracker <noreply@ietf.org> wrote:
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-httpbis-client-hints-14: No Objection
> > 
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> > 
> > 
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> > 
> > 
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-ietf-httpbis-client-hints/
> > 
> > 
> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > Section 1
> > 
> >    There are thousands of different devices accessing the web, each with
> >    different device capabilities and preference information.  These
> >    device capabilities include hardware and software characteristics, as
> >    well as dynamic user and user agent preferences.  Historically,
> > 
> > nit: should "user-agent" be hyphenated?
> > 
> > In web specifications it typically isn't. RFC 7231 also doesn't seem to hyphen it.
> 
> Yes. "User Agent" is a concept; "User-Agent" is a HTTP header field.
> 
> 
> >    applications that wanted to allow the server to optimize content
> >    delivery and user experience based on such capabilities had to rely
> >    on passive identification (e.g., by matching the User-Agent header
> > 
> > nit: it feels like "allow the server" would be something that involves
> > granting permission or the client sending an active signal (as proposed
> > by this document), as opposed to just the apaplication that "wanted the
> > server to optimize" and had to make do with such limited signal as was
> > already available.
> > 
> > OK. Removing "allow the".
> >  
> > 
> >    field (Section 5.5.3 of [RFC7231]) against an established database of
> >    user agent signatures), use HTTP cookies [RFC6265] and URL
> > 
> > nit: hyphenate user-agent again, used as an adjective.
> > 
> > TIL: compound adjective
> > Done!
> 
> The problem is that this confuses the reader between the concept and the header field. In general, I'd prefer that the RFC Editor handle this level of detail regarding English usage, so that we don't needlessly go back-and-forth.
> 
> 
> >    o  User agent detection cannot reliably identify all static
> >       variables, cannot infer dynamic user agent preferences, requires
> >       external device database, is not cache friendly, and is reliant on
> > 
> > nit: singular/plural mismatch ("an external device database" or
> > "external device databases")
> > 
> > Done 
> > 
> >    o  Cookie-based approaches are not portable across applications and
> >       servers, impose additional client-side latency by requiring
> >       JavaScript execution, and are not cache friendly.
> > 
> > (I think I missed a step in why a cookie-based approach inherently
> > requires javascript execution, though maybe it doesn't matter.)
> > 
> > Essentially, if you want to dynamically set your cookies based on client-side information, you need javascript to do that. 
> > 
> > 
> >    Proactive content negotiation (Section 3.4.1 of [RFC7231]) offers an
> >    alternative approach; user agents use specified, well-defined request
> >    headers to advertise their capabilities and characteristics, so that
> > 
> > Chasing the reference, it's not clear that it supports quite this strong
> > of a statement: in addition to the explicit negotiation fields, it also
> > allows using implicit characteristics such as client IP address and
> > User-Agent.
> > 
> > Would ending that section with the following work? 
> > ", so that servers can select (or formulate) an appropriate response, based on those request headers (or on other, implicit characteristics)."
> > 
> > 
> > Section 2.1
> > 
> >    access of third parties to those same header fields.  Without such an
> >    opt-in, user agents SHOULD NOT send high-entropy hints, but MAY send
> >    low-entropy ones [CLIENT-HINTS-INFRASTRUCTURE].
> > 
> > It looks like the reference only defines a registry for low-entropy
> > hints, and we are inferring that any hints not listed in that table are
> > to be treated as "high-entropy".  Perhaps we could reword both
> > directions of this directive to refer only to the registry of
> > low-entropy hints (e.g., "SHOULD NOT send hints that are not listed in
> > [registry]")?
> > 
> > Makes sense.
> >  
> > 
> >    Implementers need to be aware of the passive fingerprinting
> >    implications when implementing support for Client Hints, and follow
> >    the considerations outlined in the Security Considerations
> >    (Section 4) section of this document.
> > 
> > side note: in some sense the Accept-CH mechanism transforms it from a
> > passive to an active fingerprinting mechanism.
> > 
> > Good point! Removed "passive" here.
> >  
> > 
> > Section 2.2
> > 
> >    information in them.  When doing so, and if the resource is
> >    cacheable, the server MUST also generate a Vary response header field
> >    (Section 7.1.4 of [RFC7231]) to indicate which hints can affect the
> >    selected response and whether the selected response is appropriate
> >    for a later request.
> > 
> > side note: I suspect the answer I want is already present with a
> > detailed reading of RFC 7231, but I wonder if it's worth saying
> > something here about whether the Vary response header could/should
> > include registered client hint header field names that were not present
> > in the request in question.
> > 
> > https://tools.ietf.org/html/rfc7231#section-7.1.4 implies that Vary can be set to header names that are missing from the request. ("or lack thereof") 
> > I'm not sure we should mention that explicitly here.
> 
> Our general practice in this sort of situation is to mention setting Vary *if* the response is cacheable, to remind the reader, but _not_ to make it a requirement, since that requirement is already made by HTTP.
> 
> > 
> > Section 3.1
> > 
> >    Based on the Accept-CH example above, which is received in response
> >    to a user agent navigating to "https://example.com", and delivered
> >    over a secure transport, a user agent will have to persist an Accept-
> >    CH preference bound to "https://example.com".  It will then use it
> > 
> > What level of requirement is implied by "will have to" here?  IIUC, it's
> > just that "if anything is persisted, it must be keyed on" but with no
> > obligation to do any persistence.  If so, perhaps a wording like "any
> > persisted Accept-CH preference will be bound to" would be better?
> > 
> > The normative requirement in the paragraph above it is SHOULD.
> > I'll modify the wording to your suggested one.
> >  
> > 
> >    for navigations to e.g. "https://example.com/foobar.html", but not to
> >    e.g. "https://foobar.example.com/".  It will similarly use the
> >    preference for any same-origin resource requests (e.g. to
> > 
> > nit: comma after "e.g." (throughout).
> > 
> > OK
> >  
> > 
> >    "https://example.com/image.jpg") initiated by the page constructed
> >    from the navigation's response, but not to cross-origin resource
> >    requests (e.g. "https://thirdparty.com/resource.js").  This
> >    preference will not extend to resource requests initiated to
> >    "https://example.com" from other origins (e.g. from navigations to
> >    "https://other-example.com/").
> > 
> > Perhaps thirdparty.example and other.example, to stay within the BCP32
> > space?
> > 
> > Done
> >  
> > 
> > Section 3.2
> > 
> >    When selecting a response based on one or more Client Hints, and if
> >    the resource is cacheable, the server needs to generate a Vary
> >    response header field ([RFC7234]) to indicate which hints can affect
> >    the selected response and whether the selected response is
> >    appropriate for a later request.
> > 
> > Is BCP 14 language approprite here?
> > 
> > Indeed. Changed to SHOULD. 
> 
> As per above, we try not to restate requirements that are already specified elsewhere.
> 
> 
> > 
> > 
> >    Above example indicates that the cache key needs to include the Sec-
> >    CH-Example header field.
> > 
> > nit: please add the article "the" to make this a complete sentence.
> > 
> > Yup
> >  
> > 
> > Section 4
> > 
> > While I don't expect that I can tell the major browser vendors anything
> > new about the privacy considerations to client hints, I do think that we
> > should give some guidance to implementors of other HTTP clients, who may
> > not have such extensive depth of knowlege, on the general landscape in
> > which this mechanism is set.  The subsections hereof do a great job
> > covering a lot of relevant details and specific factors to consider;
> > thank you!  I think it may also be appropriate to have some more generic
> > lead-in text, noting that in the worst case, merely converting a passive
> > fingerprinting mechanism to an active fingerprinting mechanism with
> > server opt-in does not actually provide any privacy benefit (the worst
> > case being when all servers ask for all the data and clients accede)!
> > While we might hope that the need to jump through an extra hoop to
> > access fingerprinting information might dissuade some servers from
> > asking for it, it seems imprudent to assume that it will happen, so in
> > order to obtain real privacy benefit there needs to be some additional
> > policy controls in the client and in what hints are defined/implemented.
> > As I mentioned already, we already have a lot of the details for how to
> > apply such policy controls, and limitations to only define hints that
> > expose information already available in other means; what I'd like to
> > see is the high-level picture that ties them together.
> > 
> > 
> > OK. Added something. I'd appreciate your review to see if it matches what you had in mind.
> >  
> > Section 4.1
> > 
> >    upon it.  The header-based opt-in means that we can remove passive
> >    fingerprinting vectors, such as the User-Agent string (enabling
> >    active access to that information through User-Agent Client Hints
> >    [4]), or otherwise expose information already available through
> > 
> > I think this [4] is the same as [UA-CH].
> > 
> > It's pointing to a specific section of UA-CH. I'm not sure if this is critical.
> >  
> > 
> > Also, use of the first person ("we") is somewhat unusual in RFC style.
> > 
> > Changed.
> >  
> > 
> >    Therefore, features relying on this document to define Client Hint
> >    headers MUST NOT provide new information that is otherwise not
> >    available to the application via other means, such as existing
> >    request headers, HTML, CSS, or JavaScript.
> > 
> > As written, this is a fairly weird condition.  What constitutes
> > "available to the application via other means"?  Does "put up an
> > interstitial until the user provides the information in question" count?
> > 
> > Changed to "not made available to the application by the user agent"
> >  
> > 
> >    o  Entropy - Exposing highly granular data can be used to help
> >       identify users across multiple requests to different origins.
> >       Reducing the set of header field values that can be expressed, or
> >       restricting them to an enumerated range where the advertised value
> >       is close but is not an exact representation of the current value,
> > 
> > nit: "close to" seems like it would scan better.
> > 
> > Yup
> >  
> > 
> >    Different features will be positioned in different points in the
> >    space between low-entropy, non-sensitive and static information (e.g.
> >    user agent information), and high-entropy, sensitive and dynamic
> >    information (e.g. geolocation).  User agents need to consider the
> >    value provided by a particular feature vs these considerations, and
> >    MAY have different policies regarding that tradeoff on a per-feature
> >    basis.
> > 
> > How about on a per-origin basis (and, e.g., domain reputation)?  An
> > "entropy budget" where an origin that asks for too many distinct hints
> > won't get all of them?
> > 
> > Those are definitely policies that user agents can apply (e.g. one concrete proposal that looks a lot like your "entropy budget" is https://github.com/bslassey/privacy-budget)
> >  
> > (I also wonder if a descriptive "may wish to have" is better than the
> > normative "MAY", here.)
> > 
> > Sure. 
> > 
> >    o  Implementers SHOULD restrict delivery of some or all Client Hints
> >       header fields to the opt-in origin only, unless the opt-in origin
> >       has explicitly delegated permission to another origin to request
> >       Client Hints header fields.
> > 
> > Am I reading things right that this document does not define any such
> > delegation mechanisms but is just admitting the possibility of such
> > mechanisms being defined in the future?  I'd suggest clarifying up in
> > §2.1 with a parenthetical (akin to the "outlined below" note about the
> > opt-in mechanism).
> > 
> > Added an "(as outlined in {{CLIENT-HINTS-INFRASTRUCTURE}})" clarification to 2.1
> > 
> > 
> >    Implementers SHOULD support Client Hints opt-in mechanisms and MUST
> >    clear persisted opt-in preferences when any one of site data,
> >    browsing history, browsing cache, cookies, or similar, are cleared.
> > 
> > Who is the target audience for this SHOULD?  If it's just "people
> > implementing this document", it seems ineffectual, and if it's any
> > broader scope it seems unenforcable.
> > 
> > Removed the SHOULD here as it's already defined elsewhere that high entropy hints require an opt-in.
> > Also changed "implementers" to "user agents".
> > 
> > 
> > Section 4.3
> > 
> >    Research into abuse of Client Hints might look at how HTTP responses
> >    that contain Client Hints differ from those with different values,
> > 
> > nit: what are "responses that contain Client Hints"?  We have discussed
> > Accept-CH header fields in responses, and client hints in requests, but
> > the only mention I recall of hints in responses was in the Vary header
> > field, and it's not clear that that is what was intended.
> > 
> > Good catch! Changed to "responses to requests that contain Client Hints". 
> > 
> > 
> > Section 5
> > 
> >    While HTTP header compression schemes reduce the cost of adding HTTP
> >    header fields, sending Client Hints to the server incurs an increase
> >    in request byte size.  Servers SHOULD take that into account when
> > 
> > nit: I wonder if this would be more clear as:
> > 
> > % Sending Client Hints to the server incurs an increase in request byte
> > % size.  Some of this increase can be mitigated by HTTP header
> > % compression schemes, but each new hint will still lead to some
> > % increased bandwidth usage.  Servers SHOULD [...]
> > 
> > Changed. 
> > 
> > Section 7.1
> > 
> > I'm not sure I understand why [FETCH] is listed as a normative
> > reference.
> > 
> > Moved it to be informative.
> >  
> > 
> > I find it amusing that we reference both 7231 and 7234 for Vary, though
> > to my untrained eye the current references both seem appropriate in
> > their respective locations.
> > 
> > Section 7.2
> > 
> > If [CLIENT-HINTS-INFRASTRUCTURE] is to be the source of truth for
> > low-entropy (and, by deduction) high-entropy hints, it seems like it
> > should be normative.
> > 
> > Moved. 
> 
> --
> Mark Nottingham   https://www.mnot.net/
>