Re: [lamps] draft-ietf-lamps-rfc3709bis-01 security, reliability, and privacy considerations

Thanks for the review Daniel,

Thank you also to take your time to explain in such details the many
issues and challenges involved with retrieval, caching, storing and
preserving data.

I would like to respond to this on a conceptual level rather that
attempting to comment on each point.

I don't think each specification that allows data to be referenced using
a URI has the obligation to educate and specify every aspect and issue
related to retrieval, caching, storing and preserving data. The question
is not as much what the potential problems are, but what this document
must say about it.

I'm not convinced that this document is the palace to regulate or advice
on caching or storage. My main motivation for this position is that the
potential use-cases are not specific to the degree where you can give
specific meaningful guidance. Any guidance will therefore be generic to
the degree that it should belong in generic documents. Implementers can
chose their approach to this issue from the context of how the
certificates are issued and used.

Obviously things becomes much easier if all data is embedded, but this
has the the cost of larger size.

Another argument for not giving specific guidance here is that caching
may also be handled on another layer that is completely independent of
this specification and its implementation, such as generic http caching
by the application or the network.

On all other issues, I include comments below.

On 2022-03-25 21:14, Daniel Kahn Gillmor wrote:
> I did a quick skim of
> https://www.ietf.org/archive/id/draft-ietf-lamps-rfc3709bis-01.html and
> observed that while the Security Considerations section describes some
> of the "phone home" or "web bug" concerns about logotypes-by-reference,
> they aren't particularly well fleshed out.
>
> The current text is:
>
>   > Logotype data is fetched from a server when it is needed. By
>   > watching activity on the network, an observer can determine which
>   > clients are making use of certificates that contain particular
>   > logotype data. This observation can potentially introduce privacy
>   > issues. Since clients are expected to locally cache logotype data,
>   > network traffic to the server containing the logotype data will not
>   > be generated every time the certificate is used. In cases where
>   > logotype data is not cashed, monitoring would reveal usage
>   > frequency. In cases where logotype data is cached, monitoring would
>   > reveal when a certain logotype image or audio sequence is used for
>   > the first time.
>
> This and at least one other paragraph in the Security Considerations
> section might belong better in a separate Privacy Considerations
> section.

I have no opinion or preference here.

> Below i unpack a bit why these "web bugs" features are not well-fleshed
> out in the draft as it stands.  This spec is surprisingly twisty and
> complex, so i'm not sure that this covers everything:
>
>  a) there are at least two different things that a relying party might
>     need to make a network call when trying to render a certificate:
>
>     - an indirect LogotypeData object
>     - the image or audio file referenced from a LogotypeData object
>
>     If the client is going to cache these things to avoid leaking
>     information on the network, it presumably needs to cache both types
>     of object.
>
>  b) to minimize the number of network requests the relying party's cache
>     should be indexed by cryptographic digest of the objects it
>     holds. Since a single object might be referenced by multiple
>     cryptographic digests, a maximally privacy-protective local cache
>     should probably contain one index per supported digest algorithm.
>
>  c) If any one of the elements in logotypeHash is already in the relying
>     party's local cache, the privacy-preservng thing for the client to
>     do is to ignore all logotypeURI elements, even if some of the
>     logotypeHash objects are not found in the cache.
>
>  d) note that the *lack* of a network request for a given URI can also
>     be used for fingerprinting (depending on how the client deals with
>     newly-encountered certs, this could be analogous to HSTS
>     fingerprinting, for example:
>     https://datatracker.ietf.org/doc/html/rfc6797#section-14.9)
>
>  e) If a subscriber asks the issuer to include an image or audio via
>     URL, or if they ask the issuer for an indirect logotype, the issuer
>     can't be sure that any logo data suggested by the subscriber is
>     going to be available for the lifetime of the certificate at the URL
>     proposed.  Furthermore, the issuer has no control over privacy
>     policies that govern metadata collection by the host of any remote
>     resource.  This draft should encourage an issuer who cares about
>     potential privacy risks for its relying parties to copy any
>     referenced data to a server that it already controls and can make
>     reasonable guarantees about privacy, reliability, etc.  Note also
>     that if the subscriber asks the issuer to refer to an indirect
>     LogotypeData object, and the issuer decides to host that data itself
>     in addition to hosting the underlying image or audio resource, the
>     logotypeData object will need to be rewritten to change the
>     underlying URLs (and will therefore have a different digest than the
>     requested object).  This at least makes the privacy situation
>     comparable to CA-controlled OCSP, rather than subscriber-controlled
>     resources.
>
>  f) i didn't see any guidance to relying parties about how to handle a
>     mismatch between the MIME type referenced in the mediaType field of
>     the logotypeData and the Content-Type HTTP header of the retrieved
>     resource.  Consider a polymorphic bytestream that could be
>     interpreted by either a png renderer or a pdf renderer.  As the spec
>     is currently written, it looks like a client could accept a
>     Content-Type from the https server that doesn't match the mediaType
>     field, and it might accept the HTTP header as ground truth.
>
>  g) the draft should offer additional guidance for a CA that wants to
>     demonstrate a commitment to not planting web bugs in its
>     certificates.  here are several approaches, including:
>
>       - Always using direct LogotypeInfo objects (no network access)
>
>       - Self-host referenced objects (as in (e))
>       
>       - Consolidate hosted objects (all objects with the same hash are
>         always at the same URI, to avoid path-based fingerprinting)
>
>       - ensure that encoded non-data URIs are as low-entropy as
>         possible, to limit the amount of individualized tracking that
>         could be done (a distinct URL per cert would be the most
>         dangerous)
>
>  h) advise relying parties on how to constrain their HTTP resource
>     fetches to minimize fingerprinting (e.g., no cookies, no e-tags,
>     anonymized user-agent, no client certificates, and so on) -- is
>     there some sort of anonymous HTTP client profile we could point to?
>     (note that this might contradict the guidance about performing
>     "appropriate security controls" for privacy-sensitive logos)
>
> A few additional concerns about the draft (sorry this is all smushed
> into a single e-mail -- if there is an issue tracker for this draft,
> please reference it in subsequent versions of the draft so that these
> notes can be broken out separately):
>
>  i) not clear how to map the elements in the logotypeHash sequence of
>     hashes to the elements in the logotypeURI sequence of URIs.  Should
>     every reference within a given logotypeDetails refer to the same
>     underlying object?  (this is related to (c) above).

The text says: "Both direct and indirect addressing accommodate
alternative URIs to obtain exactly the same item"

>  j) this might be an ASN.1 confusion on my part, but the index
>     parameters within LogotypeImageInfo object seem off to me.  they
>     start with [0] for the "type" field, but then language is [4] and
>     there are four fields in between (fileSize, xSize, ySize, and
>     resolution).  Shouldn't language be [5] or am i misunderstanding
>     ASN.1?

Tags do not need to be in an unbroken sequence. The tag is just an
identifier, not an index. And not all items must be tagged.

I don't remember exactly, but the reason for [4] may have been chosen as
resolution is optional.

The key is that there is no ambiguity for the parser what field each
present integer specifies.

>  k) this draft implies a range of types of logotype: subjects, issuers,
>     loyalty schemes, etc.  It's pretty easy to see this as expanding
>     beyond use of logos (e.g. photo IDs for an S/MIME cert).  i'd hope
>     to see some considerations either explicitly ruling out non-logo use
>     of these objects, or at least addressing the privacy implications of
>     using them for non-logo use.

There are already non-logo use. Se section 4.4.3, and that is one of the
RFCs that used to be a spin-off RFC (6170) that not is merged into this
document.

>
>  l) when fetching an indirect logotype, what Content-Type should the
>     relying party expect for the LogotypeData?   I'm assuming it's
>     supposed to be DER-encoded?

You are right. This is not specified, nor is the encoding of the present
data. This should be fixed.

I would suggest application/octet-stream and DER encoding

>  m) no security considerations for folding polymorphic image renderers
>     into certificate handling code?  we have recent evidence that there
>     can be pretty disastrous bugs lurking there:
>     https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html
>     guidance about ensuring narrow, well-audited codepaths would be
>     great.

I don't feel qualified to answer this one. If there is something that
should be said, then we should say that.

> Sorry this review is so long and yet still incomplete.  There are a lot
> of moving parts in this draft.
>
> i welcome comments, feedback, questions about anything that's unclear here.
>
>    --dkg
>
> PS the Security Considerations section contains the word "cashed" when i
>    think it means "cached"
>
> _______________________________________________
> Spasm mailing list
> Spasm@ietf.org
> https://www.ietf.org/mailman/listinfo/spasm