[hrpc] Late Review of draft-irtf-hrpc-guidelines-16

Eric Rescorla <ekr@rtfm.com> Mon, 26 December 2022 17:28 UTC

Return-Path: <ekr@rtfm.com>
X-Original-To: hrpc@ietfa.amsl.com
Delivered-To: hrpc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6ACEBC14CE30 for <hrpc@ietfa.amsl.com>; Mon, 26 Dec 2022 09:28:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.893
X-Spam-Level:
X-Spam-Status: No, score=-1.893 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=rtfm-com.20210112.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SiC6OF-H3nra for <hrpc@ietfa.amsl.com>; Mon, 26 Dec 2022 09:28:39 -0800 (PST)
Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8D814C14CE29 for <hrpc@irtf.org>; Mon, 26 Dec 2022 09:28:39 -0800 (PST)
Received: by mail-pg1-x52f.google.com with SMTP id h192so2642176pgc.7 for <hrpc@irtf.org>; Mon, 26 Dec 2022 09:28:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rtfm-com.20210112.gappssmtp.com; s=20210112; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=rD/9DHr69VtcK2rARwGmeOaLodL2hm8HUjm435M+q3Y=; b=WF7fWf50fJfJGsItkJl9QjbJ14b3wnVh/xICBdZd0oYrDGfzBUFhiEhNRIkrhDFl8T ywqbuufBM6f8b6CaGM0Fdf0YgZS5e4V9UKbqimx6+A3078tNVX5zhv16aXmLzOXE59i0 G1MFBOg0+CsFGAbHTeDV79us4Rkyqm48bL+itWGuTUshRx00MHNgfEzC1VfDUkGgXeIs k5wQVDrbGB6YCYUitJHHQ+DaMwprdzVzxhboCGb2Owxqn1O3EvK1pRvSF2lljCEHvyPS j3Uo8mDRI5DUQhsmtaGC3QIpYiekF+ngCOYmk4X0U/v1hRNzXCbRkJ69cRhC/rbhfmFb 1Jnw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=rD/9DHr69VtcK2rARwGmeOaLodL2hm8HUjm435M+q3Y=; b=ODgA/8oduvMsWTWRKkTvdlShMprHT0mwYqUPK4rmVlkMNkBWNjeFzHib9x80kCwUrm 4cC5+oP1WgN6YZEo1m24n4K5dP5uNshWxz25Tzcl8UY0LjSMYoEoba3wXV1snTNyS+zj aEXun/paS8v6Py55rzec3HBKiOEl7cE1U/lMJNGWAblTifLN0Px/+xOHbM0tQa57voh0 L7ljLsveAOfL4zRX78DfupeXy2qoXce+4dw3/BsIqoBLIYcAgXgpnG7RLQGP/9Mf0NgM sybkigbmqTsCNE3P9dOM03xHw/67gwP/7BxPRRTvlUB+D0DgKyzMnXV59NIm/WLwAZMV 1trA==
X-Gm-Message-State: AFqh2krjZ4IDBcIP6pR1f4RwMrPw5NLyw/m53aqHBooue+Ebty3gPfb4 YFvHaI30Q9KHdmlGJ5aRD6Y73Qyvyb7sV6FzLesULhnv71zhBLA1
X-Google-Smtp-Source: AMrXdXv7bjPgeLAQUFFWr3POhR5AqaoPml11uQSBD/QRXwT+1g05svCBrbgwvsJjJDfMrRYaqO6tMe0F2nu3ZSb/7uc=
X-Received: by 2002:aa7:9567:0:b0:579:e385:3130 with SMTP id x7-20020aa79567000000b00579e3853130mr1229681pfq.84.1672075718132; Mon, 26 Dec 2022 09:28:38 -0800 (PST)
MIME-Version: 1.0
From: Eric Rescorla <ekr@rtfm.com>
Date: Mon, 26 Dec 2022 09:28:02 -0800
Message-ID: <CABcZeBNqQCbNwTQeVuLdQvQLTbULiSqEwKcJBOdU99T7kMsUCQ@mail.gmail.com>
To: hrpc@irtf.org, draft-irtf-hrpc-guidelines@ietf.org
Content-Type: multipart/alternative; boundary="000000000000379fa205f0be77d3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/hrpc/I2mP4Wehcj3UTSGqh4uwH6QoA4Y>
Subject: [hrpc] Late Review of draft-irtf-hrpc-guidelines-16
X-BeenThere: hrpc@irtf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: hrpc discussion list <hrpc.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/hrpc>, <mailto:hrpc-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/hrpc/>
List-Post: <mailto:hrpc@irtf.org>
List-Help: <mailto:hrpc-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/hrpc>, <mailto:hrpc-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Dec 2022 17:28:41 -0000

Hi folks,

I finally got a chance to give this document a reread and have
quite a few comments. I recognize that it's inconvenient to
receive them this late in the process and I'll defer to Colin
as to how best to handle them.


FRAMING
The framing of this document centers the conduct of a "human rights
review" (Section 3) and the document talks about a Human Rights Review
Team. The implication is that there is some body (HRPC?) that stands
outside the ordinary IETF protocol design and development process and
somehow reviews protocols for their impact on human rights. It seems
to me that the available evidence is that this is not a great model,
ranging from the social (nobody likes outsiders coming in and telling
you you're doing it wrong) to practical (it fosters a shallow form of
engagement that is insufficient to address complex technical
problems.) My one experience with application of the methodology in
this document
(https://datatracker.ietf.org/doc/html/draft-martini-hrpc-quichr-00),
seemed to me to follow this pattern.

As a comparison point, I think it's useful to consider security review
and the methodology described in RFC 3552. At the time of the writing
of 3552, there was already broad consensus that Security
Considerations needed to be part of protocol development and
documentation and fairly broad agreement on how to do that analysis
(what threats should be considered, available technologies, etc.), as
well as an entity (the Security ADs) who was specifically empowered to
enforce security requirements. Despite that, there have been
persistent tensions due to differing views of what security tradeoffs
were appropriate between participants in different IETF areas. I think
experience clearly indicates that reviews of nearly finished protocols
are of limited usefulness here; what has worked much better is where
security expertise is baked into the process early, as with QUIC.

It seems to me that there are at least two reasons why protocols might
have human rights properties that are less good than we might like:

1. Protocol designers don't care as much about human rights issues
   as one might want.
2. Protocol design is a matter of compromise and so we
   end up with designs which try to balance those and so
   necessarily have suboptimal properties.

The majority of the text here seems to be devoted to justifying why
various technical features are necessary for human rights, which
implies that the problem is (1) but in my experience, (2) is much more
common. It's of course true that people have different judgements
about what's important, but in many cases it's simply a hard problem
to provide privacy, censorship resistance, etc.

To that end, I think this document should be targeted in a way that is
useful to protocol designers who are interested in human rights so
that it can be useful in the protocol development process. This
primarily means helping them identify issues and providing a path to
solutions. I don't think that's currently the case. In part, this is a
matter of adjusting the framing, but also of the content, as discussed
below.


CONTENT OVERALL
I found much of the material here very abstract and not of much use to
a working protocol designer. Let's take the discussion of anonymity
and pseudonymity in S 4.14 and S 4.15 (see more below as well.)  The
basic problem from a designer's perspective is that there are many
reasons that protocols need various kinds of identifiers and yet those
degrade privacy. So, the question is what technologies are available
to improve privacy while also providing the necessary protocol
functionality. This section is of little help in addressing that problem.

Second, much of the content in this document seems to be just about generic
properties that I think there is broad consensus protocols should
strive for (e.g., connectivity, reliability, etc.) that have been
labelled here as important for human rights. I'm not saying that that
isn't true, but in the context of this document, I don't think it's
very helpful, for three reasons:

1. As I said, these are properties that we generally aim for in
   any case, and so are likely to be covered by others.

2. The discussion here is fairly shallow and not really of much
   help in achieving these properties.

3. It distracts from the properties that are more distinctively
   about human rights and might not otherwise be considered.

IMO this document would be improved by focusing specifically on those
distinctive human rights properties and providing a much more in-depth
treatment. To return to the discussion of anonymity and pseudonymity,
it would be very useful to provide a discussion of the problem of IP
identifiers in protocols and of various ways to address the associated
privacy issues in different contexts. For instance, if you wanted to
talk about DNS, this could include material on proxying, caching, PIR,
etc.  (as well as a discussion of the tension between the privacy
afforded by caching and the tension with the recursive seeing your
query). This would be far more useful to protocol designers than
what is currently there.



DETAILED COMMENTS
S 4.1.
   Question(s): Does your protocol add application-specific functions to
   intermediary nodes?  Could this functionality be added to end nodes
   instead of intermediary nodes?

There are plenty of functions which *can* be performed at end nodes
but which are much better performed at what one might consider
intermediaries. The question is where the function is best performed.

   Is your protocol optimized for low bandwidth and high latency
   connections?  Could your protocol also be developed in a stateless
   manner?

   Explanation: The end-to-end principle [Saltzer] holds that certain
   functions can and should be performed at 'ends' of the network.
   [RFC1958] states "that in very general terms, the community believes
   that the goal is connectivity [...] and the intelligence is end to
   end rather than hidden in the network."  Generally speaking, it is
   easier to attain reliability of data transmissions with computation
   at endpoints rather than at intermediary nodes.

A few points here:

1. I don't see that the E2E principle has a lot to do with connectivity
and low bandwidth, so perhaps you need some rewrite here.

2. ISTM that the statement that opens this paragraphs is kind of
overgeneralizing the E2E argument. There are plenty of cases where
end-to-end systems are possible but behave much worse. To take one
example, P2P delivery systems have much worse reliability and
performance problems than centralized CDN-type systems.

   Example: Encrypting connections, like done with HTTPS, can add a
   significant network overhead and consequently make web resources less
   accessible to those with low bandwidth and/or high latency
   connections.  [HTTPS-REL] Encrypting traffic is a net positive for
   privacy and security, and thus protocol designers can acknowledge the
   tradeoffs of connectivity made by such decisions.

HTTPS actually does not really add significant network overhead for
most traffic. What it does--and what this citation says--is prevent
caching by intermediaries (which, incidentally, makes it an odd
example to provide in the context of this section). You should rewrite
this text. It would also be nice to provide a better reference than a
blog post, e.g., one that provided at-scale measurements and not
just anecdotes.


S 4.2.
   It is important here to draw a distinction between random degradation
   and malicious degradation.  Many current attacks against TLS
   [RFC8446], for example, exploit TLS' ability to gracefully downgrade
   to non-secure cipher suites -- from a functional perspective, this is
   useful; from a security perspective, this can be disastrous.  As with

I'm not sure what paragraph this is referring to. Are you saying that
there are a lot of current attacks on cipher suite negotiation in the
wild? If so, that seems like a surprising result, and I would expect
to see a citation to it.

Alternately, perhaps you're referring to attacks like FREAK or Logjam,
and "many current" means "a number of papers have been recently
published". If that's what you mean, then this is pretty confusing
because (1) those papers are now fairly old and (2) your citation to
TLS 1.3 which is designed to resist those attacks and (3) many
implementations removed the offending cipher suites because of this
work.

In either case, this text really needs a rewrite.


   useful; from a security perspective, this can be disastrous.  As with
   confidentiality, the growth of the Internet and fostering innovation
   in services depends on users having confidence and trust [RFC3724] in
   the network.

It's not clear to me that this is true, given that we know that the
Internet grew very quickly in a setting where there was very little
encryption (recall that HTTPS usage was well under 50% in 2014). Do
you have a citation for this claim?

   Example: In the modern IP stack structure, a reliable transport layer
   requires an indication that transport processing has successfully
   completed, such as given by TCP's ACK message [RFC0793], and not
   simply an indication from the IP layer that the packet arrived.

This is kind of an odd sentence. I mean, it's true that this is how
TCP is architected, but actually one could imagine having a system
which just acknowledged the IP packet which would probably work
almost as well. I agree that the E2E argument suggests that this
is not as good, but then that's not what this section is about.


S 4.3.

   Question(s): If your protocol impacts packet handling, does it use
   user data (packet data that is not included in the header)?  Is it
   making decisions based on the payload of the packet?  Does your
   protocol prioritize certain content or services over others in the
   routing process?  Is the protocol transparent about the
   prioritization that is made (if any)?

   Explanation: Content agnosticism refers to the notion that network
   traffic is treated identically regardless of payload, with some
   exceptions where it comes to effective traffic handling, for instance
   where it comes to delay-tolerant or delay-sensitive packets, based on
   the header.  If there is any prioritization based on the content or
   metadata of the protocol, the protocol should be transparent about
   such information and reasons thereof.


I don't think that this is a very helpful framing around content
agnosticism, for a nunmber of reasons.

1. It's possible to do traffic class discrimination based on data
   that is solely in the headers, and as more traffic is encrypted,
   increasingly this will be the main way in which people do it,
   focusing on the payload doesn't help much.

2. There are at least arguably valid reasons for providing
   differential treatment for different traffic classes. This is
   handled here in a sort of brief aside around "delay-tolerant", but
   that's of course not the only reason (e.g., DoS). I'm not as
   sympathetic to these areguments as some others, but I think they
   deserves a more thorough treatment than provided in this section.

3. In general, it's not really the *protocol* which provides
   differential treatment. It's an operational practice. The
   protocol can either enable or prevent it.


   Example: Content agnosticism prevents payload-based discrimination
   against packets.  This is important because changes to this principle
   can lead to a two-tiered Internet, where certain packets are
   prioritized over others on the basis of their content.  Effectively
   this would mean that although all users are entitled to receive their
   packets at a certain speed, some users become more equal than others.

I think it's important to distinguish between user identity and user
behavior. If, for instance, a carrier prioritizes traffic to their
own video service over some other service, it's not that they
are giving Alice more bandwidth than Bob based on their status.
This might or might not be objectionable, but it's not as simple as
"some users become more equal than others".

I suppose you could say that they are delivering their own packets
at a certain speed but not the other video service, but those
entities are not what people typically think of when they think
of "users" and of course in that case the video service is not
a customer of the carrier, so the relationships are complicated.


S 4.4.

   In the current IETF policy [RFC2277], internationalization is aimed
   at user-facing strings, not protocol elements, such as the verbs used
   by some text-based protocols.  (Do note that some strings are both
   content and protocol elements, such as identifiers.)  Given the
   IETF's mission to make the Internet a global network of networks,
   [RFC3935] developers should ensure that protocols work with languages
   apart from English and character sets apart from Latin characters.
   It is therefore crucial that at the very least, the content carried
   by the protocol can be in any script, and that all scripts are
   treated equally.

This text (and in particular "at the very least") seems to imply that
it's the opinion of this draft (and hence HRPC) that identifiers
should in fact be internationalized, but that that's not IETF
practice, but it doesn't actually say that. I don't think that that
form of I18N is a particularly good idea, but in any case, this text
should be clearer about whether it's disagreeing with that policy.


S 4.7.

   Question(s): Does your protocol support heterogeneity by design?
   Does your protocol allow for multiple types of hardware?  Does your
   protocol allow for multiple types of application protocols?  Is your
   protocol liberal in what it receives and handles?

See here:
https://www.ietf.org/archive/id/draft-iab-protocol-maintenance-10.html


   Example: Heterogeneity is inevitable and needs be supported by
   design.  Multiple types of hardware must be allowed for (e.g.,
   transmission speeds differing by at least 7 orders of magnitude,
   various computer word lengths, and hosts ranging from memory-starved
   microprocessors up to massively parallel supercomputers).

This graf seems either trivial or wrong. Yes, the Internet needs to
work over a wide range of hardware scenarios (trivial) but we
routinely design protocols that are intended to work only in
relatively modern computing environments, so the implication here that
everything we do has to work on every kind of computer is just
wrong. Indeed, that's why we see WGs/protocols that are targeted
specifically for resource-constrained environments.


S 4.8.

   Question(s): Is your protocol written in such a way that it would be
   easy for other protocols to be developed on top of it, or to interact
   with it?  Does your protocol impact permissionless innovation?  (See
   Open Standards)

This seems like it applies to a fairly narrow set of protocols
(e.g., transport protocols). For instance, what does it mean for
a new protocol to be "developed on top" of Certificate Transparency
or PKIX?


   Example: WebRTC generates audio and/or video data.  In order to
   ensure that WebRTC can be used in different locations by different
   parties, it is important that standard Javascript application
   programming interfaces (APIs) are developed to support applications
   from different voice service providers.  Multiple parties will have
   similar capabilities, in order to ensure that all parties can build
   upon existing standards these need to be adaptable, and allow for
   permissionless innovation.

I have no idea what this paragraph means. WebRTC *consists of*
standardized APIs. Are you just stating that fact? Arguing for
something new?

S 4.9/4.10.
I don't think it's that useful to talk about integrity as separate
from authentication, but if you want to, you should talk about
authentication first, as integrity isn't reall yuseful without integrity.

   Alice wants to communicate with Bob.  Alice sends data to Bob.
   Corinne intercepts the data sent to Bob.  Corinne reads and alters
   the message to Bob.  Bob can see that the data did not come from
   Alice.

This is not quite correct. What Bob can see is that he is unable
to verify that the data came from Alice, not that it did not come
from her. Consider, for instance, the case where you do a DNS
resolution and some intermediary strips the RRSIG. The data is
still authentic, just not verifiable.


S 4.11.
This section seems to conflate two different questions:

- Technical: does the protocol provide confidentiality
  (i.e., is it encrypted)

- Policy: are there mechanisms to address sharing by
  entities which can see the data

For good or bad, IETF protocols typically treat these policy questions
as out of scope. I think this section would be a lot clearer if you
focused on the technical questions.


S 4.14/S 4.15
I don't think separating out pseudonymity from anonymity is useful in
this context. For instance, it's weird to talk about ODoH in the
context of pseudonymity, when it's actually about providing anonymous
access (there's no pseudonymous identifier).

Protocols can contain identifying information that varies along at
least two axes:

- Resolution (e.g., k in k-anonymity)
- Temporal stability

At one extreme, we have complete anonymity in a system like a mixnet,
and at the other we have a system with permanent identifiers that are
scoped to a single person (e.g., something like social security
numbers). In the middle, we have a pile of different types of
identifying information with different properties, e.g.,

- IP addresses (low k, fairly stable)
- Fingerprinting (low to high k, somewhat stable)
- QUIC connection identifiers (k=1, not that stable)

Trying to cram this into a pseudonymity vs. anonymity framework
doesn't seem that useful. For instance, is a QUIC CID a pseudonym?

I would rewrite these sections something like as follows:

- Talk about the various threats and forms of leakage, including:
  identication, linkage, and reidentification

- Describe the protocol situations where some form of persistent
  identity is required and at which time scales, e.g., connection
  identifiers, user continuity identifiers, communication addresses,
  etc.

- Describe technical mechanisms for providing privacy against
  the backdrop of those protocol mechanisms. For instance:

  + TLS 1.3 encrypts the PSK identity to avoid allowing outsiders
    to link multiple connections to the same user

  + ODoH provides privacy in the face of the fact that the IP
    address (needed for communication) is identifying

  + Cookie isolation mechanisms prevent third party tracking via
    what is otherwise an identifier (the cookie)

I think this will be a lot clearer.


S 4.16
I don't think that the discussion of new user-level identifiers is that
useful in the context of censorship resistance. It's of course true
that user identifiers can be used to target people visiting certain
sites but this is more about anonymity (see above) than censorship.

   Example: Identifiers of content exposed within a protocol might be
   used to facilitate censorship, as in the case of Application Layer
   based censorship, which affects protocols like HTTP.  In HTTP, denial
   or restriction of access can be made apparent by the use of status
   code 451, which allows server operators to operate with greater
   transparency in circumstances where issues of law or public policy
   affect their operation [RFC7725].

   If a protocol potentially enables censorship, protocol designers
   should strive towards creating error codes that capture different
   scenarios (blocked due to administrative policy, unavailable because
   of legal requirements, etc.) to minimize ambiguity for end-users.

This set of paragraphs seems pretty confusing. Specifically, I don't
think it's really the case that HTTP "facilitates" censorship. HTTP is
a fairly typical client/server protocol, and the censorship in this
case consists of governments telling the servers not to serve a given
piece of content, so there's nothing specific about HTTP which really
"enables" or "facilitates" censorship (it's of course true that if
it's in the clear then censorship is easy, but 451 is designed to
operate in cases where the data is encrypted over HTTPS).

It's of course true that there are P2P protocols which are more
resistant to this kind of blocking at a particular server, but given
that they have very little deployment, it's not clear that they in
practice have better censorship properties; in fact, due to their bad
privacy properties, it's possible they have much worse properties for
targeting people who try to retrieve disfavored content.

It's a serious omission that this section doesn't talk about the
primary forms of censorship we see now, which are focused on various
kinds of blocking, either of DNS or of connections to specific
servers. You should add discussions of these techniques as well as
countermeasures such as Tor, ECH, DoH, etc.


S 4.17.
I'm really struggling with how to operationalize the guidance in this
section. Suppose that we were writing an analysis like this for (say)
SMTP, what would it say?


S 4.20.
I agree that you're identifying a useful tension here between
anonymity and addressing misbehavior, but the discussion here just
says "yeah, there's a tension", which seems unhelpful.  Do you have
any guidance for how protocols could be designed to address both of
these cases? Examples seem pretty thin on the ground, but I would at
least cite message franking as implemented in Facebook Messenger.