Re: [hrpc] first screening of RFC7230 for Human Rights leads

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Wed, 04 February 2015 03:57 UTC

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: hrpc@article19.io
In-Reply-To: <54D0FCD7.60302@article19.org>
References: <54D0FCD7.60302@article19.org>
User-Agent: Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (x86_64-pc-linux-gnu)
Date: Tue, 03 Feb 2015 18:20:45 -0500
Message-ID: <87siem8tv6.fsf@alice.fifthhorseman.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [hrpc] first screening of RFC7230 for Human Rights leads
Precedence: list

Hi Niels--

Thanks for this effort!  A few thoughts from me below, spurred by your
initial commentary.  I apologize in advance that (when trying to read
critically at least) i adopt something of a devil's advocate position.
My comments are supportive in that i'd like to see this HR analysis
proceed from a strong and well-thought-through perspective.

On Tue 2015-02-03 11:52:39 -0500, Niels ten Oever wrote:
> I went through RFC7230 [0] over the last days and found some hooks
> that might be interesting.
 [...]
> This is relevant for our research because it helps us to look at both
> what is organized in protocols, but maybe even more, in what is not
> regulated or standardize which is what makes the Internet an enabling
> environment for freedom of expression.

The idea here i think might be framed as the network being
"content-neutral".  This is supported in some sense by the satirical
April Fools' Day RFC 3514, which introduces the "evil bit":

 https://tools.ietf.org/html/rfc3514

>>   Because NAT [RFC3022] boxes modify packets, they SHOULD set the evil
>>   bit on such packets.  "Transparent" http and email proxies SHOULD set
>>   the evil bit on their reply packets to the innocent client host.

There are more serious RFCs that take the idea of a content-neutral or
content-agnostic network as a given too (for one thing, it's a common
engineering practice to layer designs by introducing deliberate
agnosticism about the layers above or below the system being specified)

Unfortunately, not all proposed standards hold this line on content
neutrality:

  https://tools.ietf.org/html/draft-nottingham-safe-hint-05

(of course, the above draft isn't an official RFC yet -- should we be
comparing drafts-that-didn't make it with drafts that ended up becoming
RFCs?)

With a corpus as large as the RFCs, how should this project avoid
confirmation bias?  If we look for the things we want to find, it seems
like we should probably also make sure to look for things we *don't*
want to find, to see whether they're there too. (see also: Biblical
Exegesis ☺)

> 1.
> In the abstract, HTTP is described as a protocol for distributed and
> collaborative information systems. This has clear technical
> implications, but it could also described equality of nodes, which
> might make this translatable in rights implications. I'm especially
> thinking about the right to receive and impart information and ideas
> through any media and regardless of frontiers. A distributed system
> affirms and enables that by design.
>
> Perhaps a good start is to note the words that keep on coming back in
> a rights context, and word mine all RFCs for these words?
>
> Some of these words could be: connectivity, distributed,
> collabarative, reliable, scalable, caching
>
> This could be one way of selecting new and more RFCs, and perhaps
> auto-grouping them per theme.
>
> 2.
> [self-descriptive message payloads] -> This means that both content
> and description are to defined by the author/host system, so people
> can categorize, frame and describe their content themselves, instead
> of it being auto categorized.

i find it a bit funny that in 1. above you're proposing auto-grouping
the RFCs themselves (presumably after fetching them via HTTP), and in
2. here you're saying that the protocol is designed in opposition to
"auto categorization".

I'm wary of the term "auto" in places like this, because i think it
removes agency.  who is doing the categorization?  even if it's
"automatic", it's under the control of someone.

I think the point you're trying to make is that the definition of HTTP
represents a communication between peers, and peers get to have the
conversation without having to talk to (or get approval from) anyone
else if they don't want to.

 (technically, this might not be entirely true: DNS and (for HTTPS) the
   X.509 certificate authority cartel are in some ways mandatory
   "brokers" when negotiating the creation of an HTTP session, even if
   they don't get a say about the content of the communication once the
   session is created)

> 3.
> [QUOTE]
>    Likewise, servers do not need to be aware of each client's purpose: an HTTP request can be
>    considered in isolation rather than being associated with a specific type of client
>    or a predetermined sequence of application steps.
> [/QUOTE]
>
> This supports innovation, development, and flexibility. Would this
> have any rights implications? Or is this just very practical way of
> defining a broadly used protocol? Could be linked to  'IP
> disinterestness' (as mentioned above), which creates space for freedom
> of expression and freedom of assembly, by supplying tools but not
> defining the way in which it needs to be done.

This property is usually called "statelessness" for the server (and not
in a "smash the state" sense!).  Statelessness a useful property from a
technical perspective because it means you can have the server crash and
not have to worry about what happens to the client when it comes back up
(the client can just carry on as it was).

In practice, of course, everyone wants to introduce state because it
makes certain kinds of workflows (e.g. multipage forms, logged-in
accounts, widespread user surveillance) much more convenient.  Hence
cookies and other similar mechanisms.

But the point of defining HTTP as a stateless protocol is so that
servers *can* be implemented statelessly, for those who have engineering
constraints that preclude keeping state on the server side (e.g. a
machine with no way to write internal storage).

NFS (the "network filesystem") is another protocol that has jumped
through many hoops to keep "statelessness" for the server.  see:

 https://tools.ietf.org/html/rfc1094#section-1.3

However, NFS as of version 4 has gradually acquired server-side state
(that is, state shared between the client and the server), while
retaining some mechanisms aimed at easing this requirement for servers
that fit certain profiles:

  https://tools.ietf.org/html/rfc3530#section-8.14

otoh, aiming for statelessness itself doesn't have to be motivated
purely by technical goals.  For example, if you design a protocol that
*requires* the server to maintain state about its users (e.g. internet
relay chat (IRC) servers retain state about who is connected and what
channels they're connected to), you make it impossible for someone who
*doesn't* want to track their users to implement the protocol in a
non-tracking way.

Whether the push for statelessness in some internet protocols derives in
part from this urge to safeguard against ubiquitous surveillance is
pretty hard to say, of course.

> 4.
> [QUOTE]
> An HTTP "client" is a
>    program that establishes a connection to a server for the purpose of
>    sending one or more HTTP requests.
> [/QUOTE]
>
> Interestingly, as interaction starts with the request from a client.
> The primacy of every action lies with the client. Which could point to
> souvereignty, autonomy and/or freedom of choice. Are all services
> based on a request? Are all protocols initiated by request? Would be
> interesting to have a statement about the primacy of the client. How
> does this relate to cookies, consent, etc? In other words: does all
> automation start with clients?

I'm not sure this is anything but a technical label.  In the
client/server network communications model, the server is defined as
being the "listener".  the client is the one that initiates a
connection.

Not all protocols are client/server, though the stuff in the IETF tends
to be client-server because it's simpler to describe.

peer-to-peer protocols like bittorrent aren't client/server, for
example.  But i don't think the IETF has ever even tried to standardize
bittorrent. And while the protocol at a high level might not be
client/server, each individual communication that happens during a
bittorrent session (i'm not sure i'm using the right BT terms here -- i
don't know much about the protocol) is probably using a client/server
model, where one peer (the client at that moment) sends a message to
another peer (the server at that moment).

TCP itself supports a "simultaneous open" mode, where neither side is
the client or the server:

  https://tools.ietf.org/html/rfc793#page-32

But there are very few attempts to use simultaneous open in the wild (i
think that STUN or TURN might use it, but i don't recall the details)

Some lower-level protocols like Ethernet (also not standardized by the
IETF) are by definition broadcast -- everyone in a given broadcast
domain receives every message, and the recipients are just expected to
filter out traffic that isn't aimed at them.

The IETF has some protocols like IP multicast that enable subscription
mechanisms that might take advantage of this lower-level broadcast
technique:

 https://tools.ietf.org/html/rfc1112

> Obviously not, because one can scan for open ports, ping user agents,
> etc.

by scanning for open ports, you're acting as a (TCP or UDP) client.  I'm
not sure what you mean by "ping user agents".  the traditional ping
(ICMP echo request) happens at the IP layer, from host to host, where
"user agent" has no definition that i know of.

> But then one can configure a user agent to respond to this or not
> (like putting robots.txt on your webserver signals that you don't want
> your site to be indexed by Google).

i think this text might confuse some people because of the terms used.
in a googlebot→webserver connection, the "user agent" is googlebot, and
the "origin server" is the web server.  Putting robots.txt in your web
server is a configuration of the origin server.  And robots.txt is
purely advisory, encouraging (hoping?) that the connecting user agents
will respect the request.

> This seems to point to deepening of this topic:
>
> [QUOTE]
>    The implementation diversity of HTTP means that not all user agents
>    can make interactive suggestions to their user or provide adequate
>    warning for security or privacy concerns.
> [/QUOTE]
>
> Eventhough security and privacy concerns are valid (one does not want
> to give away more information than necessary), this could also fit
> within a freedom of expression context where a user is free to hold an
> opinions (and thus not hold or impart others!).

If we're reading this in respect to human rights, i'd be more inclined
to take it from a disability rights perspective; you can't specify
something into the protocol that assumes that the end user has a visual
display that works for them, or is physically capable of selecting
choices from a presented menu, etc.

> 5.
> In the client request Accept-Languages are defined. Perhaps we can
> relate this to the research into IDNs (see draft) and/or use this
> [QUOTE] to show the ambition of the Internet community to
>    reflect the diversity of users and to be in line with Article 2 of
>    the Universal Declaration of Human Rights which clearly stipulates
>    that 'everyone is entitles to all rights and freedoms [..], without
>    distinction of any kind, such as [..] language [..].
> [/QUOTE from ID]

I agree with this view.  You can also argue it from the reverse, which
is that traditionally, Internet protocols concerned themselves only with
characters expressable by US-ASCII (which limits to languages that use
the latin alphabet), and the story of protocol development has been one
of expansion that covers more of the diversity of human communications
for the content that the protocol transmits.

Interestingly, though, most protocols retain the ASCII-only simplicity
for protocol messages themselves.  For example, HTTP headers are all
defined in ASCII.  HTML tags are all named in ASCII, even when the
content of the page is entirely in ideograms.  And the framing messages
(e.g. EHLO, DATA, etc) in SMTP are still ASCII and will probably always
be.  There's a subtle substrate of linguistic dominance threaded in
there if you want to go looking for it.

For that matter, the RFCs themselves are all written in English (or some
weird and formalized approximation thereof).

> 6.
> Caching is crucial for enabling better access to information in areas
> with slow connection. Could we state that through caching access to
> information is improved? Further research to be done in RFC7234.

Caching is also a place where intermediaries are introduced into what
would otherwise be a peering relationship, though.  Caching proxies can
modify content, spoof content outright, or refuse to serve content.

the httpbis working group regularly fends off proposals for
machine-in-the-middle (MitM) caching proxies for https, which come
complete with arguments very similar to "enabling better access to
information":

https://tools.ietf.org/html/draft-loreto-httpbis-explicitly-auth-proxy

actually says "possibility to enhance delivery performance" :/

I consider these arguments to be dangerous to the idea of network
security, and i'm glad that the IETF has avoided standardizing this sort
of thing so far.

> 7.
> What (social) requirements could be meant here? :
>
> [QUOTE]
>    Additional (social) requirements are placed on implementations,
>    resource owners, and protocol element registrations when they apply
>    beyond the scope of a single communication.
> [/QUOTE]

i'm having a hard time parsing this myself, but i think they're saying
"most of the requirements we state in this document have to do with what
happens explicitly in a single connection.  some other requirements,
though, have scope larger than a single communication, such as how to
add a new element to this protocol, whether clients should open multiple
connections to a given server, or whether a server should publish URIs
for its own resources that it would fail to parse on subsequent
connections."  These larger-scoped requirements are the "social"
requirements.

> 8.
> Strong point for slow and/or instable connections, support
> connectivity where there is bad connection. Excellent protection of
> the right to receive and impart info. I think we could frame this
> under connecivity as well.
>
> [QUOTE]
> 6.3.1.  Retrying Requests
>
>    Connections can be closed at any time, with or without intention.
>    Implementations ought to anticipate the need to recover from
>    asynchronous close events.
> [/QUOTE]

this is definitely about the ethic of trying to connect, and robust
communications in general.  Without this baseline assumption, most
internet standards be even worse than the (admittedly not very good)
experience we've come to expect.  But it's not necessarily about
supporting connections where the underlying links might be bad.  I'd
argue that it's more about responsible handling (and awareness) of error
conditions.

Postel's law, which was taken as gospel for many years (and is named
after Jon Postel, the first RFC editor, and the author of numerous early
RFCs), emphasizes connectivity in a formulation that usually runs
something like this : "Be liberal in what you receive, and conservative
in what you send".

But within the tech security community, Postel's law is now under attack
(or at least, heavy revision).  In particular, it is understood to often
lead to buggy, non-predictable implementations that are likely to harbor
security vulnerabilities (e.g. imagine if a TCP implementation accepted
packets that had a "close enough" sequence number, instead of requiring
a correct match).  Modern security-conscious standards are much more
likely to adopt Postel's law in a more minimalist form, encouraging
implementors to drop or reject ill-formed input, while dealing
gracefully with the resulting failure conditions.

This results in less "papering over" of failures from the remote peer,
while still providing robust communications.  Maybe the underlying ethics
here are (a) transparency and (b) robustness?  Both of these are
user-centric notions -- the user should not be misled by the tools, and
the tools should not disobey the user.

           --dkg

Re: [hrpc] first screening of RFC7230 for Human R… Daniel Kahn Gillmor
[hrpc] first screening of RFC7230 for Human Right… Niels ten Oever
Re: [hrpc] first screening of RFC7230 for Human R… Stephen Farrell
Re: [hrpc] first screening of RFC7230 for Human R… Daniel Kahn Gillmor
Re: [hrpc] first screening of RFC7230 for Human R… Stephane Bortzmeyer
Re: [hrpc] first screening of RFC7230 for Human R… Niels ten Oever
Re: [hrpc] first screening of RFC7230 for Human R… Daniel Kahn Gillmor