Re: [Pearg] Research Group Last Call for "A Survey of Worldwide Censorship Techniques"

Thank you for the review, EKR. I'll work through these and put them in issues in the repo as I do, likely not until next week.

--
Joseph Lorenzo Hall, Senior Vice President, Strong Internet
hall@isoc.org | +1-703-483-9504
internetsociety.org | @internetsociety
pgp: https://josephhall.org/gpg-key
3CA28D7B9F6DDBD34B1016075F86698740A9A871
________________________________
From: Pearg <pearg-bounces@irtf.org> on behalf of Eric Rescorla <ekr@rtfm.com>
Sent: Friday, May 29, 2020 1:28:59 PM
To: Christopher Wood <caw@heapingbits.net>
Cc: pearg@irtf.org <pearg@irtf.org>
Subject: Re: [Pearg] Research Group Last Call for "A Survey of Worldwide Censorship Techniques"

Document: draft-irtf-pearg-censorship-03.txt

S 2.
    We describe three elements of Internet censorship: prescription,
    identification, and interference.  The document contains three major
    sections, each corresponding to one of these elements.  Prescription
    is the process by which censors determine what types of material they
    should block, i.e., deciding to block a list of pornographic
    websites.  Identification is the process by which censors classify
    specific traffic to be blocked or impaired, i.e., blocking or
    impairing all webpages containing "sex" in the title or traffic to
    www.sex.example.  Interference is the process by which censors

I'm not finding this distinction super clear. It seems to me like
deciding to block www.sex.example and "impairing...traffic to
www.sex.example" are pretty similar.

Not trying to force a new model on you, but I tend to think of this
process as:

1. Decide what kind of material you want to block (e.g., material about
   sex). [This is what I expected prescription to be]

2. Determine what the conceptual indicia of that material are (this might
   be "contains the string sex" or "is one of these websites A, B, C")

3. Determine precisely how you identify the indicia on the network
   (DNS, SNI, etc.)

4. Determine how to block the communications once identified.

S 3.1.
   Internet censorship necessarily takes place over a network.  Network

You say this but then later you give an example of endpoint censorship.

I feel like this section might be better if it tried to impose some
hierarchy here. From my perspective, there are four main categories
of control points:

- The network itself [ISPs, Backbone, Institutional ISPs]
- The services side of communications [hosting providers, cloud
  providers, CDNs, etc.]
- Services which are necessary to communicate but not really
  part of it [CAs, DNS, etc.]
- The client endpoint side

That might make this easier to read

    o  Certificate Authorities for Public-Key Infrastructures (PKIs):
       Authorities that issue cryptographically secured resources can be
       a significant point of control.  Certificate Authorities that
       issue certificates to domain holders for TLS/HTTPS (the Web PKI)
       or Regional/Local Internet Registries (RIRs) that issue Route
       Origination Authorizations (ROAs) to BGP operators can be forced
       to issue rogue certificates that may allow compromise, i.e.., by
       allowing censorship software to engage in identification and
       interference where not possible before.  This may allow, for
       example, adversarial traffic routing or TLS interception.

This is true, though CT is intended to make this much harder. I would
add that CAs can be forced to revoke certificates, which is a threat
that CT does not attempt to prevent.

    o  Services: Application service providers can be pressured, coerced,
       or legally required to censor specific content or data flows.
       Service providers naturally face incentives to maximize their
       potential customer base and potential service shutdowns or legal
       liability due to censorship efforts may seem much less attractive
       than potentially excluding content, users, or uses of their
       service.  Services have increasingly become focal points of
       censorship discussions, as well as the focus of discussions of
       moral imperatives to use censorship tools.

It's a little hard to know if this text is talking about Facebook
or AWS. Maybe split?

      At all levels of the network hierarchy, the filtration mechanisms
      used to detect undesirable traffic are essentially the same: a censor
      either directly sniffs transmitting packets and identifies
      undesirable content, and then uses a blocking or shaping mechanism to
      prevent or impair access, or requests that an actor ancillary to the
      censor, such as a private entity, perform these functions.

I don't think this claim is right. For instance where do registry
takedowns or DNS blacklists fit in. Neither of them "directly sniffs
transmitting packets" (a weird phrase as-is).

S 3.2.1.
   Tradeoffs: Request Identification is a technically straight-forward
   identification method that can be easily implemented at the Backbone
   or ISP level.  The hardware needed for this sort of identification is
   cheap and easy-to-acquire, making it desirable when budget and scope
   are a concern.  HTTPS will encrypt the relevant request and response
   fields, so pairing with transport identification (see Section 3.3.1)
   is necessary for HTTPS filtering.  However, some countermeasures such

I think you just want to remove "such"

   can trivially defeat simple forms of HTTP Request Header
   Identification.  For example, two cooperating endpoints - an
   instrumented web server and client - could encrypt or otherwise
   obfuscate the "host" header in a request, potentially thwarting
   techniques that match against "host" header values.

This is true, but seems kinda limited.

3.2.4.1.

   In encrypted connections using Transport Layer Security (TLS), there
   may be servers that host multiple "virtual servers" at a given
   network address, and the client will need to specify in the
   (unencrypted) Client Hello message which domain name it seeks to
   connect to (so that the server can respond with the appropriate TLS
   certificate) using the Server Name Indication (SNI) TLS extension
   [RFC6066].  Since SNI is often sent in the clear, censors and

I understand that the cert field is also used for blocking.

   Domain fronting has been one popular way to avoid identification by
   censors [Fifield-2015].  To avoid identification by censors,
   applications using domain fronting put a different domain name in the
   SNI extension than the one encrypted by HTTPS.  The visible SNI would
   indicate an unblocked domain, while the blocked domain remains hidden

I would say "than in the Host: header, which is protected by HTTPS"

S 3.3.1.

   Of the various shallow packet inspection methods, Transport Header
   Identification is the most pervasive, reliable, and predictable type
   of identification.  Transport headers in TCP/IP or QUIC contain a few
   invaluable pieces of information that must be transparent for traffic
   to be successfully routed: destination and source IP address and
   port.  Destination and Source IP are doubly useful, as not only does
   it allow a censor to block undesirable content via IP blocklisting,

Saying "TCP/IP or QUIC" is weird as QUIC runs over UDP. And of course
UDP has its own headers, as does RTP. I would just say "Transport headers"

   Header identification is trivial to implement, but is difficult to
   implement in backbone or ISP routers at scale, and is therefore
   typically implemented with DPI.  Blocklisting an IP is equivalent to

Does this mean "using DPI" or "along with DPI"

   Header identification is trivial to implement, but is difficult to
   implement in backbone or ISP routers at scale, and is therefore
   typically implemented with DPI.  Blocklisting an IP is equivalent to
   installing a /32 route on a router.  However, due to limited flow

The IPv6 people will be mad.

   Port-blocking is generally not useful because many types of content
   share the same port and it is possible for censored applications to
   change their port.  For example, most HTTP traffic goes over port 80,
   so the censor cannot differentiate between restricted and allowed web
   content solely on the basis of port.  Another example is HTTPS port
   443, which in addition to handling secure web traffic now also
   carries DNS-over-HTTPS [RFC8484] traffic; this can frustrate
   techniques that rely on cleartext DNS over port 53 for censorship,
   parental control, and other uses [SSAC-109-2020].  Port allowlisting

True, but I feel like this is actually a different category of thing,
because *this* section is about shallow blocking, whereas DNS-based
blocking of specific domains is definitely not shallow.

   conjunction with other identification mechanisms.  For example, a
   censor could block the default HTTPS port, port 443, thereby forcing
   most users to fall back to HTTP.  An important counter-example is
   that port 25 (SMTP) has long been blocked on residential ISPs'
   networks, ostensibly to reduce the potential for email spam, but also
   prohibiting residential ISP customers to run their own email servers.

This "ostensibly" seems unnecessarily sarcastic. It's not clear
why ISPs would care that much (price discrimination?). Anyway,
if you want to make this point, I think you could say something
more netural which had the same impact.

S 3.3.2.
I'm having trouble with your taxonomy here because a lot of the
protocol identification work seems like it's more application
layer than transport layer, but you have it in 3.3.

S 4.1.1.
There is some good text in this section, but I feel like it's
not doing a good job of distinguishing three attack modalities:

- Lying by the nameserver
- On-path interception by an attacker
- off-path cache poisoning

This has been such a persistent point of confusion with DoH that
I think it's worth clarifying.

>   than ideal censorship mechanism.  Additionally, the above mechanisms
>   rely on DNSSEC not being deployed or DNSSEC validation not being
>   active on the client or recursive resolver (neither of which are hard
>   to imagine given limited deployment of DNSSEC and limited client
>   support for DNSSEC validation).

This is not quite correct. If you want to redirect a user then you
need to content with DNSSEC, but if you just want to block them,
then you can just provide a record which won't validate and that
will have the same effect.

S 4.2.2.
I note that packet dropping is often how you implement traffic
shaping.

S 4.2.3.
   Packet injection, generally, refers to a man-in-the-middle (MITM)
   network interference technique that spoofs packets in an established
   traffic stream.  RST packets are normally used to let one side of TCP
   connection know the other side has stopped sending information, and
   thus the receiver should close the connection.  RST Packet Injection
   is a specific type of packet injection attack that is used to
   interrupt an established stream by sending RST packets to both sides
   of a TCP connection; as each receiver thinks the other has dropped
   the connection, the session is terminated.  QUIC is not vulnerable to
   these types of injection attacks (See [I-D.ietf-quic-transport] for
   more details).

This isn't totally true. QUIC is not vulnerable to this once the
connection is set up, but it *is* vulnerable during setup.

The two paragraphs in Trade-offs are confusing because you don't
start out with the threat model. There are three threat models
here:

- On-path and fast
- On-path and slow
- Off-path

In general, the argument you seem to be making is that on-path
and fast is expensive but on-path and slow is cheap. I'm not
sure the Blind injection stuff is that relevant: you would
need to know the host/port coordinates which are fairly expensive
to obtain. Do we know if anyone uses totally blind RSTs
for censorship? My impression is that it's primarily useful
for long running connectons like BGP.

S 4.3.2.

   Trade-offs: The impact to a network disconnection in a region is huge
   and absolute; the censor pays for absolute control over digital
   information with all the benefits the Internet brings;

This seems backwards. You pay by losing all the benefits.

   this is never
   a long-term solution for any rational censor and is normally only
   used as a last resort in times of substantial unrest.

This seems unnecessarily judgemental. What's rational depends on
incentives.

Also, this section seems weird because it doesn't talk about
fake BGP announcements for *other* sites, which we know is
common and doesn't have anywhere near the impact yoou discuss
here.

S 6.
I would remove this entirely. I don't think it really belongs

-Ekr

On Wed, May 20, 2020 at 10:00 AM Christopher Wood <caw@heapingbits.net<mailto:caw@heapingbits.net>> wrote:
This is the research group last call for the "A Survey of Worldwide Censorship Techniques" (draft-irtf-pearg-censorship) draft available here:

   https://datatracker.ietf.org/doc/draft-irtf-pearg-censorship/

Please review the document and send your comments to the list by June 5, 2020. Feedback may also be sent to the GitHub repository located here:

   https://github.com/IRTF-PEARG/rfc-censorship-tech

Thanks,
Chris, on behalf of the chairs

--
Pearg mailing list
Pearg@irtf.org<mailto:Pearg@irtf.org>
https://www.irtf.org/mailman/listinfo/pearg