[OPSAWG] draft-mm-wg-effect-encrypt-13 review

Kyle Rose <krose@krose.org> Tue, 09 January 2018 21:16 UTC
MIME-Version: 1.0
From: Kyle Rose <krose@krose.org>
Date: Tue, 09 Jan 2018 16:16:45 -0500
Message-ID: <CAJU8_nXdpbz-k=oDkKE0bjJ28N-6NDspqHDXsSFqY6jaJOSfDQ@mail.gmail.com>
To: ietf@ietf.org, Kathleen Moriarty <kathleen.moriarty.ietf@gmail.com>, "MORTON, ALFRED C (AL)" <acmorton@att.com>, Brandon Williams <bowill@akamai.com>, Warren Kumari <warren@kumari.net>, Paul Hoffman <paul.hoffman@vpnc.org>, opsawg@ietf.org
Content-Type: multipart/alternative; boundary="001a11c14646a9c75305625e6ed9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/mRqi18wEb5Fky5MAm_J763rdPyY>
Subject: [OPSAWG] draft-mm-wg-effect-encrypt-13 review
Precedence: list
Brandon Williams and I reviewed this draft with Kathleen and Al via the
IETF etherpad during last call. The link is here:

https://etherpad.tools.ietf.org/p/krose-review-draft-mm-wg-effect-encrypt-13

I have attempted to cull the discussion down to something consumable on the
mailing list. Apologies in advance for formatting issues. Note that I have
not subsequently added any comments to the etherpad: I'll respond to this
thread instead with any follow-ups.


   this mode is deployed.  IPsec with authentication has many useful
   applications and usage has increased for infrastructure applications
   such as for virtual private networks between data centers.

KR> The above paragraph doesn't seem to have a clear point. It's mostly
about opportunistic security, but has a few unrelated points inserted
(e.g., the final sentence about IPsec with authentication).

KM> Looks like final sentence provides contrast with the previous sentence
"OS has been implemented as NULL Authentication with IPsec..."


   the application user, and hosting service providers lease computing,
   storage, and communications systems in datacenters.  In practice,
   many companies perform two or more service provider roles, but may be
   historically associated with one.

KR> Honestly, all of section 1.1 seems to be a grab bag lacking a thesis,
maybe motivating the vague title "Additional Background". I think the
information there could be organized in a better way to answer the question
posed in the reader's mind after the introduction, which is: give me some
examples of monitoring/manipulation for operability that have been defeated
by encryption, with evidence that they aren't solvable without additional
telemetry or cooperation with middleboxes. The following section is closer
to what I wanted after the intro.

KM> Initially, we had some examples in the draft, but were asked to remove
them.  I take your point and am in process of reworking this text to make
it more cohesive.  The problem in much of this draft was the number of
contributions, so your sweep of the draft for points like this are very
helpful.

AM> As I read it, section 1.1 is describing changes to the communication
landscape re: encryption. The exception is the very last paragraph, which
could be moved above section 1.1. I agree that the purpose of section 1.1.
could be made more clear in the introductory paragraph. I do think that
it's useful background to high-light these changes in the encryption
landscape, since that's what motivates the draft.

KM> Thanks.  It is also to ensure the draft is not read as being just about
TLS, so I am making that point more clear as well.  Moving the last
paragraph to the introduction makes sense.


   Following the Snowden revelations, application service providers
   responded by encrypting traffic between their data centers (IPsec) to
   prevent passive monitoring from taking place unbeknownst to them
   (Yahoo, Google, etc.).  Large mail service providers also began to

KR> IIRC, companies were already doing this for infrastructure traffic on
the public internet: it was their own private backbones on which they
started using encryption universally.

KM> Yes, good point.  I clarified the text to distinguish that, but was
careful not to say that they pretty all infrastructure traffic over the
Internet was encrypted even though I think that's in line with reality - at
least where I worked which included an ISP and a big financial data
provider early on.

   The EFF reported [EFF2014] several network service providers taking
   steps to prevent the use of SMTP over TLS by breaking STARTTLS

KR> I think it's important to use the phrase "downgrade attack" when
describing something like this. The word "downgrade" first appears in the
references list.

KM> Done, thanks.


   (section 3.2 of [RFC7525]), essentially preventing the negotiation
   process resulting in fallback to the use of clear text.  In other
   cases, some service providers have relied on middle boxes having
   access to clear text for the purposes of load balancing, monitoring
   for attack traffic, meeting regulatory requirements, or for other
   purposes.  These middle box implementations, whether performing
   functions considered legitimate by the IETF or not, have been
   impacted by increases in encrypted traffic.  Only methods keeping
   with the goal of balancing network management and PM mitigation in
   [RFC7258] should be considered in solution work resulting from this
   document.

KR> I feel like this section could be better organized by:
 * Moving the examples to 1.1 as a bulleted list of sample situations in
which network operators attempted to and/or succeeded in defeating
encryption to preserve existing operational mechanisms, or in which
performance suffered for users (whether of the encrypted flows or of other
flows impacted by encrypted flows).

KM> Interesting point, but we'd need more examples.  I'll think about this
more and chat with Al in case he has ideas.  For now, I went with Brandon's
easier suggestion, but moving to this would be nice for the document
readers.

AM> Although I see how these examples could be part of the background, I
think those who will
eventually remove their objections will prefer the reduced emphasis on
these examples where
they are (in section 2). In one view, the entire memo is background, since
nothing new is proposed.

KR>
 * Using this section as an introduction to the methodology for cataloging
operational mechanisms depending on cleartext traffic monitoring, with the
various caveats on what will be considered (e.g., only mechanisms required
heretofore for operability), and for describing the approach to seeking
mitigations and/or substitutions.

KM> Hmm, interesting point.  I'll have to think about this more as it could
be alot of work at this stage.

AM> Unfortunately, we've already implemented many AD-level suggestions on
the organization of Section 2.
We're at the stage of *what can everybody live with", and re-re-re-org
falls out now, IMO.


   Network service providers use various techniques to operate, manage,
   and secure their networks.  The following subsections detail the
   purpose of each technique and which protocol fields are used to
   accomplish each task.  In response to increased encryption of these
   fields, some network service providers may be tempted to undertake
   undesirable security practices in order to gain access to the fields
   in unencrypted data flows.  To avoid this situation, ideally new
   methods could be developed to accomplish the same goals without
   service providers having the ability to see session data.

BW> I think the above paragraph is the core point of the section;
describing what the whole of section 2 is about.  The previous paragraphs,
while important information, don't seem to belong in this section. Perhaps
a separate section focused on observed bad behavior would be better.

KM> That would get at both your point and Kyle's, thanks.

AM> I agree this last paragraph could be moved-up (after the definition of
Network SP).
The Snowden and EFF paragraphs would be best positioned as footnotes to the
first sentence "Network service providers use various techniques ...", but
we don't
have that mechanism available.

Also, the neutral exposition that we've been asked to provide a million
times actually
comes from multiple perspectives expressed in contributions that we would
combine
in a balanced way, without value judgements (no good or bad).
Where we lack balance, we lack specific contributions.

(So, I think the new -14 section 1.2 does not have an appropriate title.)


   heuristics grows, and accuracy suffers.  For example, the traffic
   patterns between server and browser are dependent on browser supplier
   and version, even when the sessions use the same server application
   (e.g., web e-mail access).  It remains to be seen whether more
   complex inferences can be mastered to produce the same monitoring
   accuracy.

KR> This might be too formal of an approach for this doc, but it might be
possible to construct a taxonomy of layers of metadata made unavailable by
encryption at each layer to show the completeness/comprehensiveness of the
survey. So, for instance:
 * Protocol and port number are still available as a way of characterizing
traffic over the public internet even if the payload is encrypted, but this
information is lost if (e.g.) the traffic is traversing an IPsec tunnel or
if radically different kinds of traffic all use port 443/tcp without any
other way to distinguish between them.
 * TCP is open to optimization/measurement even if using TLS, except when
tunneled encrypted: congestion signals (like rexmits) previously
transparent to the middlebox, for instance, are then lost.
 * Encrypting the payload defeats attempts to survey traffic by user agent
(if there's no other way to distinguish, e.g., by fingerprinting).

KM> I think this would be a really helpful follow on document.  I'd be
willing to work on it if you're game.  I've been thining about something
similar, specific to TLS, but should be broadened.


   It is important to note that the push for encryption by application
   providers has been motivated by the application of the described
   techniques.  Some application providers have noted degraded
   performance and/or user experience when network-based optimization or
   enhancement of their traffic has occurred, and such cases may result
   in additional operator troubleshooting, as well.

KR> Observation: additionally, I think you'll encounter the argument that
the responsibility for diagnosing bad interactions between applications and
networks falls on the application owner rather than the network operator.
Basically, I feel like the desire among protocol designers is for operators
to provide a pipe with certain key characteristics that interact well with
established transport protocol mechanisms, and otherwise to leave the
traffic alone and let the application developers do what they want to
within the expected constraints. If that's infeasible (e.g., in edge cases,
or with respect to new technologies that interact badly with existing
transports, such as the loss=congestion assumption of TCP that interacts
badly with wifi), that's precisely the case needs to be made by this
document.

KM> We have encountered this argument already.  It's a tough one as SPs
have the SLAs with customers, so they are the first call.  Many don't know
how to get in touch with APP providers.  I understand the application
developers perspecive, but also see that there has to be some ability to
troubleshoot.  Sure, providers could wrap the protocols for transport to
provide some way of measuring, but information is lost.  IPv6 with flow
identifiers is another way to do it, but you might not be able to
prioritize a call or protocol that has little tolerance for delay over one
that does for instance.  And I realize that app providers just want all
traffic to have the same priority, but emergency calls are important.

BW> I think the point made by the document is correct though: operators are
nearly always the first call, not the application provider.

KM> We were asked to remove text that said that.  I agree that it is the
case as the providers have the SLAs and you don't typlically have a number
for App providers.

BW> The operators are looking for ways to demonstrate that they did not
cause the problem (or determine that they did) for efficient hand-off to
the correct party for resolution. There are certainly problems an approach
that changes the behavior of the protocol, but it's difficult to argue with
the diagnostic need.

AM> Using Netflix as an example, the first source of problem they mention
is the network when
addressing the question "Why doesn't Netflix work?":
    "If Netflix isn’t working, you may be experiencing a network
connectivity issue, an issue with your device, or an issue with your
Netflix app or account."
    from
https://help.netflix.com/en/node/461?ui_action=kb-article-popular-categories
They previously had even stronger wording, something like "First, make sure
your network connection meets the Netflix requirements ... URL"
One of the causes of re-buffering are CDN-related pauses when accessing the
next segment:  completely hidden from users so far.
Additional frequent cause: the unlicensed WiFi network owned and operated
by the customer.

Another way to look at this strategy: App providers are transferring as
much overhead cost to the network operators as possible
(troubleshooting customer problems is expensive - rolling a truck negates
months of revenue), while preserving as
much value/control/revenue as they can for themselves. The greed-thingy
plays poorly over time.
A user-focused strategy would be to form partnerships for troubleshooting
of shared customers, but that might result in exposing
the real causes and some would rather hide for now, it seems.


   For example, browser fingerprints are comprised of many
   characteristics, including User Agent, HTTP Accept headers, browser
   plug-in details, screen size and color details, system fonts and time
   zone.  A monitoring system could easily identify a specific browser,
   and by correlating other information, identify a specific user.

KR> Subsections of 2.1 cover the following in what feels like arbitrary and
inconsistent order: technique description, justification for the technique,
reason why the technique is bad (for privacy), how the technique is
defeated by protocol designers, and examples of the technique. It really
reads like a laundry list rather than a systematic analysis of the problems
faced, the metadata required for diagnostics, and how these techniques are
defeated by encryption.

KM> Hmm, this section was reorganized by others in the last IESG review, so
that's probbaly part of the problem.  I'll read through and see what I can
do to help it out more. It cleared a discuss to make the changes.
** didn't tacle this one and will go back to it.

AM> To re-iterate: this isn't the optimization phase. We've done 10 months
of that.
We've reached "what can you live with" phase, IMO.


   packet is able to provide stateless load balancing.  This ability
   confers great reliability and scaleability advantages even if the
   flow remains in a single POP, because the load balancing system is
   not required to keep state of each flow.  Even more importantly,
   there's no requirement to continuously synchronize such state among
   the pool of load balancers.

KR> An important point is that an integrated load balancer repurposing
limited existing bits in transport flow state must maintain and synchronize
per-flow state occasionally: using the sequence number as a cookie only
works for so long given that there aren't that many bits available to
divide across a pool of machines.

KM> I added in this point, but have to check back on flow of text.


   Current protocols, such as TCP, allow the development of stateless
   integrated load balancers by availing such load balancers of
   additional plain text information in client-to-server packets.  In
   case of TCP, such information can be encoded by having server-
   generated sequence numbers (that are ACK'd by the client), segment
   values, lengths of the packet sent, etc.

KR> Is it worth mentioning that the use of some of these mechanisms for
load balancing negates some of the security assumptions associated with
those primitives (e.g., that an off-path attacker guessing valid sequence
numbers for a flow is hard)?

KM> I added the above in as it may offer some balance to the discussion.

KR> A dedicated mechanism for storing load balancer state, such as QUIC's
proposed connection ID, is strictly better from the load balancer's point
of view, and is probably even better from a privacy perspective than
bolting it on to an unrelated transport signal because it can be tightly
controlled by one of the endpoints and rotated to avoid roving client
linkability: in other words, being a specific, separate signal, it can be
governed in a way that is finely targeted at that specific use-case. (I'm
thinking the advantages of separate mechanisms belongs in a different part
of the doc; this section is more like the problem statement than the
solution statement.)

KM> This (above) needs to be reworded to be neutral and this does go
towards solution space, which we were trying to avoid. How about:

Another possibility is a dedicated mechanism for storing load balancer
state, such as QUIC's proposed connection ID to provide visibility to the
load balancer.  An identifier could be used for tracking purposes, but this
may provide an option that is an improvement from  bolting it on to an
unrelated transport signal. This method allows for tight control by one of
the endpoints and can be rotated to avoid roving client linkability: in
other words, being a specific, separate signal, it can be governed in a way
that is finely targeted at that specific use-case.


   In future Network Function Virtualization (NFV) architectures, load
   balancing functions are likely to be more prevalent (deployed at
   locations throughout operators' networks)[.  NFV environments will
   require some type of identifier (IPv6 flow identifiers, the Proposed
    QUIC connection ID, etc.) for managing]
   traffic using encrypted tunnels.[  The shift to increased encryption
   will have an impact to visibility of flow information and will require
   adjustments to perform similar load balancing functions within an NFV.]

KR> I'm not sure what architecture this paragraph is discussing: are you
talking about encrypted tunnels between NFV nodes? Is this something
obvious to people involved in NFV? A diagram (or informational reference)
would be helpeful to me here.

KM> I see your point, the langauage here could be more clear. Do the above
adjustments (ed: in []) help?


2.2.2.  Differential Treatment based on Deep Packet Inspection (DPI)
   ...
   These effects and potential alternative solutions have been discussed
   at the accord BoF [ACCORD] at IETF95.

KR> This section is labeled DPI, but really, the underlying issue is what
you stated in the first paragraph: different kinds of traffic have
different QoS needs, yet a network provider can't rely on a voluntary
signal from an untrusted device to decide on QoS or every packet is simply
going to be marked "high importance" and so we're back to treating all
traffic equivalently. I'd argue against one of the memes I heard at the
accord BoF, that it's down to latency vs. throughput, by pointing out that
some applications (e.g., live video with low hand-wave latency) need both.

Even after reading this, I'm still skeptical of the need for any more
granularity than flow, and using AQM on a per-flow (e.g., 5-tuple) or
flow-aggregate (some subset of the 5-tuple) to prevent an application or
user from consuming resources unfairly. What, for instance, prevents a
carrier from privileging VoIP traffic by looking at endpoints? Would there
be a way for someone else to masquerade non-VoIP traffic as VoIP traffic
given this kind of setup? This is the kind of question that I need answered
by this doc.

BW> It might be useful to note in this section that QUIC and H2 both
combine multiple micro-flows, possibly of different types, within a single
encrypted transport-layer flow. They share this with IPsec tunnels and the
like. IOW, the increased use of encrypted aggregating encapsulation can
hide even the the most basic representation of a flow from the
differentiated service element. This same concern applies to load balancing
elements discussed in section 2.2.1.

KM> **  Want to talk with Al on this set of comments.

AM> We were asked not to refer to QUIC, for various reasons (e.g., still
under development).

There will always be areas where network can make the best decision,
because of the
information available to the network operators (and the lack of that same
info at end-points).

When network resources are constrained, only the network can manage
priorities.
This has been organized according to applications that can be identified,
but there
can be other solutions requiring cooperation between user devices and the
network
according to subscription to a special service (QCI above).


2.2.3.  Network Congestion Management

   For User Plane Congestion Management (3GPP UPCON) - ability to
   understand content and manage network during congestion.  Mitigating
   techniques such as deferred download, off-peak acceleration, and
   outbound roamers.

KR> This seems like a special case of 2.2.2.

KM> Al - is there a reason this shouldn't get moved into

AM> I think there is some text missing here.
The text seems to have been one list item in old
section 7.2, dating back to version 11.  The list decription was:

"7.2.  Effect of Encrypted Transport Headers

   When the Transport Header is encrypted, it prevents the following
   mobile network features from operating:
       <and then a list of many items> "

I suggest to delete this text, but...

Kathleen - if you delete this section, Please leave the section header
marked "Blank - to be deleted" to keep the section numbering as-is,
and the diffs/comments will still correlate easily.  Thanks!


2.2.4.  Performance-enhancing Proxies

   Due to the characteristics of the mobile link, performance-enhancing
   TCP proxies may perform local retransmission at the mobile edge.  In
   TCP, duplicated ACKs are detected and potentially concealed when the
   proxy retransmits a segment that was lost on the mobile link without
   involvement of the far end (see section 2.1.1 of [RFC3135] and
   section 3.5 of [I-D.dolson-plus-middlebox-benefits]).

BW> Starting the first paragraph in this way suggests that such use cases
are for mobile links only, which is not correct. Performance enhancing
proxies of this sort can be used on any long RTT path to improve
performance over a constrained uplink.

KM> How about:  Performance-enhancing TCP proxies may perform local
retransmission at the network edge, this also applies to mobile networks.

   This optimization at network edges measurably improves real-time
   transmission over long delay Internet paths or networks with large
   capacity-variation (such as mobile/cellular networks).

AM> FYI -    The folowing sentence was added here in the -14pre version I
sent on Nov 19:

        However, such optimizations can also cause problems with
performance,
        for example if the characteristics of some packet streams begin to
vary
        significantly from those considered in the proxy design.

This was intended to address one of Mark Nottingham's comments.


   An application-type-aware network edge (middlebox) can further
   control pacing, limit simultaneous HD videos, or prioritize active
   videos against new videos, etc.

KR> Observation: This subsection provides the first really compelling
argument I've seen for exposing flow metadata to the path. On long paths,
physics gets in the way of tight control feedback loops. If nothing else,
this should provide motivation for protocol designers and operators to
break down the characteristics of different kinds of flows, determine where
control points are needed in each of them, and figure out how to implement
those.

I think there is this conceit among protocol designers that quality
problems can all be solved at the endpoints without any cooperation from
path elements; the really killer arguments are examples of where that
cannot possibly be the case. ECN is a great example of this, and is a
signal explicitly targeted at middleboxes with opt-in by the endpoints: it
allows a middlebox to report congestion without dropping packets, which
produces measurably better QoS for the user.

KM> Ack, thanks.  You're not looking for additional text here, is that
right?  If so, what are you thinking should be added?


   Content replication in caches (for example live video, DRM protected
   content) is used to most efficiently utilize the available limited
   bandwidth and thereby maximize the user's Quality of Experience
   (QoE).  Especially in mobile networks, duplicating every stream
   through the transit network increases backhaul cost for live TV.  The
   Enhanced Multimedia Broadcast/Multicast Services (3GPP eMBMS) -
   trusted edge proxies facilitate delivering same stream to different
   users, using either unicast or multicast depending on channel
   conditions to the user.

KR> There are on-going efforts to support multicast inside carrier networks
while preserving end-to-end security: AMT, for instance, allows CDNs to
deliver a single (potentially encrypted) copy of a live stream to a carrier
network over the public internet and for the carrier to then distribute
that live stream as efficiently as possible within its own network using
multicast.

KM> Text added, thanks.


   Alternate approaches such as blind caches [I-D.thomson-http-bc] are
   being explored to allow caching of encrypted content; however, they
   still need to intercept the end-to-end transport connection.

KM> [s/need to intercept the end-to-end transport connection/require
cooperation between the content owners/CDNs and blind caches and fall
outside the scope of what is covered in this document/

Content delegation solves a data visibility problem with the delegated
cache, the impact remains for the use case where HTTPS encryption limits
visibility to offload from congested links.]

KR> This last point isn't strictly speaking true: many proposals (including
I believe Martin's) require cooperation between content owners/CDNs and
these blind caches. From Martin's draft:
   q( This document describes a method for conditionally delegating the
   hosting of secure content to the same server.  This delegation allows
   a client to send a request for an "https" resource via a proxy rather
   than insisting on an end-to-end TLS connection.  This enables shared
   caching for a limited set of "https" resources, as selected by the
   server. )

BW> I'm not sure that use cases where there is explicit cooperation between
the content provider and the cache are necessarily relevant for this
document, since in those cases the cache is an extension of the content
provider (by some definition) and the cache will most likely not be
inhibited by increased encryption. The more relevant caching case is one
meant for network offload on the receiver side where there is no explicit
cooperation between the content provider and the cache. That's the case
where the use of HTTPS inhibits the cache's ability to offload from
congested links. IOW, content delegation solves a data visibility problem
with the delegated cache; it does not solve a problem introduced to the
cache through the use of encryption.


2.2.6.  Content Compression

   In addition to caching, various applications exist to provide data
   compression in order to conserve the life of the user's mobile data
   plan and optimize delivery over the mobile link.  The compression
   proxy access can be built into a specific user level application,
   such as a browser, or it can be available to all applications using a
   system level application.  The primary method is for the mobile
   application to connect to a centralized server as a proxy, with the
   data channel between the client application and the server using
   compression to minimize bandwidth utilization.  The effectiveness of
   such systems depends on the server having access to unencrypted data
   flows.

KR> Observation: given the side channels exposed by data compression that
is blind to content, the inability to compress arbitrary payloads is likely
to be regarded as a feature of encryption. (Though I recognize this is a
catalog, not an endorsement.) Furthermore, in most cases eliminating
compression is still 2-competitive with compression, so I'm not sure it's a
really compelling use-case.

BW> Per-object content compression might not be a compelling use case here.
Aggregated data stream content compressions that spans objects and data
sources is compelling, though. If there is a network element close to the
receiver that sees all content destined for the receiver and can treat it
all as part of a unified compression scheme (e.g., through the use of a
shared segment store) will often be much more effective at providing data
off-load.

KM> Thanks, we'll add this text (modified) to make those helpful points
clear.

How about:
    Aggregated data stream content compression that spans objects and data
sources that can be treated as part of a unified compression scheme (e.g.,
through the use of a shared segment store) is often effective at providing
data offload when there is a network element close to the receiver that has
access to see all the content.


   Another form of content filtering is called parental control, where
   some users are deliberately denied access to age-sensitive content as
   a feature to the service subscriber.  Some sites involve a mixture of
   universal and age-sensitive content and filtering software.  In these
   cases, more granular (application layer) metadata may be used to
   analyze and block traffic.  Methods that accessed cleartext
   application-layer metadata no longer work when sessions are
   encrypted.  This type of granular filtering could occur at the
   endpoint.  However, the lack of ability to efficiently manage
   endpoints as a service reduces providers' ability to offer parental
   control.

KR> It might be worth discussing the typical opt-in strategy for these
things in the presence of TLS, adding a new intercept CA to willing
clients, which has the downside that it potentially exposes every https
connection to an active MitM.

BW> +1

KM> OK, we hadn't done that before since the option doesn't change, but you
make a good point, so I'll add in text.  Thanks.

I added the following:

    This method is also used by other types of network providers enabling
     traffic inspection, but not modification.</t>

             <t>Content filtering via a proxy can also utilize an
intercepting
          certificate where the client's session is terminated at the proxy
          enabling for cleartext inspection of the traffic.  A new session
          is created from the intercepting device to the client's
          destination, this is an opt-in strategy for the client. Changes to
          TLSv1.3 do not impact this more invasive method of interception,
where
          this has the potential to expose every HTTPS session to an active
          man in the middle (MitM). </t>

KR> Random comment: especially with respect to government content
filtering, I'm worried that the IETF's current approach of playing chicken
with regulators on end-to-end encryption is going to result in
normalization of intercept CAs, which will be strictly worse than a
compromise solution in which a subset of traffic can be inspected (but not
modified) with the user's knowledge and consent (e.g., distinct optics in
the browser). I wouldn't like either outcome, frankly, but it would be nice
if we had a game plan for what to do for user privacy if intercept CAs
become a requirement for using the web in large parts of the world
(something we might be one "crisis" away from), and an honest evaluation of
the alternatives. Fundamentally, I don't like it when discussion gets shut
down because people want to bury their heads in the sand in the name of
ideology.</rant>

BW> +1. I also note that this concern applies to some of the other
performance related use cases too.

KM> I think the real argument here is a control one between the application
and management folks and not security/privacy even though that's what is
often discussed.  This is all about control.


   In addition, mobile network operator often sell tariffs that allow
   free-data access to certain sites, known as 'zero rating'.  A session
   to visit such a site incurs no additional cost or data usage to the
   user.  This feature is impacted if encryption hides the details of
   the content domain from the network.

KR> There's the related issue that zero-rating by-implementation typically
applies only to direct connections to a particular endpoint (e.g., by IP):
if a user accidentally tunnels traffic from Spotify through a corporate
VPN, that traffic won't be zero-rated, encrypted tunnel or not. (This goes
back to the taxonomy of metadata layers comment I made near the top.)
Carriers aren't going to trust e.g., a Host header for zero-rating, because
that provides a simple way to tunnel traffic for free: consequently,
determination of zero-rating will always involve some hard-to-impersonate
credential, like an IP address or server certificate in the public trust
web.

KM> Not sure what to add here, any ideas, AL?


   When RTSP stream content is encrypted, the 5-tuple information within
   the payload is not visible to these ALG implementations, and
   therefore they cannot provision their associated middelboxes with
   that information.

KR> I would argue that this is a protocol design issue. This was originally
a problem with firewalls and NATs, with content inspection as a hack to
work around the protocol/network impedance mismatch. I'm not the only one
who would argue the right solution today is to design protocols to not
require linkage across connections by middleboxes that do basic filtering.

KM> I think we are in agreement here for solution direction, but the
document specificly tries to avoid solutions.  This example has been raised
in the IESG by Warren and the apps side hadn't considered his view of it
previously.  It would be good for protocols to have these considerations in
their designs, they were mostly thinking it didn't matter and were
end-to-end.  But poor video streaming sessions are an issue.  Not sure we
should add any text here???


2.3.4.  HTTP Header Insertion

   Some mobile carriers use HTTP header insertion (see section 3.2.1 of
   [RFC7230]) to provide information about their customers to third
   parties or to their own internal systems [Enrich].  Third parties use
   the inserted information for analytics, customization, advertising,
   to bill the customer, or to selectively allow or block content.  HTTP
   header insertion is also used to pass information internally between
   a mobile service provider's sub-systems, thus keeping the internal
   systems loosely coupled.  When HTTP connections are encrypted, mobile
   network service providers cannot insert headers to accomplish the
   functions above.

KR> See my first comment re: compression. I'm dithering on how best to
present these cases that are going to trigger some folks. ;-)

KM> Yes, this one is a hot button.  For the compression one, I clarified
with added text and a use case that Brandon provided.


3.1.  Management Access Security
   ...
   Application service providers, by their very nature, control the
   application endpoint.  As such, much of the information gleaned from
   sessions are still available on that endpoint.  However, when a gap
   exists in the application's logging and debugging capabilities, this
   has led the application service provider to access data-in-transport
   for monitoring and debugging.

BW> How is DLP part of the management access discussion? It seems like a
separate use case to me. The above two paragraphs seem out of place in this
section.

KM> Good point, I moved this to the SP Content monitoring of Applications
subsection and added a bullet for DLP.


   Overlay networks (e.g.  VXLAN, Geneve, etc.) may be used to indicate
   desired isolation, but this is not sufficient to prevent deliberate
   attacks that are aware of the use of the overlay network.  It is
   possible to use an overlay header in combination with IPsec, but this
   adds the requirement for authentication infrastructure and may reduce
   packet transfer performance.  Additional extension mechanisms to
   provide integrity and/or privacy protections are being investigated
   for overlay encapsulations.  Section 7 of [RFC7348] describes some of
   the security issues possible when deploying VXLAN on Layer 2
   networks.  Rogue endpoints can join the multicast groups that carry
   broadcast traffic, for example.

BW> I'm a little confused about the overall point of this section. I think
that it might be "Hosted environment sometimes use content inspection to
differentiate between management traffic and service traffic." but I don't
think this point is very clearly stated. Or is there a different central
point?

KM> It was supposed to be management access, but text got added and the DLP
doesn't fit, so that has been moved.  I can reach out to Alia who
contributed the VXLAN text to work that better into the intent of this
section as I can see your point on lack of flow.


   Data center operators may also maintain packet recordings in order to
   be able to investigate attacks, breach of internal processes, etc.
   In some industries, organizations may be legally required to maintain
   such information for compliance purposes.  Investigations of this

KR> I think you'll get a "[citation needed]" from folks on the TLS mailing
list.

KM> I suspect this is one you have that recorded text, you have to maintain
it for chain of custody with investigation handling.  I'll have to figure
out if there is anything that would require the capture, I suspect not, but
could be wrong.


3.2.  Hosted Applications

   Organizations are increasingly using hosted applications rather than
   in-house solutions that require maintenance of equipment and
   software.  Examples include Enterprise Resource Planning (ERP)
   solutions, payroll service, time and attendance, travel and expense
   reporting among others.  Organizations may require some level of
   management access to these hosted applications and will typically
   require session encryption or a dedicated channel for this activity.

KR> I'm not sure how encryption of the management session is relevant to
this doc. The way I've framed this document in my mind is "What information
from flows is being used for network management by entities other than the
endpoints, who have a need-to-know for the cleartext payload?", where
"management" includes things like compliance and satisfying regulatory
requirements.

KM> Leaving as-is for now.  While there is no impact since these sessions
are already encrypted, it is one more connection point to using encryption
successfully?
Al?


3.2.2.  Mail Service Providers
   ...
   STARTTLS ought have zero effect on anti-SPAM efforts for SMTP
   traffic.  Anti-SPAM services could easily be performed on an SMTP
   gateway, eliminating the need for TLS decryption services.  The
   impact to Anti-SPAM service providers should be limited to a change
   in tools, where middle boxes were deployed to perform these
   functions.

KR> Here you're discussing a potential change to the operational technique,
which doesn't match the rest of the subsections.

KM> You're right.  This text came in from Stephen when he was the sonsoring
AD.  The point is valid, but doen't fit exactly, but I'm hesitant to remove
it.


3.3.  Data Storage

BW> I'm having trouble with the Data Storage section at large. For the most
part, it seems to be describing use cases for the deployment of encryption,
as opposed to things that people are doing today that would be harder if
the network flows were encrypted.

KM> Flow were encrypted in recent years per customer demand and the
engineers worked to ensure monitoring was possible, improving logging,
etc.  I'll try to make that more clear in the introduction.  This is a
positive example, at least it was meant that way and for completeness.


3.3.1.  Object-level Encryption

KR> End-to-end or object encryption seems like a better description here:
host-level encryption implies that anything on the originating or target
host has access to it, when (for instance) it could be encrypted to a TPM
resident key or an SGX enclave key. The distinguishing question seems to be
"Do middleboxes/intermediate nodes have access to the cleartext?"

KM> You're right.  The term host-level is an internal one and shouldn't be
in the document.  I'll fix that, thanks.  No, middleboxes don't have access
in the EMC use cases at least.  This is specific to object level encryption.


3.3.1.1.  Monitoring for Hosted Storage

BW> This is one of the few subsections that seems to be describing an
method in current use that will be made more difficult if all the data
flows are encrypted.


3.3.2.1.  Monitoring Session Flows for DAR Solutions

   Monitoring for transport of data to storage platforms, where object
   level encryption is performed close to or on the storage platform are
   similar to those described in the section on Monitoring for Hosted
   Storage.  The primary difference for these solutions is the possible
   exposure of sensitive information, which could include privacy
   related data, financial information, or intellectual property if
   session encryption via TLS is not deployed.  Session encryption is
   typically used with these solutions, but that decision would be based
   on a risk assessment.

BW> What would be a monitoring use case in current use that will be made
more difficult due to the increased use of encryption? The previous
sentence seems to be suggesting that session encryption is already
prevalent, so I would think that people deploying DAR solutions have
already come up with different monitoring approaches.

KM> The storage engineers improved monitoring capabilities through logging,
etc.  This was a requirement, so they made it happen with full agreement on
direction.


   There are use cases where DAR or disk-level
   encryption is required.  Examples include preventing exposure of data
   if physical disks are stolen or lost.

KR> I don't see these last two sentences are relevant, as they have nothing
to do with the network flows.

KM> I'm happy to remove.  DO they help a reader who is not familiar with
the technology to understand the layers of encryption used at all or is it
better to remove the sentence?


   In the case where TLS is in
   use, monitoring and the exposure of data is limited to a 5-tuple.

KR> This is an example of implicit use of the taxonomy I referred to
earlier. I feel like this should be used systematically throughout the
survey (i.e., what metadata does your technique rely on, and why?).

KM> That was the intent, but we didn't get that information from enough
contributors.  I provided the above in a very early draft.  We'll have to
go back through and see if that's possible at this point.  I know it would
be helpful.


3.3.3.1.  Monitoring Of IPSec for Data Replication Services

   Monitoring for data replication services are described in this
   subsection.

   Monitoring of data flows between data centers may be performed for
   security and operational purposes and would typically concentrate
   more on operational aspects since these flows are essentially virtual
   private networks (VPN) between data centers.  Operational
   considerations include capacity and availability monitoring.  The
   security monitoring may be to detect anomalies in the data flows,
   similar to what was described in the "Monitoring for Hosted Storage
   Section".  If IPsec tunnel mode is in use, monitoring is limited to a
   2-tuple, or with transport mode, a 5-tuple.

BW> What monitoring is done when encryption is not in use? Is there
something being done that requires access to higher layer protocols? If
this traffic is already most-often encrypted, then maybe the use case isn't
relevant for this document.

KM> It's just here for completeness

   Security monitoring in the enterprise may also be performed at the
   endpoint with numerous current solutions that mitigate the same
   problems as some of the above mentioned solutions.  Since the
   software agents operate on the device, they are able to monitor
   traffic before it is encrypted, monitor for behavior changes, and
   lock down devices to use only the expected set of applications.
   Session encryption does not affect these solutions.  Some might argue
   that scaling is an issue in the enterprise, but some large
   enterprises have used these tools effectively.

KR> This is another example of mixing proposed solutions in among the
problem statement. I would argue for a clear separation, which may mean
that this document needs to have a single-minded focus on "here are the
problems and here's how enterprises currently address them."

BW> Also, enterprises increasingly allow BYOD programs for their employees,
and such programs make it more difficult to ensure that adequate
endpoint-based defenses are active. This is especially true when the area
of risk in question is the above #5 "track misuse and abuse by employees".
Note too that endpoint-based defenses can be less effective when the device
is already compromised, in which case detection of the compromised device
and effective remediation can be made more effective through the additional
use of an on-path element.

KM> [made some subsequent edits to this section]


4.1.3.2.  TCP Pipelining/Session Multiplexing

   TCP Pipelining/Session Multiplexing used mainly by middle boxes today
   allow for multiple end user sessions to share the same TCP
   connection. [This rasises several points of interest with an
            increased use of encryption.  TCP session multiplexing should
still
            be possible when TLS or TCPcrypt is in use since the TCP header
            information is exposed leaving the 5-tuple accessible.  The use
            TCP session multiplexing of an IP layer encyption, e.g. IPsec,
            that only exposes a 2-tuple would not be possible.
Troubleshooting
            capabilities with encrypted sessions from the middlebox may
limit
            troubleshooting to the use of logs from the end points
performing
            the TCP multiplexing or from the middleboxes prior to any
            additional encryption that may be added to tunnel the TCP
multiplexed
            traffic.]
   KM> I deleted the following, after adding in some of this content above
where I think it makes more sense:
   [Today's network troubleshooter often relies upon session
   decryption to tell which packet belongs to which end user, since the
   logs are currently inadequate for the analysis performed.]

KR> It's not clear to me how these two sentences are related.

KM> Good point.  I think deleting the second sentence is important as it's
a general point and not specific to this section.  E2E with FS should
prevent the use of this multiplexing as I read it, so that is an impact
that should be fine to document on it's own. [edits subsequently above;
maybe best viewed from the etherpad]


   Increased use of HTTP/2 will likely further increase the prevalence
   of session multiplexing, both on the Internet and in the private data
   center.[  HTTP pipelining requires both the client and server to
participate
   and visibilty of packets once encrypted will hide the use of HTTP
pipelining for any monitoring that takes place outside of the endpoint or
proxy solution.  Visibility for middleboxes includes anything exposed by
TLS and the 5-tuple.
   Note: left the text like this as SNI encryption will be optional, so it
may also be exposed and this could vary by version used.]

KR> And further complicate analysis of cleartext payloads from individual
packets.


4.2.  Techniques for Monitoring Internet Session Traffic
   ...
   (PII), or personal health information (PHI).  Various techniques are
   used to intercept HTTP/TLS sessions for DLP and other purposes, and
   are described in "Summarizing Known Attacks on TLS and DTLS"
   [RFC7457].  Note: many corporate policies allow access to personal
   financial and other sites for users without interception.  Another
   option is to terminate a TLS session prior to the point where
   monitoring is performed.

KR> The last two sentences seem like a non-sequitur.

KM> Leaving as-is for now.


5.4.  Botnets

   Botnet detection and mitigation is complex and may involve hundreds
   or thousands of hosts with numerous Command and Control (C&C)
   servers.  The techniques and data used to monitor and detect each may
   vary.  Connections to C&C servers are typically encrypted, therefore
   a move to an increasingly encrypted Internet may not affect the
   detection and sharing methods used.

KR> This is one of a general category of traffic that is intentionally
protected from interference by the application.

BW> For that reason, this one almost seems like a counter example ...
botnets encrypt to evade many of the earlier described methods, therefore
many of the earlier described methods are already inadequate. On the other
hand, maybe the point is that increased use of encryption for botnet C&C
demands new methods to handle many of the previously described use cases.

KM> Well, from an incident investigation standpoint, many advanced
operators have found detection methods that don't hinder their capabilities
when traffic is encrypted.  Maybe I need to make this text more clear as I
think I may have written it, not positive.

I think I'm going to leave the text alone as I think it reads as a positive
example, one which techniques could be applied to other areas.  The DoS
section talks a little about use of fingerprinting traffic already, which
is one common technique.


5.7.  Further work

   Although incident response work will continue, new methods to prevent
   system compromise through security automation and continuous
   monitoring [SACM] may provide alternate approaches where system
   security is maintained as a preventative measure.

KR> Not clear how the unknowns relate to the purpose of this document.
Being sarcastic for a minute, I'm interpreting this as "Any cleartext
metadata just *might* be used in the future for some kind of enterprise
security monitoring!"

KM> Hmm, it's meant to say endpoints (which you control) should be used and
technology like what is expected out of SACM will help with automating
this.  We are open to text suggestions.


6.1.  IP Flow Information Export
   ...
   The collection of IPFIX data itself, of course, provides a point of
   centralization for potentially business- and privacy-critical
   information.  The IPFIX File Format specification [RFC5655]
   recommends encryption for this data at rest, and the IP Flow
   Anonymization specification [RFC6235] defines a metadata format for
   describing the anonymization functions applied to an IPFIX dataset,
   if anonymization is employed for data sharing of IPFIX information
   between enterprises or network operators.

KR> I don't understand how IPFIX relates to the purpose of this document.
Each of the IEs should be covered by one of the functional categories
described earlier in the document.

KM> Leaving as-is for now.  Was added per Benoit and Brian Trammell for
completeness on network management and encryption.


6.4.  Content Length, BitRate and Pacing

   Although block ciphers utilise padding, this makes a
   negligible difference.  Bitrate and pacing are generally application
   specific, and do not change much when the content is encrypted.
   Multiplexed formats (such as HTTP/2 and QUIC) may however incorporate
   several application streams over one connection, which makes the
   bitrate/pacing no longer application-specific.

KR> Are the four items listed here all the application-level flow
information available to the network for encrypted flows? I think I need
some more comprehensive top-down analysis to be confident of that. It seems
like this could be folded into the metadata taxonomy, if you decide to go
that route.

This section is also weird in that it doesn't describe a problem caused by
encrypted flows. There's a lack of parallelism in structure between the
other sections and this one.

KM> Good point, but not my area and am open to suggestions.  Is it just
anothe example where things are ok when encrypted?  I think a few of those
are a good thing.


7.  Impact on Mobility Network Optimizations and New Services

   This section considers the effects of transport level encryption on
   existing forms of mobile network optimization techniques, as well as
   potential new services.  The material in this section assumes
   familiarity with mobile network concepts, specifications, and
   architectures.

KR> Good warning, but the entire section is very specific to a particular
set of technologies, while the rest of the document is much more general.
This feels like a case of "writing to what you know", which is fine in
principle, but it feels out of place in this document to use so much jargon
and to discuss metrics in the context of a single technology when many of
these KPIs have equivalents outside of 3GPP.


   c.  Performance-enhancing proxy with low RTT determines the
       responsiveness of TCP flow control, and enables faster adaptation
       in a delay & capacity varying network due to user mobility.  Low
       RTT permits use of a smaller send window, which makes the flow
       control loop more responsive to changing mobile network
       conditions.

KR> Again, as with section 2.2.4, this section provides the clearest and
most convincing arguments for the need for middlebox cooperation on flows.
[OPSAWG] draft-mm-wg-effect-encrypt-13 review Kyle Rose
Re: [OPSAWG] draft-mm-wg-effect-encrypt-13 review Kyle Rose