[TLS] ALPN concerns

Brian Smith <brian@briansmith.org> Tue, 05 November 2013 22:21 UTC

Return-Path: <brian@briansmith.org>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1080611E8103 for <tls@ietfa.amsl.com>; Tue, 5 Nov 2013 14:21:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.977
X-Spam-Level:
X-Spam-Status: No, score=-2.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gP8MD5vzCfr4 for <tls@ietfa.amsl.com>; Tue, 5 Nov 2013 14:21:31 -0800 (PST)
Received: from mail-qc0-f171.google.com (mail-qc0-f171.google.com [209.85.216.171]) by ietfa.amsl.com (Postfix) with ESMTP id E150A11E81DE for <tls@ietf.org>; Tue, 5 Nov 2013 14:21:26 -0800 (PST)
Received: by mail-qc0-f171.google.com with SMTP id i7so5279437qcq.30 for <tls@ietf.org>; Tue, 05 Nov 2013 14:21:26 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=SQx89Ito6gJn3WF+g2IA6+J8Au3W3+Q/kmtapLob0rI=; b=LzYW4Ix6QO4oRIWT82q8q06+syReLE682kK5sM2B6pPrH2rMhvzmj55o3PlW6zcDGw 4IBvD46/5SHPiZbjDcW6uhGJ3htEPV4HTwxsNRHjSjLbwQCGM4G5K+eYpU1VcUBPPaaV /nDns+ZiYC6eAomBlUp8jreDAVeCBmnj2qSfCKCPR8TKyLS12E3NObIJD6hbjzxWlRxK YY/UGUUizwSmbPssspRGFBdRmtTx1+6PlefyvYC0paj3Pak4LKcRXzYhW7+MNLwqTgg+ bCricunlCgQJnbbo038dvsZajNAKD2/ihGz978vj3Y1ioYmbGugz0nQutgPeuiYdjG9w 8tgw==
X-Gm-Message-State: ALoCoQkLUu7eUv/mFDMvyV7RgrVGt2PVcIFYeC+9F9jI5q6WnSpjE6c/aM1gYWFqPO3ZrYVBV6oy
MIME-Version: 1.0
X-Received: by 10.49.94.71 with SMTP id da7mr33527826qeb.22.1383690085946; Tue, 05 Nov 2013 14:21:25 -0800 (PST)
Received: by 10.224.38.5 with HTTP; Tue, 5 Nov 2013 14:21:25 -0800 (PST)
X-Originating-IP: [2001:67c:370:144:443a:a6a:235a:f1d3]
Date: Tue, 5 Nov 2013 14:21:25 -0800
Message-ID: <CAFewVt7-+e-e82LA3iPWOuoudRqCCk23uyf0w5+aXSFsAv64GA@mail.gmail.com>
From: Brian Smith <brian@briansmith.org>
To: "<tls@ietf.org>" <tls@ietf.org>
Content-Type: text/plain; charset=UTF-8
Subject: [TLS] ALPN concerns
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tls>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Nov 2013 22:21:37 -0000

Hi all,

First, I apologize for sending this message at the last minute. I know
that the delay has caused some people come inconvenience and/or
consternation. I actually feel like I still don't have enough
information to make this email as helpful as it could be, but I think
waiting longer to collect more information is going to cause more harm
than good. I will follow up with more information as soon as I have
it.

I have been watching the efforts of my Google counterparts, who I work
on NSS with, regarding their deployment of ALPN and TLS 1.2.  I am
very concerned about the issues that they've run into where many web
servers are failing to handshake when the ClientHello message is
larger than 255 bytes. See [1] and [2] for some (not all) of the
information. Just yesterday, I got some numbers from Kurt Roeckx, who
relayed them from Ivan Ristic, that around 2.9% of the web servers
surveyed on the internet have this problem [3]. This is higher than
all the TLS 1.2/1.1/1.0 version intolerance measured in that survey
*combined*. That is well above the threshold where web browser makers
become very concerned.

Note that the failure mode for this is "the server does not respond."
This is the worst possible failure mode because it can only be dealt
with by implementing some kind of timeout for the handshake, that
falls back to a handshake with a smaller ClientHello. At Mozilla we
have extensive experience with this kind of timeout-based fallback
logic and we hate it due to too many false positives on poor networks.
It means that people on very poor connections lose the ability to
negotiate any TLS extensions, including in particular the SNI
extension and elliptic-curve-related extensions. In fact, we hate this
timeout-based fallback so much that we just removed it. (AFAICT, we
were the only browser to still be implementing it.) We don't want to
bring it back. Even if we were to decide to bring it back and accept
the false positives, a look at the list that Adam published ([1]
again) shows that we'd be impacting some very important sites with
horrible latency, which would almost definitely be unacceptable for
us.

We also know from prior experience that it is very difficult to get
people to deploy newer firmware and software updates to fix these
issues. Google tried a long time ago when similar issues were causing
False Start to fail, with the same failure mode. Adam Langley did
extensive lobbying to try to get these issues resolve, but they
weren't successful at convincing enough sites to do so. Consequently,
Google had to disable False Start for almost every website except
google.com, which was a major (30%, according to them) performance hit
for the affected sites. Note that Google also tried a blacklist
approach for dealing with False Start and that wasn't successful
either.

Consequently, it is very hard for me to see how I can interpret the
current information I have as anything other than "Firefox must make
sure its ClientHello stays under 256 bytes for some indefinite, but
almost certainly long, period of time." Right now, Firefox is doing
fine at staying under that limit. However, If we add support for ALPN
we're likely to go over that limit frequently. Further, even if we
could squeeze the ALPN extension in *now*, it may mean that we
effectively cannot deploy any more TLS extensions going forward, for
some indefinite period of time. That is unacceptable to me.

In particular, you may have seen the draft that EKR just posted [4].
Note how that draft adds significant amounts of information to the
ClientHello. When I talked about this with Patrick McManus and EKR, my
initial conclusion is that we can probably squeeze just enough
information into the ClientHello to stay under this artificial
256-byte limit, but only if we don't do ALPN. Or, put another way,
currently it looks like my choice is between the upcoming TLS 1.3
one-roundtrip handshake and ALPN, at least for an indefinite and
currently-unbounded period of time.

In the HTTPbis working group session earlier today, we noted that
besides certificate-related issues that are slowing HTTPS adoption,
there are significant performance concerns that also slow HTTPS
adoption. I view the TLS 1.3 one-roundtrip handshake as a way of
greatly improving the default level of TLS-induced latency, to help in
this regard. That is, I think that we can design the TLS 1.3
one-roundtrip handshake in a way that is safe and bulletproof enough
that we can have it enabled in the default configuration of all web
servers without requiring additional system administrator
configuration. In other words, we'll be able to do with a single
OpenSSL update what we haven't been able to do successfully enough
with False Start. I think that would be a huge win in the fight to
make encryption as pervasive as surveillance is.

I want to do whatever I can do to ensure we get that win, even if it
means not doing ALPN. We are very fortunate that every major browser,
and many server-side implementations, already implemented an
alternative to ALPN that avoids the compatibility risk: NPN. I don't
think NPN is perfect. But, NPN *works*, and it works *today*, and it
has been demonstrated to not cause any compatibility issues. So, from
a technical standpoint, NPN is the zero-risk, zero-effort choice and
thus greatly preferable.

Here at IETF 88 we're spending a lot of effort to figure out what to
do about the pervasive passive evesdropping problem by effectively
trying to figure out how to encrypt as much of the internet as is
feasible. I am having a hard time squaring this with the effect that
switching from NPN to ALPN would have. We're effectively asking web
browsers to *stop* encrypting information in the handshake that they
are currently encrypting. It seems backwards. I think it is well worth
considering, given what we've all read in the news between IETF 86 and
IETF 88, whether we should be extra, especially careful about leaking
anything in cleartext that we can possibly encrypt. Further, I think
we should be careful to avoid setting precedent for leaking more and
more cleartext information in protocols that are designed to be
secure. I hate slippery-slope arguments as much as anybody, but we've
actually already seen this happen in the HTTP WG, where people are now
suggesting that ALPN be used to negotiate additional information
beyond HTTP 1.x vs. HTTP 2.0. I feel like saying that it is OK to
provide a cleartext outlet for this information is likely going to be
a mistake analogous to how the TLS 1.1 spec was wrong in saying "This
leaves a small timing channel, since MAC performance depends to some
extent on the size of the data fragment, ut it is not believed to be
large enough to be exploitable" regarding the verification of MACs.
Just because we cannot find a problem now doesn't mean we won't find a
problem later. We should default to the safer choice.

Unfortunately, being at odds with the IETF TLS working group by
preferring to do NPN and not ALPN is a sucky situation to be in--for
me personally, and for my organization, and for others on this working
group. So, I would like to find some way for us to reconsider whether
NPN is really so bad that we (the TLS WG and implementers) must
immediately pay the costs of resolving the significant issues with
ALPN now, when we don't understand the full compatibility impact of
ALPN on current implementations and/or on our ability to deploy TLS
1.3 in the future.

I am very interested in what other implementers think about the issues
I brought up:

1. Is ALPN causing too many compatibility issues now?
2. Is it going to be difficult to deploy much-needed TLS 1.3
functionality later if we deploy ALPN first?
3. Is it really a good idea to have TLS encrypt less application data
instead of of more, especially considering that IETF and TLS 1.3 are
moving in the other direction overall.
4. Is it really problematic, other than the unfortunate politics, for
web browsers to carry on using NPN, either in addition to, or instead
of ALPN?

Thanks for your consideration.

Cheers,
Brian
-- 
Mozilla Networking/Crypto/Security (Necko/NSS/PSM)

[1] https://www.imperialviolet.org/2013/10/07/f5update.html
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=923696
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=733647#c48
[4] http://tools.ietf.org/html/draft-rescorla-tls13-new-flows-00#section-5.1