Re: [Uta] MTA-STS-03 review

Viktor Dukhovni <ietf-dane@dukhovni.org> Wed, 22 March 2017 19:18 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\))
From: Viktor Dukhovni <ietf-dane@dukhovni.org>
In-Reply-To: <CANtKdUfOZYSr_SuGHdDHHgrF8J5VjEWwVw_7KC2xS5DrCKhu-w@mail.gmail.com>
Date: Wed, 22 Mar 2017 15:18:30 -0400
Content-Transfer-Encoding: quoted-printable
Reply-To: uta@ietf.org
Message-Id: <46C421CB-1189-4493-B322-5A214D6A6EE9@dukhovni.org>
References: <4C0807DA-4852-4DAC-80ED-8A25371CFFAA@dukhovni.org> <CANtKdUfOZYSr_SuGHdDHHgrF8J5VjEWwVw_7KC2xS5DrCKhu-w@mail.gmail.com>
To: uta@ietf.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/uta/MXrQmeSeIqmFjbnpeJFDgcGA_VM>
Subject: Re: [Uta] MTA-STS-03 review
Precedence: list

> On Mar 22, 2017, at 8:39 AM, Daniel Margolis <dmargolis@google.com> wrote:
> 
>>>     o  Policy Domain: The domain for which an STS Policy is defined.
>>>        (For example, when sending mail to "alice@example.com", the policy
>>>        domain is "example.com".)
>> 
>> Perhaps this should mention that more generally this is the nexthop
>> domain for SMTP delivery, and when mail routing is preëmpted by
>> by explicit relays via local policy, the Policy domain is the
>> domain name of the explicit relay (prior to any MX lookups if
>> applicable).
> 
> We mention this explicitly in section 3.4. 

I think that the *terminology* section needs a reasonably complete
defintion, or at least a forward-reference.

>>>     DANE requires DNSSEC [RFC4033]
>>>     for authentication; the mechanism described here instead relies on
>>>     certificate authorities (CAs) and does not require DNSSEC.  For a
>>>     thorough discussion of this trade-off, see the section _Security_
>>>     _Considerations_.
>> 
>> The Security Considerations section (once again this is not MarkDown)
>> will need to properly discuss the downgrade exposure of STS on first
>> contact, and this trade-off probably should be mentioned here up-front.
> 
> We mention this (briefly) already, but maybe don't do it justice. In
> particular, while we say that "[t]he sender policy cache is designed
> to resist this attack," we don't call out explicitly the implied benefit
> of longer max_ages and longer caching.

And I think don't give sufficient coverage to MiTM exposure on
first-contact.

>>>     In addition, SMTP STS provides an optional report-only mode, enabling
>>>     soft deployments to detect policy failures.
>> 
>> It is easy to define a couple of new "certificate usage" code-points
>> for DANE that similarly signal "soft-fail" (with reporting).  Would
>> the group strongly support a draft along those lines?  I am not sure
>> that a merely "tamper-evident" operating mode is a good idea as a
>> persistent operating state, and I think that the ability to have
>> one or more (backup, i.e. worse preference) MX hosts without TLSA
>> records adequately addresses the need for incremental deployment.
> 
> Agreed. It's a difference from DANE (necessitated in part by the fact
> that DANE policies are per-MX rather than, as with STS, per-domain,
> so DANE has a different mechanism to allow incremental rollouts). I
> would not suggest it's better or worse, but supporting soft failures
> in this matter seems the easiest way to support testing a policy in
> the STS model, no?

Well, my question is whether the group wants to see additional
"certificate usage" code-points in DANE to support "report"
mode, not a suggestion to change the STS specification...  The
STS "report" mode is roughly all one can do, though of course
a more complex policy could express per-MX "enforce" bits, but
I think that would be a mistake for STS.

So if anyone has definite views on adding "soft fail" for DANE TLSA,
please speak up!

>> Here, as discussed on the list, serious consideration should be given
>> to changing the semantics from validating the MX hostname to specifying
>> the names allowed in the server's certificate.  This simplifies MX processing
>> (which remains unmodified) and merely changes the conditions under which an
>> MX host is considered suitably authenticated per the policy.
>> 
>> Which, for example, allows the MX hosts to share a single certificate
>> with the destination domain (and not the MX hostname) as its DNS SAN.
>> If the policy lists that (shared) name as what's expected in the certificate,
>> then authentication succeeds when the MX host's certificate matches one of
>> the allowed names.
> 
> That's interesting. I think it's not actually that big a change--we just
> get rid of the MX host filtering and instead just apply the logic to
> validation.

Correct.

>> Furthermore, I think that using "*.example.com" in what are essentially
>> (rfc6125) reference identifiers creates confusion with similar-looking, but
>> semantically distinct wildcard names in certificates.  My suggestion was
>> ".example.com" for sub-domain matching, with the policy optionally specifying
>> whether only single-label prefixes are accepted, or whether in fact multi-label
>> prefixes are also valid.
> 
> Well, I think our matching is a subset of the matching defined in RFC6125
> section 6.4.3. We could specify that we support the full policy; it seemed
> worth a simplification (i.e. we don't support 'something*.domain.com', only
> '*.domain.com') to make implementation easier, but at cost of a semantic
> difference. If we get rid of the wildcard, I suppose it is indeed more obvious
> that we're just doing only suffix matching. Was that your point?

Observe that with the current "mx" constraints these are compared only
with the literal MX host names, which are in turn (as literals) compared
with the certificate.  So the literal MX hostname is on the one-hand
compared against the STS policy (with potential wildcards) and on the other
hand the server certificate (with potential wildcards).

If there's a switch to "san", the comparison of "san" is directly against the
certificate, which introduces the possibility of wildcard-to-wildcard
comparison.

Thus, my point is that there are two type of names with potential
wildcards in this context.  There are DNS SAN entries in certificates
(presented identifiers) whose semantics are (partly) defined in RFC6125.
RFC6125 leaves room for application-specific behaviour with respect to CN-ID
and exactly which kinds of wild-cards, if any, are supported.  Then
there are also "reference identifiers" which RFC6125 does not envision
as anything other than plain literals.

This draft casually treats "mx" constraints as though they were presented
identifiers from the certificate certificate, and I think this could lead
to user confusion, but this could be explained by noting that both kinds
of patterns are compared against the MX hostname, and are then subject
to similar interpretation.

If however, the design is switched to "san" constraints, then there
should be more care to distinguish "reference identifiers" from
"presented identifiers".

We tried to avoid confusion in Postfix by using ".example.com" as the
wildcard form for reference identifiers.  So in the "san" design there
would need to be some discussion of how to match reference identifiers
against presented identifiers when the reference identifier is a wildcard
form and the presented identifier may also be a wildcard form.

>> FWIW, I would have chosen something less fancy than JSON for
>> what is clearly a simple array of attribute/value pairs.
> 
> I seem to recall that we started with simple key-value pairs and
> moved to JSON because people variously seemed to think this was
> easier (since parsing libraries are common). I would also consider
> the vague possibility that in future iterations we wish to add
> fields.

Which a good reason to have name/value pairs rather than a simple
ordered list of values with per-index semantics, but perhaps the
instinct to choose JSON is over-engineering, and forces MTAs to
include novel (to the MTA SMTP stack) technologies.  A much
simpler (and already common in MTAs) encoding would certainly
suffice.  I realize it is late in the evolution of the spec, but
perhaps there is still time to reconsider.

This is not critical, I'd likely have the transformation from
JSON to something more natural happen in the software that imports
data into the policy cache, so that the cache is more MTA friendly.
Even if policy processing is out of the main MTA executable (Postfix
is not monolithic like Sendmail or Exim) there is still a new JSON
library dependency on the MTA package as a whole, and MTAs are part
of base-system images for various operating system distributions
(Postfix is part of "base" in NetBSD for example).  So I really
do think that JSON should be reconsidered.

>> Here, it would be prudent to mention rfc6125, and be explicit about
>> the acceptable wildcard patterns in that certificate.  See, for example:
>> 
>>    https://tools.ietf.org/html/rfc7672#section-3.2.3
>> 
>> which limits certificate wildcards to the entire first label only.  I
>> would like to avoid supporting more general wildcards in new
>> specifications.
> 
> We specify this in 4.1 (as you discovered below), but maybe that's too late.

Repetition is good, not everyone reads each RFC from cover to cover.
Try to make each section as self-contained as reasonably possible,
or provide references to other relevant content where applicable.

>> The reference to rfc6125 is rather unfortunate here, since it defines
>> semantics for presented identifiers, not reference identifiers.
> 
> If this field is used to validate the server certificate (and not
> the MX hostnames) as you suggested above, that would entail validating
> the presented identifier, which I think resolves your comment.

So this draft should probably align with RFC7672 in permitting only
full-label wildcards (*.example.com) and not partial-label wildcards
(such as mail*.example.com) in presented identifiers of servers that
publish STS policy.  Separately, you should define the allowed wildcard
forms for ("mx" or "san") and perhaps even consider allowing multi-label
matches for those (another policy field?).  Some domains have MX hosts
like:

	mx01.<site1>.example.com
	mx02.<site1>.example.com
	mx01.<site2>.example.com
	...

and might not want to "freeze" all the sites in the policy.  They
may be content with ".example.com" covering more than one label.

>> Here, if the proposal to switch from "mx" to "san" is taken up, the
>> peer certificate is no longer restricted to authenticate the MX hostname,
>> rather, one its presented identifiers needs to match one of the reference
>> identifiers fro the "san" policy attribute.  Either way some discussion
>> is appropriate of how to perform matching when both the reference and
>> the presented identifiers are "wildcards".
> 
> Ah, good point--this is a bit of additional complexity implied by your
> suggested change.

As explained above, if there's a switch to "san", the comparison of "san"
values is directly against the certificate, which introduces the possibility
of wildcard-to-wildcard comparison.

> One option would be to merely disallow wildcards in the "mx" (or "san")
> field.

That's probably too restrictive.  If anything, I'd be more supportive
in disallowing wildcards in the certificate!  Certificate wildcards
encourage poor practices (all eggs in one basket when the shared
certificate fails, all the MX hosts fail, and wildcard certs
enable redirection attacks between services that share the same
certificate and application protocol, say HTTP).

> Another: Specify our own matching rules for wildcards on both
> identifiers. It's not really that complicated, but I don't favor
> making people write this and possibly make mistakes.

[ FWIW, already implemented in Postfix,
  http://www.postfix.org/postconf.5.html#smtp_tls_secure_cert_match ]

>>>     When a message fails to deliver due to an "enforce" policy, a
>>>     compliant MTA MUST check for the presence of an updated policy at the
>>>     Policy Domain before permanently failing to deliver the message.
>>>     This allows implementing domains to update long-lived policies on the
>>>     fly.
>> 
>> This seems to suggest the contrary, i.e. that it might be appropriate
>> to bounce on policy failure when a more fresh policy still fails.
>> However certificate problems (expiration most typically) are transient,
>> and a receiving system operator should/may notice the problem before
>> outstanding messages expire and bounce.  So I would urge implementors
>> to queue, rather than bounce, on authentication failure.
> 
> Rather, the point of this was to encourage implemenors to _not_ bounce
> until at least updating the policy once, not to encourage them to bounce
> _after_ such an update. So I think we agree, but the wording may be
> suboptimal.

Yes, please clarify.

>>> Step 0:  If the sending MTA is DANE-capable, the destination is
>>> DNSSEC signed, and one or more of the MX hosts have DANE TLSA
>>> records, then DANE TLSA preempts STS (more downgrade-resistant,
>>> works on first contact, ...).
>> 
>> This is another reason to avoid STS-based MX filtering, with DNSSEC
>> signed destinations the security of the MX records is sufficiently
>> established, and we need to be able to proceed with DANE MX by MX,
>> using STS policy only for MX hosts with no TLSA records.
> 
> With the suggested change to use STS only to validate the server
> certificates and not to do MX filtering, this is obviated, no? I
> don't really favor specifying whether senders should do an AND or
> an OR in applying DANE and STS when both are present, and with
> that change it no longer implies any incompatibilities in MX selection.

Well, this text goes away, and the decision moves down to the TLS
handshake step, when deciding whether the host is adequately
authenticated.  I would argue that an STS policy of "report"
should not preempt a non-soft-fail DANE policy.

Note that DANE has one, perhaps non-obvious, "feature" in this space.
When TLSA records are present, but are all "unusable", authentication
is not enforced, but STARTTLS is still required.  Thus, while the STS
"report" mode also tolerates cleartext transmission, DANE TLSA with
"unusable" TLSA records does not.  Here too, I would not like to see
STS "report" downgrade DANE to cleartext when both are published.

Perhaps this draft need not be fully prescriptive about how the
two mechanisms interact, but raising the issue is warranted, so
that implementors make sensible decisions, and/or provide users
with appropriate controls.

>>>                  "max_age": 123456
>> 
>> Note, the "max_age" here is barely over a day, which is not consistent with
>> the recommended policy lifetimes.  Even "report" policies should probably
>> have better downgrade protection through longer lifetimes (say a week or more).
> 
> I think that max_age was chosen because of the sequential nature of
> the numbers and not its value. ;) But point taken.

Add another digit or two, and you're set.  An appealing looking number
would be 90 days or 7776000 seconds.

>> Should that be "complaint" or "compliant"? :-)

I hope you'll fix the typo.

-- 
	Viktor.

[Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Jeremy Harris
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review David Illsley
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Alberto Bertogli
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Alberto Bertogli
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis