Re: [Uta] MTA-STS-03 review

Daniel Margolis <dmargolis@google.com> Wed, 22 March 2017 12:40 UTC

MIME-Version: 1.0
In-Reply-To: <4C0807DA-4852-4DAC-80ED-8A25371CFFAA@dukhovni.org>
References: <4C0807DA-4852-4DAC-80ED-8A25371CFFAA@dukhovni.org>
From: Daniel Margolis <dmargolis@google.com>
Date: Wed, 22 Mar 2017 13:39:54 +0100
Message-ID: <CANtKdUfOZYSr_SuGHdDHHgrF8J5VjEWwVw_7KC2xS5DrCKhu-w@mail.gmail.com>
To: uta@ietf.org
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha-256"; boundary="001a1140f2c2cee2cc054b510e8f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/uta/kOjcYM11IAtTDoLoK01H-Y3zJpE>
Subject: Re: [Uta] MTA-STS-03 review
Precedence: list

Thanks, Viktor!

Some comments inline, of course. ;)

On Wed, Mar 22, 2017 at 6:04 AM, Viktor Dukhovni <ietf-dane@dukhovni.org>
wrote:

>
> >    While such _opportunistic_ encryption protocols provide a high
>
> While "_" may work as italics in MarkDown, you'll need to change all
> these to RFC XML (where IIRC <i>...</i> may be supported, but don't
> take my word for it, check).
>

(This is just some mistake in our markdown->xml transformation. Beats me,
but good catch.)

>
> >    o  Policy Domain: The domain for which an STS Policy is defined.
> >       (For example, when sending mail to "alice@example.com", the policy
> >       domain is "example.com".)
>
> Perhaps this should mention that more generally this is the nexthop
> domain for SMTP delivery, and when mail routing is preëmpted by
> by explicit relays via local policy, the Policy domain is the
> domain name of the explicit relay (prior to any MX lookups if
> applicable).
>

We mention this explicitly in section 3.4.

>
> >  DANE requires DNSSEC [RFC4033]
> >    for authentication; the mechanism described here instead relies on
> >    certificate authorities (CAs) and does not require DNSSEC.  For a
> >    thorough discussion of this trade-off, see the section _Security_
> >    _Considerations_.
>
> The Security Considerations section (once again this is not MarkDown)
> will need to properly discuss the downgrade exposure of STS on first
> contact, and this trade-off probably should be mentioned here up-front.
>

We mention this (briefly) already, but maybe don't do it justice. In
particular, while we say that "[t]he sender policy cache is designed to
resist this attack," we don't call out explicitly the implied benefit of
longer max_ages and longer caching.

> >    In addition, SMTP STS provides an optional report-only mode, enabling
> >    soft deployments to detect policy failures.
>
> It is easy to define a couple of new "certificate usage" code-points
> for DANE that similarly signal "soft-fail" (with reporting).  Would
> the group strongly support a draft along those lines?  I am not sure
> that a merely "tamper-evident" operating mode is a good idea as a
> persistent operating state, and I think that the ability to have
> one or more (backup, i.e. worse preference) MX hosts without TLSA
> records adequately addresses the need for incremental deployment.
>

Agreed. It's a difference from DANE (necessitated in part by the fact that
DANE policies are per-MX rather than, as with STS, per-domain, so DANE has
a different mechanism to allow incremental rollouts). I would not suggest
it's better or worse, but supporting soft failures in this matter seems the
easiest way to support testing a policy in the STS model, no?

> Given earlier discussion on the list, while new "id" values can give
> expedited signals to refresh the policy, it may be appropriate here,
> (but perhaps later in the document as you see fit) to describe a
> recommended approach to proactively refresh policies prior to their
> pending expiration, this reduces the probability of "gaps" in policy
> coverage if the remote HTTPS service is unavailable just as a policy
> is expiring
>

Agreed.

> Here, as discussed on the list, serious consideration should be given
> to changing the semantics from validating the MX hostname to specifying
> the names allowed in the server's certificate.  This simplifies MX
> processing
> (which remains unmodified) and merely changes the conditions under which an
> MX host is considered suitably authenticated per the policy.

> Which, for example, allows the MX hosts to share a single certificate
> with the destination domain (and not the MX hostname) as its DNS SAN.
> If the policy lists that (shared) name as what's expected in the
> certificate,
> then authentication succeeds when the MX host's certificate matches one of
> the allowed names.
>

That's interesting. I think it's not actually that big a change--we just
get rid of the MX host filtering and instead just apply the logic to
validation.

Downsides:

1. Misconfigurations _may_ be less obvious without connecting to the SMTP
server. That is, today, you can spot some kinds of misconfigurations by
merely looking at the DNS (so for example my test tool here
http://sts.x.af0.net/ first looks at  DNS, and only then tries SMTP
connections).
2. As you say, it requires some changes to server certificate validation,
but not much.

I don't consider #1 compelling. So this seems fairly reasonable to me, but
I will let others chime in.

I've been quite hesitant to entertain any "business logic" changes at this
point if we could avoid it, since we already have some code written, but
like I said, I don't think this is actually a very large change.

>
> It should be made clear that the names here are always in A-label form
> and not U-label form (as they also are in DNS names in certificates).
>

This is specified in 3.2.

> Furthermore, I think that using "*.example.com" in what are essentially
> (rfc6125) reference identifiers creates confusion with similar-looking, but
> semantically distinct wildcard names in certificates.  My suggestion was
> ".example.com" for sub-domain matching, with the policy optionally
> specifying
> whether only single-label prefixes are accepted, or whether in fact
> multi-label
> prefixes are also valid.
>

Well, I think our matching is a subset of the matching defined in RFC6125
section 6.4.3. We could specify that we support the full policy; it seemed
worth a simplification (i.e. we don't support 'something*.domain.com', only
'*.domain.com') to make implementation easier, but at cost of a semantic
difference. If we get rid of the wildcard, I suppose it is indeed more
obvious that we're just doing only suffix matching. Was that your point?

> FWIW, I would have chosen something less fancy than JSON for
> what is clearly a simple array of attribute/value pairs. ESMTP
> already encodes such optional A/V pairs as space separated lists
> of attr=value with xtext encoding as needed (it is not needed for
> any plausible value of the above attributes).
>

I seem to recall that we started with simple key-value pairs and moved to
JSON because people variously seemed to think this was easier (since
parsing libraries are common). I would also consider the vague possibility
that in future iterations we wish to add fields.

De novo, I myself have no real opinion here, because certainly either work,
but of course I'm loath to make changes again. I'd appreciate more feedback
from more readers, though, as these style discussions are always a bit
subjective. I assume that for Postfix policy fetching is a background
process anyway, though, so it isn't (I would hope) too big an
implementation difference for you, right?

> We're not going to add a JSON decoder to the Postfix SMTP client,
> but if this specification retains JSON, then the code that does
> background policy retrieval will need to transform JSON into
> something more friendly to traditional C-based MTA implementations.
>
> > 4.  Policy Validation
>
> Are we validating the policy or using the policy to authenticate
> the MX host?  I think the name of this section is suboptimal.
>
> >    When sending to an MX at a domain for which the sender has a valid
> >    and non-expired SMTP MTA-STS policy, a sending MTA honoring SMTP STS
> >    MUST validate:
>
> s/validate/check/ or /ensure/.
>
> >    1.  That the recipient MX matches the "mx" pattern from the recipient
> >        domain's policy.
>
> See above, perhaps "mx" becomes "san" and the check is deferred until
> it is time to verify the host's certificate.
>
> >    2.  That the recipient MX supports STARTTLS and offers a valid PKIX
> >        based TLS certificate.
>
> Here, it would be prudent to mention rfc6125, and be explicit about
> the acceptable wildcard patterns in that certificate.  See, for example:
>
>    https://tools.ietf.org/html/rfc7672#section-3.2.3
>
> which limits certificate wildcards to the entire first label only.  I
> would like to avoid supporting more general wildcards in new
> specifications.
>

We specify this in 4.1 (as you discovered below), but maybe that's too
late.

>
> >    This section does not dictate the behavior of sending MTAs when
> >    policies fail to validate; in particular, validation failures of
> >    policies which specify "report" mode MUST NOT be interpreted as
> >    delivery failures, as described in the section _Policy_
> >    _Application_.
>
> The first sentence is unfortunate, you are in fact specifying behaviour,
> in the very next sentence.  It might be better to add a forward-reference
> to a section that provides a detailed explanation and not start with a
> disclaimer.
>

Heh. Thanks. ;)

> > 4.1.  MX Matching
> >
> >    When delivering mail for the Policy Domain to a recipient MX host,
> >    the sender validates the MX match against the "mx" pattern from the
> >    applied policy.  The semantics for these patterns are those found in
> >    section 6.4 of [RFC6125].
>
> Assuming this text survives, replace:
>
>         validates the MX match against the "mx" pattern
>
> with something like
>
>         ensures that the MX host name is consistent with the "mx"
>         property of the STS policy (only matching MX hosts are
>         used in delivery attempts).
>
> The reference to rfc6125 is rather unfortunate here, since it defines
> semantics for presented identifiers, not reference identifiers.
>

If this field is used to validate the server certificate (and not the MX
hostnames) as you suggested above, that would entail validating the
presented identifier, which I think resolves your comment.

> Here, if the proposal to switch from "mx" to "san" is taken up, the
> peer certificate is no longer restricted to authenticate the MX hostname,
> rather, one its presented identifiers needs to match one of the reference
> identifiers fro the "san" policy attribute.  Either way some discussion
> is appropriate of how to perform matching when both the reference and
> the presented identifiers are "wildcards".
>

Ah, good point--this is a bit of additional complexity implied by your
suggested change.

One option would be to merely disallow wildcards in the "mx" (or "san")
field. Assuming people do not have full (non-wildcarded) hostnames in their
server certificates, this should work fine. (IOW, for Google, we would
change "*.aspmx.l.google.com" to "mail.google.com" or some other suitable
name.) Upside: easy to use existing validation code to check (e.g.
https://golang.org/src/crypto/tls/common.go?s=12153:20652#L426) Downside:
I guess this would be cumbersome if someone for some reason has a fully
specified hostname as the CN/SAN in their server cert.

Another: Specify our own matching rules for wildcards on both identifiers.
It's not really that complicated, but I don't favor making people write
this and possibly make mistakes.

>    When a message fails to deliver due to an "enforce" policy, a
> >    compliant MTA MUST check for the presence of an updated policy at the
> >    Policy Domain before permanently failing to deliver the message.
> >    This allows implementing domains to update long-lived policies on the
> >    fly.
>
> This seems to suggest the contrary, i.e. that it might be appropriate
> to bounce on policy failure when a more fresh policy still fails.
> However certificate problems (expiration most typically) are transient,
> and a receiving system operator should/may notice the problem before
> outstanding messages expire and bounce.  So I would urge implementors
> to queue, rather than bounce, on authentication failure.
>

Rather, the point of this was to encourage implemenors to _not_ bounce
until at least updating the policy once, not to encourage them to bounce
_after_ such an update. So I think we agree, but the wording may be
suboptimal.

> Step 0:  If the sending MTA is DANE-capable, the destination is
> DNSSEC signed, and one or more of the MX hosts have DANE TLSA
> records, then DANE TLSA preempts STS (more downgrade-resistant,
> works on first contact, ...).
>
> This is another reason to avoid STS-based MX filtering, with DNSSEC
> signed destinations the security of the MX records is sufficiently
> established, and we need to be able to proceed with DANE MX by MX,
> using STS policy only for MX hosts with no TLSA records.
>

With the suggested change to use STS only to validate the server
certificates and not to do MX filtering, this is obviated, no? I don't
really favor specifying whether senders should do an AND or an OR in
applying DANE and STS when both are present, and with that change it no
longer implies any incompatibilities in MX selection.

>
> >    5.  Upon message retries, a message MAY be permanently failed
> >        following first checking for the presence of a new policy (as
> >        indicated by the "id" field in the "mta-sts" TXT record).
>
> I think this is a mistake.  Please do not turn transient errors into
> permanent failures.
>

See above; the intent is not to specify that this should be a perm-fail,
but rather that one should not perm-fail without first checking for a new
policy. I agree on the wording, however.

Finally, some discussion is appropriate of the fact that given weaker
> downgrade-resistance of STS viz. DANE, implementations that support
> both should apply DANE first and STS only in its absence.  Indeed with
> the MX RRset validated via DNSSEC the "mx" policy element is not really
> needed to prevent MX record forgery, however that depends on the outcome
> of the "mx" vs. "san" discussion.
>
> There should probably be a recommendation that sending systems implement
> DANE in addition to STS.  It is much easier to use a validating resolver
> than to sign one's own domain, and so the barrier to option of outbound
> DANE is significantly lower.  I'll have some additional implementation
> news to report shortly...
>

I'm good with putting that here (unlike my objection above); it seems
appropriate to note the tradeoffs in Security Considerations in some form.

>
> > 10.1.  Example 1
> >
> >    The owner of "example.com" wishes to begin using STS with a policy
> >    that will solicit reports from receivers without affecting how the
> >    messages are processed, in order to verify the identity of MXs that
> >    handle mail for "example.com", confirm that TLS is correctly used,
> >    and ensure that certificates presented by the recipient MX validate.
> >
> >    STS policy indicator TXT RR:
> >
> >         mta-sts.example.com.  IN TXT "v=STSv1; id=20160831085700Z;"
> >
> >    STS Policy JSON served as the response body at [1]
> >
> >               {
> >                 "version": "STSv1",
> >                 "mode": "report",
> >                 "mx": ["mx1.example.com", "mx2.example.com"],
> >                 "max_age": 123456
> >               }
>
> Note, the "max_age" here is barely over a day, which is not consistent with
> the recommended policy lifetimes.  Even "report" policies should probably
> have better downgrade protection through longer lifetimes (say a week or
> more).
>

I think that max_age was chosen because of the sequential nature of the
numbers and not its value. ;) But point taken.

> Should that be "complaint" or "compliant"? :-)  I would expect that
> sane implementations will use cached policies first, and only both
> refreshes based on TXT record changes or refresh timers, with updates
> in the background.  Synchronous updates of policy at message delivery
> time are exceedingly unappealing at least to me.
>

It's presented as such here only for ease of following the logic, not to
suggest that one should prefer a synchronous fetch.

> > func certMatches(connection, mx) {
> >   // Return if the server certificate from "connection" matches the "mx"
> host.
>
> This could use a detailed discussion of what "matches" means, or a
> back-reference to a section that does.  Covering wildcard use in
> either or both of the reference and presented identifiers, and
> possible additional text if "mx" becomes "san".
>

As discussed above, I'm not a huge fan of doing
wildcards-on-both-sides...but let's conclude on that discussion first.

>
> What's happening with "status" here?  I think the first assignment goes...
>

Man, who wrote this code? ;)

> There are potential complications when an MX host
> accepts a proper subset of the envelope recipients
> and tempfails the rest, or fails for reasons unrelated
> to policy, ... So a simple boolean here is not enough.
> Delivery to backup MX hosts continues so long as
> recipients that are neither delivered nor permanently
> rejected remain.  STS policy failure is just one way
> that (all the) recipients may remain unresolved.
>

Well, the goal is not to implement a complete SMTP server here in
pseudocode. ;)

For clarity I'd probably keep that elided, unless you think it introduces
confusion.

Attachment: smime.p7s

[Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Jeremy Harris
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review David Illsley
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Alberto Bertogli
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Alberto Bertogli
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis
Re: [Uta] MTA-STS-03 review Viktor Dukhovni
Re: [Uta] MTA-STS-03 review Daniel Margolis

Re: [Uta] MTA-STS-03 review

Attachment: smime.p7s