Re: [dmarc-ietf] indeterminisim of ARC-Seal b= value

Brandon Long <blong@google.com> Thu, 30 March 2017 23:00 UTC

Return-Path: <blong@google.com>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 043051279E5 for <dmarc@ietfa.amsl.com>; Thu, 30 Mar 2017 16:00:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0S72FbeaSrpY for <dmarc@ietfa.amsl.com>; Thu, 30 Mar 2017 16:00:21 -0700 (PDT)
Received: from mail-io0-x233.google.com (mail-io0-x233.google.com [IPv6:2607:f8b0:4001:c06::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2661B129434 for <dmarc@ietf.org>; Thu, 30 Mar 2017 16:00:21 -0700 (PDT)
Received: by mail-io0-x233.google.com with SMTP id b140so29577950iof.1 for <dmarc@ietf.org>; Thu, 30 Mar 2017 16:00:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=JGWvDTx9ixcMYrO8FYAVp3YpL0ot2IYer+7ItfcjriI=; b=fKkVFhQYE7XiY2iY0zYDn4VqPK+XIzC/P3/7Sfdm/AHjtFxTKRNl5fBLxNlBWOuvZJ us1hhIkmi/qh4jf65PnzD0FI9RPpFD6NXD6P4HdNKs2yAlGQpY+O245jPyOiYmp4ouoK tZBN6jkTRz8Q/diD6hKxP8Cu3NcjQS1kF9c2LJi8nFwUcM9GXQmIbtGTCc2XCM/DexpT PP7t+cr2sdK8bZcb867Sh83LSaSjrrvgizZkp5O3qnDZuw+3HE8rhIj5f36xl2JcRKiy 8UW8dily0SL+bl0tiDceGtKYyAXZKw3jGuwlE4EohHid7BbXfnGsulzTEybjYx+vGKA9 LdSA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=JGWvDTx9ixcMYrO8FYAVp3YpL0ot2IYer+7ItfcjriI=; b=BSSESPTqPLfKd/g9FMUv/OujDVLMJBp4mQRvYmsjJD+4O+JtVflhX5oQVxs+SWom5b ahDzslmkfIbsC/24rAx36XCeUuA6lkL/Sjuq+vXoy+T05b+YtxRxBLsZ7r3OTwnT26xn gDu0nNBPfkEGv9JYPPW5o6McF5k8nGk3jJX5q72/6MwhUlQr+GUVuYu8zXzlRuV7yS8p 1akahgtAOqJv6ZHGwMVk2cKysbrPEa8EgeRhXDXIxObxDEo9a6dtjAisqmuG1D9OGKVK ejCmIJlSHNApY9jn36o397jha618r2JAWsGQuuLzO+3gxVOD+YbvaW/RfOthY00lJHnA M+cA==
X-Gm-Message-State: AFeK/H37DI0stq9EksWerrw8uK3X/OKdU4npzzkXuLut4NYu5EEBmsafH43UteTX/njR2el2UroI302/IMnSR1N3
X-Received: by 10.107.173.169 with SMTP id m41mr3592530ioo.190.1490914820099; Thu, 30 Mar 2017 16:00:20 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.79.35.97 with HTTP; Thu, 30 Mar 2017 16:00:19 -0700 (PDT)
In-Reply-To: <CAL0qLwbYCD2nsj62HxjqZ=Wt5oK8W8kbTJ+H5GiMN5M7rMSRAA@mail.gmail.com>
References: <CANtLugO_D1Mz_v_341pc5O1mZ7RhOTrFA3+Ob5-onp72+5uRfA@mail.gmail.com> <20170324212304.85346.qmail@ary.lan> <CANtLugOK4tXqA3ztYwchYsc8+t6KhyNj6mvgEu2wzvwKm_rK7A@mail.gmail.com> <CAL0qLwbYCD2nsj62HxjqZ=Wt5oK8W8kbTJ+H5GiMN5M7rMSRAA@mail.gmail.com>
From: Brandon Long <blong@google.com>
Date: Thu, 30 Mar 2017 16:00:19 -0700
Message-ID: <CABa8R6us82aKUozO-kPfdeBXNTM-8GC8nGCM-b8zhkRCDni9JQ@mail.gmail.com>
To: "Murray S. Kucherawy" <superuser@gmail.com>
Cc: Gene Shuman <gene@valimail.com>, "dmarc@ietf.org" <dmarc@ietf.org>, John Levine <johnl@taugh.com>
Content-Type: multipart/alternative; boundary="001a11443ef63e86a4054bfaa858"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/oon6ZtV6y_vzmkVX-MFoS4TvYYA>
Subject: Re: [dmarc-ietf] indeterminisim of ARC-Seal b= value
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Mar 2017 23:00:24 -0000

On Thu, Mar 30, 2017 at 1:21 AM, Murray S. Kucherawy <superuser@gmail.com>
wrote:

> I have to catch up on the rest of this thread still, but I wanted to chime
> in here to get started:
>
> On Sun, Mar 26, 2017 at 5:23 PM, Gene Shuman <gene@valimail.com> wrote:
>
>> Ah that had slipped my mind & is a good point.  However, I think the
>> issue here is generally that ARC is more complex protocol than DKIM and
>> therefore it's more important to reduce ambiguity & increase
>> standardization while we have the chance.  I think this is generally a good
>> idea from a security perspective, however this is mostly relevant with
>> respect to testing & validation, as ensuring cross-compatibility is a much
>> bigger challenge.  It's even more important than it was with DKIM to have a
>> test suite that can verify signing behavior.  If we don't agree on any sort
>> of standard, a test suite will need to select a preferred format for the
>> ARC headers & will fail all implementations that don't meet this form.
>> We've discussed this with Murray, and he agrees with this conclusion.
>>
>
> I agree that it's impossible to design a signer test suite that confirms
> correct output, without doing crytpo checks of its own with a known-good
> verifier, unless we nail down the syntax the output will use.  For this to
> work, we'd have to mandate a lot of things, including at least these:
>
> - the order of the tags as presented to the hash algorithm
>

This could imply two different things, either "alphabetize them in the
header" like was done for the current test suite, or it could imply a
canonicalization step when passing the header to signing. The latter seems
unlikely to be what we'd want, given we have to sign a whole bunch of ARC
headers, we probably don't want to interpret and sign each one, signing the
string is easier.

And I admit that I immediately disliked the alphabetized list upon seeing
it, the main issue I had was with not having i= at the front, the rest just
seemed ugly (ie, the b/bh tend to be at the end in implementations because
they are long).  That's probably not a valid argument against it, though I
do think that i= at the front is useful for visual grouping without having
to scan the full header to find it.


> - which tags will be present (note that many are not required, including
> "t=")
> - the specific values they will all contain
> - for ARC-Message-Signature, which canonicalization will be used
>

Does this also mean that my signer has to support all four combinations of
canonicalization?  Right now, it only does relaxed/relaxed.  I have no
intention of every generating anything with simple, and I could just fail
all of those tests, I guess.


> - the spacing between the tags; since "relaxed" header canonicalization
> compresses spaces but does not add them, "a=foo;b=foo" is not the same as
> "a=foo; b=foo", but "a=foo;\r\n\tb=foo" is
> - similarly, how signature fields will be wrapped (if at all)
>

agreed.  I think some of this also goes to what implementers think look
good.  I also think dkimpy uses certain spacing so that they can wrap it
easier, as opposed to say Google's impl which hard codes most of it and has
more specific or hard coded wrapping.


> - what signing key will be used
> - the body content to be signed
>
- the header content to be signed
>

These are implied by the test already, so no big deal.


> - the set of header fields that will be signed (which becomes "h=")
>

And the order in which they're signed.  Ie, our DKIM impl, because it was
also a DomainKeys impl, could choose headers in either direction, but
actually chooses them "backwards" for DKIM, requiring kind of a nasty loop
that's also worst case (O(n^2)).  For ARC, I chose to choose them bottom
up, which matches the order to search, and means usually the dual loop has
O(n) instead of O(n^2).  Granted, it's a silly optimization, and doesn't
help with other implementations we have to also handle.

dkimpy just has a fixed list of fields, and adds them mostly in order.
There's also the choice of whether to include a header or not, we removed
content-type from our DKIM list at one point trying to work around
Exchange's DKIM breaking behavior, for example.  Some implementations
double add Subject/etc to prevent duplicates, others don't.

We can probably come to agreements on all of these things... well, maybe.
I don't know how many potential implementations we'll curtail by doing so.
Or maybe some of these won't be specified in the spec, but only for the
test suite, so they'll be options to the signers but not required (ie, list
of headers to prevent duplicates should be N for testing).

There's also the fact that the current drafts basically incorporate all of
this from the DKIM without specifying, if we do otherwise, we have to have
a lot more exposition.  We'll also most likely end up with things that
aren't specified in the spec explicitly but are required by the test suite
(which may be better than the alternative, something ambiguous in the spec
without being explicit elsewhere).

Also, I'm not clear on the statement that these aren't relying on dkim
libraries.  The dkimpy arc impl is clearly depending on shared code between
them, I think Paul mentioned that the AOL one re-used code from their Java
DKIM impl, and I know the Google impl shares a bunch of code.  I don't
think any of this precludes these changes, but they aren't free either.

Brandon