Re: [Uta] Consensus call for proposed changes to draft-ietf-uta-rfc6125bis-10

Patrik Fältström <paf@paftech.se> Fri, 10 February 2023 07:50 UTC

Return-Path: <paf@paftech.se>
X-Original-To: uta@ietfa.amsl.com
Delivered-To: uta@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 23BA2C14CF13 for <uta@ietfa.amsl.com>; Thu, 9 Feb 2023 23:50:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.096
X-Spam-Level:
X-Spam-Status: No, score=-7.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=paftech.se
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gwXWqvaePMbU for <uta@ietfa.amsl.com>; Thu, 9 Feb 2023 23:49:59 -0800 (PST)
Received: from mail.paftech.se (vm01.paftech.se [192.165.72.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ED239C14CE46 for <uta@ietf.org>; Thu, 9 Feb 2023 23:49:57 -0800 (PST)
Received: from [192.168.1.11] (unknown [58.161.74.46]) by mail.paftech.se (Postfix) with ESMTPSA id 2E2F9403EC; Fri, 10 Feb 2023 08:49:51 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paftech.se; s=2022013001; t=1676015394; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=odizfI9s3RgVNK45lbF7HchTtBqFR8NER1UjPPxdod8=; b=eTfC9LQTJb4KrNxxZl/eiCXfHQZk2KF8V3DVhCRsZ2kyg9AaU/A39jRj8IS6j2XlPlx9bz aADDc8PdDWdIH1ywxDdHUsje/rCKEGCldKtsizojztyeneneTh7VOVqgwdeCcgLUaESkSE TZNDfptk70mZ7gqcZkDbMtdWi+0paeM=
From: Patrik Fältström <paf@paftech.se>
To: Peter Saint-Andre <stpeter@stpeter.im>
Cc: uta@ietf.org, John C Klensin <john-ietf@jck.com>
Date: Fri, 10 Feb 2023 17:49:43 +1000
X-Mailer: MailMate (1.14r5942)
Message-ID: <C28327AE-F853-46AF-AEAC-FBF4B3FD6C16@paftech.se>
In-Reply-To: <2feec816-e95e-e650-c692-d1c5923e176d@stpeter.im>
References: <029901d93618$7ac97b80$705c7280$@smyslov.net> <DM6PR14MB21869E09C50E623CDAB2454E92D19@DM6PR14MB2186.namprd14.prod.outlook.com> <26d9a7fe-dd55-d4aa-4cc1-ee92c80b3bd8@stpeter.im> <2848770E-FF3C-4DDE-B3E9-68BD0FB2F24D@paftech.se> <2feec816-e95e-e650-c692-d1c5923e176d@stpeter.im>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_526573B9-E186-4B19-9A60-168182FC8AA0_="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/uta/9HG70x2K03gDl3LyE5nkKVQwek0>
Subject: Re: [Uta] Consensus call for proposed changes to draft-ietf-uta-rfc6125bis-10
X-BeenThere: uta@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: UTA working group mailing list <uta.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/uta>, <mailto:uta-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/uta/>
List-Post: <mailto:uta@ietf.org>
List-Help: <mailto:uta-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/uta>, <mailto:uta-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Feb 2023 07:50:04 -0000

On 8 Feb 2023, at 12:31, Peter Saint-Andre wrote:

> Thanks for taking the time to provide such a detailed message, and my apologies for the delayed reply. Comments inline.

And my responses...

> On 2/2/23 6:59 AM, Patrik Fältström wrote:
>> On 2 Feb 2023, at 9:58, Peter Saint-Andre wrote:
>>
>>> On 2/1/23 6:17 AM, Corey Bonnell wrote:
>>>
>>>> I think it would be unfortunate if the usage of terms that are defined in
>>>> RFC 5890 is not aligned with their definitions.
>>>>
>>>> If we are not opposed to introducing new terminology to the document, then I
>>>> suggest the following:
>>>>
>>>> 1.	Replace all instances of "A-label" with the term "P-label" from the
>>>> CABF Baseline Requirements [1]: "P-Label: A XN-Label that contains valid
>>>> output of the Punycode algorithm (as defined in RFC 3492, Section 6.3) from
>>>> the fifth and subsequent positions."
>>>> 2.	For U-label:
>>>> 	a. Punt and call it "Unicode representation" instead (this is what
>>>> the CABF Baseline Requirements does, although that may not be appropriate
>>>> for this document).
>>>> 	b. Create a new term that is defined as "A non-LDH label that
>>>> contains valid output of the decoding algorithm for Punycode (as defined in
>>>> RFC 3492, Section 6.2)." and use this new term instead of "U-label".
>>>>
>>>> I'd be happy to work on concrete text to this effect if there's agreement
>>>> this is a good path to resolve the issue.
>>>
>>> I would very much like to hear what John Klensin and Patrik Fältström (cc'd) think about this proposal.
>>>
>>> As noted in my other message <https://mailarchive.ietf.org/arch/msg/uta/92tKoHT3Kjll1o_mCYQYQT8xON4/> I'm not immediately comfortable with referencing a CA/Browser Forum
> document instead of RFC 5890.
>>>
>>> Having looked at Corey's proposal more closely, I'm doubly unsure because (a) it is not fully clear to me how the P-label construct differs from the A-label construct in RFC 5890 and (b) coming up with new DNS-related terminology in a late-stage document about certificate validation just seems like a bad idea (e.g., I'm not sure how to get proper review) even if it were necessary (which I'm not sure it is).
>>
>> Thanks for being brought into this discussion Peter.
>>
>> I had a read of the document and have these direct comments:
>>
>>>     delegated domain:  A domain name or host name that is explicitly
>>>        configured for communicating with the source domain, either by the
>>>        human user controlling the client or by a trusted administrator.
>>>        For example, an IMAP server at mail.example.net could be a
>>>        delegated domain for a source domain of example.net associated
>>>        with an email address of user@example.net.
>>
>> This might be confusing as it is using the term "delegated" and give indeed an example where "mail.example.net" might (or might not) be delegated from "example.net", while the administrator of an imap server at a specific domain name might have no similarities at all with the MX record of the domain to which email is to be sent to end up in the named IMAP server.
>>
>> So I think a better example is to either use the term "delegated" when it really talks about DNS delegation, OR, you use a different term but have an example where you can have:
>>
>> - IMAP server: imap.example.se.
>> - MX target: mx.example.net.
>> - Email domain: example.com.
>
> Although you might be right that "delegated domain" is less than ideal, it's the term we used in RFC 6125. As a result, a number of specifications that cite RFC 6125 also use the term, so it seems inadvisable to change terminology now.

I talked about two things:

1. The terminology, which seems to be screwed up (which implies IETF did the wrong with with RFC6125) which you only can solve by using more words so it is very very very clear what you talk about, and not "just" pick one of the two definitions of "delegated" without explaining what you mean. I.e. I do not think the term "delegated" can be used here easily. You can as well reference the DNS RFCs and use that definition of "delegated", right?

2. That you use the same domain names in all cases. That is a simple situation, and in reality it is much more complicated. Specifically when you start talking about spoofing, phishing and what you really have as strings in DN in your certificates.

> The original idea was not DNS delegation at the nameserver level, but service delegation at the application level such as one finds in this document (e.g., in order to retrieve email for addresses at example.net, one configures one's email client to connect to the server at imap.example.net).

But that is not how it works. The name of the IMAP server might very well be a domain name not related to the domain the MX has a source.

If you have as simple examples as this, people will start guessing domain names. Not good.

> At the least, it seems reasonable for us to explain this in more detail so that the reader doesn't confuse this perhaps bespoke notion of service delegation with the perhaps more established notion of DNS delegation.

Agree, you must explain much better what is going on here.

>>>     derived domain:  A domain name or host name that a client has derived
>>>        from the source domain in an automated fashion (e.g., by means of
>>>        a [DNS-SRV] lookup).
>>
>> Also MX?
>
> I don't see why not. If DNS SRV records had existed from the beginning of time, it seems that email protocols would have used SRV rather than MX, right?

My point was that I see the fact you refer to DNS-SRV as the source for definition of "derived domain" it is not clear also MX records can result in a "derived domain", i.e. the list of how to get a derived domain is fixed with only one method, SRV, if written as is.

Can a CNAME also result in a derived domain?

If you have a mapping rule for a Unicode Code Point from A -> B, does that count as a "derived domain"?

>> What is then the difference or similarity between an MX related derivation of one domain name from another and an SRV related derivation?
>
> It seems to me that they are functionally equivalent. But I am not a DNS expert or email expert, so (leaving aside various nuances) I might be missing some essential difference.
>
>> Can a delegated domain also be derived?
>
> Not really. The idea is that a delegated domain is explicitly configured client-side whereas a derived domain is obtained in an automated fashion via DNS. So they are two different constructs that play two different roles in protocols.

Definitely not clear in the text.

Specifically as IMAP do have some "de facto" mechanisms out there to automatically configure the client, so "explicitly configured in the client" is not always "explicitly configured" but "will become configured when auto configuration of the IMAP client happens".

So I do not understand what you mean by "explicitly configured client-side" to be honest, as compared with "derived configuration" if you exclude IMAP. If you do not exclude IMAP, then you have to explain more which ones of the IMAP situations you have a derived domain and when you have not.

See for example <https://www.icdsoft.com/en/kb/view/1698_automatic_e_mail_configuration_autodiscover_autoconfig> but you can google auto configuration of IMAP and see what you find.

>>>     source domain:  The FQDN that a client expects an application service
>>>        to present in the certificate.  This is typically input by a human
>>>        user, configured into a client, or provided by reference such as a
>>>        URL.  The combination of a source domain and, optionally, an
>>>        application service type enables a client to construct one or more
>>>        reference identifiers.
>>
>> I presume you also include domain names that one at a time is created using a search list construction in a DNS stub resolver?
>
> If I understand you correctly, I would say that we have not had a theory about how domain names are created (e.g., using a suffix search list). And it's not clear to me that we need to have such a theory here.

A search list is when you for example type in "mail" and your stub resolver say you should try with for example "mail.example.com" because the search list includes "example.com". So my question is whether mail.example.com is in there even if only mail is configured. I.e. mail.example.com is not configured, mail is.

>> I.e. what you talk about is really a FQDN?
>
> That is the intent - no bare hostnames or, more generally, no domain names that do not include all labels.

Ok, then say so.

>> I think this is a good thing, but hope people to understand what this implies.
>>
>> I hate search lists and relative domain names.
>>
>>>     The DNS name conforms to one of the following forms:
>>>
>>>     1.  A "traditional domain name", i.e., a FQDN that conforms to
>>>         "preferred name syntax" as described in Section 3.5 of
>>>         [DNS-CONCEPTS] and for which all of its labels are "LDH labels"
>>>         as described in [IDNA-DEFS].  Informally, such labels are
>>>         constrained to [US-ASCII] letters, digits, and the hyphen, with
>>>         the hyphen prohibited in the first character position.
>>>         Additional qualifications apply (refer to the above-referenced
>>>         specifications for details), but they are not relevant here.
>>>
>>>     2.  An "internationalized domain name", i.e., a DNS domain name that
>>>         includes at least one label containing appropriately encoded
>>>         Unicode code points outside the traditional US-ASCII range and
>>>         conforming to the processing and validity checks specified for
>>>         "IDNA2008" in [IDNA-DEFS] and the associated documents.  In
>>>         particular, it contains at least one U-label or A-label, but
>>>         otherwise may contain any mixture of NR-LDH labels, A-labels, or
>>>         U-labels.
>>
>> This is confusing
>
> What specifically do you think is confusing? We tried to get it right, but clearly didn't succeed...

It is confusing because you here start to look at what an A-label is, what a U-label is etc, and repeat the definitions.

If I where you, I would just say "domain name according to IDNA2008"...and then "...with the following clarifications..."

And then you go through things like:

- How to do comparison
- Whether you allow mapping of characters / series of characters
- How to handle code points not allowed by IDNA2008

>> and it seems people misunderstand the big changed we went through in the IETF from IDNA2003 to IDNA2008.
>>
>> In IDNA2008 we have:
>>
>> - Got rid of mapping, i.e. mapping like case folding is something happening in application layer, and have nothing to do with "domain names".
>> - Have a 1:1 mapping between A-label and U-label.
>> - In theory because of this can have A-label and U-label for domain names that include by IDNA2008 not allowed Unicode code points (or not allowed code point by other policy rules, for example the ones a registry have).
>>
>> I stronly recommend you have similar rules here. Separate potential mapping from comparison of domain names which in turn must be separated from policy for what code points are allowed.
>
> When you say "have similar rules here", are you suggesting that we define such rules outside the context of IDNA2008 (e.g., in a way that would be valid for both IDNA2008 and IDNA2003 + UTS-46?) I think it would be a challenge to get that right and I'm not confident that a document about certificate matching is the correct place to do so.

Good, I like this, you only(!) want to do certificate matching.

Then you should _only_ compare the domain names themselves, and you can explicitly remind people that what is to be compared are the A-labels, and that domain names always are compared in a case insensitive manner.

Done.

Now, if you want to do other tricks that comparing domain names you have to explicitly talk about:

- Mapping from one code point to others: What rules are you to allow, if any, and if this can be application specific or not -- note that this mapping is happening between the "user interface" and the processing of the domain names in the application. You never ever try to map back again. Mapping is really a one way transformation. Do NOT repeat the IDNA2003 mistake!

- If you allow non-approved code points (i.e. punycode encoded strings that include UNASSIGNED or even DISALLOWED). This so people that have violated IDNA2008 and for example have emoji in their domain names can get a certificate.

But you should think very carefully on the mapping part whether you really really want to include that.

>> All of the above can be replaced by just saying that "A domain name is to be compared using case insensitive matching according to what DNS uses, and this because of this include domain names that have A-Labels in them" and reference IDNA2008.
>
> It seems that we should at least say that U-labels need to be converted to A-labels first, no? Or do you think that is implied by referencing the DNS rules (which don't allow U-labels natively)?

As U-labels and A-labels are 1:1 it does not matter. "Comparing domain names" is a well known algorithm. People should look at DNS RFCs to understand how to do that. If you reinvent that algorithm you are on thin ice.

But yes, I understand why you would like to write more words. Just be careful.

People might think you talk about comparing as in "comparing what is displayed" which is something completely different as display order (for example) is different from the order of characters in a string. Think of Bidi for example.

>> It *might* also include wording about:
>>
>> - If a domain name include unicode characters, and case folding equivalent approximate matching is expected by the client, mapping from one unicode character to another must take place before the A-label is created from the U-label. And reference section 4.2 in RFC 5894.
>
> Thanks for the reminder about that section.
>
>> Do not come up with your own words please!
>
> Agreed.
>
>> - If a domain name include code points that are DISALLOWED according to IDNA2008 or any other policy, for example a registry, it MUST be defined in this document whether it SHOULD be allowed to do a comparison of the domain names or not. If a label include 0x00 bytes for example (which is normally never allowed in any protocol) should such a lable be able to get a "match" when the domain name is to be compared?
>
> It seems like a bad idea to match on DISALLOWED code points! But see below.

I know some people might want to.

>> Please be specific in the general case!
>>
>>>     A wildcard in a presented identifier can only match exactly one label
>>>     in a reference identifier.  Note that this is not the same as DNS
>>>     wildcard matching, where the "*" label always matches at least one
>>>     whole label and sometimes more.  See [DNS-CONCEPTS], Section 4.3.3
>>>     and [DNS-WILDCARDS].
>>
>> Wow, wildcards in DNS is hairy. I know some people knows this, be careful, as wildcards in DNS is very different from (so far) wildcards in certificates.
>
> I believe we included that text only to note that the wildcard matching for certificates is more constrained that for DNS. Do you think that further clarifications are needed?

Note that comparison of DNS records is not only the label but also the type and class. Then you have rules like that you can not have a CNAME and other records with the same name. A simplified example, this impacts the matching algorithm for wildcards, specifically if you have a wildcard like *.example.com but then also a CNAME which is foo.example.com. That results in search for A-record for foo.example.com not matching the wildcard.

So when you talk about "dns-wildcards" you must think of what you mean and what you want.

>>>     An IP-ID matches based on an octet-for-octet comparison of the bytes
>>>     of the reference identity with the bytes contained in the iPAddress
>>>     subjectAltName.  Because the iPAddress field does not include the IP
>>>     version, a helpful heuristic for implementors is to distinguish IPv4
>>>     addresses from IPv6 addresses by their length.
>>
>> Why "octet by octet"?
>
> Do you suggest some other text? Specifically do you have in mind "bit by bit" perhaps?

Either the 32/128 bit values match or not. :-)

But I get it...I just thought the wording was weird. You compare the 32 or 128 bit values. Doing it octet by octet or 64 bits at a time depends on your implementation of comparison in the CPU, right?

>> The field include either a 32 bit or 128 bit field. If what is compared have different length, the match is False. If the length is the same, the values are compared. If they are the same, the match is True, otherwise False.
>
> We were trying to be more precise about what "the same" means, but as we know it can be a challenge to get that right.

The 32 or 128 bit values are the same or not. You do not have to deal with byte order or endian either. :-)

>>>     If the identifier is an SRV-ID, then the application service name
>>>     MUST be matched in a case-insensitive manner, in accordance with
>>>     [DNS-SRV].  Note that the _ character is prepended to the service
>>>     identifier in DNS SRV records and in SRV-IDs (per [SRVNAME]), and
>>>     thus does not need to be included in any comparison.
>>
>> Please reference one place in this document where case sensitivity is explained. Do not repeat text.
>
> Noted.
>
>>> 7.3.  Internationalized Domain Names
>>>
>>>     As specified under Section 6, matching of internationalized domain
>>>     names is performed on A-labels, not U-labels.  As a result, potential
>>>     confusion caused by the use of visually similar characters in domain
>>>     names is likely mitigated in certificate matching as described in
>>>     this document.
>>>
>>>     As with URIs and URLs, there are in practice at least two primary
>>>     approaches to internationalized domain names: "IDNA2008" (see
>>>     [IDNA-DEFS] and the associated documents) and an alternative approach
>>>     specified by the Unicode Consortium in [UTS-46].  (At this point the
>>>     transition from the older "IDNA2003" technology is mostly complete.)
>>
>> Not really...it is neither one or the other.
>>
>> The basis for all domain names is what is defined in DNS, and that is IDNA2008.
>>
>> The differences from UTS-46 are specifically two things:
>>
>> - UTS-46 also include rules for mapping that IDNA2008 does not include. The mapping that might be performed according to UTS-46 is "out of scope" for IDNA2008.
>>
>> - What code points are allowed in the ultimate domain name is slightly different.
>>
>> But, we have people using domain names (i.e. in the wild) which are neither allowed in UTS-46 or IDNA2008.
>>
>> And, then there are people using the algorithm in IDNA2008 applied to versions of Unicode that IETF have not approved yet.
>>
>> So, once again, not "either or". It is "a little bit of everything".
>
> I see what you mean. However, that makes it more difficult to specify recommended behavior.
>
> As one example, it seems possible that these differences could lead to someone using domain names in the wild that include DISALLOWED code point (e.g., because the definition of which code points are DISALLOWED can vary across Unicode versions). Thus if we say that applications MUST NOT match on DISALLOWED code points, behavior could be inconsistent.

Correct.

>>>     Differences in specification, interpretation, and deployment of these
>>>     technologies can be relevant to Internet services that are secured
>>>     through certificates (e.g., some top-level domains might allow
>>>     registration of names containing Unicode code points that typically
>>>     are discouraged, either formally or otherwise).  Although there is
>>>     little that can be done by certificate matching software itself to
>>>     mitigate these differences (aside from matching exclusively on
>>>     A-labels), the reader needs to be aware that the handling of
>>>     internationalized domain names is inherently complex and can lead to
>>>     significant security vulnerabilities if not properly implemented.
>>>
>>>     Relevant security considerations for handling of internationalized
>>>     domain names can be found in [IDNA-DEFS], Section 4.4, [UTS-36], and
>>>     [UTS-39].
>
> Does that text seem correct or appropriate?

It is wrong to talk about A-labels in there.

This is about two different applications that compare two strings might have different result depending on whether the two strings are allowed in the two applications. One application might reject the string while the other allow it. If that is the case, the application that rejects the string might because of the rejection claim that the domain names "do not match" even if they do, simply because the only error is "fail", as the application do not distinguish between "match ok" and "rejected string".

> Do you have opinions on Corey's suggestion to use P-labels instead of U-labels and to reference the CA/Browser Forum specifications?
>
> https://mailarchive.ietf.org/arch/msg/uta/r5uJRGUzCC55XH4XSnwtMB2YWPA/

Yes and no.

I *think* what you are after is to have A-Labels and U-Labels as defined in RFC5980 section 2.3.2.1, BUT you encourage application at the stage of comparison of strings to allow also non-IDNA-valid to be compared. Or rather, IF an application do allow also non-IDNA-valid strings, they should be compared as if the strings were IDNA-valid.

This implies that you still have the symmetry requirement that an A-Label and U-Label can be converted back and forth between each other. You also still have the requirement that "mapping" is something that have absolutely nothing to do with the comparison that is specified in this document.

You only say that applications when comparing strings should not fail on the match because of rejection of DISALLOWED code points. That rejections should have happened when the string "entered the system" at the same time as mapping should have been done.

That said, you can in the security considerations section point out that doing "fail" at the comparison stage due to non-PVALID code points might very well be what an application want to do, all depending on what the application context is.

   Patrik