Re: [precis] [media-types] Internet media type application/pkcs8-encrypted rev 2
Sean Leonard <dev+ietf@seantek.com> Tue, 24 November 2015 09:56 UTC
Return-Path: <dev+ietf@seantek.com>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 70FEB1B2ED5 for <precis@ietfa.amsl.com>; Tue, 24 Nov 2015 01:56:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.399
X-Spam-Level:
X-Spam-Status: No, score=0.399 tagged_above=-999 required=5 tests=[BAYES_50=0.8, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pV57a2EoI29k for <precis@ietfa.amsl.com>; Tue, 24 Nov 2015 01:56:45 -0800 (PST)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 948911B2ED1 for <precis@ietf.org>; Tue, 24 Nov 2015 01:56:45 -0800 (PST)
Received: from [192.168.123.7] (unknown [75.83.2.34]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 600FA509BB; Tue, 24 Nov 2015 04:56:43 -0500 (EST)
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
References: <F2EFBAE2-DD4E-4E33-A8FB-B4402ABBF086@seantek.com> <5641BCB5.4060409@it.aoyama.ac.jp> <F0194F3F-46B7-401F-A2E8-118EFBE1C39C@seantek.com> <5653BDDB.9020203@it.aoyama.ac.jp>
From: Sean Leonard <dev+ietf@seantek.com>
Message-ID: <565433DF.50304@seantek.com>
Date: Tue, 24 Nov 2015 01:54:39 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <5653BDDB.9020203@it.aoyama.ac.jp>
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha-256"; boundary="------------ms030004020303030104030100"
Archived-At: <http://mailarchive.ietf.org/arch/msg/precis/u93GtGhmq3HyuBOQGafWjHDNglk>
Cc: "media-types@iana.org" <media-types@iana.org>, "precis@ietf.org" <precis@ietf.org>
Subject: Re: [precis] [media-types] Internet media type application/pkcs8-encrypted rev 2
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Nov 2015 09:56:51 -0000
On 11/23/2015 5:31 PM, Martin J. Dürst wrote: > Hello Sean, > > I have cc'ed the precis mailing list because some of what I'll write > below is relevant for the discussion you have started there. This is > also the reason why I'm keeping most the previous context. > > On 2015/11/11 00:25, Sean Leonard wrote: >> Hello Martin, >> >> On Nov 10, 2015, at 1:45 AM, Martin J. Dürst <duerst@it.aoyama.ac.jp> >> wrote: >> >>> Hello Sean, >>> >>> I have a few questions re. your registration below. >>> >>> On 2015/11/05 14:57, Sean Leonard wrote: >>>> Hello: >>>> >>>> To keep this moving, trying a different thing. Please review. >>>> >>>> Sean >>>> >>>> ***** >>>> >>>> Type name: application >>>> >>>> Subtype name: pkcs8-encrypted >>>> >>>> Required parameters: N/A >>>> >>>> Optional parameters: >>>> charset: When the private key encryption algorithm incorporates a >>>> “password" that is an octet string, a mapping between user input >>>> and the octet string is desirable. PKCS #5 [RFC2898] Section 3 >>>> recommends "that applications follow some common text encoding >>>> rules"; it then suggests, but does not recommend, ASCII and UTF-8. >>>> This parameter specifies the charset that a recipient SHOULD >>>> attempt first when mapping user input to the octet string. It has >>>> the same semantics as the charset parameter from text/plain, except >>>> that it only applies to the user’s input of the password. There is >>>> no default value. >>> >>> Why does it say "This parameter specifies the charset that a >>> recipient SHOULD attempt *first*" here? Can't that encoding just be >>> specified as such? >>> >>> At least for future, similar efforts, it would be extremely >>> desirable to not leave character encoding open like this, but just >>> to nail it down to UTF-8. >> >> There seems to be something of a “cultural disconnect” between the >> security people and the I18N/UI/UX people. >> >> The I18N/UI/UX people want well-defined interfaces that work with >> users “in their own language”, whether that language is visual, >> aural, tactile, symbolic, pictorial, etc. Invariably this involves >> Unicode and a large character repertoire such as 💩 and 大便所. >> >> In contrast, the security people find open-ended things like Unicode >> to be anathema and would much rather restrict the range of inputs to >> a small and preferably uniformly distributed set of values. And there >> are good reasons for that, because when you introduce bias into >> cryptographic protocols, it turns out that it is a lot easier to >> cryptanalyze the results. >> >> The common security protocols that I have seen that take passwords, >> hand-wave about character sets and encodings and define the password >> to be an octet string. This is great for universality but bad for >> human input. PBKDF2 (PKCS #5, on which this PKCS #8 >> EncryptedPrivateKeyInfo registration is based) is a leading example >> of the “octet string” approach. Ultimately, the algorithms don’t care >> what encoding it’s in, as long as they get a blob of bits (octets). >> >> My knowledge of implementations of PKCS #5/#8/#12 suggests that there >> are many applications out there that give zero thought to the >> encoding issue, which means that they will take user input “As-Is”, >> i.e., in the current code page. >> >> Note that PKCS #12 defines the input to this structure as a UTF-16LE >> encoded character string, *with* a terminating U+0000 NULL character >> (i.e., the octets 00 00). This is really “weird” except of course for >> the fact that Microsoft invented it and then shipped it without too >> much thought, in which case, all weirdness can be explained. >> >> It is a design criteria that if you extract such an >> EncryptedPrivateKeyInfo blob from a PKCS #12 file, that you should be >> able to process it. If you specify UTF-8 as the one, single, true >> encoding of the password for application/pkcs8-encrypted, that can’t >> happen. > > That's just fine, in this specific case. I have explicitly prefaced my > remark above with "At least in the future". > > But if we know that the password is encoded in UTF-16LE, then why > doesn't your registration just say "This parameter specifies the > charset" rather than the handwavy "This parameter specifies the > charset that a recipient SHOULD attempt *first*". See below... > > >> Furthermore, UTF-8 is not uniformly distributed across the octet >> range. If your users are in US-English they are highly likely to have >> octets in 20-7E. Octets in 00-1F will be pretty rare. And if you >> choose scalar values randomly in Unicode (regardless of assignment), >> you will see a *lot* of F0-F4 but virtually none in 00-7F. And in >> spite of all this, octets F5-FF will *never* appear in UTF-8. >> >> It turns out that we have a pretty good source of uniformity and >> universality: characters in the US-ASCII range 20-7E. Many password >> input boxes will only accept US-ASCII and so user’s non-US-English >> keyboards will switch to US-ASCII mode for the purpose of providing >> input to such boxes. What matters is not so much the specific >> characters, so much as a reasonable selection of arbitrary buttons >> that a user can push *across a wide range of devices*. This ends up >> giving you 5-6 bits of entropy per user input. So the need for UTF-8 >> or any particular encoding is actually not as great as some people >> perceive. > > My comment was specifically trying to say: If you use something more > than US-ASCII, make it UTF-8. I think that's also the general policy > of the IETF. As for entropy, the entropy needs to be measured over the > whole string. It's clear that in UTF-8 bytes, a password in the ASCII > range is shorter than a similar-length (in terms of charaters) > password in a non-Latin script. The entropy of each byte will be > lower, but the entropy of the overall password should be about the same. > > Something that's very important for passwords is how easy they are to > remember for actual people. It should be obvious that it's easier for > somebody to remember a password in the language/script they use every > day than in some foreign gibberish. Actually, the most important criteria for passwords is that they are inputtable: capable of being input into a computing system. The passphrase "éclairs sont si délicieuX" is nice and memorable, but if you cannot put in "é" (or "X") in the computer systems where you're going to use the thing, you're screwed. Viewed through this lens, many cryptographic systems are so-defined to accept secret input as octet strings, where neither the mathematical algorithms nor the protocols inherently restrict the range of the octets. Whether "é" maps to E9, C3 A9, E9 00, or 82 [code page 437] is relevant because if the user tries to input é but the math part doesn't get the original octets on the input, it won't work. Scenarios where PKCS #8 blobs might be accessed at device boot-time are possible. In those scenarios, the character set/encoding of the system may well be something other than UTF-8. Regardless of the encoding, the boot-time input methods may be severely restricted. (There is a recent thread on the Unicode mailing list, related to passwords, that raises this issue.) PKCS #8/PKCS #5 mention ASCII and UTF-8 as complementary possibilities--this registration does not attempt to disturb that. The handwavy text, "This parameter specifies the charset that a recipient SHOULD attempt *first*", recognizes the mathematical reality. Suppose an Evildoer (pick your preferred one) encodes all of its private keys with the parameter charset=us-ascii. But the actual input has some character ñ that is clearly outside the range. When the Good Guys™ try to break the encryption, if they only attempt characters in us-ascii range, they are never going to decrypt the thing. Ditto for UTF-8 encoding if the correct input (meaning the input that actually results successfully decrypting the private key payload) contains FFh somehow, or some overlong encoding. If this parameter were baked into the PKCS #8 format, perhaps a stronger claim could be made in the specification text. (Compare with PKCS #12.) But as it stands, the parameter is just a "hint"...and for a long time, implementations will probably not bother to put this parameter in. > >> Overall I think that a standard such as IEEE 802.11 strikes a >> reasonable balance. (See 802.11-2012 Annex M.4, which is informative, >> but is pretty much the worldwide de-facto standard practice.) In >> 802.11, the input to PBKDF2 is between 8-63 ASCII-encoded characters >> in the range 20-7E, or 64 hexadecimal characters that convert >> directly to 32 octets. > > So it's up to 63 ASCII characters but only up to 32 octets that may > e.g. be used for UTF-8? That doesn't strike me as a reasonable > balance; it puts a much stronger length limitation on some scripts > outside ASCII. Take a look at 802.11-2012 Annex M.4. Here is a direct link: <http://standards.ieee.org/about/get/802/802.11.html>. When the input is 64 hexadecimal characters, PBKDF2 is bypassed: the resulting 32 octets are used directly as the RSNA pre-shared key (PSK). The advantage of defining the protocol that way is that a user interface (aka, password prompt) for Wi-Fi devices only needs to have one text box field: the implementation then auto-detects which path to use based on the length. Otherwise, all Wi-Fi devices on the planet would have to have user interfaces designed with additional dropdowns, checkboxes, radio buttons, etc. to pick between the (ASCII) password and direct input of the PSK. (It stands to reason that every password input for every SSID has a corresponding valid 64-hexacdecimal-character PSK.) 802.11-2012 does not define an algorithm for cases when the passphrase has characters that encode outside of the range of 32 to 126. I do not know what other implementations do worldwide if non-ASCII characters are entered. As some have observed (on the Unicode mailing list), some operating systems, e.g., iOS, always enforce Latin script for password box entry. > >> *** >> To answer your questions directly: >>> Why does it say "This parameter specifies the charset that a >>> recipient SHOULD attempt *first*" here? >>> Can't that encoding just be specified as such? >> >> >> The parameter is not cryptographically protected so it is subject to >> tampering or substitution. Furthermore, a good-faith but naïve sender >> may put some encoding (e.g., UTF-8) but not have the means to verify >> that the encoding actually works, because the user did not supply the >> password. Basically it’s a good-faith first effort, but this >> parameter can’t meaningfully restrict what the sender or receiver >> attempt to do. > > That essentially applies to any single parameter in any single media > type registration, and in much more of what the IETF does. Yet this is > virtually never called out, because otherwise, IETF documents would be > full of such stuff and very hard to read. Perhaps. First of all such warnings about "not being cryptographically protected" tend to show up in many Security Considerations sections, of which I have seen many in the IETF. Second of all, the content inside PKCS #8 EncryptedPrivateKeyInfo *is* cryptographically protected (to a certain extent, depends on the algorithms being used), but an implementation really needs to be vigilant not to treat the stuff outside, i.e., the media type parameters, as the same as on the inside. > >> Also, I am not sure how to specify the NULL suffix in the PKCS >> #12-extracted case. > > That may suggest that you are going down the wrong path here. > >> I suppose it could just be “+0” or something. >> >>> >>>> ualg: When the charset is a Unicode-based encoding, this parameter >>>> is a space-delimited list of Unicode algorithms that a recipient >>>> SHOULD first attempt to apply to the Unicode user input in >>>> succession, in order to derive the octet string. The list of >>>> algorithm keywords is defined by [UNICODE]. “Tailored operations” >>>> are operations that are sensitive to language, which must be >>>> provided as an input parameter. If a tailored operation is called >>>> for, the exclamation mark followed by the [BCP47] language tag >>>> specifies the language. For example, "toNFD toNFKC_Casefold!tr" >>>> first applies Normalization Form D, followed by Normalization Form >>>> KC with Case Folding in the Turkish language, according to >>>> [UNICODE] and [UAX31]. The default value of this parameter is >>>> empty, and leaves the matter of whether to normalize, case fold, or >>>> apply other transformations unspecified. >>> >>> "When the charset is": Is this the charset parameter, or the actual >>> encoding of the password? >> >> Admittedly this was vague. First draft. I am not sure what it should >> be. Per PKCS #5, the "Actual Encoding" is just an octet string of >> arbitrary length. >> >> I would limit this to cases when the charset parameter is present and >> defined. Makes it easier. >> >>> >>> What is a "Unicode algorithm”? >> >> Conformance Clause D17. > > Well, this, via the term "Named Unicode Algorithm" points to table 3.1 > (page 93 in Unicode V 8.0). > > >>> Reading on and looking at the examples, the intent becomes clearer, >>> at least to somebody who has seen things such toNFD and toNFKC and >>> Casefold, but I hope we can avoid "specification by example" here. >> >> In fairness, “toNFD” and “toNFKC” are not defined terms. However, NFD >> (D118) and NFKC (D121) are. > > Yes, but not as (Named) Unicode Algorithms. > >> I would rather not create Yet Another Registry of things. > > I'd agree in principle. > >> The terms are in fact defined in [UNICODE] in the conformance clauses. > > Yes, but there are many other things defined there, too. Yes... If you meant to refer to [UNICODE] Table 3.1 (page 93, Unicode V8.0) as a canonical list of names...I do not find that satisfactory because the only entries for normalization are "Normalization" and "Identifier Normalization", but the most important kind of transformation is to pick which Normalization Form you want: NFC, NFD, NFKC, or NFKD. (I am under the impression that NFKC is the "Best" for passwords, but the PRECIS profile for passwords uses NFC. Not sure exactly why but I will leave that one alone.) Really, algorithms to do character transformations, prohibitions, substitutions, etc. in the Internet context, would seem to fall under PRECIS. Therefore maybe the proper pre-existing "registry" of things is just PRECIS profiles. Developing from my last PRECIS e-mail on the topic, I am leaning towards a single parameter: pw-mapping with special values: *pkcs12 = UTF-16LE with U+0000 NULL terminator *precis = PRECIS password profile, i.e., OpaqueString from Section 4 for RFC 7613 (always UTF-8) *precis-XXX = PRECIS profile named by XXX *hex = hexadecimal input: as with 802.11, the input is mapped to 0-9, A-F, and then converted directly to octets. If there are an odd number of hex digits, the final digit 0 is appended, or an error condition may be raised. *dtmf = The characters "0"-"9", "A"-"D", "*", and "#", which map to their corresponding ASCII codes. (This is to support restricted-input devices, i.e., telephones and telephone-like equipment.) Otherwise, pw-mapping is a charset. The parameter previously called "ualg" is interesting but maybe it's a bit over-specified. If *precis-XXX can do the job, I am fine with specifying "pw-mapping" only, and removing "ualg". Personally I don't see why PRECIS prohibits control characters like HT in passwords. It seems to me that those sorts of characters are legit for password purposes. But that is just my personal view and I don't care enough about HT, BEL, ENQ, or similar in this context to fight about it. :) > >> My usability perception is that if people really want to use Unicode >> in their passwords, canonicalization is a very useful property to >> preserve. Case folding/case mapping are not so useful, as most >> systems like to have case-sensitive passwords for greater entropy, >> but “most systems” is not “all systems” so we shouldn’t preclude the >> use of case algorithms. As for other algorithms such as line >> breaking, character segmentation, Hangul syllable name generation, >> etc., the short answer is “I don’t know”. (These are all reasons why >> people stick with ASCII passwords, by the way.) > > Line breaking, character segmentation, Hangul syllable name > generation,... are completely irrelevant for passwords and passphrases. > > Also, many algorithms come with options or parameters. > > >>> Also, if there is indeed a list of algorithm identifiers in >>> [UNICODE], then it would be good to give a Section number. Is the >>> intent that each and every algorithm named somewhere in [UNICODE] is >>> implemented? My rough guess would be that the average password input >>> implementation implements only the identity transform. [I would of >>> course be positively surprised if I were wrong.] >> >> See above; main thing that worries me is Normalization Forms. >> >>> >>> Also, references for [UNICODE], [BCP47], and [UAX31] should be give >>> so that this registration is self-containing. >> >> Ok. >> >> Another possibility is that this registration goes back to “rev 1”, >> i.e., no optional parameters about the character encoding at all. I >> think that is perfectly defensible. But it is not particularly >> i18n-friendly. > > I'm not sufficiently familiar with the format and the actual use > cases, but my suggestion would be to check what's actually out there > in the field (such as the Microsoft UTF-16LE including final NULL), > and select or create a list of parameters/algorithms (with a registry > if it turns out to be needed). To that, add a way to reference PRECIS, > even if it's not currently used, because that includes the > expertise/recommendations of experts. > > The current proposal just essentially saying: Unicode may define some > of the pieces you may want to use here, and may have labels for them, > so just give it a try. I'm not at all sure this will help > interoperability, except by similar accidents like the Microsoft one > that you described above. Right. Well, maybe PRECIS to the rescue? David Wheeler's aphorism is that all problems in computer science can be solved by another level of indirection. If PRECIS can handle this stuff, that's good enough for me. Sean > > Regards, Martin. > >> Regards, >> >> Sean >> >>> >>> Regards, Martin. >>> >>>> Encoding considerations: binary >>>> >>>> Security considerations: >>>> Carries a cryptographic private key. See Section 6 of RFC 5958. >>>> EncryptedPrivateKeyInfo PKCS #8 data contains exactly one private >>>> key. Poor password choices, weak algorithms, or improper parameter >>>> selections (e.g., insufficient salting rounds) will make the >>>> confidential payloads much easier to compromise. >>>> >>>> Interoperability considerations: >>>> PKCS #8 is a widely recognized format for private key information >>>> on all modern cryptographic stacks. The encrypted variation in this >>>> registration, EncryptedPrivateKeyInfo (Section 3, Encrypted Private >>>> Key Info, of RFC 5958, and Section 6 of PKCS #8), is less widely >>>> used for exchange than PKCS #12, but it is much simpler to >>>> implement. The contents are exactly one private key (with optional >>>> attributes), so the possibility for hidden "easter eggs" in the >>>> payload such as unexpected certificates or miscellaneous secrets is >>>> drastically reduced. >>>> >>>> Published specification: >>>> PKCS #8 v1.2, November 1993 (republished as RFC 5208, May 2008); >>>> RFC 5958, August 2010 >>>> >>>> Applications that use this media type: >>>> Machines, applications, browsers, Internet kiosks, and so on, that >>>> support this standard allow a user to import, export, and exercise >>>> a single private key. >>>> >>>> Fragment identifier considerations: N/A >>>> >>>> Additional information: >>>> >>>> Deprecated alias names for this type: N/A >>>> Magic number(s): None. >>>> File extension(s): .p8e >>>> Macintosh file type code(s): N/A >>>> >>>> Person & email address to contact for further information: >>>> Sean Leonard <dev+ietf&seantek.com> >>>> >>>> Intended usage: COMMON >>>> >>>> Restrictions on usage: None. >>>> >>>> Author: >>>> RSA, EMC, IETF >>>> >>>> Change controller: The IETF >>>> >>>> Provisional registration? (standards tree only): No >>>> >>>> >>>> >>>> _______________________________________________ >>>> media-types mailing list >>>> media-types@ietf.org >>>> https://www.ietf.org/mailman/listinfo/media-types >>>> >>
- Re: [precis] [media-types] Internet media type ap… Martin J. Dürst
- Re: [precis] [media-types] Internet media type ap… Sean Leonard
- Re: [precis] [media-types] Internet media type ap… Sean Leonard