Re: [auth48] AUTH48: RFC-to-be 9277 <draft-ietf-cbor-file-magic-12> for your review

Carsten Bormann <cabo@tzi.org> Thu, 04 August 2022 18:02 UTC

Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <20220803210827.2D4B455D45@rfcpa.amsl.com>
Date: Thu, 04 Aug 2022 20:02:17 +0200
Cc: Michael Richardson <mcr+ietf@sandelman.ca>, cbor-ads@ietf.org, cbor-chairs@ietf.org, Christian Amsüss <christian@amsuess.com>, auth48archive@rfc-editor.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <A72D6D20-35C9-4D83-95BF-B1FA5DC92821@tzi.org>
References: <20220803210827.2D4B455D45@rfcpa.amsl.com>
To: RFC Errata System <rfc-editor@rfc-editor.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/auth48archive/b5pI2XqfN-gL5xOt4DOuVO9bUrQ>
Subject: Re: [auth48] AUTH48: RFC-to-be 9277 <draft-ietf-cbor-file-magic-12> for your review
Precedence: list

Dear RFC-Editor,

here are coordinated responses from the authors to your questions.
Next step will be a full reread, which is best done when the results of these questions are in a new revision.

Grüße, Carsten



> On 2022-08-03, at 23:08, rfc-editor@rfc-editor.org wrote:
> 
> Authors,
> 
> While reviewing this document during AUTH48, please resolve (as necessary) 
> the following questions, which are also in the XML file.
> 
> 1) <!-- [rfced] For clarity, we suggest the following update to the title.  
> Please let us know if this is acceptable.  
> 	 
> Original:
> On storing CBOR encoded items on stable storage
> 
> Current:
> On Storing CBOR-Encoded Items on Stable Storage
> 
> Perhaps:
> Stable Storage for Items Encoded in Concise Binary Object Representation (CBOR)
> -->

Maybe:
On Stable Storage for Items in Concise Binary Object Representation (CBOR)

(Representation implies encoding, so expanding the abbreviation saves the need for that word.  “On” indicates that this is not the entire storage solution.)

> 2) <!-- [rfced] Please insert any keywords (beyond those that appear in
> the title) for use on https://www.rfc-editor.org/search. -->

magic number
file identification

> 3) <!-- [rfced] For clarity, may we update the text as shown below or does 
> this change the intended meaning? 
> 
> Current:
> This document defines a stored ("file") format for CBOR data items
> that is friendly to common file type recognition systems such as the
> Unix file(1) command.
> 
> Perhaps: 
> This document defines a stored ("file") format for CBOR data items
> that is friendly to systems that recognize common file types such 
> as the Unix file(1) command.
> -->

(I think the “common” was about systems, not about file types.)

Maybe: 
This document defines a stored ("file") format for CBOR data items
that is friendly to common systems that recognize file types, such 
as the Unix file(1) command.

> 4) <!-- [rfced] How is the file(1) command confused?  Does it not
> know the difference between the encoding and the content?
> Does the encoding throw it off?  Maybe clarifying this
> sentence would be helpful for the reader.
> 
> Original:
> A challenge for the file(1) command is often that it can be confused
> by the encoding vs. the content.
> 
> Perhaps:
> A frequent challenge for the file(1) command is that it can be confused
> because it cannot differentiate between the encoding and the content.
> -->

Maybe:
A challenge for the file(1) command is often that it can be confused
by recognizing the overall encoding but not the content being encoded.

> 5) <!-- [rfced] FYI: We updated "apk" to APK. Please let us know if there 
> are any objections.
> 
> Original:
> For instance, an Android "apk" (as
> used to transfer and store an application) may be identified as a ZIP
> file.
> -->
> 

Thank you!

> 
> 6) <!-- [rfced] This sentence is a bit tough to parse.  Please 
> consider whether either of the suggested updates is more clear 
> and is consistent with the intended meaning. 
> 
> Original:
> Additionally, both OpenOffice and MSOffice files are ZIP files
> of XML files, and may also be identified as a ZIP file.
> 
> Perhaps A:
> Additionally, both OpenOffice and MSOffice files are ZIP files
> of XML files, and may be identified as XML or ZIP files.
> 
> Perhaps B:
> Additionally, the application XML files of both OpenOffice 
> and Microsoft Office use the ZIP archive format, and they 
> may also be identified as ZIP files.
> -->

Maybe:
Additionally, both OpenOffice and MSOffice files are ZIP files
of XML files; the identification may stop at identifying them as ZIP files.

> 
> 
> 7) <!-- [rfced] We had a few questions about this sentence.  If 
> our suggested text does not correctly capture your intent, please 
> let us know how we may rephrase.
> 
> a) The use of "derive...to" seems odd.  We generally see "derive...from".
> 
> b) May we clarify what is not in CBOR form?
> 
> Original:
> This includes a simple way to
> derive a magic number to content-formats as defined by [RFC7252],
> even if not in CBOR form.
> 
> Perhaps:
> This includes a simple way to derive a magic number [for/from?] 
> content-formats as defined in [RFC7252] even if the file is
> not in CBOR form.
> -->

Indeed:
This includes a simple way to derive a magic number for 
content-formats as defined in [RFC7252] even if the file is
not in CBOR form.

> 8) <!-- [rfced] For clarity, may we update the sentence as follows? 
> 
> Original:
> A major inspiration for this document is observing the disarray in
> certain ASN.1 based systems where most files are PEM encoded; these
> are then all identified by the extension "pem", confusing public
> keys, private keys, certificate requests, and S/MIME content.
> 
> Perhaps:
> A major inspiration for this document is to disambiguate the 
> disarray observed in certain ASN.1-based systems where most 
> files are PEM encoded; these files are identified by the 
> extension "pem", which confuses public keys, private keys, 
> certificate requests, and S/MIME content.
> -->
> 

Well, we are not solving that problem, we are just observing the analog in a related domain.

Maybe:
A major inspiration for this document is observing the disarray in
certain ASN.1 based systems where most files are PEM encoded; 
these files are all identified by the 
extension "pem", which confuses public keys, private keys, 
certificate requests, and S/MIME content.

Maybe confound or commingle?


> 9) <!-- [rfced] It seems like this text is referring to future 
> registrations, rather than values being registered by this document.  
> Should the regisration template refer to this document AND the document 
> that defines the semantics for the requested registration? 
> 
> Original:
> In the template, it is suggested to include
> a reference to this specification (RFC XXXX) alongside the
> Description of semantics.
> // (Note to RFC Editor: Please replace all occurrences of "RFC XXXX"
> // with the RFC number of the present specification and remove this
> // note.)
> -->
> 

We like:
Current:
 In the template, a
 reference to this specification (RFC 9277) alongside the Description
 of semantics is suggested.

> 10) <!-- [rfced] Note that we changed the instance of "US-ASCII" to "ASCII". 
> Please let us know if there are any objections. 
> 
> Original:
> The use of a sequence of four US-ASCII [RFC20] codes which are
> mnemonic to the protocol is encouraged, but not required (there may
> be reasons to encode other information into the tag; see Appendix B
> for an example).
> -->

That is a good simplification.

> 11) <!-- [rfced] "form a representation that is described by" is 
> tough to parse.  Is the CoAP  Content-Format number input to the 
> representation?  The relationship is unclear.  Please consider 
> whether the text can be clarified.
> 
> Original:
> For CBOR data items that form a representation that is described by a
> Constrained Application Protocol (CoAP) Content-Format Number
> (Section 12.3 of [RFC7252] and Registry CoAP Content-Formats of
> [IANA.CORE-PARAMETERS]), a tag number has proactively been allocated
> in Section 4.3 (see Appendix B for details and examples).
> -->

Maybe add a simple “already”:

Maybe:
For CBOR data items that form a representation that is already described by a
Constrained Application Protocol (CoAP) Content-Format Number
(Section 12.3 of [RFC7252] and Registry CoAP Content-Formats of
[IANA.CORE-PARAMETERS]), a tag number has proactively been allocated
in Section 4.3 (see Appendix B for details and examples).

Maybe clean this up into:

In [IANA.CORE-PARAMETERS], the Constrained Application Protocol (CoAP) defines the Registry "CoAP Content-Formats" to assign Content-Format Numbers (Section 12.3 of [RFC7252] to Content Types in a specific Content Coding.
For CBOR data items that form a representation that is already described by such a Content-Format Number, a tag number has proactively been allocated
in Section 4.3 (see Appendix B for details and examples).

> 12) <!-- [rfced] Note that we have removed the relative attribute 
> of xrefs used to include URLs to IANA registries per discussion with 
> IANA.  I know we discussed this with you recently and allowed the use 
> of the relative attribute in another document.  Per further discussion 
> with IANA, they recommended against use of the registry-specific URLs.  
> The web portion of the style guide was recently updated to make this 
> more clear. -->

It appears there is nothing we can do about this.
We have communicated the value of the direct URLs provided by the relative references.
Clarifying this with IANA for future documents might be a good work item for RSWG.

> 13) <!-- [rfced] Please review the use of "operating systems 
> configurations".  Is this possessive, plural, or plural possessive?
> 
> Original:
> Similarly, depending on operating systems configurations and related
> properties of the execution environment the labeling might influence
> the default application used to process a file in a way that may not
> be predicted by a protective application.
> 
> Perhaps:
> Similarly, depending on the configurations of the operating system 
> and the related properties of the execution environment, the labeling 
> might influence the default application used to process a file in a 
> way that may not be predicted by a protective application.
> -->
> 

This was meant to be unspecific; maybe we can just get rid of the plural:

Maybe:
Similarly, depending on operating system configurations and related
properties of the execution environment the labeling might influence
the default application used to process a file in a way that may not
be predicted by a protective application.

> 14) <!-- [rfced] Will the ".." notation be clear for the reader?  
> 
> Original Section 4.1: 
> However, the following
> 16-bit big-endian value 0xf8.. is not a valid second sequence
> according to [RFC2781].
> 
> Original Section 4.2:
> However, the following
> 16-bit big-endian value 0xf9.. is not a valid second sequence
> according to [RFC2781].
> 
> We typically see this used for a range of numbers.  For example, 
> this is from RFC 8710:
> 
> 
>                    | Serialization  | Value      |
>                    +================+============+
>                    | 0x00..0x17     | 0..23      |
> 
>                    | 0x18 0xnn      | 24..255    |
> 
>                    | 0x19 0xnn 0xnn | 256..65535 |
> -->
> 

Indeed, that may be confusing.

Maybe:
16-bit big-endian value 0xf8_xx is not a valid second sequence
16-bit big-endian value 0xf9_xx is not a valid second sequence

> 
> 15) <!-- [rfced] Per the message from Carsten below, we have updated 
> instances of 63470101 to 63740101.  Please review carefully to ensure 
> the updates have been made correctly, and let us know if any updates 
> are needed.  We will ask IANA  to update the registry once the updates 
> are confirmed.
> 
> From Carsten: 
> A user of draft-ietf-cbor-file-magic just made us aware that there 
> are two swapped digits in the draft we actually submitted:
> 
> The number 0x63740101, which occurs twice (and 8 times more without 0x), 
> has been copied incorrectly into the formula:
> 
> 	• TN(ct) = 0x63470101 + (ct / 255) * 256 + ct % 255
> 
> Where it occurs as 0x63470101 (74 ➔ 47, also twice).  This number is 
> incorrect and does not result in the computed numbers found throughout 
> the draft.
> 
> Unfortunately, one of the two occurrences is in the IANA Considerations, 
> so the registry currently also has the wrong formula and will need to be 
> corrected.
> 
> We will have time to correct this in AUTH48, but I wanted to make sure 
> the fact that there is this mistake is known as early as possible.
> -->

This now looks good.

> 16) <!--[rfced] May we update the citations/reference pointing to 
> STD 94 to instead point to RFC 8949?  As there are a number of 
> instances referencing specific sections from RFC 8949, this seems 
> like best practice for in case more RFCs are added to STD 94. -->

That seems to be about a limitation of the current RFCXML format.
We do prefer citing STDs if we can.
We note that the current text under [STD94] unambiguously specifies one RFC, 
and the section references therefore work in draft-ietf-cbor-file-magic-12.html — what caused these to be lost?

> 17) <!-- [rfced] For readability, should "on wire" and "on-wire" be 
> "on the wire"? 
> 
> Originals:
> ... and then return the
> actual CBOR item, which could be anything at all, and could include
> CBOR tags that _do_ need to be sent on wire.
> 
> A.1.  Is the on-wire format new?
> 
> If the on-wire format is new, then it could be specified with the
> CBOR Tag Wrapped format if the extra eight bytes are not a problem.
> The stored format is then identical to the on-wire format.
> -->
> 

We have 2 “on the wire”, three "on-wire” (all in A.1), and that one “on wire”.
The last one is probably best changed into “on the wire”.  
The “on-wire” is an adjective usage on “format”, so the hyphen seems right — “on-the-wire format” is OK but longer than “on-wire format”, which is closer to what would be said in speech, so we would like to stay with “on-wire”.

> 18) <!-- [rfced] Does "compiled/serialized" mean "compiled and serialized"?
> If yes, may we update this text as follows:
> 
> Original:
> Instead, the IPC that is normally sent across
> the wire is compiled/serialized and placed in a file.
> 
> Perhaps:
> Instead, the IPC that is normally sent across
> the wire is compiled, serialized, and placed in a file.
> -->

OK.

> 19) <!-- [rfced] We are having trouble parsing this sentence.  
> Does "for any language that encode to CBOR" mean "for any language 
> that can be encoded as CBOR"?  
> 
> 
> Original: 
> Additionally, this change
> allows the IPC to be described by CDDL, and for any language that
> encode to CBOR can be used.
> -->

Maybe: 
  Additionally, this change
  allows the IPC to be described by CDDL and any implementation language to be used that can encode CBOR.

20) <!-- [rfced] Terminology
> 
> Should "tag" be capitalized when it follows CBOR?  Is "tag" lowercased 
> when not referring to a specific CBOR tag?  We see the following: 
> 
> CBOR Tag
> CBOR tags 
> CBOR tag numbers 

That is indeed a bit confused at the moment.

Apart from headings and the registry name, CBOR Tag should be capitalized when it occurs in the construct "CBOR Tag Wrapped”, but not elsewhere.
Specifically, CBOR Tag should be CBOR tag in these three places:
- in Section 2
- in the second paragraph of 2.1
- in Section 4

> In addition, should "Sequence tag" and "Tag Sequence" be consistent? 
> CBOR Sequence tag
> CBOR Tag Sequence

MCR: Typo?

CBOR Sequence tag: The tag that has been allocated for a CBOR Sequence.

OLD:
 As a result, each file and each IPC is prefixed with a CBOR
 Tag Sequence.
NEW:
 As a result, each file and each IPC is prefixed with a CBOR
 Sequence tag.


> -->
> 
> 
> 21) <!-- [rfced] When we converted this document to v3, xml2rfc converted 
> <spanx style="verb"> to <tt> and <spanx style="emph"> to <em>.

(We don’t think you actually did this conversion.)
We believe we used <em> and <tt> correctly in draft-ietf-cbor-file-magic-12.xml

> In the html and pdf outputs, the text enclosed in <tt> is output in
> fixed-width font. In the txt output, there are no changes to the font,
> and the quotation marks have been removed. 
> 
> In the html and pdf outputs, the text enclosed in <em> is output in
> italics. In the txt output, the text enclosed in <em> appears with an
> underscore before and after.
> 
> Please review carefully and let us know if the output is acceptable or if 
> any updates are needed.
> -->

Thank you for this reminder; we will look at this again in our full reread.

> 22) <!-- [rfced] Please review the "Inclusive Language" portion of 
> the online Style Guide 
> <https://www.rfc-editor.org/styleguide/part2/#inclusive_language>
> and let us know if any changes are needed. Note that our script 
> did not flag any terms or phrases.-->

We are not currently aware of instances of language that could be improved, but will look again in the full reread.

> --------------------------------------
> RFC9277 (draft-ietf-cbor-file-magic-12)
> 
> Title            : On storing CBOR encoded items on stable storage
> Author(s)        : M. Richardson, C. Bormann
> WG Chair(s)      : Christian Amsüss, Barry Leiba
> 
> Area Director(s) : Murray Kucherawy, Francesca Palombini
> 
>

[auth48] AUTH48: RFC-to-be 9277 <draft-ietf-cbor-… rfc-editor
Re: [auth48] AUTH48: RFC-to-be 9277 <draft-ietf-c… rfc-editor
Re: [auth48] AUTH48: RFC-to-be 9277 <draft-ietf-c… Michael Richardson
Re: [auth48] AUTH48: RFC-to-be 9277 <draft-ietf-c… Carsten Bormann
Re: [auth48] AUTH48: RFC-to-be 9277 <draft-ietf-c… Michael Richardson
Re: [auth48] AUTH48: RFC-to-be 9277 <draft-ietf-c… Michael Richardson
[auth48] [AD] Re: AUTH48: RFC-to-be 9277 <draft-i… Sandy Ginoza
Re: [auth48] [AD] AUTH48: RFC-to-be 9277 <draft-i… Carsten Bormann
Re: [auth48] [AD] AUTH48: RFC-to-be 9277 <draft-i… Sandy Ginoza
Re: [auth48] [AD] AUTH48: RFC-to-be 9277 <draft-i… Carsten Bormann
Re: [auth48] [AD] AUTH48: RFC-to-be 9277 <draft-i… Sandy Ginoza
Re: [auth48] [AD] AUTH48: RFC-to-be 9277 <draft-i… Michael Richardson
[auth48] [AD] AUTH48: RFC-to-be 9277 <draft-ietf-… Sandy Ginoza
[auth48] [AD - Murray] AUTH48: RFC-to-be 9277 <dr… Sandy Ginoza
Re: [auth48] [AD] AUTH48: RFC-to-be 9277 <draft-i… Sandy Ginoza
[auth48] [IANA #1237702] Re: [AD] AUTH48: RFC-to-… Amanda Baber via RT
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Murray S. Kucherawy
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Michael Richardson
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Carsten Bormann
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Michael Richardson
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Murray S. Kucherawy
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Carsten Bormann
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Michael Richardson
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Carsten Bormann
Re: [auth48] [AD - Murray] AUTH48: RFC-to-be 9277… Sandy Ginoza
[auth48] Final question - AUTH48: RFC-to-be 9277 … Sandy Ginoza
Re: [auth48] Final question - AUTH48: RFC-to-be 9… Carsten Bormann
Re: [auth48] Final question - AUTH48: RFC-to-be 9… Michael Richardson
Re: [auth48] Final question - AUTH48: RFC-to-be 9… John R. Levine
Re: [auth48] Final question - AUTH48: RFC-to-be 9… Sandy Ginoza