Re: [codec] AD Evaluation of draft-ietf-codec-oggopus-09

"Timothy B. Terriberry" <tterribe@xiph.org> Fri, 11 December 2015 22:16 UTC

Return-Path: <tterribe@xiph.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C60541A9068; Fri, 11 Dec 2015 14:16:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.613
X-Spam-Level:
X-Spam-Status: No, score=-2.613 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_COM=0.311, RCVD_IN_DNSWL_HI=-5, SPF_SOFTFAIL=0.665] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7ZPJryj_zlgU; Fri, 11 Dec 2015 14:16:43 -0800 (PST)
Received: from smtp.mozilla.org (mx1.scl3.mozilla.com [63.245.214.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A76FE1A908A; Fri, 11 Dec 2015 14:16:43 -0800 (PST)
Received: from localhost (localhost6.localdomain [127.0.0.1]) by mx1.mail.scl3.mozilla.com (Postfix) with ESMTP id 34751C1852; Fri, 11 Dec 2015 22:16:43 +0000 (UTC)
X-Virus-Scanned: amavisd-new at mozilla.org
Received: from smtp.mozilla.org ([127.0.0.1]) by localhost (mx1.mail.scl3.mozilla.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hUXwMJwIPICA; Fri, 11 Dec 2015 22:16:41 +0000 (UTC)
Received: from [10.130.20.84] (unknown [207.190.10.41]) (Authenticated sender: tterriberry@mozilla.com) by mx1.mail.scl3.mozilla.com (Postfix) with ESMTPSA id AF02EC1812; Fri, 11 Dec 2015 22:16:40 +0000 (UTC)
Message-ID: <566B4B47.9010809@xiph.org>
Date: Fri, 11 Dec 2015 14:16:39 -0800
From: "Timothy B. Terriberry" <tterribe@xiph.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:29.0) Gecko/20100101 SeaMonkey/2.26
MIME-Version: 1.0
To: Ben Campbell <ben@nostrum.com>, draft-ietf-codec-oggopus.all@ietf.org, codec@ietf.org
References: <86ACD2D0-02B6-473E-9E35-B9980166D9A0@nostrum.com>
In-Reply-To: <86ACD2D0-02B6-473E-9E35-B9980166D9A0@nostrum.com>
Content-Type: multipart/mixed; boundary="------------090501040108020500070205"
Archived-At: <http://mailarchive.ietf.org/arch/msg/codec/INlUbGEghHFVDYA5tLjnRm3zlZg>
Subject: Re: [codec] AD Evaluation of draft-ietf-codec-oggopus-09
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/codec/>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Dec 2015 22:16:50 -0000

Ben Campbell wrote:
> Here is my AD evaluation of draft-ietf-codec-oggopus-09. Codec
> encapsulation is not my strong suit, so I suspect many of my comments
> can be answered with "this is the way we normally do this thing". But
> I'd still like to discuss these before going to IETF last call.

Thank you for the very thorough review. Comments in-line below (as an 
individual).

> - section 2, last paragraph:
>
> Is this paragraph necessary or useful? (And is it really impossible to
> create a non-compliant implementation that meets all the MUSTs?) I
> suggest removing it.

This was copied from RFC 5334. I don't think anyone has any particular 
attachment to it, so let's just remove it.

> - section 3:
> This section seems to use 2119 keywords to describe what I _think_ are
> existing Ogg requirements. If so, they should use descriptive text,
> unless they are in the form of attributed quotes.

I think I can attribute this mostly to paranoia over RFC 3533 being 
informational, and using descriptive text of its own for what should be 
normative requirements. E.g.,

"Ogg also allows but does not require secondary header packets after the 
bos page for logical bitstreams and these must also precede any data 
packets in any logical bitstream. These subsequent header packets are 
framed into an integral number of pages, which will not contain any data 
packets." (from RFC333)

vs.

"The second packet in the logical Ogg bitstream MUST contain the 
comment header ...  It MAY span multiple pages, beginning on the second 
page of the logical stream. However many pages it spans, the comment 
header packet MUST finish the page on which it completes." (from this draft)

Unless someone objects, though, I'll try to rewrite the latter without 
the 2119 keywords.

> - 3, last paragraph:
> -- "implementations need to be prepared":
>   maybe that should be a MUST or SHOULD?

I think the intent was just to point out that someone might violate the 
preceding SHOULD. It's not clear what actionable thing we'd be saying 
they MUST do (it would depend on the particular implementation).

> -- "last page SHOULD NOT be a continued packet":
> Why not MUST NOT? Can you describe reasons someone might reasonably
> violate this?

The intent was to allow but discourage page-level editing of files 
(i.e., cutting and splicing files by copying complete pages without 
examining the packets on them). Even just saving a live stream that 
drops at some point, you might reasonably expect to just be able to copy 
the data you received to a file. A MUST here would mean you would have 
to parse the pages, identify the continued packets from the lacing 
values, remove the extraneous data from the last page if you found some, 
repack the page buffer, and recompute the page checksum. That's a lot 
more machinery, and people would likely just ignore the requirement in 
practice. See also the discussion at 
<https://www.ietf.org/mail-archive/web/codec/current/msg03145.html>.

> - 4.2, 2nd paragraph:
> -- "samples which SHOULD be skipped": Why not MUST?  (also, s/which/that)

Some specific applications might have a reason for looking at that data 
(for example, when you have multiple files that you know were produced 
by chopping up a single existing file without re-encoding). In practice 
we haven't needed anything stronger to convince people to implement this 
correctly (especially when we point them to 
<https://people.xiph.org/~greg/opus_testvectors/correctness_trimming_nobeeps.opus>).

> -- "SHOULD NOT be played": Is this redundant with SHOULD be skipped?

Yes. Deleted.

> - 4.3: Should the reference to [vorbis-trim] be normative? Can the
> feature be implemented without understanding the referenced doc?

The answer to the second question is "yes", which I think makes the 
answer to the first question "no". The reason this is here is to tell 
people familiar with how Vorbis works what mechanism they should use to 
achieve the same outcome in Opus. But if you're not familiar with 
Vorbis, then you can safely ignore the whole paragraph.

> - 4.6: Should the reference to [seeking] be normative? Can the section
> be implemented without understanding it?

It is possible to implement seeking without knowing anything other than 
what is described in RFC 3533, but the linked documentation does help 
considerably.

I would argue that the section itself is not normative, in that you 
could implement seeking with some other method (for example, linearly 
scanning the whole file on open to build an index, possibly caching such 
indexes for files that are opened frequently: this would not be my 
recommended approach, but people have done it in practice). You could 
also choose to implement only approximate seeking by just guessing a 
file position and decoding from there (also done in practice by some 
players, and a common method for other formats like MP3). Or you could 
choose not to implement seeking at all.

> - 5.1, list entry 5: What is the unit for "Input Sample Rate"?

Hz. It was specified in Figure 1, but you're right, it should be in the 
text, too. Added the sentence, "This is the sample rate of the original 
input (before encoding), in Hz."

> - 5.1, entry 6, sentence starting with "Virtually all players..."
> Is this a normative statement, or a statement of fact? If the former,
> consider dropping “Virtually all”

Dropped "virtually all".

> - 5.1, entry 6, last paragraph:
> -- "which MUST be omitted when the channel mapping family is 0": That
> seems redundant to the previous paragraph.

Yes. Deleted the sentence in the previous paragraph.

> -- "Implementations SHOULD reject ID headers"
> What does it mean, in context, to "reject" headers?

The same thing as rejecting a stream (i.e., don't try to play/decode 
it). Changed to say that instead.

> - 5.1.1.4: Why not MUST?

I think the idea was to allow assigning reserved values without 
explicitly updating this specification. Even though I expect we _would_ 
want to update this specification, given how long it's taken to get this 
draft published, I'm wary of adding more pressure to people not to 
deploy new mappings (because this RFC says they MUST not), even when it 
looks like they'll be relatively stable and interoperable. At least not 
for the sole reason that they haven't gotten an RFC number yet.

> - 5.2, last two paragraph: Why are the two SHOULDs not MUSTs?

For the first SHOULD, this is the same as above: some other 
specification may eventually define the format of the data here and how 
to modify it, and I didn't think we'd want to make people supporting 
that specification in violation of this specification. I'm also leaning 
towards keeping this one a SHOULD since a) the eventual format of this 
data may apply to all codecs which share the vorbiscomment style of 
metadata, and b) we would probably try to do this informally first to 
gain implementation experience, and only bring it to the IETF if there 
is sufficient interest. I didn't want to make updating this document a 
prerequisite for that. See also the discussion in this thread: 
<https://www.ietf.org/mail-archive/web/codec/current/msg03061.html> 
(skip to 
<https://www.ietf.org/mail-archive/web/codec/current/msg03107.html> for 
the resolution of the debate).

For the second SHOULD, it felt like the right strength for a judgment 
call like "excessive". I have no objection to changing it to MUST.

> - 5.2.1, 2nd to last paragraph, last sentence:
> I don’t think muxers assume things—what is the required concrete behavior?

I replaced this with "A muxer SHOULD place the gain it wants other tools 
to use by default into the 'output gain' field, and not the comment tag."

> - 5.2.1, last paragraph: Why is the SHOULD NOT not MUST NOT?

In some contexts, there may be no confusion with multiple normalization 
schemes. The language in this section was debated a good deal on the 
list (see threads 
<https://www.ietf.org/mail-archive/web/codec/current/msg02930.html>, 
<https://www.ietf.org/mail-archive/web/codec/current/msg03012.html>, and 
<https://www.ietf.org/mail-archive/web/codec/current/msg03053.html>). I 
believe everyone is satisfied with the current compromise, and would 
prefer not to change the strength of any normative statements in it to 
avoid re-igniting the debate.

> -6, first paragraph:
> Why are the two SHOULDs not MUSTs? What does it mean to reject a packet?

For the first SHOULD, the fact that most decoders are likely to reject 
such packets is enough to discourage encoders from emitting them, and I 
was wary of imposing a MUST restriction when RFC 6716 chose not to do so.

For the second, this is the same as the previous comment about excessive 
allocations. I have no objection to changing this one, either.

"Reject a packet" is just shorthand for "treat them as if it was a 
malformed Opus packet with an invalid TOC sequence". I've added explicit 
text saying the latter.

> -7: "reference implementation" could use a citation. Should

Added a reference to RFC 6716.

> [linear-predection] be normative? Can this be implemented without
> understanding that? If it does need to be normative, can you find a more
> stable reference?

Linear prediction is something that would be understood by "a person 
having ordinary skill in the art" (to borrow a phrase). We just wanted 
to include a reference to help those who might not be DSP engineers but 
wanted to learn more.

> - 9, first paragraph:
> This seems to defer at least some security considerations to 4732. If
> so, this should be a normative reference. If, on the other hand, 4732
> just provides additional information that is not necessary to understand
> this section, then the wording should be adjusted.

Upgraded the reference to normative.

> - 13:
> Is this needed? Are the IETF BCP 78 and 79 provisions incorrect for this?

This text was specifically requested by the Debian project (and I 
support it personally). The same issue came up for RFC 6716 (for the 
same reasons). See the discussion at 
<https://www.ietf.org/mail-archive/web/codec/current/msg02845.html> and 
<https://www.ietf.org/mail-archive/web/ietf/current/msg73116.html>.

Reviewing those conversations, I do notice that the language of that 
section was changed after last call for RFC 6716, but the corresponding 
language in our source repository was not updated (because it was added 
to the document boilerplate manually by the RFC Editor, and not included 
in a separate section).

> Editorial
> =========
> - Section 1: Consider spelling out multiplexer (muxer) and demultiplexer
> (demuxer) on first mention.

Done.

> - 3: Please expand TOC, SILK, and CELT on first mention.

Doing this directly in the current text was a little bit awkward. I 
reworded it to

"...An implementation of this specification SHOULD treat any Opus packet 
whose duration is different from that of the first Opus packet in an Ogg 
packet as if it were a malformed Opus packet with an invalid Table Of 
Contents (TOC) sequence.

"The TOC sequence at the beginning of each Opus packet indicates the 
coding mode, audio bandwidth, channel count, duration (frame size), and 
number of frames per packet, as described in Section 3.1 of [RFC6716]. 
The coding mode is one of SILK, Hybrid, or Constrained Energy Lapped 
Transform (CELT). The combination of coding mode, audio bandwidth, and 
frame size is referred to as the configuration of an Opus packet."

I did not expand SILK because it is not an acronym.

> - 5, steps 2 and 3:
> These steps in combination seem to say “ decode at the highest available
> supported rate that the hardware can handle.” If so, the wording could
> be simplified.

It is a little bit more complicated. There are two rates here: one used 
for decoding, and one used for playback. The difference between steps 2 
and 3 is that in step 3, the hardware's highest available sample rate is 
not one of the rates supported by Opus decoding. For example, if the 
highest available sample rate the hardware can handle is 32 kHz, then 
this is not a "supported rate" (i.e., not supported by Opus decoding), 
so step 2 does not apply and step 3 does. Step 3 says to decode at the 
next highest "supported rate", which would be 48 kHz, and then resample 
to a rate the hardware can handle (in this case 32 kHz).

I don't think that's equivalent to your summary of the two steps.

> - 7, 2nd paragraph:
> I don’t think that’s literally true, you’ve defined 0, 1, and 255. The
> rest are reserved. (I guess one could argue the guidance to treat
> reserved values as 255 supports this statement, but I think that’s a
> stretch.

Yes, this is really true for the currently defined mapping families. 
Changed to "Each currently specified value of this octet indicates..."

> -5.1.1.2: Do you expect the reader to know what Vorbis channel order is?
> Also, please expand FLAC on first mention.

It's a fairly common term among those who work with audio formats, but I 
added a "(see below)" for those who don't, since the table below it 
lists what that order is.

> - 5.2.1, paragraph starting with " An Ogg Opus stream MUST NOT have more
> than one of each tag, and if present their values MUST be an integer"
>
> each of these two new tags, or all tags? (I assume the former from the
> numeric values) Also, singular/plural mismatch in "their values MUST be
> an integer"

"each of these tags" seems to resolve both issues.

> -10, last paragraph:
>
> Updates, or extends? If the former, the draft needs an “Updates RFC
> 5334” field in the initial heading.

Added the field.

A complete diff of the changes since -09 so far is attached (and, as 
always, can be found in the Opus git repository: 
<https://git.xiph.org/?p=opus.git>). I'll post again when I finish 
reviewing the changes for normative language for generic Ogg 
requirements and the extra copyright grant.