Re: [secdir] Benjamin Kaduk's Discuss on draft-ietf-payload-tsvcis-03: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 14 February 2020 19:43 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F3312120A99; Fri, 14 Feb 2020 11:43:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.801
X-Spam-Level:
X-Spam-Status: No, score=0.801 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=5, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gYtTYumvHZCs; Fri, 14 Feb 2020 11:43:38 -0800 (PST)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AE950120B37; Fri, 14 Feb 2020 11:43:36 -0800 (PST)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 01EJhLg3026351 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 14 Feb 2020 14:43:24 -0500
Date: Fri, 14 Feb 2020 11:43:21 -0800
From: Benjamin Kaduk <kaduk@mit.edu>
To: victor.demjanenko@vocal.com, 'Barry Leiba' <barryleiba@computer.org>
Cc: "'Roni Even (A)'" <roni.even@huawei.com>, 'The IESG' <iesg@ietf.org>, 'Catherine Meadows' <catherine.meadows@nrl.navy.mil>, 'IETF SecDir' <secdir@ietf.org>, draft-ietf-payload-tsvcis@ietf.org, 'Ali Begen' <ali.begen@networked.media>, avtcore-chairs@ietf.org, avt@ietf.org, "'Dave Satterlee (Vocal)'" <Dave.Satterlee@vocal.com>, 'IETF discussion list' <ietf@ietf.org>, draft-ietf-payload-tsvcis.all@ietf.org
Message-ID: <20200214194321.GF43385@kduck.mit.edu>
References: <001601d57af9$405efcf0$c11cf6d0$@vocal.com> <6E58094ECC8D8344914996DAD28F1CCD23D79BC0@DGGEMM506-MBX.china.huawei.com> <034a01d58a73$f4d3a1c0$de7ae540$@vocal.com> <037e01d58a92$72287510$56795f30$@vocal.com> <CALaySJLSkZnC_jsQtk+Ybq03RWYJeujdeft+zGsv9uZ5wjcwCg@mail.gmail.com> <20191101001153.GQ88302@kduck.mit.edu> <06e101d59f15$ee937b30$cbba7190$@vocal.com> <20191125064606.GL32847@mit.edu> <CALaySJJNovsSWuCB_R3Dc7ci7did2Zu20haU5o7b6pSpRYP5nw@mail.gmail.com> <00c601d5e339$965ebd90$c31c38b0$@vocal.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <00c601d5e339$965ebd90$c31c38b0$@vocal.com>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/vQiYnLTgsXF12bdUG7aXmy91qdM>
Subject: Re: [secdir] Benjamin Kaduk's Discuss on draft-ietf-payload-tsvcis-03: (with DISCUSS and COMMENT)
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Feb 2020 19:43:43 -0000

Hi Barry,

As Victor notes, this one was/is waiting on me; he did reply (offlist) on
15 January but I seem to have missed it amid a deluge of other mail that
arrived at that time.  Thanks for the reminder, and thanks Victor for
re-sending the comments.
(inline)

On Fri, Feb 14, 2020 at 08:20:54AM -0500, victor.demjanenko@vocal.com wrote:
> HI Barry,
> 
> Thanks for recalling this was still outstanding.  I had emailed Ben just after the holidays and did not realize we had no response.  The below is what we suggested to Ben to address concerns he raised.
> 
> --------------------
> Hi Ben,
> 
> Hope your holidays were good.  Our were both good and busy.  Deliveries for two NASA projects and the holidays kept us from responding sooner.  But we do want to get this draft completed.
> 
> With your permission, I’d like to address your comments directly, resolve what changes we should make and then publish a new version with a summary of our out-of-band discussions.  We don’t have a lot of experience with drafting such documents and would like to know exactly what is needed to make this draft acceptable.
> 
> I believe there are two comments/issues to address:
> 
> 1)	CODA, CODB
> 
> Your comment ends by stating:  “(Or, of course, the use of CODB as an alternating 1/0 bit as the framing usage could be documented instead.)”  We can do this as follows:
> 
> (original)
> It should be noted that CODB for MELPe 600 bps mode MAY deviate from
>    the value in Table 1 when bit 55 is used as an end-to-end framing
>    bit. Frame decoding would remain distinct as CODA being zero on its
>    own would indicate a 7-byte frame for either 2400 or 600 bps rate and
>    the use of 600 bps speech coding could be deduced from the RTP
>    timestamp (and anticipated by the SDP negotiations).
> 
> (adding “alternating 1/0”)
> It should be noted that CODB for MELPe 600 bps mode MAY deviate from
>    the value in Table 1 when bit 55 is used as an alternating 1/0 end-to-end framing
>    bit. Frame decoding would remain distinct as CODA being zero on its
>    own would indicate a 7-byte frame for either 2400 or 600 bps rate and
>    the use of 600 bps speech coding could be deduced from the RTP
>    timestamp (and anticipated by the SDP negotiations).
> 
> I think this change would be sufficient to address your concern about what to expect for CODB.

This looks like the minimal sufficient change, yes.  (I use "minimal"
because I would say more if I was writing it, but I don't think I can
insist that you write it the way I would -- it's your document after all!)

> 2.    Packing and unpacking
> 
> You are correct that I am trying to vaguely describe a middle layer shim that is neither RTP nor speech coder.  So it definitely does need to be clear.  The vagueness comes from the speech coder description being a FOUO document.  Its now unclassified so I can potentially say more (and I did make some enhancements of the parameter description already).  
> 
> So I am trying to understand exactly what you think is vague in our current description:
> 
> TSVCIS augmented speech data is derived from the signal processing
>    and data already performed by the MELPe speech coder.  For the
>    purposes of this specification, only the general parameter nature of
>    TSVCIS will be characterized.  Depending on the bandwidth available
>    (and FEC requirements), a varying number of TSVCIS-specific speech
>    coder parameters need to be transported.  These are first byte-packed
>    and then conveyed from encoder to decoder.
> 
>    Byte packing of TSVCIS speech data into packed parameters is
>    processed as per the following example:
> 
>       Three-bit field: bits A, B, and C (A is MSB, C is LSB)
>       Five-bit field: bits D, E, F, G, and H (D is MSB, H is LSB)
> 
>            MSB                                              LSB
>             0      1      2      3      4      5      6      7
>         +------+------+------+------+------+------+------+------+
>         |   H  |   G  |   F  |   E  |   D  |   C  |   B  |   A  | 
>         +------+------+------+------+------+------+------+------+
> 
>    This packing method places the three-bit field "first" in the lowest
>    bits followed by the next five-bit field.  Parameters may be split
>    between octets with the most significant bits in the earlier octet.
>    Any unfilled bits in the last octet MUST be filled with zero.

[not actually relevant to the Discuss part, but if there is always exactly
one 3-bit parameter and one 5-bit parameter, then this text allowing
splitting across octets will never be used and is potentially confusing to
mention.]

>    In order to accommodate a varying amount of TSVCIS augmented speech
>    data, it is only necessary to specify the number of octets containing
>    the packed TSVCIS parameters.  The encoding to do so is presented in

I think the "only necessary to specify the number of octets" is the key
stumbling point, for me -- I need to know the number of octets as well as
the order of the parameters within the list, which is more information than
just the number of octets.

>    Section 3.2.  TSVCIS specifically uses the NRL VDR in two
>    configurations using 15 and 35 packed octet parameters [TSVCIS].  

I think I failed to internalize the "two configurations using 15 and 35
packed octet parameters" the first time I read the document, as this does
help give the reader a clue that [TSVCIS] gives a good picture of what
parameters go where.  So it seems like we could easily append to that, for
"using a fixed set of 15 and 35 packed octet parameters in a fixed order
[TSVCIS]" and that would resolve my concerns.

> The speech coder description of the parameters is the following:
> 
>  
> 
> So the three bit pitch is first (bits 56 to 58), followed by a five bit amplitude (bits 59 to 63) and then an array of spectral components, each 8-bit wide (starting at bit 64).

[And maybe TSVCIS specifes that the spectral components are derived from
some fundamental harmonic decomposition that naturally quantizes to a
number-of-parameters/accuracy tradeoff with a natural order.  If so, we
could also rely on that instead of my proposed change above; let me know if
you want to explore that path further.]

> Based on this information, I’m not sure what we should add to our draft to make the description of packing/unpacking clearer.  Can you make any suggestions or does this table help you with what you did not know?  (I don’t think I should put this table into the draft RFC however.)

Hopefully the above helps to clarify.

Thanks, and sorry for the delay.

-Ben

> Thanks for your attention and comments.
> 
> Victor & Dave
> 
> 
> 
> -----Original Message-----
> From: Barry Leiba <barryleiba@computer.org> 
> Sent: Friday, February 14, 2020 7:38 AM
> To: Benjamin Kaduk <kaduk@mit.edu>
> Cc: victor.demjanenko@vocal.com; Roni Even (A) <roni.even@huawei.com>; The IESG <iesg@ietf.org>; Catherine Meadows <catherine.meadows@nrl.navy.mil>; IETF SecDir <secdir@ietf.org>; draft-ietf-payload-tsvcis@ietf.org; Ali Begen <ali.begen@networked.media>; avtcore-chairs@ietf.org; avt@ietf.org; Dave Satterlee (Vocal) <Dave.Satterlee@vocal.com>; IETF discussion list <ietf@ietf.org>; draft-ietf-payload-tsvcis.all@ietf.org
> Subject: Re: Benjamin Kaduk's Discuss on draft-ietf-payload-tsvcis-03: (with DISCUSS and COMMENT)
> 
> This is still outstanding, since November.  Victor, where are we on this one?
> 
> Barry
> 
> On Mon, Nov 25, 2019 at 1:46 AM Benjamin Kaduk <kaduk@mit.edu> wrote:
> >
> > Hi Victor,
> >
> > On Tue, Nov 19, 2019 at 03:14:21PM -0500, victor.demjanenko@vocal.com wrote:
> > > Hi Ben,
> > >
> > > Sorry I overlooked sending you a response.  I would like to address 
> > > the two concerns you have by explaining what the speech coders are doing.
> >
> > Thanks for the extra clarifications.  To supply one of my own: I'm not 
> > concerned that the protocol doesn't work as implemented, but just want 
> > to make sure that the document includes enough information to admit 
> > new implementations without guesswork.  That is to say, "either tell 
> > me how to do it or tell me where to look that tells me how to do it".
> >
> > > WRT to 600 bps MELP, there is one TSVCIS mode that uses one bit 
> > > beyond the 54-bit frame for MELP 600 as a frame sync which alternates between frames.
> > > With two or more MELP 600bps frames in one RTP packet, if any frame 
> > > indicates 600 bps by CODA being 0 and CODB being 1, then we know the 
> > > stream is 600bps.  If there is a single frame in an RTP packet, you 
> > > can still deduce this by looking at every other RTP packet (every 
> > > other MELP 600bps
> > > frame) and by the timestamp advance.  Most likely the two ends would 
> > > negotiate 600 bps in SDP anyways so there really should not be a 
> > > problem.  I know it's not pretty but its workable.  I hope this 
> > > explanation helps you with the concerns for this issue.
> >
> > In this case, the use as an "end-to-end framing bit" (i.e., the 
> > alternating behavior you describe above) is not explicitly stated; one 
> > might imagine a scheme where the framing usage is to have the bit 
> > cycle through 1, 1, 0, and 0, or some other scheme.  I'd suggest to 
> > note in the document that if any instance of (CODA, CODB) == (0, 1) is 
> > observed, then the 600bps mode is in use.  It might also be helpful to 
> > include the observation that two successive MELPe payloads with CODA 
> > == CODB == 0 indicates the 2400bps mode (and that seeing them in a 
> > single RTP packet is decisive, whereas additional information about 
> > packet non-loss would be needed in the one-MELPe-frame-per-RTP-packet 
> > case), but that would be a fair bit of additional text and might be 
> > diminishing returns.  (Or, of course, the use of CODB as an 
> > alternating 1/0 bit as the framing usage could be documented
> > instead.)
> >
> > > As for the TSVCIS parameter packing/unpacking, this is really 
> > > simple.  There is exactly on three bit parameter, exactly one five 
> > > bit parameter and a variable number of eight bit parameters.  In our 
> > > view, the speech coder itself (or a wrapper for it) is responsible 
> > > for preparing the block of octets.  RTP then just transports it.  On 
> > > receive, the complementary wrapper reverses the packing operation.  
> > > I hope this clarifies and explains the simplicity.
> >
> > That's exactly what I expected to happen; however, it's not what I 
> > believe the current text of the document is describing.  Specifically, 
> > I think that the current text implies that the "preparing the block of 
> > octets" and "complementary wrapper reverses the packing operation" are 
> > supposed to be part of the RTP payload format that this document 
> > describes, but this document does not have enough information to 
> > actually perform those operations reversibly.  If the packing is to be 
> > done in the speech coder, then this document doesn't need to talk 
> > about the packing at all (e.g., at the end of Section 2); if we need 
> > to keep the packing/wrapper in this document then we need to indicate 
> > that there's a defined priority order for the (8-octet) TSVCIS 
> > parameters in the TSVCIS references, to allow the packing/unpacking to be deterministic.
> >
> > Thanks,
> >
> > Ben
> >
> > >
> > > -----Original Message-----
> > > From: Benjamin Kaduk <kaduk@mit.edu>
> > > Sent: Thursday, October 31, 2019 8:12 PM
> > > To: Barry Leiba <barryleiba@computer.org>
> > > Cc: victor.demjanenko@vocal.com; Roni Even (A) 
> > > <roni.even@huawei.com>; The IESG <iesg@ietf.org>; Catherine Meadows 
> > > <catherine.meadows@nrl.navy.mil>; IETF SecDir <secdir@ietf.org>; 
> > > draft-ietf-payload-tsvcis@ietf.org; Ali Begen 
> > > <ali.begen@networked.media>; avtcore-chairs@ietf.org; avt@ietf.org; 
> > > Dave Satterlee (Vocal) <Dave.Satterlee@vocal.com>; IETF discussion 
> > > list <ietf@ietf.org>; draft-ietf-payload-tsvcis.all@ietf.org
> > > Subject: Re: Benjamin Kaduk's Discuss on 
> > > draft-ietf-payload-tsvcis-03: (with DISCUSS and COMMENT)
> > >
> > > I don't think so, unfortunately.
> > >
> > > I do see the clarification about CODB's potential for deviation from 
> > > Table 1, that only the 600 bps MELPe is allowed to deviate, and that 
> > > CODA gets us to "it's one of 2400 or 600 bps" and the RTP timestamp 
> > > disambiguates that
> > > 600 bps is in use.  But, it seems that this means that the recipient 
> > > in general should not rely on CODB to differentiate 600 from 2400 
> > > bps, and instead is more robustly implemented by *always* using the 
> > > RTP timestamp to detect 600 bps, since that will always work and 
> > > CODB will sometimes not work under conditions not fully specified 
> > > here.  So, if we are unwilling or unable to clarify what those 
> > > conditions are (e.g., whether at a minimum mutual agreement is 
> > > required), then I think we need to describe this procedure of 
> > > consulting the RTP timestamp as the default behavior and avoid giving the impression that CODB should be used to do so.
> > >
> > > Additionally, I don't see anything to address my concern about 
> > > TSVCIS parameter decoding.  To be clear, the procedure I see this 
> > > document describing is that:
> > > - TSVCIS gives parameters (and their lengths in bits) to the codec
> > >   described in this document
> > > - this document specifies how to densely encode those parameters into a
> > >   byetstream
> > > - RTP transmits that encoded bytestream to the peer
> > > - the codec specified by this is responsible for turning that encoded
> > >   bystream back into a list of TSVCIS parameters (and their length 
> > > in bits)
> > >
> > > I don't see how that last step is attainable with only the 
> > > information provided by this document.  I *assume* that one of the 
> > > TSVCIS specifications has a canonical (ordered) listing of 
> > > parameters, and that the list of parmeters given to this codec in 
> > > the first step will always be an initial prefix of that list, but 
> > > that's just me guessing at how to make sense of the stated procedure 
> > > given insufficient information.  I don't think it's appropriate to 
> > > make the reader of an RFC guess at what to do; we need to either say 
> > > how to do it or give a pointer to an external reference that does.
> > >
> > > -Ben
> > >
> > > On Tue, Oct 29, 2019 at 02:26:09PM -0400, Barry Leiba wrote:
> > > > Ben, does the -04 version address everything?
> > > >
> > > > Barry
> > > >
> > > > On Thu, Oct 24, 2019 at 1:42 PM <victor.demjanenko@vocal.com> wrote:
> > > > >
> > > > > I forgot to address security comments in one email.  The changes are:
> > > > >
> > > > > Section 8, second paragraph - Suggested edit by reviewer
> > > > >
> > > > > (was)
> > > > >    This RTP payload format and the TSVCIS decoder do not exhibit any
> > > > >    significant non-uniformity in the receiver-side computational
> > > > >    complexity for packet processing and thus are unlikely to pose a
> > > > >    denial-of-service threat due to the receipt of pathological data.
> > > > >    Additionally, the RTP payload format does not contain any active
> > > > >    content.
> > > > >
> > > > > (now)
> > > > >    This RTP payload format and the TSVCIS decoder, to the best of our
> > > > >    knowledge, do not exhibit any significant non-uniformity in the
> > > > >    receiver-side computational complexity for packet processing and thus
> > > > >    are unlikely to pose a denial-of-service threat due to the receipt of
> > > > >    pathological data. Additionally, the RTP payload format does not
> > > > >    contain any active content.
> > > > >
> > > > >
> > > > > Section 8, third paragraph - Suggested edit by reviewer
> > > > >
> > > > > (was)
> > > > >    Please see the security considerations discussed in [RFC6562]
> > > > >    regarding VAD and its effect on bitrates.
> > > > >
> > > > > (now)
> > > > >    Please see the security considerations discussed in [RFC6562]
> > > > >    regarding Voice Activity Detect (VAD) and its effect on bitrates.
> > > > >
> > > > > Victor
> > > > >
> > > > > -----Original Message-----
> > > > > From: victor.demjanenko@vocal.com <victor.demjanenko@vocal.com>
> > > > > Sent: Thursday, October 24, 2019 10:05 AM
> > > > > To: 'Roni Even (A)' <roni.even@huawei.com>; 'Benjamin Kaduk'
> > > > > <kaduk@mit.edu>; 'The IESG' <iesg@ietf.org>
> > > > > Cc: draft-ietf-payload-tsvcis@ietf.org; 'Ali Begen'
> > > > > <ali.begen@networked.media>; avtcore-chairs@ietf.org; 
> > > > > avt@ietf.org; 'Dave Satterlee (Vocal)' 
> > > > > <Dave.Satterlee@vocal.com>
> > > > > Subject: RE: Benjamin Kaduk's Discuss on
> > > > > draft-ietf-payload-tsvcis-03: (with DISCUSS and COMMENT)
> > > > >
> > > > > Hi Everyone,
> > > > >
> > > > > First we want to thank everyone for their review and comments 
> > > > > for this
> > > draft RFC.  We believe we reviewed all the comments and suggestions 
> > > and incorporated them adequately in the next draft (04).  We'd like 
> > > to send out this list of exact changes in case anyone has additional 
> > > comments or thinks the clarifications are inadequate.  We would be 
> > > most happy to address concerns before publishing draft 04 tomorrow.
> > > > >
> > > > > With so many emails from a half dozen or more reviewers, we 
> > > > > apologize
> > > that we cannot address each sender individually.  We hope this 
> > > detail is sufficient for everyone.
> > > > >
> > > > > Again, many thanks to all.
> > > > >
> > > > > Victor & Dave
> > > > >
> > > > > ----------------------------------------------------------------
> > > > > ----
> > > > > --------------------------
> > > > >
> > > > > Section 1.1 - Suggested reference to RFC 8088 added.
> > > > >
> > > > > (was)
> > > > >    Best current practices for writing an RTP payload format
> > > > >    specification were followed [RFC2736].
> > > > >
> > > > > (now)
> > > > >    Best current practices for writing an RTP payload format
> > > > >    specification were followed [RFC2736] [RFC8088].
> > > > >
> > > > >
> > > > > Section 2, paragraphs 3 and 4 - Suggested edits by reviewers
> > > > >
> > > > > (was)
> > > > >    In addition to the augmented speech data, the TSVCIS specification
> > > > >    identifies which speech coder and framing bits are to be encrypted,
> > > > >    and how they are protected by forward error correction (FEC)
> > > > >    techniques (using block codes).  At the RTP transport layer, only the
> > > > >    speech coder related bits need to be considered and are conveyed in
> > > > >    unencrypted form.  In most IP-based network deployments, standard
> > > > >    link encryption methods (SRTP, VPNs, FIPS 140 link encryptors or Type
> > > > >    1 Ethernet encryptors) would be used to secure the RTP speech
> > > > >    contents.  Further, it is desirable to support the highest voice
> > > > >    quality between endpoints which is only possible without the overhead
> > > > >    of FEC.
> > > > >
> > > > >    TSVCIS augmented speech data is derived from the signal processing
> > > > >    and data already performed by the MELPe speech coder.  For the
> > > > >    purposes of this specification, only the general parameter nature of
> > > > >    TSVCIS will be characterized.  Depending on the bandwidth available
> > > > >    (and FEC requirements), a varying number of TSVCIS specific speech
> > > > >    coder parameters need to be transported.  These are first byte-packed
> > > > >    and then conveyed from encoder to decoder.
> > > > >
> > > > > (now)
> > > > >    In addition to the augmented speech data, the TSVCIS specification
> > > > >    identifies which speech coder and framing bits are to be encrypted,
> > > > >    and how they are protected by forward error correction (FEC)
> > > > >    techniques (using block codes).  At the RTP transport layer, only the
> > > > >    speech-coder-related bits need to be considered and are conveyed in
> > > > >    unencrypted form.  In most IP-based network deployments, standard
> > > > >    link encryption methods (SRTP, VPNs, FIPS 140 link encryptors or Type
> > > > >    1 Ethernet encryptors) would be used to secure the RTP speech
> > > > >    contents.
> > > > >
> > > > >    TSVCIS augmented speech data is derived from the signal processing
> > > > >    and data already performed by the MELPe speech coder.  For the
> > > > >    purposes of this specification, only the general parameter nature of
> > > > >    TSVCIS will be characterized.  Depending on the bandwidth available
> > > > >    (and FEC requirements), a varying number of TSVCIS-specific speech
> > > > >    coder parameters need to be transported.  These are first byte-packed
> > > > >    and then conveyed from encoder to decoder.
> > > > >
> > > > >
> > > > > Section 3, last sentence paragraph 3 - Suggested edit by 
> > > > > reviewer
> > > > >
> > > > > (was)
> > > > >    When more than one codec data frame is
> > > > >    present in a single RTP packet, the timestamp is, as always, that of
> > > > >    the oldest data frame represented in the RTP packet.
> > > > >
> > > > > (now)
> > > > >    When more than one codec data frame is
> > > > >    present in a single RTP packet, the timestamp specified is that of
> > > > >    the oldest data frame represented in the RTP packet.
> > > > >
> > > > >
> > > > > Section 3.1, last paragraph - Clarified permission for MELP 600 
> > > > > end-to-end framing bit
> > > > >
> > > > > (was)
> > > > >    It should be noted that CODB for both the 2400 and 600 bps modes MAY
> > > > >    deviate from the values in Table 1 when bit 55 is used as an end-to-
> > > > >    end framing bit.  Frame decoding would remain distinct as CODA being
> > > > >    zero on its own would indicate a 7-byte frame for either rate and the
> > > > >    use of 600 bps speech coding could be deduced from the RTP timestamp
> > > > >    (and anticipated by the SDP negotiations).
> > > > >
> > > > > (now)
> > > > >    It should be noted that CODB for MELPe 600 bps mode MAY deviate from
> > > > >    the value in Table 1 when bit 55 is used as an end-to-end framing
> > > > >    bit. Frame decoding would remain distinct as CODA being zero on its
> > > > >    own would indicate a 7-byte frame for either 2400 or 600 bps rate and
> > > > >    the use of 600 bps speech coding could be deduced from the RTP
> > > > >    timestamp (and anticipated by the SDP negotiations).
> > > > >
> > > > >
> > > > > Section 3.2, first paragraph - Clarifications requested by 
> > > > > reviewers
> > > > >
> > > > > (was)
> > > > >    The TSVCIS augmented speech data as packed parameters MUST be placed
> > > > >    immediately after a corresponding MELPe 2400 bps payload in the same
> > > > >    RTP packet.  The packed parameters are counted in octets (TC).  In
> > > > >    the preferred placement, shown in Figure 6, a single trailing octet
> > > > >    SHALL be appended to include a two-bit rate code, CODA and CODB,
> > > > >    (both bits set to one) and a six-bit modified count (MTC).  The
> > > > >    special modified count value of all ones (representing a MTC value of
> > > > >    63) SHALL NOT be used for this format as it is used as the indicator
> > > > >    for the alternate packing format shown next.  In a standard
> > > > >    implementation, the TSVCIS speech coder uses a minimum of 15 octets
> > > > >    for parameters in octet packed form.  The modified count (MTC) MUST
> > > > >    be reduced by 15 from the full octet count (TC).  Computed MTC = TC-
> > > > >    15.  This accommodates a maximum of 77 parameter octets (maximum
> > > > >    value of MTC is 62, 77 is the sum of 62+15).
> > > > >
> > > > > (now)
> > > > >    The TSVCIS augmented speech data as packed parameters MUST be placed
> > > > >    immediately after a corresponding MELPe 2400 bps payload in the same
> > > > >    RTP packet.  The packed parameters are counted in octets (TC).  The
> > > > >    preferred placement SHOULD be used for TSVCIS payloads with TC less
> > > > >    than or equal to 77 octets, is shown in Figure 6.  In the preferred
> > > > >    placement, a single trailing octet SHALL be appended to include a
> > > > >    two-bit rate code, CODA and CODB, (both bits set to one) and a six-
> > > > >    bit modified count (MTC).  The special modified count value of all
> > > > >    ones (representing a MTC value of 63) SHALL NOT be used for this
> > > > >    format as it is used as the indicator for the alternate packing
> > > > >    format shown next.  In a standard implementation, the TSVCIS speech
> > > > >    coder uses a minimum of 15 octets for parameters in octet packed
> > > > >    form.  The modified count (MTC) MUST be reduced by 15 from the full
> > > > >    octet count (TC).  Computed MTC = TC-15.  This accommodates a maximum
> > > > >    of 77 parameter octets (maximum value of MTC is 62, 77 is the sum of
> > > > >    62+15).
> > > > >
> > > > >
> > > > > Section 3.3, first paragraph - Suggested edit by reviewer
> > > > >
> > > > > (was)
> > > > >    A TSVCIS RTP packet consists of zero or more TSVCIS coder frames
> > > > >    (each consisting of MELPe and TSVCIS coder data) followed by zero or
> > > > >    one MELPe comfort noise frame.  The presence of a comfort noise frame
> > > > >    can be determined by its rate code bits in its last octet.
> > > > >
> > > > > (now)
> > > > >    A TSVCIS RTP packet payload consists of zero or more consecutive
> > > > >    TSVCIS coder frames (each consisting of MELPe 2400 and TSVCIS coder
> > > > >    data), with the oldest frame first, followed by zero or one MELPe
> > > > >    comfort noise frame.  The presence of a comfort noise frame can be
> > > > >    determined by its rate code bits in its last octet.
> > > > >
> > > > >
> > > > > Section 3.3, fourth paragraph - Clarification requested by 
> > > > > reviewers
> > > > >
> > > > > (was)
> > > > >    TSVCIS coder frames in a single RTP packet MAY be of different coder
> > > > >    bitrates.  With the exception for the variable length TSVCIS
> > > > >    parameter frames, the coder rate bits in the trailing byte identify
> > > > >    the contents and length as per Table 1.
> > > > >
> > > > > (now)
> > > > >    TSVCIS coder frames in a single RTP packet MAY have varying TSVCIS
> > > > >    parameter octet counts.  Its packed parameter octet count (length) is
> > > > >    indicated in the trailing byte(s).  All MELPe frames in a single RTP
> > > > >    packet MUST be of the same coder bitrate.  For all MELPe coder
> > > > >    frames, the coder rate bits in the trailing byte identify the
> > > > >    contents and length as per Table 1.
> > > > >
> > > > >
> > > > > Section 4.1 - Editor note removed
> > > > >
> > > > >
> > > > > Section 4.1 - Change controller is now
> > > > >
> > > > > (now)
> > > > >    Change controller: IETF, contact <avt@ietf.org>
> > > > >
> > > > >
> > > > > Section 5, first paragraph - Suggested edits by reviewers
> > > > >
> > > > > (was)
> > > > >    A primary application of TSVCIS is for radio communications of voice
> > > > >    conversations, and discontinuous transmissions are normal.  When
> > > > >    TSVCIS is used in an IP network, TSVCIS RTP packet transmissions may
> > > > >    cease and resume frequently.  RTP synchronization source (SSRC)
> > > > >    sequence number gaps indicate lost packets to be filled by PLC, while
> > > > >    abrupt loss of RTP packets indicates intended discontinuous
> > > > >    transmissions.
> > > > >
> > > > > (now)
> > > > >    A primary application of TSVCIS is for radio communications of voice
> > > > >    conversations, and discontinuous transmissions are normal.  When
> > > > >    TSVCIS is used in an IP network, TSVCIS RTP packet transmissions may
> > > > >    cease and resume frequently.  RTP synchronization source (SSRC)
> > > > >    sequence number gaps indicate lost packets to be filled by Packet
> > > > >    Loss Concealment (PLC), while abrupt loss of RTP packets indicates
> > > > >    intended discontinuous transmissions.  Resumption of voice
> > > > >    transmission SHOULD be indicated by the RTP marker bit (M) set to 1.
> > > > >
> > > > >
> > > > > Section 10 - Added reference
> > > > >
> > > > > (added)
> > > > >    [RFC8088]  Westerlund, M., "How to Write an RTP Payload Format",
> > > > >               RFC 8088, DOI 10.17487/RFC8088, May 2017,
> > > > >               <http://www.rfc-editor.org/info/rfc8088>.
> > > > >
> > > > > ----------------------------------------------------------------
> > > > > ----
> > > > > -----------------------------
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Roni Even (A) <roni.even@huawei.com>
> > > > > Sent: Sunday, October 6, 2019 2:09 AM
> > > > > To: victor.demjanenko@vocal.com; 'Benjamin Kaduk' 
> > > > > <kaduk@mit.edu>; 'The IESG' <iesg@ietf.org>
> > > > > Cc: draft-ietf-payload-tsvcis@ietf.org; 'Ali Begen'
> > > > > <ali.begen@networked.media>; avtcore-chairs@ietf.org; 
> > > > > avt@ietf.org; 'Dave Satterlee (Vocal)' 
> > > > > <Dave.Satterlee@vocal.com>
> > > > > Subject: RE: Benjamin Kaduk's Discuss on
> > > > > draft-ietf-payload-tsvcis-03: (with DISCUSS and COMMENT)
> > > > >
> > > > > Hi,
> > > > > About the reference to TSVCIS.
> > > > > The RTP payload is about how to encapsulate the payload in an 
> > > > > RTP
> > > packet. The objective is to define how an RTP stack can insert the 
> > > tsvcis frames and  extract the tsvcis frames from the RTP packet. 
> > > Typically it is not required to understand the payload structure in 
> > > order to be able to perform the encapsulation.
> > > > > This is why the reference to the payload is Informational and we 
> > > > > did not require to have it publically available.  If there is a 
> > > > > need to understand the payload itself for the encapsulating than 
> > > > > we need more information in the RTP payload specification and a 
> > > > > publically available normative reference. I think this is not 
> > > > > the case here
> > > > >
> > > > > Roni Even
> > > > >
> > > > > AVTCore co-chair (ex Payload)
> > > > >
> > > > > -----Original Message-----
> > > > > From: victor.demjanenko@vocal.com 
> > > > > [mailto:victor.demjanenko@vocal.com]
> > > > > Sent: Saturday, October 05, 2019 12:18 AM
> > > > > To: 'Benjamin Kaduk'; 'The IESG'
> > > > > Cc: draft-ietf-payload-tsvcis@ietf.org; 'Ali Begen';
> > > avtcore-chairs@ietf.org; avt@ietf.org; 'Victor Demjanenko, Ph.D.'; 
> > > 'Dave Satterlee (Vocal)'
> > > > > Subject: RE: Benjamin Kaduk's Discuss on
> > > > > draft-ietf-payload-tsvcis-03: (with DISCUSS and COMMENT)
> > > > >
> > > > > Everyone,
> > > > >
> > > > > Thanks for the comments.  I think I mis-understood the ambiguity 
> > > > > with
> > > respect to to changing rates within a RTP packet.  That was not 
> > > plan.  An RTP packet must have MELP speech frames of the same rate.  
> > > What is possible is that the amount of augmented TSVCIS speech data 
> > > may vary from one speech frame to the next.  This allows for a 
> > > dynamic VDR as suggested by the NRL paper.  So an RTP packet may 
> > > have varying TSVCIS data but must always have MELPe 2400 data.
> > > > >
> > > > > Again backwards parsing is necessary but the timestamp uniformly
> > > increments 22.5msec per combined MELP/TSVCIS speech frame.
> > > > >
> > > > > The NRL is a good public reference on the VDR aspects.  The 
> > > > > actual
> > > TSVCIS spec we had was FOUO so we could not replicate its detail.  
> > > (I believe a later spec is public or at least partially public.  I 
> > > am trying to get this.)  The opaque data is pretty obvious with the TSVCIS spec in hand.
> > > > >
> > > > > We will address the issues/concerns raised next week.  Other 
> > > > > business
> > > had priority.
> > > > >
> > > > > Thank you and enjoy the weekend.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Victor & Dave
> > > > >
> > > > > -----Original Message-----
> > > > > From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
> > > > > Sent: Wednesday, October 2, 2019 10:40 PM
> > > > > To: The IESG <iesg@ietf.org>
> > > > > Cc: draft-ietf-payload-tsvcis@ietf.org; Ali Begen 
> > > > > <ali.begen@networked.media>; avtcore-chairs@ietf.org; 
> > > > > ali.begen@networked.media; avt@ietf.org
> > > > > Subject: Benjamin Kaduk's Discuss on draft-ietf-payload-tsvcis-03:
> > > > > (with DISCUSS and COMMENT)
> > > > >
> > > > > Benjamin Kaduk has entered the following ballot position for
> > > > > draft-ietf-payload-tsvcis-03: Discuss
> > > > >
> > > > > When responding, please keep the subject line intact and reply 
> > > > > to all email addresses included in the To and CC lines. (Feel 
> > > > > free to cut this introductory paragraph, however.)
> > > > >
> > > > >
> > > > > Please refer to
> > > > > https://www.ietf.org/iesg/statement/discuss-criteria.html
> > > > > for more information about IESG DISCUSS and COMMENT positions.
> > > > >
> > > > >
> > > > > The document, along with other ballot positions, can be found here:
> > > > > https://datatracker.ietf.org/doc/draft-ietf-payload-tsvcis/
> > > > >
> > > > >
> > > > >
> > > > > ----------------------------------------------------------------
> > > > > ----
> > > > > --
> > > > > DISCUSS:
> > > > > ----------------------------------------------------------------
> > > > > ----
> > > > > --
> > > > >
> > > > > I support Magnus' point about the time-ordering of adjacent 
> > > > > frames in a
> > > packet.
> > > > >
> > > > > Additionally, I am not sure that there's quite enough here to be
> > > interoperably implementable.  Specifically, we seem to be lacking a 
> > > description of how an encoder or decoder knows which TSVCIS 
> > > parameters, and in what order, to byte-pack or unpack, respectively.  
> > > One might surmise that there is a canonical listing in [TSVCIS], but 
> > > this document does not say that, and furthermore [TSVCIS] is only listed as an informative reference.
> > > (I couldn't get my hands on my copy, at least on short notice.)  If 
> > > we limited ourselves to treating the TSVCIS parameters as an 
> > > entirely opaque blob (codec, convey these N octets to the peer with 
> > > the appropriate one- or two-byte trailer for payload type 
> > > identification and framing), that would be interoperably 
> > > implementable, since the black-box bits are up to some other codec to interpret.
> > > > >
> > > > > In a similar vein, we mention but do not completely specify the
> > > potential for using CODB as an end-to-end framing bit, in Section 
> > > 3.1 (see Comment), which is not interoperably implementable without further details.
> > > > >
> > > > >
> > > > > ----------------------------------------------------------------
> > > > > ----
> > > > > --
> > > > > COMMENT:
> > > > > ----------------------------------------------------------------
> > > > > ----
> > > > > --
> > > > >
> > > > > Where is [TSVCIS] available?
> > > > >
> > > > > Is [NRLVDR] the same as
> > > > > https://apps.dtic.mil/dtic/tr/fulltext/u2/a588068.pdf ?  A URL 
> > > > > in the
> > > references would be helpful.
> > > > >
> > > > > Is additional TSVCIS data only present after 2400bps MELPe and 
> > > > > the first
> > > thing to get dropped under bandwidth pressure?  The abstract and 
> > > introduction imply this by calling out MELPe 2400 bps speech 
> > > parameters explicitly, but Section 3 says that TSVCIS augments 
> > > standard 600, 1200, and
> > > 2400 bps MELP frames.
> > > > >
> > > > > It's helpful that Section 3.3 gives some general guidance for 
> > > > > decoding
> > > this payload type ("[t]he way to determine the number of 
> > > TSVCIS/MELPe frames is to identify each frame type and length"), but 
> > > I think some generic considerations would be very helpful to the 
> > > reader much earlier, along the lines of "MELPe and TSVCIS data 
> > > payloads are decoded from the end, using the CODA and CODB (and, if 
> > > necessary, CODC and others) bits to determine the type of payload.  
> > > For MELPe payloads the type also indicates the payload length, 
> > > whereas for TSVCIS data an additional length field is present, in 
> > > one of two possible formats.  A TSVCIS coder frame consists of a 
> > > MELPe data payload followed by zero or one TSVCIS data payload; 
> > > after the TSVCIS payload's presence/length is determined, then the 
> > > preceding MELPe payload can be determined and decoded.  Per Section 
> > > 3.3, multiple TSVCIS frames can be present in a single RTP packet."  
> > > This (or something like it) would also serve to clarify the role of the COD* bits, which is otherwise only implicitly introduced.
> > > > >
> > > > > Section 1.1
> > > > >
> > > > > RFC 2736 is BCP 36 (but it's updated by RFC 8088 which is for 
> > > > > some
> > > reason an Informational document and not part of BCP 36?!).
> > > > >
> > > > > Section 2
> > > > >
> > > > >    In addition to the augmented speech data, the TSVCIS specification
> > > > >    identifies which speech coder and framing bits are to be encrypted,
> > > > >    and how they are protected by forward error correction (FEC)
> > > > >    techniques (using block codes).  At the RTP transport layer, only the
> > > > >    speech coder related bits need to be considered and are conveyed in
> > > > >    unencrypted form.  In most IP-based network deployments, 
> > > > > standard
> > > > >
> > > > > Am I reading this correctly that this text is just summarizing 
> > > > > what's in
> > > the TSVCIS spec in terms of what needs to be in unencrypted form, so 
> > > the "only the speech coder related bits[...]" is not new information 
> > > from this document?  I'm not sure I agree with the conclusion, 
> > > regardless -- won't the
> > > (MELPe) speech coder bits be enough to convey the semantic content 
> > > of the audio stream, something that one might desire to keep confidential?
> > > > >
> > > > >    link encryption methods (SRTP, VPNs, FIPS 140 link encryptors or Type
> > > > >    1 Ethernet encryptors) would be used to secure the RTP speech
> > > > >    contents.  Further, it is desirable to support the highest voice
> > > > >    quality between endpoints which is only possible without the overhead
> > > > >    of FEC.
> > > > >
> > > > > I think I'm missing a step in how this conclusion was reached.
> > > > >
> > > > >    TSVCIS will be characterized.  Depending on the bandwidth available
> > > > >    (and FEC requirements), a varying number of TSVCIS specific speech
> > > > >    coder parameters need to be transported.  These are first byte-packed
> > > > >    and then conveyed from encoder to decoder.
> > > > >
> > > > > Per the Discuss point, how do I know which parameters need to be
> > > transported, and in what order?
> > > > >
> > > > >    Byte packing of TSVCIS speech data into packed parameters is
> > > > >    processed as per the following example:
> > > > >
> > > > >       Three-bit field: bits A, B, and C (A is MSB, C is LSB)
> > > > >       Five-bit field: bits D, E, F, G, and H (D is MSB, H is 
> > > > > LSB)
> > > > >
> > > > >            MSB                                              LSB
> > > > >             0      1      2      3      4      5      6      7
> > > > >         +------+------+------+------+------+------+------+------+
> > > > >         |   H  |   G  |   F  |   E  |   D  |   C  |   B  |   A  |
> > > > >         
> > > > > +------+------+------+------+------+------+------+------+
> > > > >
> > > > >    This packing method places the three-bit field "first" in the lowest
> > > > >    bits followed by the next five-bit field.  Parameters may be split
> > > > >    between octets with the most significant bits in the earlier octet.
> > > > >    Any unfilled bits in the last octet MUST be filled with zero.
> > > > >
> > > > > I agree with Adam that this is very unclear.  A is the MSB of 
> > > > > the
> > > three-bit field but the LSB of the octet overall?
> > > > > We probably need an example of splitting a parameter across 
> > > > > octets as
> > > well, to get the bit ordering right.
> > > > >
> > > > > Section 3.1
> > > > >
> > > > >    It should be noted that CODB for both the 2400 and 600 bps modes MAY
> > > > >    deviate from the values in Table 1 when bit 55 is used as an end-to-
> > > > >    end framing bit.  Frame decoding would remain distinct as 
> > > > > CODA being
> > > > >
> > > > > Where is the use of CODB as an end-to-end framing bit defined?  
> > > > > If we're
> > > going to provide neither a complete description of how to do it nor 
> > > a reference to a better description, we probably shouldn't mention it at all.
> > > > >
> > > > > Section 3.2
> > > > >
> > > > >    RTP packet.  The packed parameters are counted in octets (TC).  In
> > > > >    the preferred placement, shown in Figure 6, a single trailing octet
> > > > >    SHALL be appended to include a two-bit rate code, CODA and 
> > > > > CODB,
> > > > >
> > > > > I'd consider saying something about this being the preferred 
> > > > > format
> > > > > ("placement") due to its shorter length than the alternative, 
> > > > > and say
> > > that it "SHOULD be used for TSVCIS payloads with TC less than or 
> > > equal to 77 octetes".
> > > > >
> > > > > Section 3.3
> > > > >
> > > > > When a longer packetization interval is used, is that indicated 
> > > > > by
> > > signaling or RTP timestamps or otherwise?
> > > > >
> > > > >    TSVCIS coder frames in a single RTP packet MAY be of different coder
> > > > >    bitrates.  With the exception for the variable length TSVCIS
> > > > >    parameter frames, the coder rate bits in the trailing byte identify
> > > > >    the contents and length as per Table 1.
> > > > >
> > > > > Maybe also note that the penultimate octet gives the length there?
> > > > >
> > > > >    Information describing the number of frames contained in an RTP
> > > > >    packet is not transmitted as part of the RTP payload.  The way to
> > > > >    determine the number of TSVCIS/MELPe frames is to identify each frame
> > > > >    type and length thereby counting the total number of octets within
> > > > >    the RTP packet.
> > > > >
> > > > > terminology nit: if a frame is the combination of MELPe and 
> > > > > TSVCIS
> > > payload data units then there are two layres of decoding to get a 
> > > length for the frame, since we have to get the TSVCIS length and then the MELPe length.
> > > > >
> > > > > Section 4.2
> > > > >
> > > > >    Parameter "ptime" cannot be used for the purpose of 
> > > > > specifying the
> > > > >
> > > > > nit: missing article ("The parameter")
> > > > >
> > > > >    will be impossible to distinguish which mode is about to be used
> > > > >    (e.g., when ptime=68, it would be impossible to distinguish if the
> > > > >    packet is carrying one frame of 67.5 ms or three frames of 22.5 ms).
> > > > >
> > > > > So how is the operating mode determined, then?
> > > > > (I think this is the same question I asked above)
> > > > >
> > > > > Section 4.4
> > > > >
> > > > >    For example, if offerer bitrates are "2400,600" and answer bitrates
> > > > >    are "600,2400", the initial bitrate is 600.  If other bitrates are
> > > > >    provided by the answerer, any common bitrate between the offer and
> > > > >    answer MAY be used at any time in the future.  Activation of these
> > > > >    other common bitrates is beyond the scope of this document.
> > > > >
> > > > > It seems important to specify whether this requires a new O/A 
> > > > > exchange
> > > or can be done "spontaneously" by just encoding different frame types.
> > > > > (It seems like the latter is possible, on first glance, and this 
> > > > > is implied by Section 3.3's discussion of mixing them in a 
> > > > > single
> > > > > packet.)
> > > > >
> > > > > Section 5
> > > > >
> > > > > Please expand PLC at first use (not second).
> > > > >
> > > > > Section 6
> > > > >
> > > > > I don't understand the PLC usage.  Is the idea that a receiver, 
> > > > > on
> > > seeing an SSRC gap, constructs fictitious PLC frames to "fill the gap"
> > > > > and passes the resulting stream to the decoder?
> > > > >
> > > > > Section 8
> > > > >
> > > > >    and important considerations in [RFC7201].  Applications SHOULD use
> > > > >    one or more appropriate strong security mechanisms.  The rest of this
> > > > >    section discusses the security-impacting properties of the payload
> > > > >    format itself.
> > > > >
> > > > > I thought we described TSVCIS itself (much earlier in the 
> > > > > document) as
> > > requiring encryption for some data; wouldn't that translate to a "MUST"
> > > > > here and not a "SHOULD"?
> > > > >
> > > > >
> > > > >
> > >
>