[AVT] Audio/Video Transport Working Group Minutes

Audio/Video Transport Working Group Minutes

Reported by Colin Perkins and Stephen Casner

  The Audio/Video Transport working group met twice at the 54th IETF
  meeting in Yokohama. In the first session, the group discussed RTP
  payload formats for MIDI, Interleaved audio, AC-3 audio, JPEG 2000
  video, JVT video and MPEG-4. In the second session, the discussion
  focused on RTP payload formats for iLBC speech, SMPTE 292M video
  and uncompressed video, extended RTCP reports, RTP retransmission,
  multiplexing based on the SSRC, and RTCP extensions to support single
  source multicast sessions.

Introduction and Document Status

  The meeting opened with a status update from Steve Casner. The group has
  had a single RFC published since the last meeting: the RTP payload format
  for AMR audio (RFC 3267). There are also several drafts in the RFC editor
  queue awaiting publication: the MIME registration for the payload formats
  in the RTP profile, the SDP bandwidth modifiers for RTCP bandwidth, and
  the RTP payload format for comfort noise. In addition, the revised RTP
  specification and audio/video profile have been "tentatively approved"
  for publication since the last meeting.

  Two issues have been raised with the drafts approved for publication:
  a conflict between the G.726 payload format defined in the new profile
  and that in ITU I.336.2 Annex E, and a desire for a minimum transmission
  interval for comfort noise packets.

  The payload format for G.726 audio defined in the audio/video profile is
  little-endian, however ITU I.336.2 Annex E specifies a big-endian format
  for the same codec. It has been requested that the audio/video profile be
  changed to match the ITU format. Steve Casner asked the group if it is
  reasonable to make this change, since the definition in the profile has
  been present since 1997, and there are existing implementations? He noted
  that it is clearly unfortunate that there is an incompatibility with the
  ITU format, and that there are several possible ways to move forward.
  These include accepting the incompatibility, changing the definition of
  the payload format in the audio/video profile, or accepting both formats
  with the ITU format being registered under a separate MIME type.

  Dave Oran suggested a fourth option, which is to "take the IETF format
  and declare that to be the second MIME type" with the ITU format taking
  the place of the current definition, noting that "we get to decide who's
  ox gets gored". Steve Casner thought this was not reasonable, since such
  a change will break compatibility with implementations which use the
  existing MIME type, and noted that if we accept both formats, we need to
  assign a new MIME type to the ITU format. Dave Oran noted that whatever
  happens there will be some breakage, since several implementations use
  the ITU packing but refer to it with the IETF MIME type. There was no
  resolution, comments are solicited.

  The second issue is the possibility of defining a maximum inter-packet
  transmission interval for comfort noise packets, to act as a liveness
  indicator. This raises two new questions: what should the interval be,
  and should it be specified in the payload format rather than some
  application-specific document? Steve Casner expressed the opinion that
  such a restriction was application-specific, and should not be in the
  payload format. He also noted that RTCP was a good indicator of system
  liveness. Input was solicited from the group, but there were no comments.

  Several drafts have been submitted to the IESG, but are not yet accepted
  for publication. These include enhanced CRTP and TCRTP, the secure RTP
  profile, and the payload format for EVRC/SMV speech. Several drafts are
  in working group last call: RTCP feedback, the MPEG-4 payload format, the
  payload format for distributed speech recognition, and the ULP and UXP
  FEC mechanisms.

  Regarding the ULP draft, Steve Casner noted that the behaviour of the X
  and P bits in ULP packets did not follow the usual rules (these bits are
  the XOR of the bits in the protected packets). It turns out that this
  behaviour was inherited from RFC 2733, which ULP extends, and implies
  that RTP header validation must be special-cased for FEC packets. It was
  noted that this is not a good design, and that neither the chairs nor the
  author of RFC 2733 could remember a reason why these fields were
  redesigned (except, perhaps, to save a couple of bits). Accordingly,
  Steve Casner proposed to redesign the ULP format to address this issue,
  as a replacement to RFC 2733. It is believed that there are
  implementations of RFC 2733 which might be affected by such a change, and
  input from implementors is solicited.

DTMF

  There is a new draft on the payload format for DTMF tones to update RFC
  2833 (draft-ietf-avt-rfc2833bis-00.txt). This adds several tones and
  events that were missed, and clarifies a small number of points. This
  revision is intended as a short-term effort with the goal of producing a
  draft standard RFC with minimal changes. Interoperability tests will be
  required, to complete this work, and a volunteer was solicited to
  coordinate interoperability testing.

MIDI

  Colin Perkins, sitting in for John Lazzaro, gave an update on the MIDI
  Wire Protocol Packetization (draft-ietf-avt-mwpp-midi-rtp-04.txt). This
  draft has been discussed on the list, and is believed to be essentially
  complete in terms of the payload format and recovery journal structures.
  The next steps are to reality test the SDP parameters, updating them if
  necessary, and to write drafts describing how the recovery journal can be
  used for particular scenarios (intended as BCPs to accompany the main
  specification). Steve Casner asked if the aim of publishing these drafts
  as BCPs was to provide a means of publishing that information whilst
  satisfying concerns about including it in the main spec? Yes, it's not
  appropriate to put a single algorithm in the payload format, since there
  may be multiple algorithms that may be suitable, depending on how the
  format is used and on the desired degree of resilience.

  Colin noted that work is proceeding to check the format for correctness
  and update the implementation. In addition, companion drafts are planned
  to describe complete systems using MIDI with RTSP and SIP, fleshing out
  the complete scenarios.

  The issue of the definition of an IANA registry for render parameters,
  was highlighted: who controls the definition of new values? What sort of
  specification should be required for new values? Colin Perkins suggested
  that requiring an RFC for each parameter is probably overkill, since new
  parameters are expected to be common, but that a stable and public
  specification is appropriate. Colin also pointed to the summary of open
  issues that was sent to the mailing list, and solicited input.

Interleaved Audio

  Colin Perkins, sitting in for Orion Hodson, introduced an RTP payload
  format for interleaved audio (draft-ietf-avt-rtp-interleave-00.txt).
  There are several existing payload formats that support interleaving; the
  intention of this draft is to produce a general purpose solution rather
  than having separate, subtly different, interleavers for each new codec.
  This new draft is relatively low overhead, works with audio codecs with
  fixed or self-describing frame sizes, supports codec changes mid-stream
  and codecs that employ silence suppression, and is reasonably easy to
  implement. The proposed payload format uses a two octet header indicating
  the interleaver cycle and index, plus the original payload type, in much
  the style of the RFC 2198 redundant audio format. The RTP timestamp is
  specified to use the timestamp of the frames pre-interleaver, to keep the
  sequence and allow header compression.

  Stephan Wenger noted that this format is useful for more than audio,
  since it works for anything with a fixed frame rate and size, and it
  could be useful for other situations.

  Steve Casner noted that he doesn't particularly like the idea, because of
  the overhead of carrying the additional payload type in each packet, plus
  the notion that we're adding an additional layer of indirection to hide
  the original payload type. It may still be reasonable to define an
  interleaving payload format, but the efficiency gains of not including
  the payload type in-band may make it worth defining codec-specific
  formats also. Colin Perkins noted that it may be possible to signal the
  inner-payload type out of band, as a way of reducing the overhead.

  Magnus Westerlund noted that there are potential issues with comfort
  noise, as discussed on the mailing list. Colin Perkins noted that
  operation with silence suppression may also not be well specified.

JPEG 2000 video

  Eric Edwards presented the RTP payload format for JPEG 2000 video streams
  (draft-ietf-avt-rtp-jpeg2000-01.txt). There have been a number of changes
  in the draft since the last meeting, as a result of the feedback received
  from the IETF and the JPEG committee.

  The number of RTP packet types has been reduced, with the opaque type
  field in the payload header being changed to a set of explicit flags.
  This was queried by Steve Casner, who noted that the change does not
  reduce the number of modes of operation, it just represents them in a
  different way: his concern was more about the number of modes and the
  complexity they introduced. Eric noted that the types map directly onto
  the codec, and hence the authors believe the set of flags is appropriate.

  Support for tiling small sized parts has been specified, to improve
  efficiency with certain classes of operation.

  A number of optional fields have been added to support the addition of
  JPIP at some time in the future. Steve Casner and Colin Perkins expressed
  concern over this, since it is not clear if JPIP is suitable for use with
  RTP. It may be more appropriate to extend the protocol at a later date,
  rather than to add fields now in the hope that they are suitable. Steve
  Casner noted that having an undefined field in the standard is a problem:
  a new RFC, registering this option with IANA, may be better. This payload
  could define the extensibility mechanism in the IANA considerations section,
  but leave actual extensions for future specification.

  At the previous meeting, it was suggested that the authors investigate
  the H.263 picture header redundancy technique (RFC 2428) as a possible
  means of improving the resilience of this format. Eric reported that,
  because of the possible size of the Main Header of JPEG 2000, the authors
  believe this not appropriate. Steve Casner noted that the real issue may
  be that having a large amount of state which needs to be maintained makes
  the codec more fragile (since, unlike H.263, we can't repeat it often).

  Eric noted that there is an optional marker segment that can be used to
  help resilience, so this fragility is not necessarily a problem. Stephan
  Wenger noted that the RFC 2429 repetition feature allows repetition of
  parts of the header, if that is useful. Eric suggested giving examples of
  resilience using the optional header. Philippe Gentric asked if SDP might
  be an appropriate means of conveying this information, but Steve Casner
  noted that this is only appropriate if the header is static for the
  entire session.

  At the previous meeting, the redundant audio scheme from RFC 2198 was
  also proposed as a resilience mechanism, but the authors did not find
  that appropriate either.

  It was also suggested that careful ordering of packets might result in a
  more robust transport, since errors could be concealed by careful choice
  of update order. This is possible, and shouldn't require changes to the
  payload format, but it does requires additional buffering at the
  receiver.

  Eric noted that a patent application has been filed in Japan that covers
  this format. If the patent is granted, it will be licensed under
  reasonable and non-discriminatory terms. Steve Casner noted that the IPR
  statement needs to go on the IETF website, rather than in the drafts, and
  there is specific wording that should be included in the draft.

  There are a number of open issues: should support for in-band priority
  mapping tables by included in the specification? Steve Casner asked who
  would look at it? Is the goal to have the network do something different?
  He noted that there is no point putting information in the packets unless
  it's going to be useful. "If you're not sure if you'll need it, don't put
  it in".

  Eric noted that the authors have an implementation of the format, being
  used for testing. They will produce one more version of the draft before
  the next meeting, and they expect that to be ready for last call.

AC-3 audio

  Jason Flaks presented the RTP payload format for the AC-3 audio codec
  (draft-flaks-avt-rtp-ac3-02.txt). There have been significant changes
  since the last version, primarily to improve the fragmentation and error
  resilience (this is important, since most AC-3 frames exceed the Ethernet
  MTU).

  Fragmentation has been improved by noting that the first 5/8ths of an
  AC-3 frame are independently decodable. This provides a natural
  fragmentation point, which is resilient to packet loss, and is now
  supported by the payload format. This new fragmentation scheme also gives
  the opportunity for redundant transmission of fragments, by sending a
  channel-reduced version of the data in the following packet.  Colin
  Perkins suggested that delaying the redundant data by more than one
  packet might improve performance in the presence of burst losses, and so
  might be appropriate to consider.

  Jason noted that the number of data units field was added in case it was
  useful, but it was unlikely that is will be used. Steve noted that there
  are networks with large MTUs, but the question is more whether the packet
  rate and header overheads are a problem? Aggregation is useful when you
  can tolerate the latency and wish to reduce the packet rate, if that's
  not the case, there is no need to aggregate multiple frames per packet.

  After outlining the changes, Jason asked that the draft move to the
  standards track at some stage. Steve Casner noted that the draft was
  already accepted as a working group task (the name didn't change this
  time, due to the deadline). Advancing the draft is simply a matter of
  completing the work, at which time it can advance to RFC status.

JVT video

  Stephan Wenger updated the group on the RTP payload format for JVT video
  (draft-wenger-avt-rtp-jvt-01.txt). There have been a number of changes
  since the last meeting, including making the RTP timestamp match the
  presentation timestamp, using a fixed 90kHz clock, using two types of
  aggregation packets (STAP and MTAP). The JVT spec itself has many changes
  in the video coding layer, a new "disposable" flag for packets, and the
  introduction of a picture layer (it was noted that the picture layer is
  controversial, and may be removed in future), and the draft has been
  updated to track these. There are several open issues: efficiency of
  MTAPs? Is a 16 bit timestamp offset in the MTAP sufficient? Is it
  appropriate for the RTP marker bit to represent end of slice? (There is
  also the issue of possible alignment with the MPEG-4 payload format.)

  Stephan noted the issue of IPR on the parameter set concept, raised by
  Reha Civanlar on the mailing list.

  Regarding the 16 bit timestamp offset, Philippe Gentric noted that it is
  "both not enough and too much" and should be configurable. Colin Perkins
  commented that a variable length encoding of the timestamp might be used
  (much as in CRTP). Philippe also asked if the MTAP timestamp offset has
  to match the rate of the RTP clock (this is the reason for the 2/3rd of a
  second offset limit)?  Stephan objected to this idea, because it causes a
  loss of precision in gateways and adds considerable complexity. Steve
  Casner also noted the issue of precision in the low bits of the timestamp
  as being important. Philippe noted that MTAPs can be used for interleaving,
  and wondered if the offset size limitation was problematic for this use?
  Stephan believes not, but this may depend on the application (e.g. Philippe
  noted that streaming applications may have very long interleaving periods).
  Magnus Westerlund voiced support for variable length encoding of the
  timestamp, to solve this problem.

  Regarding the marker bit, Stephan noted that there is no need for an end
  of picture signal in JVT. Accordingly, it would be helpful to signal end
  of slice or end of NALU (if fragmented) instead. Is this an acceptable
  use of the marker bit? Steve Casner agreed that signalling the end of an
  application data unit, even if that is not end-of-picture, is appropriate.

  The final issue was whether to allow media unaware fragmentation,
  signalled by the marker bit, in the payload format. It is clearly better
  to fragment on application meaningful boundaries, but there was no real
  objection to adding media unaware fragmentation, so long as it can be
  done in a clean way.

  Steve Casner called a "hum" on making this an AVT work item, after asking
  on the status of the draft within JVT. The room expressed support for
  taking this as a work item.

MPEG-4 payload format and related MIME types

  Philippe Gentric described progress in the RTP payload format for MPEG-4
  (draft-ietf-avt-mpeg4-simple-04.txt) and a related draft containing MIME
  registrations (draft-lim-mpeg4-mime-00.txt). The payload format is in
  working group last call and has also been reviewed by MPEG. Since the
  previous meeting, the draft has been extended to transport MPEG-4 System
  streams (still no SL) and has two new optional fields in the AU header
  section to transport a random access point flag and a stream-state
  counter. The remaining open issue is the naming of the "profile" MIME
  parameter, which is misleading. There is ongoing discussion to change
  this to either "MaxInterleaveDelay", "maxInterleave" or "maxptime" for
  clarity and compatibility with other formats (however, the ISMA uses the
  existing name, so there may be compatibility issues with existing products
  if a change is made).

  Philippe also described draft-lim-mpeg4-mime-00.txt, which is an
  evolution of draft-singer-mpeg4-ip-04.txt. The informative parts of the
  Singer draft will be published as part of MPEG-4, with the MIME types
  being extracted into this new draft for publication by the IETF. Comments
  are solicited.

MPEG-4 FlexMux

  Catherine Roux presented the RTP payload for MPEG-4 FlexMultiplexed
  streams (draft-curet-avt-rtp-mpeg4-flexmux-03.txt). A number of open
  issues exist, regarding the relation between the clock references and RTP
  timestamp, the ability to synchronise FlexMux streams with non-FlexMux
  content using RTP, the ability to robustly signal FlexMux configuration,
  the SDP parameters and the applicability of the format.

  The draft now considers the RTP timestamp to be the send time of the
  packet. However, it is still not clear how to synchronise MPEG-4 FlexMux
  content with non-MPEG content transported in RTP, due to the lack of an
  appropriate reference clock. Steve Casner recognised that there may not
  be a clean solution to this problem, and that the applicability statement
  for this payload format may have to document that synchronisation with
  normal RTP content is not possible.

  To improve robustness of FlexMux configuration, the proposal is to send
  repeated copies of the signalling, in advance of the change, to provide
  probabilistic reliability. This seems reasonable, provided the limited
  guarantee is noted.

  There is also the issue of error sensitive streams, such as systems
  streams, which can be transported in FlexMux. One solution is to carousel
  the data, but TCP may also be used. There are significant synchronisation
  issues with the use of TCP as part of a presentation, which are not yet
  addressed.

  Use of a=fmtp to signal FlexMux parameters was briefly explained.
  Nothing has changed since the previous meeting, except that the type
  will be registered "audio", "video" or "application" to match the MPEG-4
  payload format.

  There are still significant open issues with this format, which have
  to be addressed before it can advance.

Demonstrations

  The first session concluded with demonstrations of the JPEG-2000 and AC-3
  payload formats.

[At this point, please adjust your locale dial from en_GB to en_US.]

Intra-Frame Request Signaling

  The second AVT session began with a discussion deferred from the
  MMUSIC working group session earlier in the day.  In a multi-party
  conferencing system with switched video, a receiver that begins
  receiving a new source needs to signal to the sender that a full
  intra-coded frame is required to begin decoding.  The question is
  whether this signal should be passed in SDP using the offer/answer
  method, or in RTCP.

  We reached a common understanding on two sub-issues:  1) no matter how
  the signaling is done, the spec cannot say the sender MUST send an
  intra-frame because this is dependent upon congestion conditions, but
  the sender MUST be prepared to receive the signal and respond with the
  intra-frame if it is able; and 2) the request for a full intra-frame
  is distinct from the loss-of-reference-picture indication that is
  already specified in the RTCP Feedback Profile because the sender's
  response may be different, and therefore two different signals are
  required (although both may use the same signaling channel).

  Roni Even stated a preference to use SDP for the new indication so it
  can go together with the "freeze" command that would be sent that way.
  But either way, where would the full process be described?  His
  contribution to MMUSIC, using SDP, gave such a description.

  Joerg Ott believes the signal belongs in RTCP, and suggested that all
  we need is a 3-page Internet-Draft to specify the semantics of an
  additional RTCP request to be used under the RTCP Feedback Profile.

  Jonathan Rosenberg understood the consensus from MMUSIC to be that
  this signal was not appropriate as an SDP parameter because it is not
  a property of the media stream.  One of the fundamental properties of
  the offer/answer model is that the attributes of the session have no
  dependence on history.  To use offer/answer you would have to "turn
  on" the intra transmission and then "turn off" with another REINVITE.

  Dave Oran identified a conflict between this inherently unidirectional
  signal and the bidirectional protocols in which SDP is usually
  embedded.  If the protocol is running stop-and-wait, the timing of the
  requests and responses can become completely out-of-sync.

  Roni concluded that we need to go back to MMUSIC to discuss it again
  because we still don't agree if this operation is a changing of the
  stream or not.  Another participant commented that we have a conflict
  between what the IETF wants to do and what the implementers want to
  do.  The implementers will go their own way.

  Steve Casner noted that the use of RTCP for this signal was proposed
  at the last IETF, but we stumbled then because of disagreement on the
  "MUST" issue.  Otherwise, we might have had a solution then.  He ended
  the discussion and summarized the output from AVT to MMUSIC as follows:
  having both of these signals carried in RTCP is a fine idea that fits
  in the RTCP feedback scheme if MMUSIC concludes that SDP is not the
  appropriate method.

iLBC Speech

  Alan Duric presented an update of two drafts on the iLBC speech codec
  and its associated payload format in draft-andersen-ilbc-01.txt and
  draft-duric-rtp-ilbc-01.txt, respectively.  The changes in the codec
  since the -00 version were to rearrange the bit packing for Unequal
  Level Protection and reduce the total number of bits from 419 to 416
  so the result is 8+12+32=52 bytes for the three decreasing priority
  levels.  There have also been some revisions to the code and
  descriptions in the draft based on feedback from implementers.  No
  technical changes in the payload format were mentioned.

  Alan gave a brief description of the coding steps as requested by some
  participants at the previous AVT meeting -- see the presentation.

  Planned future work is to develop a 20 ms frame option (vs. 30ms) and
  to add voice activity detection and comfort noise generation.  They
  will also be optimizing some parts of the algorithm to reduce
  complexity.  Alan expected to have some testing results from one
  University to present, but will send this to the mailing list later.
  The summary is that it is working quite nicely due to the simple
  payload structure.  The executable for a demo SIP client with the iLBC
  codec is available by request from alan.duric@globalipsound.com.

Uncompressed Video

  Ladan Gharai discussed two payload formats for uncompressed video.
  The first is draft-ietf-avt-smpte292-video-06.txt which has been in
  process for some time; it is for constant-rate video, essentially
  circuit emulation with all bits from a SMPTE 292M stream being
  transported.  It is designed to interoperate with existing broadcast
  equipment.  The second, draft-gharai-avt-uncomp-video-00.txt, is a new
  payload format for a more native (to RTP) packetization that is
  flexible over a wide range of uncompressed video formats and sends
  only the active video (no blanking).  The choice between the two
  formats depends upon the application.

  The main change in the smpte292 draft was the definition of a new term
  "pgroup" which is the smallest number of pixels that keeps together
  related Y, Cb and Cr values and fills an integral number of octets.
  The purpose of defining the pgroup is to specify where fragmentation
  should occur (between pgroups).  The payload header has not changed
  since the last draft, and no further major technical changes are
  expected.  The authors plan to submit another draft revision by August
  15 with additional technical rationale, and then would like to go to
  working group last call.

  The new uncompressed video draft should cover most any uncompressed
  video format including BT.601, SMPTE 296M and 274M, and future digital
  cinema formats with 4K x 4K frame size.  There is already a Proposed
  Standard payload format for uncompressed video in RFC 2431, however it
  is limited to 4096 scan lines per frame and 2048 pixels per line, and
  is constrained to 4:2:2 color subsampling of YUV data.  This new draft
  supports up to 64K scan lines and pixels per line and supports RGB as
  well as YUV data in various color subsamplings.  The new draft also
  provides flexible support for multiple scan lines per packet rather
  than just one (or a fragment), which may be important for lower data
  rates or jumbo packets.  For each line, there is a 64-bit payload
  header section to carry the scan line number, scan offset for
  fragmentation, and length.  In contrast, RFC 2431 uses only a 32-bit
  payload header, although for high-rate video this is not an issue.
  RFC 2431 indicates the sample size and data type in-band.  That
  information is moved to out-of-band signaling in the new draft, but
  the details of the SDP parameters remain to be specified.

  Steve Casner is not aware of any implementations of RFC 2431 and asked
  if there are likely to be implementations of this new format.
  Acceptance of it as a work item is dependent upon whether or not it is
  likely to be used.  Ladan responded that her group has an
  implementation, and other people are working on it as well.

  Philippe Gentric confirms that the new proposal is more useful,
  especially for the 4:2:0 YUV native format of JPEG and MPEG, and
  perhaps even for lower resolution (CIF or QCIF) images.  He suggests
  adding the capability to specify the pixel aspect ratio.

  Ladan replied that it is not clear how 4:2:0 video should be
  packetized, since the chrominance info is related to two scan lines of
  luminance info.  She would like feedback on that on the mailing list.

  Those present gave a positive hum for taking on the new draft as an
  AVT work item.

RTCP Reporting Extensions

  During IETF week itself, Timur Friedman and Alan Clark collaborated to
  produce a combination of draft-clark-avt-rtcpvoip-01.txt with
  draft-friedman-avt-rtcp-report-extns-02.txt; the result will be sent
  to mailing list shortly.  The new draft integrates the VoIP metrics of
  the Clark draft with the additional RTCP report block types specified
  by the Friedman draft to allow reporting of packet duplication and
  loss patterns using run-length encodings, to add timestamps for
  multicast inference of network characteristics (loss rates and delays
  along logical links within an RTP session) and a statistics summary
  for more detailed info than in the RTCP SR and RR packets, and to
  define a mechanism to allow receivers to measure RTT in the same way
  that senders can.

  Changes in the VoIP metrics relative to the -01 revision of the Clark
  draft include adding a Gmin parameter to allow the burst density
  threshold to be adjusted, and changing packet loss rate to be a binary
  fraction, as in existing RTCP reports.  The estimated MOS quality
  score has been broken into two, a "listening" quality that does not
  consider the effects of delay, and a "conversational" quality that
  does.

  One motivation for adding he VoIP metrics is to allow VoIP service
  providers to get feedback on the quality experienced by the end user
  inside an enterprise behind a firewall.  Comparing the VoIP metrics
  with SLA monitoring on the service provider's side of the firewall
  allows the service provider to determine whether problems are in the
  WAN or the enterprise network.

  Steve Casner expressed concern that the "implementation specific"
  fields of the VoIP metrics block are totally unspecified.  The draft
  either needs to say how these fields will be specified, or define them
  to be always zero until some future specification revises this one.
  It is not reasonable to say the bits are open for arbitrary use.

  Al Morton said that the burst parameters may not be aligned with those
  of the E-Model produced by ITU Study Group 12 in May.  However, Henry
  Sinnreich believes the E-Model is inappropriate for the Internet.
  Alan Clark responded that he is familiar with the E-Model, but has
  deliberately kept these metrics independent of what model is used
  because some people will want to use the E-Model and some will use
  other models.

  Magnus Westerlund asked what status would be assigned to this document
  (Proposed Standard or Experimental).  Timur responded that the group
  decided in Minneapolis to go for Experimental.  Magnus suggested we
  rethink that, and go for Proposed.  He would really like to have the
  RTT measurements, for example to use with retransmission.

  Steve Casner explained that the reason we decided on Experimental was
  that it was unclear how much these measures added above what is
  already in RTCP.  The authors have been doing measurements to quantify
  that, and there is more evidence now that implementers are ready to
  use at least some of these functions in practice.  That would be more
  effective at Proposed rather than Experimental.

RTP Retransmission

  Perhaps the most significant progress at this meeting resulted from
  side discussions among the authors of the two alternative proposals
  contained in drafts draft-ietf-avt-rtp-retransmission-01.txt and
  draft-ietf-avt-selret-05.txt for RTP retransmission based on RTCP
  feedback.  Jose Rey provided a consensus report from these
  discussions.

  Merging of the two approaches was enabled by the recognition and
  acceptance of a requirement for the solution to be able to indicate
  explicitly which RTP packets were lost.  The technique from the
  "selret" draft for multiplexing the initial transmissions and
  retransmissions in one stream by sharing the sequence number space
  does not allow this.  Therefore, that proposal will be abandoned in
  favor of carrying the initial transmissions and retransmissions as
  separate streams, but to reduce the number of UDP ports required, the
  streams will be multiplexed on the SSRC id so long as no problems with
  that approach are found.

  Steve Casner observed that it may not be necessary to restrict the
  solution to just SSRC multiplexing or just port multiplexing.  For
  example, the FEC payload format in RFC 2733 allows either.  For some
  applications it may be more important to restrict the number of ports
  used (favoring SSRC multiplexing), while for others it may be more
  important to allow selectability in receiving both streams or just the
  initial transmissions (favoring port multiplexing).

  Jose expressed concern that applications would not know whether peers
  had implemented one or both methods.  Rahul Agarwal was also concerned
  that if there are two solutions, that requires either the server or
  the client to implement both for maximum interoperability.  Steve
  responded that it could be part of the session signaling or might be a
  fixed characteristic for a particular application.  He explained that
  the selection of which method is used might be fixed in a given RTP
  profile or by an implementation agreement for a particular
  application, so there would be no interoperability issue within that
  application.  But the payload format could remain flexible to fit
  different requirements for different applications.  Magnus Westerlund
  considered the implementation difference to be small, so you could
  cheaply implement both.  Colin Perkins said we need to evaluate the
  value of having both methods available.  If only one approach is
  needed, that would be preferable, but there is no objection to having
  both if they are needed.

  A separate issue is the SEL payload format in the "selret" draft for
  communicating packet priority.  This allows only reporting losses of
  "important" packets to reduce the feedback bandwidth.  However, some
  people contend that the difference is not significant.  To facilitate
  a decision about including the SEL format, the performance will be
  evaluated quantitatively compared to reporting of all losses.

  A work plan was outlined, calling for the mentioned evaluations to be
  completed no later than September so a merged draft could be posted in
  October.  The goal is a WG last call in December.

SSRC Multiplexing

  The question of whether it is acceptable to use SSRC multiplexing in
  RTP retransmission is a specific case of a more general question:
  should the identifier for an "RTP session" be redefined to include the
  SSRC identifier in addition to the destination transport address?  In
  other words, why disallow multiplexing of RTP sessions based on SSRC
  identifier?  This was a bonus topic for the previous AVT meeting in
  Minneapolis that was not presented due to lack of time, so Steve
  Casner revived the presentation in this meeting.

  The primary reason why we can't change the definition of an RTP
  session is that there are scenarios where multiple sources are
  intended to be combined in one session, such as multiple audio streams
  being summed in a multiparty conference or multiple video cameras on
  one workstation being transmitted in the same session.  That's why the
  SSRC identifier was added to the RTP header in the first place.  It
  allows incoming streams to be distinguished independently of the
  source transport address since the stream might flow through an RTP
  translator such that the original source transport address is lost.

  In addition, Section 5.2 of the RTP specification lists several
  reasons why both the SSRC id and the payload type field should not be
  used for multiplexing RTP sessions, in particular sessions of
  different media.  However, for some applications, the implementers
  feel that these considerations do not apply.  Those implementers are
  more concerned about the requirement to use a large number of UDP
  ports to multiplex the RTP sessions because the performance of some
  operating systems degrades severely in that situation (due to an
  inefficient search to match ports to sockets on incoming packets).
  Rather than changing the definition of an RTP session, perhaps the
  energy should be spent getting operating system inefficiencies fixed
  instead?

  One problem with using the SSRC for multiplexing when streams
  originate on multiple hosts is that the assignment of SSRC identifiers
  must be coordinated among the sources.  Roni Even pointed out that
  multiplexing on the SSRC id introduces another level of demultiplexing
  which precludes the receiver from dispatching the sources to different
  processes, and that one stream can impact the latency of another
  (independent) stream.

  Rahul Agarwal agreed that it would be nice if we could fix the OS, but
  it is a long and difficult process to convince the vendors to do so.
  There is also a problem that some operating systems limit the number
  of ports per process, thus requiring multiple processes for a
  high-scale server.

  Steve said what this question boils down to is whether we need to add
  extra words in Section 5.2 to relax the guideline against SSRC id
  multiplexing or to say under what conditions it is acceptable to
  violate the guideline.  His personal preference is not to make any
  changes, but instead to allow other documents, such as the RTP
  retransmission specification, to explain why a choice of SSRC
  multiplexing was used and why it is not a problem.  He asked if anyone
  feels strongly that the text should be changed.

  Rahul replied that the considerations in the existing text are all
  related to multiplexing multiple different media streams, so those
  clauses don't apply to the case of RTP retransmission.  He would like
  to see a clause added to say that for a single medium SSRC
  multiplexing is OK.

  Steve noted that some text has been added in the revised profile to
  explain that the prohibition against multiplexing on SSRC id or
  payload type is in particular for trying to put different media
  together.  Multiple sources of one medium in one session is allowed
  and expected when they are to be combined and processed together, and
  that switching payload types on the fly to change encodings is also
  perfectly normal; that's the reason the payload type field is in the
  RTP header rather than being signaled out of band.

RTCP Extensions for Single-Source Multicast Sessions

  Julian Chesterfield presented an update on the draft specifying
  unicast feedback for group sessions, draft-ietf-avt-rtcpssm-01.txt, to
  facilitate use of RTP with single-source multicast.  In previous
  discussions of this draft, we've established the need to address the
  security issues it introduces.  A good analysis of the security
  threats and an evaluation of the existing solutions has been written
  as a separate document to work toward the finished solution (see
  http://irg.attlabs.net/rtcp_ssm/rtcp_security.pdf).

  The current focus is to identify a level of security that should be
  mandated by the draft.  The goal is to provide the same level of
  guarantee as the current RTCP for any-source multicast.  That is,
  although additional security services or protections might be desired,
  it is not a requirement for the rtcpssm solution to provide stronger
  protection than does the current multicast RTCP.  On the other hand,
  replay defense is an example of an additional service that may be an
  inherent side benefit of any security mechanism that meets the basic
  requirements.

  Steve Casner agreed that a higher level of protection is not a
  requirement for the basic level of operation with SSM.  However,
  additional services such as SRTP can be added to RTP in any-source
  multicast operation, and it is a requirement that these additional
  services be usable with SSM as well.  The security issues we want to
  address with rtcpssm are the new risks such as denial of service
  attacks that are introduced by unicast RTCP, not confidentiality and
  admission control.

  Julian went on to say that the fundamental defense is authentication
  of the feedback address (the destination for the unicast RTCP) and
  authentication of the RTCP information from the multicast source which
  controls the bandwidth calculations.  Given that authentication of the
  RTCP packets from the multicast source is required, then one solution
  for authenticating the feedback address is to send it in-band with the
  multicast RTCP.  This also allows changing the feedback address during
  the session if needed.  Another option is to use out-of-band
  signaling, e.g., in SDP, with an authenticated transport mechanism.
  Julian is seeking feedback from the group on this choice.  He also
  asks to what extent should the specification give recommendations of
  specific approaches for security functions versus just establishing
  requirements?

  Colin Perkins replied that we should require approaches that make it
  as secure as "normal RTCP", and then we can recommend additions that
  make it more secure.  For example, it is appropriate to say the
  feedback identifier MUST be authenticated, but it is not clear whether
  there is one single authentication solution that is always appropriate
  and therefore must be implemented, or whether there are several
  solutions and you should implement one of them or something else with
  equivalent security.  We may need to use different alternatives for
  signaling done in different ways, so we could not mandate just one.

  Another participant asked if the purpose of "MUST" in a specification
  is to achieve interoperability, how can the choice of approach be left
  open?  If two implementations make different choices, they can't
  interoperate.

  Colin replied that it would be good to include in the rtcpssm
  specification how the security would be done for a couple of common
  signaling protocols (e.g., RTSP and SIP), as well as how it would be
  done for the in-band RTCP (since there are two parts to the problem).
  Then, if you are using a different signaling protocol you MUST achieve
  the security requirements, but how you do it is to be specified
  separately.

  Philippe Gentric suggested adding a criterion that the solution should
  be suitable for operation through a NAT in which the apparent address
  for feedback might be changed.

Wrap-Up

  This AVT session was unusual in that we reached the end of the agenda
  before the end of the session (which has not happened for years).
  Steve Casner mentioned a couple of topics regarding the revision of
  the RTP specification that had not been put on the agenda.  The code
  currently in the appendix indicates a packet loss value of 1 when no
  packets have yet been received.  Steve had planned to work out a
  solution and present it here, but didn't get that done.  We can
  include such a (small) change as a comment to the RFC-editor even
  though the draft has been reviewed by the IESG.  He asked if anyone
  had fixed that bug in their implementations, but nobody said yes.
  Steve is also considering adding a definition of the term "sampling
  instant" in the revised RTP specification to explain what it means in
  scenarios other than live sampling of the media.  Contributions for
  either of these additions would be welcome.

Action Items

  We took two "hums" during the meeting which need confirmation on the
  mailing list.  They were to accept the payload formats for
  uncompressed video (draft-gharai-avt-uncomp-video-00.txt) and for JVT
  video (draft-wenger-avt-rtp-jvt-01.txt) as working group tasks.  We
  solicit confirmations or objections on these actions -- we want to
  hear both "yeas" and "nays".

_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt