[dispatch] Questions about draft-hellstrom-text-conference-04

"Worley, Dale R (Dale)" <dworley@avaya.com> Fri, 01 July 2011 18:15 UTC

From: "Worley, Dale R (Dale)" <dworley@avaya.com>
To: "dispatch@ietf.org" <dispatch@ietf.org>, "gunnar.hellstrom@omnitor.se" <gunnar.hellstrom@omnitor.se>, "arnoud@realtimetext.org" <arnoud@realtimetext.org>
Date: Fri, 01 Jul 2011 14:12:55 -0400
Thread-Topic: Questions about draft-hellstrom-text-conference-04
Thread-Index: AQHMOBp/1nMJQofaCUmUdI2XS3Ucwg==
Message-ID: <CD5674C3CD99574EBA7432465FC13C1B222B1F571F@DC-US1MBEX4.global.avaya.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: [dispatch] Questions about draft-hellstrom-text-conference-04
Precedence: list

Following are some questions I have about
draft-hellstrom-text-conference-04, "Text media handling in RTP based
real-time conferences".  I am not on the appropriate working group
mailing list, so some of these questions may already have been
discussed and solved.  Also, I don't know the proper forum for this
draft, so I am sending this to Dispatch (where the original e-mail was
sent) and to the authors of the draft.

The draft is a useful specification to how to do text mixing.

1. Negotiation

The draft is mostly a set of standards or best practices for mixing
text media.  There seem to be three modes for generating the mixing
output, identified via the "rtp-mixer", "t140-mixer", and "text-mixer"
values of the "rtt-mixer" media tag.

BTW, the values "rtp-mix" and "rtp-mixer" are both used in the draft.
I suspect the first is a mistake.

It appears that the rtp-mixer system is described in sections 3, 4,
and 5; the t140-mixer system is described in section 6; and the
text-mixer system is described (very minimally) in section 7.

As the negotiation of these capabilities is very important for
successful interoperation, I suggest that at the start of the draft
there be an outline of the three different methods and an explicit
description of how the text mixing mode is negotiated between the UA
and the mixer (including the various compatibility considerations).

2. Use of CSRC

In concept, the rtp-mixer technique is a method of multiplexing
several T.140 streams, with the intention that the receiving UA will
de-multiplex them, and then display the set of streams in the way the
user desires.  Each T.140 stream is identified by the (one) CSRC value
in the RTP packets that carry its content.

The draft states (section 2.1):

   The source of the primary text in each RTP packet is
   identified by the CSRC parameter, containing the SSRC of the initial
   source of text.

I may be misreading this sentence, but it appears to specify that the
CSRC of a T.140 stream that is sent by the mixer should be the SSRC of
the RTP stream that brought the content to the mixer.

However, I think that a better specification would be:

   The source of the primary text in each RTP packet is identified by
   the CSRC parameter, containing the CSRC of the source of text, or
   if that is not available, the SSRC of the source of text.

With the latter specification, if a mixer receives a multiplexed T.140
RTP stream, it will carry the associated CSRCs from the input stream
into the output streams.

Consider the following configuration (which is fairly common in audio
mixing):

  User 1                               User 3
        \                             /
         \                           /
         Mixer 1----------------Mixer 2
         /                           \
        /                             \
  User 2                               User 4

The RTP from Mixer 1 to Mixer 2 would carry the SSRCs of User 1 and
User 2 as CSRC values, and Mixer 2 would copy those values into the
RTP to User 3 and User 4.  That would allow User 4 to see the text
from all of the other users identified properly.

3. Redundancy

BTW, although the text describes the use of RFC 2198 redundancy, that
RFC is not mentioned in the references.

The use of RFC 2198 redundancy in text conferencing seems to require
excessive numbers of packets when more than one user is typing
simultaneously.

The governing text seems to be:

   The mixer MUST NOT transmit redundant levels of text from one source
   together with primary text from another source.  Thus, when there is
   text available for primary or redundant transmission from more than
   one source, the mixer MUST buffer text from other sources until all
   the redundant transmissions of a packet from one selected source has
   been transmitted.

Thus, if user A transmits text fragments A1, A2, etc. and user B
transmits text fragments B1, B2, etc., and these transmissions are
interleaved in time, the mixer is required to transmit the following
RTP packets (for two-fold redundancy):

CSRC A: empty/A1
CSRC A: A1/empty
CSRC B: empty/B1
CSRC B: B1/empty
CSRC A: empty/A2
CSRC A: A2/empty
CSRC B: empty/B2
CSRC B: B2/empty
CSRC A: empty/A3
CSRC A: A3/empty
CSRC B: empty/B3
CSRC B: B3/empty

(Within each RTP packet, I list the contained t140 sub-payloads in
forward time order, as they would be placed in the packet.  Thus, the
most recent payload is last.)

This requires doubling the number of RTP packets sent, whereas in an
ordinary redundant-t140 environment, the number of RTP packets sent
would be about the same as without redundancy (though the total
payload size would still double).  This scheme also sends each
redundant packet closely in time to its primary packet, which
increases the chance that both packets will be lost (since packet
losses are correlated for packets that are close in time).

Note that this situation, multiple users typing simultaneously, is
expected to be common in RTT mixing.  (It is already common in the
Unix 'talk' tool.)

It seems to me that the problem is that we assume that the RFC 2198
redundancy decoding will be done *before* the t140-stream
demultiplexing, which causes problems because RFC 2198 redundancy does
not carry CSRC values separately for each sub-payload.

A better alternative would be to do the redundancy decoding *after*
the t140-stream demultiplexing, which allows each substream of packets
(which is from a single source, with a single CSRC value) to be
*separately* redundant.  The parallel example would be:

CSRC A: empty/empty/A1
CSRC B: empty/empty/B1
CSRC A: empty/A1/A2
CSRC B: empty/B1/B2
CSRC A: [A1]/A2/A3
CSRC B: [B1]/B2/B3
CSRC A: [A2]/A3/empty
CSRC B: [B2]/B3/empty

(The use of three-fold redundancy here and the "[...]" sub-payloads
are described below.)

In this scheme, text arriving from one user would not force "idle"
packets (with empty final sub-payload) to be sent for another user.
Even in this short example, the packet count is reduced by 33%.

Each sub-stream can be decoded for redundancy as if it was a separate
RFC 2198 RTP stream, with one large exception:  The sub-stream does
not have consecutive RTP sequence numbers, and so missing sequence
numbers can not be used to determine if there are missing payloads
that cannot be reconstructed.  (The display of t140 text requires
identifying gaps in the received text stream to the user.)

As an alternative to using sequence numbers, we can use the RTP
timestamps, as each sub-payload does carry its own timestamp (as an
offset from the RTP packet's timestamp).  Thus, we can identify and
eliminate duplicate sub-payloads from different packets by comparing
timestamps.

However, the only way to detect that no sub-payload is missing is if
the timestamps from one packet overlap the timestamps from a previous
packet.  That is, if one packet contains T1, T2, T3, and another
packet contains T3, T4, T5, then we know that the sequence of
sub-payloads with timestamps T1, T2, T3, T4, T5 has no gaps.  But if
one packet contains T1, T2, T3, and another contains T4, T5, T6, the
receiver has no way of knowing if there is a T3B between T3 and T4.

Thus, for this scheme to have the same reliability as RFC 2198 applied
to a single t140 stream, this scheme needs *one more* sub-payload per
packet.  Thus, the example above uses three-fold redundancy, whereas
the first example used only two-fold redundancy.

But since the oldest sub-payload is needed only for its timestamp
(since it is expected to overlap with a previous packet, the
sub-payload contents are already known), we can omit the sub-payload
bytes themselves.  This is the meaning of "[A1]", etc. in the example:
The sub-payload carries A1's timestamp, but the payload bytes are
omitted (and the block length is 0).

The net extra overhead is 4 bytes per RTT packet, which is no greater
than if RFC 2198 redundancy was reorganized so that each sub-payload
had its own CSRC field.

It could be objected that this approach violates the concept that RFC
2198 redundancy processed independently of the sub-payload carried
within it, but the draft is already violating that concept by placing
restrictions on how redundancy is applied.  And even within RFC 2198,
which envisions redundant audio sub-payloads to be lower-bit-rate
versions of the primary audio sub-payloads, encoding and decoding the
redundancy would not be done independently of processing the
sub-payloads.

4. Graphic rendition

The mechanism used by T.140/RFC 4103 to control graphic rendition is
*stateful*, in that there is always a current rendition mode, which is
changed by sending/receiving an SGR escape sequence.  Given the
possibility that characters in the T.140 stream may occasionally be
lost, the graphic rendition mode at the receiver can become
desynchronized from the graphic rendition mode at the sender.

For example, if some text is emphasized by first sending CSI 1;5;31 m
(bold, blink, red), but the CSI 0 m (all attributes off) that follows
is lost, the remainder of the conversation may be painful to behold.

In order to allow for this, it would be useful to recommend that the
sender periodically insert an SGR for "all attributes off", followed
by an SGR to reestablish the graphic rendition mode that it thinks is
in effect.  (The second SGR can be omitted in the common case that the
current mode is "all attributes off".)

It seems that the natural place to insert these SGRs is at "the
beginnings of lines", that is, before the first character after a new
line indicator (x2028 or x0D/x0A).

Dale

[dispatch] Questions about draft-hellstrom-text-c… Worley, Dale R (Dale)
Re: [dispatch] Questions about draft-hellstrom-te… Gunnar Hellström
Re: [dispatch] Questions about draft-hellstrom-te… Worley, Dale R (Dale)
Re: [dispatch] Questions about draft-hellstrom-te… Arnoud van Wijk