Re: [dispatch] Questions about draft-hellstrom-text-conference-04

Gunnar Hellström <gunnar.hellstrom@omnitor.se> Sun, 03 July 2011 18:32 UTC

Message-ID: <4E10B5A8.6050309@omnitor.se>
Date: Sun, 03 Jul 2011 20:32:08 +0200
From: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; sv-SE; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11
MIME-Version: 1.0
To: "Worley, Dale R (Dale)" <dworley@avaya.com>, "dispatch@ietf.org" <dispatch@ietf.org>
References: <CD5674C3CD99574EBA7432465FC13C1B222B1F571F@DC-US1MBEX4.global.avaya.com>
In-Reply-To: <CD5674C3CD99574EBA7432465FC13C1B222B1F571F@DC-US1MBEX4.global.avaya.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: "arnoud@realtimetext.org" <arnoud@realtimetext.org>
Subject: Re: [dispatch] Questions about draft-hellstrom-text-conference-04
Precedence: list

Dale,
You are placing good questions on the multi-party real-time text draft.
And since we have not got it placed in any specific working group, it 
seems appropriate to have the discussion in Dispatch.
I think it relates either to some of the avi groups or xcon. Users of it 
would possibly include ecrit and clue.

I want to start with a short recognition of your observation, and expand 
into proper answers later.

1. Negotiation.
You are right that three possible mixer methods are described.
One case of "rtp-mixer" was also by mistake shortened to "rtp-mix".

I agree that it is good to have an initial section on this and the 
capability negotiation.

The specification you find very thin is that one for conference-unaware 
clients.
It is documented at http://www.realtimetext.org/index.php?pagina=62

Please help us to judge if this description of what to do with 
multiparty real-time text calls when the client does not understand the 
multi-party features. Should it be included in the IETF draft, or kept 
separate?


2. CSRC usage.
Yes, it is a good idea to let the focus separate contents per 
contributor, even if the sessions come mixed to the focus.
And doing it by the extra rules for CSRC usage seem good.

3. Redundancy.
Yes, I have realized that the restriction to not merge different sources 
to contents in primary and secondary text in the same packets as 
described easily creates long waiting before transmission of a specific 
text item. That is not favourable.

We should find a way to circumvent that limitation. Your proposal to 
analyze the time stamps is one interesting initial way.

Having a more specialized redundancy format providing more info per text 
item could also be considered.
RFC 4351 transmits real-time text and has a special added sequence 
number field in the redundancy information. Such solutions but 
specifying the source could be created for the central rtp-based method 
if so decided.


4. Graphic rendition.
You are right that the graphic rendition is stateful information, and 
loss can mess up the presentation.

A requirement to iterate the commands for rendition after each Line 
Separator is a good idea.
Also read the spec in realtimetext.org for more hints.
Something like this should be brought into the spec.


A question is if the three alternative methods are too many. Maybe the 
rtp-mixer method should be deleted and only the t140-mixer used for 
multi-party-aware clients????


  Thanks,

Gunnar












___________________________________________________
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46708204288


Worley, Dale R (Dale) skrev 2011-07-01 20:12:
> Following are some questions I have about
> draft-hellstrom-text-conference-04, "Text media handling in RTP based
> real-time conferences".  I am not on the appropriate working group
> mailing list, so some of these questions may already have been
> discussed and solved.  Also, I don't know the proper forum for this
> draft, so I am sending this to Dispatch (where the original e-mail was
> sent) and to the authors of the draft.
>
> The draft is a useful specification to how to do text mixing.
>
> 1. Negotiation
>
> The draft is mostly a set of standards or best practices for mixing
> text media.  There seem to be three modes for generating the mixing
> output, identified via the "rtp-mixer", "t140-mixer", and "text-mixer"
> values of the "rtt-mixer" media tag.
>
> BTW, the values "rtp-mix" and "rtp-mixer" are both used in the draft.
> I suspect the first is a mistake.
>
> It appears that the rtp-mixer system is described in sections 3, 4,
> and 5; the t140-mixer system is described in section 6; and the
> text-mixer system is described (very minimally) in section 7.
>
> As the negotiation of these capabilities is very important for
> successful interoperation, I suggest that at the start of the draft
> there be an outline of the three different methods and an explicit
> description of how the text mixing mode is negotiated between the UA
> and the mixer (including the various compatibility considerations).
>
> 2. Use of CSRC
>
> In concept, the rtp-mixer technique is a method of multiplexing
> several T.140 streams, with the intention that the receiving UA will
> de-multiplex them, and then display the set of streams in the way the
> user desires.  Each T.140 stream is identified by the (one) CSRC value
> in the RTP packets that carry its content.
>
> The draft states (section 2.1):
>
>     The source of the primary text in each RTP packet is
>     identified by the CSRC parameter, containing the SSRC of the initial
>     source of text.
>
> I may be misreading this sentence, but it appears to specify that the
> CSRC of a T.140 stream that is sent by the mixer should be the SSRC of
> the RTP stream that brought the content to the mixer.
>
> However, I think that a better specification would be:
>
>     The source of the primary text in each RTP packet is identified by
>     the CSRC parameter, containing the CSRC of the source of text, or
>     if that is not available, the SSRC of the source of text.
>
> With the latter specification, if a mixer receives a multiplexed T.140
> RTP stream, it will carry the associated CSRCs from the input stream
> into the output streams.
>
> Consider the following configuration (which is fairly common in audio
> mixing):
>
>    User 1                               User 3
>          \                             /
>           \                           /
>           Mixer 1----------------Mixer 2
>           /                           \
>          /                             \
>    User 2                               User 4
>
> The RTP from Mixer 1 to Mixer 2 would carry the SSRCs of User 1 and
> User 2 as CSRC values, and Mixer 2 would copy those values into the
> RTP to User 3 and User 4.  That would allow User 4 to see the text
> from all of the other users identified properly.
>
> 3. Redundancy
>
> BTW, although the text describes the use of RFC 2198 redundancy, that
> RFC is not mentioned in the references.
>
> The use of RFC 2198 redundancy in text conferencing seems to require
> excessive numbers of packets when more than one user is typing
> simultaneously.
>
> The governing text seems to be:
>
>     The mixer MUST NOT transmit redundant levels of text from one source
>     together with primary text from another source.  Thus, when there is
>     text available for primary or redundant transmission from more than
>     one source, the mixer MUST buffer text from other sources until all
>     the redundant transmissions of a packet from one selected source has
>     been transmitted.
>
> Thus, if user A transmits text fragments A1, A2, etc. and user B
> transmits text fragments B1, B2, etc., and these transmissions are
> interleaved in time, the mixer is required to transmit the following
> RTP packets (for two-fold redundancy):
>
> CSRC A: empty/A1
> CSRC A: A1/empty
> CSRC B: empty/B1
> CSRC B: B1/empty
> CSRC A: empty/A2
> CSRC A: A2/empty
> CSRC B: empty/B2
> CSRC B: B2/empty
> CSRC A: empty/A3
> CSRC A: A3/empty
> CSRC B: empty/B3
> CSRC B: B3/empty
>
> (Within each RTP packet, I list the contained t140 sub-payloads in
> forward time order, as they would be placed in the packet.  Thus, the
> most recent payload is last.)
>
> This requires doubling the number of RTP packets sent, whereas in an
> ordinary redundant-t140 environment, the number of RTP packets sent
> would be about the same as without redundancy (though the total
> payload size would still double).  This scheme also sends each
> redundant packet closely in time to its primary packet, which
> increases the chance that both packets will be lost (since packet
> losses are correlated for packets that are close in time).
>
> Note that this situation, multiple users typing simultaneously, is
> expected to be common in RTT mixing.  (It is already common in the
> Unix 'talk' tool.)
>
> It seems to me that the problem is that we assume that the RFC 2198
> redundancy decoding will be done *before* the t140-stream
> demultiplexing, which causes problems because RFC 2198 redundancy does
> not carry CSRC values separately for each sub-payload.
>
> A better alternative would be to do the redundancy decoding *after*
> the t140-stream demultiplexing, which allows each substream of packets
> (which is from a single source, with a single CSRC value) to be
> *separately* redundant.  The parallel example would be:
>
> CSRC A: empty/empty/A1
> CSRC B: empty/empty/B1
> CSRC A: empty/A1/A2
> CSRC B: empty/B1/B2
> CSRC A: [A1]/A2/A3
> CSRC B: [B1]/B2/B3
> CSRC A: [A2]/A3/empty
> CSRC B: [B2]/B3/empty
>
> (The use of three-fold redundancy here and the "[...]" sub-payloads
> are described below.)
>
> In this scheme, text arriving from one user would not force "idle"
> packets (with empty final sub-payload) to be sent for another user.
> Even in this short example, the packet count is reduced by 33%.
>
> Each sub-stream can be decoded for redundancy as if it was a separate
> RFC 2198 RTP stream, with one large exception:  The sub-stream does
> not have consecutive RTP sequence numbers, and so missing sequence
> numbers can not be used to determine if there are missing payloads
> that cannot be reconstructed.  (The display of t140 text requires
> identifying gaps in the received text stream to the user.)
>
> As an alternative to using sequence numbers, we can use the RTP
> timestamps, as each sub-payload does carry its own timestamp (as an
> offset from the RTP packet's timestamp).  Thus, we can identify and
> eliminate duplicate sub-payloads from different packets by comparing
> timestamps.
>
> However, the only way to detect that no sub-payload is missing is if
> the timestamps from one packet overlap the timestamps from a previous
> packet.  That is, if one packet contains T1, T2, T3, and another
> packet contains T3, T4, T5, then we know that the sequence of
> sub-payloads with timestamps T1, T2, T3, T4, T5 has no gaps.  But if
> one packet contains T1, T2, T3, and another contains T4, T5, T6, the
> receiver has no way of knowing if there is a T3B between T3 and T4.
>
> Thus, for this scheme to have the same reliability as RFC 2198 applied
> to a single t140 stream, this scheme needs *one more* sub-payload per
> packet.  Thus, the example above uses three-fold redundancy, whereas
> the first example used only two-fold redundancy.
>
> But since the oldest sub-payload is needed only for its timestamp
> (since it is expected to overlap with a previous packet, the
> sub-payload contents are already known), we can omit the sub-payload
> bytes themselves.  This is the meaning of "[A1]", etc. in the example:
> The sub-payload carries A1's timestamp, but the payload bytes are
> omitted (and the block length is 0).
>
> The net extra overhead is 4 bytes per RTT packet, which is no greater
> than if RFC 2198 redundancy was reorganized so that each sub-payload
> had its own CSRC field.
>
> It could be objected that this approach violates the concept that RFC
> 2198 redundancy processed independently of the sub-payload carried
> within it, but the draft is already violating that concept by placing
> restrictions on how redundancy is applied.  And even within RFC 2198,
> which envisions redundant audio sub-payloads to be lower-bit-rate
> versions of the primary audio sub-payloads, encoding and decoding the
> redundancy would not be done independently of processing the
> sub-payloads.
>
> 4. Graphic rendition
>
> The mechanism used by T.140/RFC 4103 to control graphic rendition is
> *stateful*, in that there is always a current rendition mode, which is
> changed by sending/receiving an SGR escape sequence.  Given the
> possibility that characters in the T.140 stream may occasionally be
> lost, the graphic rendition mode at the receiver can become
> desynchronized from the graphic rendition mode at the sender.
>
> For example, if some text is emphasized by first sending CSI 1;5;31 m
> (bold, blink, red), but the CSI 0 m (all attributes off) that follows
> is lost, the remainder of the conversation may be painful to behold.
>
> In order to allow for this, it would be useful to recommend that the
> sender periodically insert an SGR for "all attributes off", followed
> by an SGR to reestablish the graphic rendition mode that it thinks is
> in effect.  (The second SGR can be omitted in the common case that the
> current mode is "all attributes off".)
>
> It seems that the natural place to insert these SGRs is at "the
> beginnings of lines", that is, before the first character after a new
> line indicator (x2028 or x0D/x0A).
>
> Dale

[dispatch] Questions about draft-hellstrom-text-c… Worley, Dale R (Dale)
Re: [dispatch] Questions about draft-hellstrom-te… Gunnar Hellström
Re: [dispatch] Questions about draft-hellstrom-te… Worley, Dale R (Dale)
Re: [dispatch] Questions about draft-hellstrom-te… Arnoud van Wijk