Re: [AVTCORE] Source switching performance in draft-hellstrom-avtcore-multi-party-rtt-source-01.txt

Hi Gunnar

Many thanks for taking the time to go through this so thoroughly.

I think we have 2 main aspects to this work:-

  1.  Compatibility with existing implementations
  2.  Choosing an efficient mechanism for the future

For the first of these, it seems to me that the only solution is for a mixer to be able to do inline participant labeling and buffering to produce a presentable single text stream. Current implementations of RFC4103 will simply not understand switching between participants, nor will they make any visual indication of which text belongs to which participant, so the mixer needs to do that.

For the second we have established that it's possible to: allow the different redundant blocks in a packet to be for different participants; or use timestamps to resolve the correct redundant text to use and to have each packet associated with just one participant. I can also imagine putting all text generations of each source in the CSRC list in each packet which adds 12 bytes of header space per CSRC (assuming 2 redundant generations); I'll write a separate mail about that. After that, I think we have a fairly complete set of solutions to choose from.

Some other comments inline.

Best regards

James

[X][X]

[X]James Hamlin
Contractor
Purple, a Division of ZP Better Together, LLC
purplevrs.com

The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message.

________________________________
From: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
Sent: 14 March 2020 11:17
To: James Hamlin; avtcore@ietf.org
Subject: Re: Source switching performance in draft-hellstrom-avtcore-multi-party-rtt-source-01.txt

Hi James,

Thanks for an interesting proposal.

Let us extend the information about the packet contents of your example a bit:

seq         01  02  03  04  05  06  07  08  09  10  11  12  13  14  15

CSRC=source  1   2   3   1   2   3   1   2   3   1   2   3   1   2   3

Timestamp   91  92  93  94  95  96  97  98  99 100 101 102 103 104 105

R2 t offset  6   6   6   6   6   6   6   6   6   6   6   6   6   6   6

R1 t offset  3   3   3   3   3   3   3   3   3   3   3   3   3   3   3

R2                                   A   M   X   B   N   Y   C   O   Z

R1                       A   M   X   B   N   Y   C   O   Z

P            A   M   X   B   N   Y   C   O   Z

Lost                             X   X

(The timestamps and timestamp offsets ("t offset") are shown in 100 ms, in reality it will be in milliseconds)

The SSRC of the packet is always the mixer's SSRC.

The source is indicated in the CSRC-list that in this method has only one member = the SSRC of the source represented in the packet.

+jeh: Agreed: My mistake.

The timestamp is created by the mixer when sending, and the timestamp offsets make it possible to calculate the timestamps the redundant texts had when they were transmitted as originals.

The receiver must store essential data from a number of packets. This data is the sequence number, the source (=CSRC), the Timestamp.

So, let us see what happens if both packet 06 and 07 are lost.

The receiver must also store for each source, the timestamps for which text has been recieved (either with real contents or empty).

In packets 1 to 5, we have received and put in display areas for source 1: "AB", for source 2: "MN", for source 3: "X"

08 is received and the gap (07 and 06 ) is detected (07 and 06 with two redundant elements in both, making a need for retrieval of 4 text elements ) is remembered so we need to do the recovery analysis..  The source (2) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 98-6 = 92. Checking back in the list of received packets we find that we got a packet with timestamp 92 (and indeed, it contained text from source 2), so there is no need to recover R2. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 98-3 = 95. Checking back in the list of received packets we find that we got a packet with timestamp 95 (and indeed, it contained text from source 2), so there is no need to recover R1. The Primary text in packet 08 ("O") is retrieved and put in the display area for source 2. It is noted that we have got text for timestamp 98 for source 2. The gap is still 4.

09 is received.  The gap (4 elements ) is remembered so we need to do the recovery analysis.  The source (3) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 99-6 = 93. Checking back in the list of received packets we find that we got a packet with timestamp 93 (and indeed, it contained text from source 3), so there is no need to recover R2. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 99-3 = 96. Checking back in the list of received packets we find that we never got a packet with timestamp 96 , so we recover R1 ("Y") and insert it in the display area of source 3. The Primary text in packet 09 ("Z") is retrieved and put in the display area for source 3. It is noted that we got text from timestamps 96 and 99 for source 3 (the gap can now be reduced to 3)

10 is received.  The gap (3 ) is remembered so we need to do the recovery analysis.  The source (1) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 100-6 = 94. Checking back in the list of received packets we find that we got a packet with timestamp 94 (and indeed, it contained text from source 1), so there is no need to recover R2. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 100-3 = 97. Checking back in the list of received packets we find that we never got a packet with timestamp 97 , so we recover R1 ("C") and insert it in the display area of source 1. The Primary text in packet 10 is empty so there is nothing to put in the display area for source 1. It is noted that we got text from timestamps 97 and 100 for source 1 (the gap can now be reduced to 2)

11 is received.  The gap (2 ) is remembered so we need to do the recovery analysis.  The source (2) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 101-6 = 95. Checking back in the list of received packets we find that we got a packet with timestamp 95 (and indeed, it contained text from source 2), so there is no need to recover R2. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 101-3 = 98. Checking back in the list of received packets we find that we got a packet with timestamp 98 , so we do not need to recover R1. The Primary text in packet 11 is empty so there is nothing to put in the display area for source 2. It is noted that we got text from timestamp 101 for source 2. (we did not recover anything, so the gap is still 2)

12 is received.  The gap (2 ) is remembered so we need to do the recovery analysis.  The source (3) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 102-6 = 96. Checking back in the list of received packets we find that we never got a packet with timestamp 96, but from packet 09 we recovered R1 from timestamp 96. So we shall not recover anything from R2 here. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 102-3 = 99. Checking back in the list of received packets we find that we got a packet with timestamp 99, so we do not need to recover R1. The Primary text in packet 12 is empty so there is nothing to put in the display area for source 3. It is noted that we got text for timestamp 102 for source 3 (the gap can now be reduced to 1)

13 is received.  The gap (1 ) is remembered so we need to do the recovery analysis.  The source (1) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 103-6 = 97. Checking back in the list of received packets we find that we already recovered text for timestamp 97, so nothing is recovered and nothing inserted from R2 in the display area of source 1 . Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 103-3 = 100. Checking back in the list of received packets we find that we got a packet with timestamp 100, so we do not need to recover R1. The Primary text in packet 13 is empty so there is nothing to put in the display area for source 1. It is noted that we got text for timestamp 103 for source 1 (the gap can now be reduced to 0)

14 is received.  The gap is now 0 so we do not need to do any recovery analysis.   The source (2) and other essential data is noted. The Primary text in packet 13 is empty so there is nothing to put in the display area for source 2. It is noted that we got text for timestamp 104 for source 2.

After this, we have in the display areas:  for 1: "ABC" for 2: "MNO", for 3: "XYZ", so everything is recovered.

I am sorry, the narrative above may be hard to follow. It could probably be converted to some table format if we need to do it again for other cases.

So, yes, this method also works.

I see a couple of differences in characteristics between this "timestamp method" and the "CSRClist method":

1. The recovery time from loss to recovery can with the timestamp method be 200 times the number of simultaneous sending sources in milliseconds. Thus with 5 sources: 1 second. With the CSRClist method, it is steady 200 milliseconds. (assuming a transmission interval of 100 ms and round robin mixer switching.)

2. The recovery capacity in packets in sequence is 2*the number of simultaneous sources = 10 packets for 5 sources. With the CSRC method it is 2 packets. (again assuming round robin mixer switching )

3. The complexity of the procedure is higher but still manageable for the timestamp method.

jeh: Yes. I had thought this would be simpler, but the timestamp logic gets complicated.

4. The number of packets to store essential information about is higher for the timestamp method ( I think it is 4*the number of active sources). For the CSRC list method, it is 4 packets and less information per packet.

You say: "The advantage of this approach is that the format of the packet doesn't change. The current arrangement where all the text in a packet is for one participant is preserved."

I do not see the CSRClist method as a change in packet format.

jeh: Agreed: I should have said that differently: the change is that the text over the redundant and primary block is no longer continuous; it's for different participants.

The mixer needs in both methods to include a CSRC -list. The difference is that in the CSRClist method, the list has more members. It is still within the format description of RTP. The source of the redundant parts will vary in a packet, but the composition of the contents of the packet for transmission from the mixer is as usual for a single sender: Put what was sent next to last in the packet as R2, put what was sent last as R1, and put the new text chunk as P. The only addition is the rule that the CSRC list is populated with the sources in the strict order.

Summary: both methods seem possible. It will be interesting to get more comments.

Thanks,

Gunnar

Den 2020-03-14 kl. 01:45, skrev James Hamlin:
Hi Gunnar

I've also been thinking through the possibility of a sender switching source without clearing all redundant generations first. Clearly, a sender that did this today would cause problems for existing receivers. But checking timestamps at the receiver should fix this.

Consider three senders 1, 2 and 3 which send text "ABC", "MNO" and "XYZ". The block below shows this text being sent taking round-robin turns with the participants.

seq    01  02  03  04  05  06  07  08  09  10  11  12  13  14  15

part    1   2   3   1   2   3   1   2   3   1   2   3   1   2   3

                            -

R2                              A   M   X   B   N   Y   C   O   Z

R1                  A   M   X   B   N   Y   C   O   Z

P       A   M   X   B   N   Y   C   O   Z

If sequence 06 is lost and the receiver sees sequence 07 then it may assume that the lost packet was for participant 1 and use the redundant character "B". This would lead to character "B" being duplicated in the output for participant 1. But it is possible for a receiver to get the correct result; the timestamp for the redundant text in packet 07 will not be higher than the most recent timestamp previously received and so shouldn't be used. The redundant text in the following packet 08 has no applicable redundant text, but the timestamp of the R1 in packet 09 will be greater than the most recent received and so is usable.

The advantage of this approach is that the format of the packet doesn't change. The current arrangement where all the text in a packet is for one participant is preserved. Tracking the timestamp adds some implementation effort but I think it's minimal. It does also mean the mixer needs to synchronize timestamps across the media sources but it is in a position to do so.

Best regards

James

[X][X]

[X]James Hamlin
Contractor
Purple, a Division of ZP Better Together, LLC
purplevrs.com

The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message.

________________________________
From: Gunnar Hellström <gunnar.hellstrom@omnitor.se><mailto:gunnar.hellstrom@omnitor.se>
Sent: 13 March 2020 16:41
To: avtcore@ietf.org<mailto:avtcore@ietf.org>; James Hamlin
Subject: Source switching performance in draft-hellstrom-avtcore-multi-party-rtt-source-01.txt

Hi,

I want to follow-up on the good discussion on source switching performance a couple of days ago, under the subject "[AVTCORE] Improved RTP-mixer performance for RFC 2198 and RFC 4103 redundancy coding"

Two parts in the performance increase solution.
Two actions are proposed in draft-hellstrom-avtcore-multi-party-rtt-source:
a) Reduce the packet transmission interval from 300 to 100 ms.
b) Use a strict relation between members in the CSRC list and the parts of the payload that is original text and first generation redundancy and second generation redundancy so that the mixer can switch source for every new packet and the sources of text recovered from redundancy can be assessed by the receiver.

I think it is worth while to move forward with the complete improvement a) and b) proposed in the draft. It will cause less complexity, lower delays and lower risk for stalling in case of many participants entering new text simultaneously.

Here is my reasoning:

I have the following view of the achievable performance improvements for different cases:

1. With the original source switching with RFC 4103 and an RTP-mixer using 300 ms transmission interval and not allowing a mix of sources in one packet, there can be one source switching per second by the mixer with an introduced delay of up to one second.

2. By just reducing the transmission interval from 300 to 100 ms, it will be possible to have three source switches per second with an introduced delay of up to one second. (with just two parties sending text simultaneously, the delay will be maximum 300 ms. )

3. And by applying the proposal from the multi-party-rtt-source draft with the CSRC-list as a source list for the redundancy, and also using 100 ms transmission interval, there can be switching between five source per second with an introduced delay of max 500 ms. With just two parties typing simultaneously, the delay will be a maximum of 100 ms.

The delays are extreme values from when all sources start to type simultaneously.  It was agreed that at least the improvement from the reduced transmission interval is needed.

Case 1 and 2 are a bit complex for the mixer to implement. From the moment it has text queued for transmission from another source B than the one currently transmitted A, then the mixer needs to stop adding new text from A to the packets, but still send two more packets with the agreed transmission interval, progressing the latest transmitted original text to first generation redundancy and then again one more packet with the text as second level redundancy. Not until that is done, the mixer is allowed to start taking text from the transmission queue from B to transmit. This is the background of the 1 s vs 300 ms delays in case 1 and 2.

In case 3, there is much less complexity. When there is something from B in queue for transmission, the mixer can decide to insert that in next packet and add the redundancy from earlier transmissions from A, because their sources are included in the CSRC list in the same packet.

Therefore I want to move on with the complete solution in case 3.

-----------------------------------

Influence on the multi-party capability negotiation:
There is an installed park of RTT implementations without multi-party awareness. The receiver need to take active part in planning the multi-party RTT presentation. Therefore a capability negotiation is needed. A simple sdp attribute a=rtt-mix without value is proposed in the draft.

It is important to let this attribute mean capability of the complete solution case 3).  If there is a temptation to have different levels of implementation, some only implementing the shorter transmission interval (2) and some implementing the complete solution (3), then threre will be a need for two different attributes, or one attribute with a list of parameter values for the two cases. That would complicate the evaluation of the negotiation. Therefore I would prefer that the attribute can mean capability to use the complete mixing solution (3).

Regards

Gunnar

Den 2020-02-29 kl. 20:13, skrev internet-drafts@ietf.org<mailto:internet-drafts@ietf.org>:

A New Internet-Draft is available from the on-line Internet-Drafts directories.

        Title           : Indicating source of multi-party Real-time text
        Author          : Gunnar Hellstrom
        Filename        : draft-hellstrom-avtcore-multi-party-rtt-source-01.txt
        Pages           : 13
        Date            : 2020-02-29

Abstract:
   Real-time text mixers need to identify the source of each transmitted
   text chunk so that it can be presented in suitable grouping with
   other text from the same source.  An enhancement for RFC 4103 real-
   time text is provided, suitable for a centralized conference model
   that enables source identification, for use by text mixers and
   conference-enabled participants.  The mechanism builds on use of the
   CSRC list in the RTP packet.  A capability exchange is specified so
   that it can be verified that a participant can handle the multi-party
   coded real-time text stream.  The capability is indicated by an sdp
   media attribute "rtt-mix".

The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-hellstrom-avtcore-multi-party-rtt-source/

There are also htmlized versions available at:
https://tools.ietf.org/html/draft-hellstrom-avtcore-multi-party-rtt-source-01
https://datatracker.ietf.org/doc/html/draft-hellstrom-avtcore-multi-party-rtt-source-01

A diff from the previous version is available at:
https://www.ietf.org/rfcdiff?url2=draft-hellstrom-avtcore-multi-party-rtt-source-01

Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

_______________________________________________
I-D-Announce mailing list
I-D-Announce@ietf.org<mailto:I-D-Announce@ietf.org>
https://www.ietf.org/mailman/listinfo/i-d-announce
Internet-Draft directories: http://www.ietf.org/shadow.html
or ftp://ftp.ietf.org/ietf/1shadow-sites.txt

--

+ + + + + + + + + + + + + +

Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se<mailto:gunnar.hellstrom@omnitor.se>
+46 708 204 288

--

+ + + + + + + + + + + + + +

Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se<mailto:gunnar.hellstrom@omnitor.se>
+46 708 204 288