Re: [AVTCORE] Source switching performance in draft-hellstrom-avtcore-multi-party-rtt-source-01.txt
James Hamlin <> Mon, 16 March 2020 12:11 UTC
Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 46FF33A0CE5 for <>; Mon, 16 Mar 2020 05:11:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id T5t9Rt81QilY for <>; Mon, 16 Mar 2020 05:11:29 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 083653A0CD8 for <>; Mon, 16 Mar 2020 05:11:27 -0700 (PDT)
Received: from (unknown []) by (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NO); Mon, 16 Mar 2020 12:11:19 +0000
Received: from ( by ( with Microsoft SMTP Server (TLS) id 15.0.1263.5; Mon, 16 Mar 2020 05:11:15 -0700
Received: from ([fe80::e190:fa54:4b11:2dfb]) by ([fe80::e190:fa54:4b11:2dfb%13]) with mapi id 15.00.1263.000; Mon, 16 Mar 2020 05:11:15 -0700
From: James Hamlin <>
To: Gunnar Hellström <>, "" <>
Thread-Topic: Source switching performance in draft-hellstrom-avtcore-multi-party-rtt-source-01.txt
Thread-Index: AQHV+VZLdm2KNJW0Uky65QSHZFmrt6hHBKLkgAFiOYCAApjx1A==
Date: Mon, 16 Mar 2020 12:11:15 +0000
Message-ID: <>
References: <> <> <>, <>
In-Reply-To: <>
Accept-Language: en-GB, en-US
Content-Language: en-GB
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: []
Content-Type: multipart/alternative; boundary="_000_158436067481399071purpleus_"
MIME-Version: 1.0
X-BESS-ID: 1584360676-893011-9358-63502-1
X-BESS-VER: 2019.1_20200313.2046
X-BESS-Outbound-Spam-Score: 0.20
X-BESS-Outbound-Spam-Report: Code version 3.2, rules version [from] Rule breakdown below pts rule name description ---- ---------------------- -------------------------------- 0.00 HTML_MESSAGE BODY: HTML included in message 0.00 BSF_BESS_OUTBOUND META: BESS Outbound 0.20 BSF_SC0_SA953 META: Custom Rule BSF_SC0_SA953
X-BESS-Outbound-Spam-Status: SCORE=0.20 using global scores of KILL_LEVEL=7.0 tests=HTML_MESSAGE, BSF_BESS_OUTBOUND, BSF_SC0_SA953
X-BESS-BRTS-Status: 1
Archived-At: <>
Subject: Re: [AVTCORE] Source switching performance in draft-hellstrom-avtcore-multi-party-rtt-source-01.txt
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 16 Mar 2020 12:11:38 -0000
Hi Gunnar Many thanks for taking the time to go through this so thoroughly. I think we have 2 main aspects to this work:- 1. Compatibility with existing implementations 2. Choosing an efficient mechanism for the future For the first of these, it seems to me that the only solution is for a mixer to be able to do inline participant labeling and buffering to produce a presentable single text stream. Current implementations of RFC4103 will simply not understand switching between participants, nor will they make any visual indication of which text belongs to which participant, so the mixer needs to do that. For the second we have established that it's possible to: allow the different redundant blocks in a packet to be for different participants; or use timestamps to resolve the correct redundant text to use and to have each packet associated with just one participant. I can also imagine putting all text generations of each source in the CSRC list in each packet which adds 12 bytes of header space per CSRC (assuming 2 redundant generations); I'll write a separate mail about that. After that, I think we have a fairly complete set of solutions to choose from. Some other comments inline. Best regards James [X][X] [X]James Hamlin Contractor Purple, a Division of ZP Better Together, LLC The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message. ________________________________ From: Gunnar Hellström <> Sent: 14 March 2020 11:17 To: James Hamlin; Subject: Re: Source switching performance in draft-hellstrom-avtcore-multi-party-rtt-source-01.txt Hi James, Thanks for an interesting proposal. Let us extend the information about the packet contents of your example a bit: seq 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 CSRC=source 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Timestamp 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 R2 t offset 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 R1 t offset 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 R2 A M X B N Y C O Z R1 A M X B N Y C O Z P A M X B N Y C O Z Lost X X (The timestamps and timestamp offsets ("t offset") are shown in 100 ms, in reality it will be in milliseconds) The SSRC of the packet is always the mixer's SSRC. The source is indicated in the CSRC-list that in this method has only one member = the SSRC of the source represented in the packet. +jeh: Agreed: My mistake. The timestamp is created by the mixer when sending, and the timestamp offsets make it possible to calculate the timestamps the redundant texts had when they were transmitted as originals. The receiver must store essential data from a number of packets. This data is the sequence number, the source (=CSRC), the Timestamp. So, let us see what happens if both packet 06 and 07 are lost. The receiver must also store for each source, the timestamps for which text has been recieved (either with real contents or empty). In packets 1 to 5, we have received and put in display areas for source 1: "AB", for source 2: "MN", for source 3: "X" 08 is received and the gap (07 and 06 ) is detected (07 and 06 with two redundant elements in both, making a need for retrieval of 4 text elements ) is remembered so we need to do the recovery analysis.. The source (2) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 98-6 = 92. Checking back in the list of received packets we find that we got a packet with timestamp 92 (and indeed, it contained text from source 2), so there is no need to recover R2. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 98-3 = 95. Checking back in the list of received packets we find that we got a packet with timestamp 95 (and indeed, it contained text from source 2), so there is no need to recover R1. The Primary text in packet 08 ("O") is retrieved and put in the display area for source 2. It is noted that we have got text for timestamp 98 for source 2. The gap is still 4. 09 is received. The gap (4 elements ) is remembered so we need to do the recovery analysis. The source (3) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 99-6 = 93. Checking back in the list of received packets we find that we got a packet with timestamp 93 (and indeed, it contained text from source 3), so there is no need to recover R2. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 99-3 = 96. Checking back in the list of received packets we find that we never got a packet with timestamp 96 , so we recover R1 ("Y") and insert it in the display area of source 3. The Primary text in packet 09 ("Z") is retrieved and put in the display area for source 3. It is noted that we got text from timestamps 96 and 99 for source 3 (the gap can now be reduced to 3) 10 is received. The gap (3 ) is remembered so we need to do the recovery analysis. The source (1) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 100-6 = 94. Checking back in the list of received packets we find that we got a packet with timestamp 94 (and indeed, it contained text from source 1), so there is no need to recover R2. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 100-3 = 97. Checking back in the list of received packets we find that we never got a packet with timestamp 97 , so we recover R1 ("C") and insert it in the display area of source 1. The Primary text in packet 10 is empty so there is nothing to put in the display area for source 1. It is noted that we got text from timestamps 97 and 100 for source 1 (the gap can now be reduced to 2) 11 is received. The gap (2 ) is remembered so we need to do the recovery analysis. The source (2) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 101-6 = 95. Checking back in the list of received packets we find that we got a packet with timestamp 95 (and indeed, it contained text from source 2), so there is no need to recover R2. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 101-3 = 98. Checking back in the list of received packets we find that we got a packet with timestamp 98 , so we do not need to recover R1. The Primary text in packet 11 is empty so there is nothing to put in the display area for source 2. It is noted that we got text from timestamp 101 for source 2. (we did not recover anything, so the gap is still 2) 12 is received. The gap (2 ) is remembered so we need to do the recovery analysis. The source (3) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 102-6 = 96. Checking back in the list of received packets we find that we never got a packet with timestamp 96, but from packet 09 we recovered R1 from timestamp 96. So we shall not recover anything from R2 here. Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 102-3 = 99. Checking back in the list of received packets we find that we got a packet with timestamp 99, so we do not need to recover R1. The Primary text in packet 12 is empty so there is nothing to put in the display area for source 3. It is noted that we got text for timestamp 102 for source 3 (the gap can now be reduced to 1) 13 is received. The gap (1 ) is remembered so we need to do the recovery analysis. The source (1) and other essential data is noted. The original timestamp of R2 is calculated as Timestamp-R2 t offset = 103-6 = 97. Checking back in the list of received packets we find that we already recovered text for timestamp 97, so nothing is recovered and nothing inserted from R2 in the display area of source 1 . Then the original timestamp of R1 is calculated as Timestamp-R1 t offset = 103-3 = 100. Checking back in the list of received packets we find that we got a packet with timestamp 100, so we do not need to recover R1. The Primary text in packet 13 is empty so there is nothing to put in the display area for source 1. It is noted that we got text for timestamp 103 for source 1 (the gap can now be reduced to 0) 14 is received. The gap is now 0 so we do not need to do any recovery analysis. The source (2) and other essential data is noted. The Primary text in packet 13 is empty so there is nothing to put in the display area for source 2. It is noted that we got text for timestamp 104 for source 2. After this, we have in the display areas: for 1: "ABC" for 2: "MNO", for 3: "XYZ", so everything is recovered. I am sorry, the narrative above may be hard to follow. It could probably be converted to some table format if we need to do it again for other cases. So, yes, this method also works. I see a couple of differences in characteristics between this "timestamp method" and the "CSRClist method": 1. The recovery time from loss to recovery can with the timestamp method be 200 times the number of simultaneous sending sources in milliseconds. Thus with 5 sources: 1 second. With the CSRClist method, it is steady 200 milliseconds. (assuming a transmission interval of 100 ms and round robin mixer switching.) 2. The recovery capacity in packets in sequence is 2*the number of simultaneous sources = 10 packets for 5 sources. With the CSRC method it is 2 packets. (again assuming round robin mixer switching ) 3. The complexity of the procedure is higher but still manageable for the timestamp method. jeh: Yes. I had thought this would be simpler, but the timestamp logic gets complicated. 4. The number of packets to store essential information about is higher for the timestamp method ( I think it is 4*the number of active sources). For the CSRC list method, it is 4 packets and less information per packet. You say: "The advantage of this approach is that the format of the packet doesn't change. The current arrangement where all the text in a packet is for one participant is preserved." I do not see the CSRClist method as a change in packet format. jeh: Agreed: I should have said that differently: the change is that the text over the redundant and primary block is no longer continuous; it's for different participants. The mixer needs in both methods to include a CSRC -list. The difference is that in the CSRClist method, the list has more members. It is still within the format description of RTP. The source of the redundant parts will vary in a packet, but the composition of the contents of the packet for transmission from the mixer is as usual for a single sender: Put what was sent next to last in the packet as R2, put what was sent last as R1, and put the new text chunk as P. The only addition is the rule that the CSRC list is populated with the sources in the strict order. Summary: both methods seem possible. It will be interesting to get more comments. Thanks, Gunnar Den 2020-03-14 kl. 01:45, skrev James Hamlin: Hi Gunnar I've also been thinking through the possibility of a sender switching source without clearing all redundant generations first. Clearly, a sender that did this today would cause problems for existing receivers. But checking timestamps at the receiver should fix this. Consider three senders 1, 2 and 3 which send text "ABC", "MNO" and "XYZ". The block below shows this text being sent taking round-robin turns with the participants. seq 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 part 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 - R2 A M X B N Y C O Z R1 A M X B N Y C O Z P A M X B N Y C O Z If sequence 06 is lost and the receiver sees sequence 07 then it may assume that the lost packet was for participant 1 and use the redundant character "B". This would lead to character "B" being duplicated in the output for participant 1. But it is possible for a receiver to get the correct result; the timestamp for the redundant text in packet 07 will not be higher than the most recent timestamp previously received and so shouldn't be used. The redundant text in the following packet 08 has no applicable redundant text, but the timestamp of the R1 in packet 09 will be greater than the most recent received and so is usable. The advantage of this approach is that the format of the packet doesn't change. The current arrangement where all the text in a packet is for one participant is preserved. Tracking the timestamp adds some implementation effort but I think it's minimal. It does also mean the mixer needs to synchronize timestamps across the media sources but it is in a position to do so. Best regards James [X][X] [X]James Hamlin Contractor Purple, a Division of ZP Better Together, LLC The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message. ________________________________ From: Gunnar Hellström <><> Sent: 13 March 2020 16:41 To:<>; James Hamlin Subject: Source switching performance in draft-hellstrom-avtcore-multi-party-rtt-source-01.txt Hi, I want to follow-up on the good discussion on source switching performance a couple of days ago, under the subject "[AVTCORE] Improved RTP-mixer performance for RFC 2198 and RFC 4103 redundancy coding" Two parts in the performance increase solution. Two actions are proposed in draft-hellstrom-avtcore-multi-party-rtt-source: a) Reduce the packet transmission interval from 300 to 100 ms. b) Use a strict relation between members in the CSRC list and the parts of the payload that is original text and first generation redundancy and second generation redundancy so that the mixer can switch source for every new packet and the sources of text recovered from redundancy can be assessed by the receiver. I think it is worth while to move forward with the complete improvement a) and b) proposed in the draft. It will cause less complexity, lower delays and lower risk for stalling in case of many participants entering new text simultaneously. Here is my reasoning: I have the following view of the achievable performance improvements for different cases: 1. With the original source switching with RFC 4103 and an RTP-mixer using 300 ms transmission interval and not allowing a mix of sources in one packet, there can be one source switching per second by the mixer with an introduced delay of up to one second. 2. By just reducing the transmission interval from 300 to 100 ms, it will be possible to have three source switches per second with an introduced delay of up to one second. (with just two parties sending text simultaneously, the delay will be maximum 300 ms. ) 3. And by applying the proposal from the multi-party-rtt-source draft with the CSRC-list as a source list for the redundancy, and also using 100 ms transmission interval, there can be switching between five source per second with an introduced delay of max 500 ms. With just two parties typing simultaneously, the delay will be a maximum of 100 ms. The delays are extreme values from when all sources start to type simultaneously. It was agreed that at least the improvement from the reduced transmission interval is needed. Case 1 and 2 are a bit complex for the mixer to implement. From the moment it has text queued for transmission from another source B than the one currently transmitted A, then the mixer needs to stop adding new text from A to the packets, but still send two more packets with the agreed transmission interval, progressing the latest transmitted original text to first generation redundancy and then again one more packet with the text as second level redundancy. Not until that is done, the mixer is allowed to start taking text from the transmission queue from B to transmit. This is the background of the 1 s vs 300 ms delays in case 1 and 2. In case 3, there is much less complexity. When there is something from B in queue for transmission, the mixer can decide to insert that in next packet and add the redundancy from earlier transmissions from A, because their sources are included in the CSRC list in the same packet. Therefore I want to move on with the complete solution in case 3. ----------------------------------- Influence on the multi-party capability negotiation: There is an installed park of RTT implementations without multi-party awareness. The receiver need to take active part in planning the multi-party RTT presentation. Therefore a capability negotiation is needed. A simple sdp attribute a=rtt-mix without value is proposed in the draft. It is important to let this attribute mean capability of the complete solution case 3). If there is a temptation to have different levels of implementation, some only implementing the shorter transmission interval (2) and some implementing the complete solution (3), then threre will be a need for two different attributes, or one attribute with a list of parameter values for the two cases. That would complicate the evaluation of the negotiation. Therefore I would prefer that the attribute can mean capability to use the complete mixing solution (3). Regards Gunnar Den 2020-02-29 kl. 20:13, skrev<>: A New Internet-Draft is available from the on-line Internet-Drafts directories. Title : Indicating source of multi-party Real-time text Author : Gunnar Hellstrom Filename : draft-hellstrom-avtcore-multi-party-rtt-source-01.txt Pages : 13 Date : 2020-02-29 Abstract: Real-time text mixers need to identify the source of each transmitted text chunk so that it can be presented in suitable grouping with other text from the same source. An enhancement for RFC 4103 real- time text is provided, suitable for a centralized conference model that enables source identification, for use by text mixers and conference-enabled participants. The mechanism builds on use of the CSRC list in the RTP packet. A capability exchange is specified so that it can be verified that a participant can handle the multi-party coded real-time text stream. The capability is indicated by an sdp media attribute "rtt-mix". The IETF datatracker status page for this draft is: There are also htmlized versions available at: A diff from the previous version is available at: Please note that it may take a couple of minutes from the time of submission until the htmlized version and diff are available at Internet-Drafts are also available by anonymous FTP at: _______________________________________________ I-D-Announce mailing list<> Internet-Draft directories: or -- + + + + + + + + + + + + + + Gunnar Hellström Omnitor<> +46 708 204 288 -- + + + + + + + + + + + + + + Gunnar Hellström Omnitor<> +46 708 204 288
- [AVTCORE] Source switching performance in draft-h… Gunnar Hellström
- Re: [AVTCORE] Source switching performance in dra… James Hamlin
- Re: [AVTCORE] Source switching performance in dra… Gunnar Hellström
- Re: [AVTCORE] Source switching performance in dra… James Hamlin
- Re: [AVTCORE] Source switching performance in dra… Gunnar Hellström
- Re: [AVTCORE] Source switching performance in dra… James Hamlin
- Re: [AVTCORE] Source switching performance in dra… Gunnar Hellström
- Re: [AVTCORE] Source switching performance in dra… Gunnar Hellström
- Re: [AVTCORE] Source switching performance in dra… James Hamlin
- Re: [AVTCORE] Source switching performance in dra… James Hamlin
- Re: [AVTCORE] Source switching performance in dra… Gunnar Hellström
- Re: [AVTCORE] Source switching performance in dra… Gunnar Hellström