Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-avtcore-multi-party-rtt-mix-18: Answer 2 on 3.19 security

Gunnar Hellström <> Mon, 24 May 2021 14:33 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 7ABF73A2AD2; Mon, 24 May 2021 07:33:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id F7KqDPMdeJM3; Mon, 24 May 2021 07:33:29 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 319763A2AAC; Mon, 24 May 2021 07:33:28 -0700 (PDT)
Received: from [] ( []) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id 3F15D2004C; Mon, 24 May 2021 16:33:25 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dkim; t=1621866805; bh=uiy5phX1K8yoBjcJFJHxPh0F4Jm3+Pk2+E/CCyvbsOU=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=PrI9dXz4xUko12t7cOM0wVsI2zGuHHQrvl11QgKtJDgVsrWL8uePCQajPvG2mz4cm aFOHAVByqQkystis9hs6EH1Ghq98RB3a4YEGjTiiJp3aLc2e4UZppLken1MQdyZAXs qmlSWOwu4Wd4KxzFNMZmidl51Qz30u899UXk6sjo=
To: Benjamin Kaduk <>, The IESG <>
References: <>
From: =?UTF-8?Q?Gunnar_Hellstr=c3=b6m?= <>
Message-ID: <>
Date: Mon, 24 May 2021 16:33:23 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/alternative; boundary="------------BCCC6E69C69AD0C803FA198E"
Content-Language: sv
Archived-At: <>
Subject: Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-avtcore-multi-party-rtt-mix-18: Answer 2 on 3.19 security
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 24 May 2021 14:33:35 -0000


I hope I resolved the issues with 3.9 in the previous answer.

Therefore I continue here with 3.19 that you partly included in the DISCUSS.

You say:

"I also left a note in the comment that there's a remark about "lower
security level" in Section 3.19 that's not really accurate; we should
resolve that in some manner before the document proceeds."

Section 3.19

    Security SHOULD be applied when possible regarding the capabilities
    of the participating devices by use of SIP over TLS by default

"Security" is not some all-encompassing attribute that can be
generically applied; there are specific security properties that may or
may not be achieved by any given mechanism, and it's generally worth
being precise about what properties are (or are not) achieved.  So here
we might say "security mechanisms to provide confidentiality and
integrity protection and peer authentication SHOULD be applied".

[GH] Accepted and included.
cannot in general achieve source authenticity with just SRTP when a
mixer is involved, though RFC 8723 does specify a double-encryption
mechanism that applies in some cases when there is a central media

    applications where legacy endpoints without security may exist, a
    negotiation SHOULD be performed to decide if encryption on the media
    level will be applied.  [...]

How would endpoints know if legacy endpoints might exist?

[GH] It may be a decision by the service provider. It is of course not 
desirable to mix secure and unsecured endpoints, but that is what is 
specified in the next generation emergency service RFC:s RFC 6443 and 
RFC 6881. These RFCs have reasoning about that choise. I suggest to 
replace "may exist" with "are allowed".   

  How would this negotiation be performed?
[GH]The draft only supplies a suitable example in next sentence by 
recommending RFC 8643 OSRTP when nothing else is specified in the 
application. RFC 6443 and RFC 6881 also specifies mechanisms to be used 
for negotiating security. I suggest that that is sufficient.   

  security levels.  The mixing for conference-unaware endpoints has
    lower security level than the mixing method for conference-aware
    endpoints, because there may be an opportunity for a malicious mixer
    or a middleman to masquerade the source labels accompanying the text
    streams in text format.  This is especially true if support of un-
    encrypted SIP and media is supported because of lack of such support
    in the target endpoints.  However, the mixing for conference-aware
    endpoints as specified here also requires that the mixer can be
    trusted.  [...]

As the last sentence indicates, the provided reasoning in the first
sentence is not really accurate, since the mixer could just as easily
adjust the CSRC value in the header as change the label in the in-band
text stream.  This does not inherently invalidate the claim that there
are different security levels, though, as the correct behavior of the
mixer seems easier to independently validate in the conference-aware
endpoint case (with the well-formed RTP payloads providing information
that can be validated out-of-band with other participants).  But I don't
think this description should be left in the document as-is; it doesn't
seem accurate.

In the case of unencrypted media, it does seem technically true that it
is easier for a non-mixer middleman to masquerade the source labels,
since in that case it only adjusts the payload directly without needing
to keep state on the RTP sender information and produce well-formed RTP
headers after its adjustment.  But this is only a modest level of
additional difficulty and does not reflect any kind of effective
security control, so it may not be worth mentioning at all.
[GH]I suggest to delete the paragraph, except the last sentence 
referring to "Further security considerations..."

  End-to-end encryption would require further work and could be based
    on WebRTC as specified in Section 1.2.
Is RFC 8723 not applicable to these scenarios at all? I do not think it 
is WebRTC-specific. [GH] Yes, it looks good, but requires quite another 
approach to how to detect and indicate text loss, because RFC 2198 that 
is used, has a header in the RTP payload, containing Payload type and 
timestamp offsets. RFC 8723 has a special chapter on how to handle cases 
when RFC 2198 is used, and it requires the detection of loss and 
recovery to be handled between the source and the destination rather 
than by the mixer. I suggest to add reference to RFC 8723 in the 
sentence about further work to read: "End-to-end encryption would 
require further work and could be based on WebRTC as specified in 
section 1.2 or on double encryption as specified in [RFC8723]."


Continuation will follow with the COMMENT points.


Den 2021-05-19 kl. 04:53, skrev Benjamin Kaduk via Datatracker:
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-avtcore-multi-party-rtt-mix-18: Discuss
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> Please refer to
> for more information about DISCUSS and COMMENT positions.
> The document, along with other ballot positions, can be found here:
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> I'm not sure I understand how the examples are consistent with the main
> specification, so let's please discuss it to either un-confuse me or fix
> the document.
> Section 3.9 seems to say that the oldest (source or redundant) text at
> the mixer takes priority when there is text from more than one source
> waiting to be sent, but the examples in Section 3.21 seem to show (e.g.)
> text received from A at time 20400 that is to be sent as redundancy,
> being sent after text from B received at time 20500 (sent as primary).
> Is the intent that if there is any primary text, the oldest primary text
> is sent first, and only if there is no outstanding primary text do we
> consider the redundant text?
> In a related vein, Section 3.10 says that a packet is sent when (among
> other things) "330 ms has passed since already transmitted text was
> queued for transmission as redundant text".  But that doesn't say
> anything about the timer being reset by subsequent transmission or
> queuing of redundant text, so I'm not sure how in the Section 3.21
> example, we say that transmitting B1 and B2 as redundancy was planned as
> 330 ms after packet 105 -- the original B2 was sent in packet 104, so
> shouldn't the 330ms start from packet 104's transmission?  (The stated
> time for this seems to match 330ms after 104, so maybe the "105" is just
> a typo?)
> I also left a note in the comment that there's a remark about "lower
> security level" in Section 3.19 that's not really accurate; we should
> resolve that in some manner before the document proceeds.
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
Gunnar Hellström