Re: [AVTCORE] WG last call for draft-ietf-avtcore-srtp-ekt-01

Magnus Westerlund <> Fri, 15 November 2013 16:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3AB2B11E80F8 for <>; Fri, 15 Nov 2013 08:24:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -103.716
X-Spam-Status: No, score=-103.716 tagged_above=-999 required=5 tests=[AWL=-1.117, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id f3I30GBQ4yPI for <>; Fri, 15 Nov 2013 08:24:52 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 2F75E11E81B1 for <>; Fri, 15 Nov 2013 08:23:58 -0800 (PST)
X-AuditID: c1b4fb38-b7f2c8e000006d25-09-52864a9d80a9
Received: from (Unknown_Domain []) by (Symantec Mail Security) with SMTP id A1.11.27941.D9A46825; Fri, 15 Nov 2013 17:23:57 +0100 (CET)
Received: from [] ( by ( with Microsoft SMTP Server id 14.2.328.9; Fri, 15 Nov 2013 17:23:56 +0100
Message-ID: <>
Date: Fri, 15 Nov 2013 17:23:54 +0100
From: Magnus Westerlund <>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: IETF AVTCore WG <>
References: <>
In-Reply-To: <>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprALMWRmVeSWpSXmKPExsUyM+Jvre5cr7Ygg98T+Sxe9qxktzjZ+YbJ gcljyZKfTB5fLn9mC2CK4rJJSc3JLEst0rdL4Mp4dzazYPYSxorpM1+zNjBOa2DsYuTkkBAw kbi4eR8rhC0mceHeerYuRi4OIYEjjBIbnx4GKxISWM4o0bMnp4uRg4NXQFvi4DMnkDCLgKrE lOn72UBsNgELiZs/GsFsUYFgifOvFrOD2LwCghInZz5hAbFFBJQkdkzaxgxiMwsYSCxv38kE YgsLuEksXvSRHWKVtsSDpttg93AK6EhsbephBFkrISAu0dMYBNGqJzHlagsjhC0v0bx1NjNM a0NTB+sERqFZSDbPQtIyC0nLAkbmVYwcxanFSbnpRgabGIGhenDLb4sdjJf/2hxilOZgURLn /fjWOUhIID2xJDU7NbUgtSi+qDQntfgQIxMHp1QDI6sr183ClohnvBvzS1q39RhbOWbNstUW Zjy+aM3kBQc6rZbcC/OtdDuvx71PK5ZTeIXYwtNJF09V8Z1dem526fndbeJM7/a83aWqb11a +GVa7DH+lsQbP4u/TlyxzacyXP53XdZqnn9nDqk6GH31/cp+sGyprDvH/Zd2dVV77U6ucp3/ zEdZWImlOCPRUIu5qDgRADBatYQjAgAA
Subject: Re: [AVTCORE] WG last call for draft-ietf-avtcore-srtp-ekt-01
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 15 Nov 2013 16:24:59 -0000


This is my personal review of EKT in its WG last call.

I have to say that this is clearly not ready yet. There are a large
number of issues that needs resolution.

First the ones that doesn't have clear section references.

G1: The documents section where the connection with signalling and the
key-exchange mechanisms are specified implies that one can start an SRTP
session without EKT and later upgrade it to use EKT. I however fails to
see how this can work reliable. Especially not the alluding to that you
can wait doing this until you actually have a new master key to use.

A. Lets start with the signalled upgrade. When upgrading to EKT from
plain SRTP then you start sending SRTP and SRTCP packets that allways
have a EKT trailer, which is either a full header or the 1-byte short
header. Thus when turning on EKT this header will be added to the end of
the packet by the sender. When the receiver gets this packet it will
fail SRTP authentication, as the EKT header now displaced the
Authentication header. Thus there are failure modes for this upgrade
that isn't discussed. To my understanding given that one have received
the signalling that EKT will be used, then one can run regular mode
until an authentication failure, then try processing the failed packets
as EKT. If that works, one go to EKT mode for that SSRC. Risk, twice the
amount of authentication work for SSRC not yet switched, but for a
limited time

B. If this is EKT switch can occur at any time, then each time there is
an authentication failure one have to check if this is EKT. Thus,
bit-errors, packet mangling, all of these results in twice the amount of
authentication work before dropping the packet.

Authors please clarify what the intention is hear. Is this to be
supported, if it is, then can we please have a section defining the
procedure for how to handle it so that its cost can be appropriately judged?

G2. SRTCP compound packets and the SSRC field in the EKT. So the
processing in Section 2.2.2, bullet 4 says that the EKT SSRC field must
match the SRTP headers SSRC. This fails to discuss SRTCP, and I think
you might have a major head-ache here. An SRTCP compound packet may
actaully contain SR/RR packets from multiple SSRCs from an end-point.
See RFC3550 Section 6.1 that says:

It is RECOMMENDED that translators and mixers combine individual
      RTCP packets from the multiple sources they are forwarding into
      one compound packet whenever feasible in order to amortize the
      packet overhead (see Section 7).

This is further discussed in draft-ietf-avtcore-rtp-multi-stream.

Thus I think you have an issue of how to get all these SSRCs EKT
payloads sent in SRTCP. If one takes the easy way out, and say that the
SSRC in EKT must match the first RTCP packet part of the compound, and
then round-robin among them you might have significant delay until all
the SSRCs from an end-point have sent their EKT payload even once.

This maybe, but I am uncertain be handled with above said round robin,
and instead rely on SRTP distribution of the relevant EKT. But the issue
clearly needs to be discussed.

G3. The common master salt. I realized this when reading the security
description section, that the master salt to use is actually defined by
the cipher suit. And in case of mixing GCM or CCM cipher suits with any
of the regular AES ones you actually have Master Salt mismatch between
them. GCM and CCM uses 96 bits Master Salts, why the others use 112
bits. Thus, an assumption EKT is built on can easily be made untrue by
including cipher suites in the same invite from these two sets.

G4. In general I am missing a clear explanation of how the "default"
master key produced by the key-managements are being used when combined
with EKT. This applies to all the key-management mechanism where I don't
get if or not they being used, or need to be included at all.

G5. This document totally fails to discuss the issue of MTU and the fact
that having a varying sized tag could cause issues. I would recommend
that one always assumes a full EKT header will be present and deduct
appropriately from the MTU.

G6. The document is very sloppy in referencing SRTP, SRTCP or both types
of packets correctly. Please review all references to packets to ensure
that this is correct.

Now comments more associated with particular sections:

1. Section 1:
   For example, if a participant joins a session that is already in
   progress, the SRTP rollover counter (ROC) of each SRTP source in the
   session needs to be provided to that participant.

I think this might not be the best start, at least failing to discuss
RFC 4771-

2. Section 1:
   securely distributes the SRTP master key and other information for
   each SRTP source, using SRTCP or SRTP to transport that information.

I think you should be more explicit when you refer to what is identifed
by a SSRC, like this:

SRTP Source (SSRC), using ..

3. Section 1:
 Section 3, Section 4, and Section 5
   define the use of EKT with SDP Security Descriptions, DTLS-SRTP, and
   MIKEY, respectively.  Section 7 provides a design rationale.
   Section 6 explains how EKT can interwork with keying in call

First, you have a order mismatch, secondly, I can't understand the
second sentence.

4. Section 2.1:
What is the definition of the SSRC field part of the EKT_PLAINTEXT?

5. Section 2.1:
Definition of EKT_Ciphertext:
I am missing what the requirements are on the production of this from
the plain text. What is given is that confidenitiality needs to be
equally strong on this as the SRTP. What about integrity and

6. Section 2.1:
Together, these data elements assoicated with an instance of EKT
      are called an EKT parameter set.

What is an "instance of EKT"?

7. Section 2.2:
At any given time, each SRTP/SRTCP source has associated with it a
   single EKT parameter set.

Make clear that SSRC is the identifying factor:

At any given time, each SRTP/SRTCP source (SSRC) is associated with it a
   single EKT parameter set.

8. Section 2.2:

   may be other EKT parameter sets that are used by other SRTP/SRTCP
   sources in the same session.

Does the above apply also within a end-point? So when having multiple
SSRCs one can have multiple different EKT parameter sets in use?

9. Section 2.2:
All of these EKT parameter sets SHOULD
   be stored by all of the participants in an SRTP session, for use in
   processing inbound SRTCP traffic.

inbound SRTP/SRTCP traffic?

10. Section 2.2.1:
   First, the sender decides whether to use the Full or Short format.
   When sending EKT with SRTP, the Full format SHOULD be used on the
   initial SRTP packet in a session and after each rekeying event.  When
   sending EKT with SRTCP, the Full format MUST be used.  Not all SRTP
   or SRTCP packets need to include the EKT key, but it SHOULD be
   included with some regularity, e.g., every second or every ten
   seconds, though it need not be sent on a regular schedule.

Why isn't this mostly a pointer on Section 2.6? I think you should
gather the discussion of when and why to send full format in one
section. The reason I think is that Section 2.6 do need to be expanded
to both discuss when and how to make delivery of the keys reliable.

I think one can start with analyzing the cases when one knows the
receiver(s) will not have the keys. Which I think comes down to two
major cases:

1. The SSRC one transmit is new joiner of the session
2. There is new receiving endpoint in the session

A sender can easily determine 1), it can in some cases determine 2) and
act on it.

The next is how to assure reliability, and that do depend on the
receiver population and session properties. I would also note that there
are cases where one can determine that the EKT has been received. For
example RTCP reports on Extended Sequence number or explicit ACK
messages for a sequence number that one has sent with the new key, thus
determining that the reporting SSRC has been capable of setting the
right SRTP key state.

Next is the classical, put the EKT full header on key-frames,
periodically or on all until determined that reception has happened.

I think 2.6 can be structured to say this better. With concrete and good
recommendations for how to act. Probably the new receiver is the hardest
as the reasonable action will depend on group size.

In the end the text in section 2.2 can be significantly shortened to
basically say. The use of Full or Short headers SHOULD follow the
recommendation in Section 2.6.

11. Section 2.2.1
   The EKT_Ciphertext field is set to the ciphertext created by
   encrypting the EKT_Plaintext with the EKT cipher, using the KEK as
   the encryption key.

The above uses KEK without defining it.

12. Section 2.2.1:

I am missing a clear statement that each SSRC an endpoint have must
individually ensure that their Master Key, ROC and SPI is distributed in
the RTP session, even if shared with other SSRCs from the same endpoint.

13. Section 2.2.2, Bullet 2:

If multiple parameter sets have been defined for the
       SRTP session, then the one that is associated with the value of
       the SPI field in the packet is used.

What is only one SPI is defined, what is used then. Shouldn't the SPI be
matched independently?

14. Section 2.2.2, Bullet 3:
The EKT_Ciphertext is decrypted using the EKT_Key and EKT_Cipher
       in the matching parameter set,

Here the key is called EKT_Key, previously KEK?

15. Section 2.2.2, bullet 5:
If the ROC from the EKT_Plaintext is less than the ROC in the
       SRTP context, then packet processing halts.

Probably should be explicit that the SRTP context related to the SSRC
from step 4 is to be used.

16. Section 2.2.2, bullet 5:
Otherwise, the ROC
       in the SRTP context is set to the value of the ROC from the
       EKT_Plaintext, and the SRTP Master Key from the EKT_Plaintext is
       accepted as the SRTP master key corresponding to the SRTP source
       that sent the packet.

If there is no SRTP crypto context corresponding to
       the SSRC in the packet, then a new crypto context is created.  If
       the crypto context is not new, then the rollover counter in the
       context MUST NOT be set to a value lower than its current value.

Appears to be in the wrong order in this bullet.

17. Section 2.2.2, bullet 5:
Otherwise, the ROC
       in the SRTP context is set to the value of the ROC from the
       EKT_Plaintext, and the SRTP Master Key from the EKT_Plaintext is
       accepted as the SRTP master key corresponding to the SRTP source
       that sent the packet.

I think this is missing discussion of ISN. As the new key is not valid
for use until ISN to my understanding.

18. Section 2.2.2: Usage limiations of ISN and ROC

Due to the rules for the ROC, that any lower ROC gets update to the ROC
included in the EKT, then one can't project the key-change further into
the future, than the current ROC. So if one are at RTP Seq 65501 and
would like to put a new key into use in 100 packets, then that isn't
possible as the ISN can only point within the current ROC, i.e. no
further than 65535. I don't think this is a serious limitation, but it
should be discussed.

19. Section 2.2.2: Use of ISN and the playback issues. If one wasn't
allowed the pre-place an Master Key, ROC, SPI set, then one could accept
a EKT full header provisionally, then try to use it on the current
SRT(C)P packet and see if authenticates correctly. If it does, then one
moves the EKT from provisional to verified status. However, with the ISN
usage, and the possibility to project EKT parameters into the future,
then one can't do the immediate verification, but maybe one should also
have similar provisional status of parameter set. That way at least one
don't use a replayed EKT to overwrite a valid entry.

20. Section 2.2.2, Bullet 6:
This contain overlap procedures with bullet 5. Can you separate or merge
them to make them correct and sequential in processing.

21. Section 2.2.2,
          Implementation note: the value of the EKT Ciphertext field is
          identical in successive packets protected by the same EKT
          parameter set and the same SRTP master key and ROC.

I think the end of the sentence needs to include ISN.

I would note that the format of the EKT do require one to generate a new
EKT cipher text at minimum each time the ROC changes, i.e. every 65536
packets. I don't know if this is worth pointing out.

22. Section 2.3:
This cipher
   MUST be implemented, but another cipher that conforms to this
   interface MAY be used, in which case its use MUST be coordinated by
   external means (e.g., call signaling).

I think the above formulation, without using name of the cipher is
unclear. In fact I am uncertain which EKT cipher I am required to
support. As Section 2.3.1 doesn't identify a single one, and instead
references three labels.

I also think the "call signaling" is the wrong reference. It needs to be
coordinated in key-management.

23. Section 2.3.1
   The default EKT Cipher is the Advanced Encryption Standard (AES)
   [FIPS197] Key Wrap with Padding [RFC5649] algorithm, which can be
   used with plaintexts larger than 16 bytes in length, and is thus
   suitable for keys of any size.  It requires a plaintext length M that
   is at least eight bytes, and it returns a ciphertext with a length of
   N = M + 8 bytes.

I find it very confusing to have one sentence saying that the KEY wrap
requires plaintext larger that 16 bytes, and the next to say it is 8
bytes required.

24. Section 2.3.1:

   When AES-128 is used in SRTP and/or SRTCP, AESKW_128 SHOULD be used
   in EKT.  In this case, the EKT Plaintext is 26 bytes long, the EKT
   Ciphertext is 40 bytes long, and the Full EKT field is 42 bytes long.

   When AES-192 is used in SRTP and/or SRTCP, AESKW_192 SHOULD be used
   in EKT.  In this case, the EKT Plaintext is 34 bytes long, the EKT
   Ciphertext is 48 bytes long, and the Full EKT field is 50 bytes long.

   When AES-256 is used in SRTP and/or SRTCP, AESKW_256 SHOULD be used
   in EKT.  In this case, the EKT Plaintext is 42 bytes long, the EKT
   Ciphertext is 56 bytes long, and the Full EKT field is 58 bytes long.

I think you should build a table to make clear which key-wrap should be
used for all the already defined SRTP crypto transforms.

I also think it is unclear to use SHOULD in the above. I think they are
MUST unless overwritten by a key-management function.

25.  Section 2.3.2:

In reference to the above. I believe it is reasonable that you propose
default EKT ciphers for all existing SRTP transforms. They are not
thousands of them, just a few different algorithms and key-lengths in use.

26. Section 2.4:
A participant in a session MAY opt to use a particular EKT key to
   protect outbound packets after it accepts that EKT key for protecting
   inbound traffic.

Are the reference to EKT_Key, actually correct, or should it refer to
EKT Parameter set?

27. Section 2.4:
An SRTP/SRTCP source SHOULD change its SRTP master key after its EKT
   key has been changed.

What are the reasonable exceptions to doing this, why isn't the SHOULD a
MAY or a MUST?

28. Section 2.4:
Rather than automatically discarding such SRTP packets, the receiver
   MAY want to provisionally place them in a jitter buffer and delay
   discarding them until playout time.

I think referring to Jitter buffer here is maybe to specific. I would
abstract this to say: buffer the packets and later discard them when
they become unusable.

29. Section 2.4:
As RTCP packets doesn't have sequence numbers, If one include a new
Master Key in an EKT packet with an ISN in the future attached to the
RTCP packet, does this key applies immediately for RTCP, or first when
the SRTP packets has been sent?

30.  Section 3:
   The SDP Security Descriptions (SDESC) [RFC4568] specification defines
   a generic framework for negotiating security parameters for media
   streams negotiated via the Session Description Protocol by use of a
   new SDP "crypto" attribute and the Offer/Answer procedures defined in

Is really "new" appropriate to use here?

31. Section 3.1:
Suggest using ":" after hanging text in bullet list

32. Section 3.5.1:

However if it operates as
      an RTP translator, synchronized negotiation of the EKT parameter
      sets on *all* the involved SIP dialogs will be needed.

I assume the issue you are referring to is a transport translator
(relay), rather than a media translator, or?

33. Section 3.5.3:
In this case, there will be multiple EKT parameter sets;
   one for each SRTP session.

I agree this occurs between one end-point and its different peers. What
is unclear in this topology is if this truly are different RTP sessions,
or just one RTP session. It depends on the implementation.

Maybe further clarify that the concern really are between different
pairs of end-points.

34. Section 3.7:
Finally, subsequent offer/answer exchanges MUST NOT remap a given SPI
   value to a different EKT parameter set until 2^32 other mappings have
   been used within the SRTP session.

Considering that the SPI is 15 bits, this appears wrong. See also later
sentence in paragraph.

35. Section 3.9:
       a=crypto:2 F8_128_HMAC_SHA1_80

The EKT key appears very short (24 bits).

36. Section 4.1:
                   enum {
                   } ektcipher;

I don't understand why AEA_128 is listed here, can you please clarify?

37. Section 4.1:

This appears to point to an open issue not dealt with:

      Editor's note: do we need reliability for the ekt_key messages?

38. Section 4.1:

I don't understand how the SPI are bound to DTLS-SRTP parameter
agreements, and if you actually can use DTLS-SRTP to establish multiple

39. Section 4.1.1:

The dtls-srtp-host SDP attribute appears a bit strange. First of all, a
includer in an O/A exchange gets no confirmation that the peer received
or understands it. The second you could ensure by being explicit about
mandating support for it. However, I don't know how you plan to ensure
that it isn't removed in middlebox that doesn't understand it and thus
remove it. Shouldn't this be a slightly more elaborate parameter that
allows negotiation?

40. Section 4.1.1: ABNF:

I am missing references to all the used definitions, space, nettype,
addrtype, connection-address. I personally know they are from RFC 4566,
but you are not explicit about this in the sentence leading up to the def.

41. Section 4.1:

I fail to find the SDP attribute definition for dtls-srtp-ekt.

42. Section 4.2.2:
The presence of the dtls-srtp-host attribute
   indicates an alternate host to send the DTLS-SRTP handshake (instead
   of the host on the c/m lines).

Or what ICE indicates?

43. Section 5:
I am missing the RSA-R mode (RFC 4738) in this discussion. I don't know
if it seriously effects anything, but that mode do allow the Responder
to provide the TGK or TEK.

44. Section 5:

I missing a discussion on the multicast and broadcast usages of MIKEY.

45. Section 5.2:
I think most of what is defined here is actually not O/A specific, and
applies indpedenent of how MIKEY messages are transported. Could this be
factored out, so that it get more general applicability? After all the
usages I hear for MIKEY are not in a O/A context.

46. Section 9:

This section needs quite a lot more detail. The SDP attributes needs to
fill in the template. The Registries needs considerations for what a
registration requires. Probably the other registrations have several
formal requirements not fulfilled by this text.

47. Section 11:

I think the following references are in the wrong category:
RFC 3261, RFC 4771, RFC 6347?, RFC 3830, RFC 5649.


Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVM
Ericsson AB                | Phone  +46 10 7148287
Färögatan 6                | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden| mailto: