Re: [secdir] Review of draft-ietf-clue-framework-24

"Roni Even" <> Wed, 02 December 2015 23:19 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id B86F81A87BB; Wed, 2 Dec 2015 15:19:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.9
X-Spam-Status: No, score=0.9 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id QeCXU-vd0u4T; Wed, 2 Dec 2015 15:19:24 -0800 (PST)
Received: from ( [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id CB9E51A8756; Wed, 2 Dec 2015 15:19:23 -0800 (PST)
Received: by wmuu63 with SMTP id u63so235624847wmu.0; Wed, 02 Dec 2015 15:19:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:thread-index:content-language; bh=MYhkcOFD1/KRBXFxcu+HsvV+TZ0Ilke7cv57ApwBPlc=; b=oDDwvVNCCtMgrbuSSk6gjMXcmvJ9plTf+8aFihk2weUq6vCmjJY8b1P3X6Fhtp7S80 KRYGZAZVxCsFID4RGySx1hQ/Bb5hBHW9G++pOFA4yVN89iUJhlo+Qlaex3qch1WDPJHg 9+WERFCAHXJsmsaSk341FCAkGwaYnuzklWmCUEg6ApGj7cj7vJ6VojzV1AoZrQBQbGFT lREjm1tV34fs1Eiq0lflQp52dlHTscyx5EDRE7jC87WaI2G2U2uVXIBDs17ZyFu5dhGJ GrVimT4Jk2E4ZpZDdKf/X0OdxvrcjH7BCUFlXW78/V3mN43lj3w6zlsuoa9OGQz+k4WY 5JeQ==
X-Received: by with SMTP id 132mr49775888wmh.100.1449098362403; Wed, 02 Dec 2015 15:19:22 -0800 (PST)
Received: from RoniPC ( []) by with ESMTPSA id uq3sm4836641wjc.10.2015. (version=TLSv1/SSLv3 cipher=OTHER); Wed, 02 Dec 2015 15:19:21 -0800 (PST)
From: "Roni Even" <>
To: "'Phillip Hallam-Baker'" <>, <>, <>, <>
References: <>
In-Reply-To: <>
Date: Thu, 3 Dec 2015 01:19:11 +0200
Message-ID: <060001d12d57$dcd53520$967f9f60$>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0601_01D12D68.A05F8BC0"
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQEBhf3CCSlHIPZaO4U95pGYC9pEJKBXnMdQ
Content-Language: he
Archived-At: <>
Subject: Re: [secdir] Review of draft-ietf-clue-framework-24
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Security Area Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 02 Dec 2015 23:19:26 -0000


The framework defines what is a CLUE endpoint (see section 3), it has one signaling (SIP UA connection) that creates the connection to the remote endpoint and source and sink all the media (RTP(audio/video) and data) channels.

This is not different from any other telecommunication device and the end to end is from one endpoint to the other. If you want to look at it differently it is also true for your cellphone where someone may have inserted some audio listening and transmission inside your phone. This is not the scope of the work. In the example you have about the home security, this may be a use case but only if the connection to the remote viewers is done using a single SIP call and the CLUE protocol (advertise and configure) over a secure data channel is done after a secure SIP connection is established, media is sent via this single connection point. The connection of the cameras and microphones to this central connection point is not in the scope of the work even the data format is not specified, it may analog to the central point who will do the actual encoding or any other type of connection. We do not decompose the endpoint in CLUE.


The CLUE system uses SIP for starting the call like and other SIP UA and after the call is set it opens a data channel for carrying the CLUE protocol as discussed in drfat-ietf-clue-datachannel-11, the security section of the framework refers to this document for the security of the data channel and the data it carries.



Roni Even


From: [] On Behalf Of Phillip Hallam-Baker
Sent: Wednesday, December 02, 2015 6:35 PM
Subject: Review of draft-ietf-clue-framework-24


I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG.  Document editors and WG chairs should treat these comments just
like any other last call comments.



This standards track document describes what is essentially an enhanced data model for negotiating telepresence configurations in cases where a given party may have multiple capture devices offering multiple streams. Choice of streams may be constrained by device capabilities. A camera may offer a closeup of the speaker or a wide view of the panel but not be capable of providing both.



Security considerations.



One context issue I am having here is understanding what the relation of this document is to the others it is referencing. For example, there is a normative reference to  <> draft-ietf-clue-protocol-06. Is that to be considered by the IESG at this point? If so it does not have a security considerations.


If the point is to publish the framework doc as an RFC so as to set the context for further discussions of the protocol, this is OK. But otherwise there is a normative reference to a document that doesn't have a security considerations section and desperately needs one.


This is a big problem as the Security Considerations section in framework is pointing forward to 'authorization mechanisms' that are presumably to be described in protocol.



Given this situation, these comments may be taken as input to the framework doc or the documents to be written using framework as the architecture. 



As a general matter, it would be easier to analyze security if terms such as 'confidentiality' and 'integrity' were used. This is particular the case when the specification in question is dealing with audio and video. for example the phrase "an endpoint attempting to listen to sessions in which it is not authorized to participate" is almost certainly intended to cover video as well which is seen and not heard.


Looking at the considerations in this way gives us the following considerations:



   Disclosure of media streams to an unauthorized endpoint.

   Disclosure of metadata to capture devices.

   Failure to terminate access to media streams at completion of a session.



   Modification of media stream data

   Introduction of spurious media streams.



   Denial of Service against capture devices

   Denial of service against output devices


I think this approach would be helpful when it comes to writing the protocol authorization sections.



As a general rule, the term 'endpoint' is now meaningless and should not be used. Yes, end-to-end security is a good thing. But you show me which are the 'endpoints' here. 


End to end is Alice's brain to Bob's brain. 


Between that we have mouth/face -> cameras/ mics -> capture host(s) -> inter-network -> output host(s) -> displays/speakers -> eyes/ears.


An attacker may target any of those modules and any of the interfaces between them. Using the term 'endpoint' is ambiguous.



The metadata disclosure problem can be quite insidious. Let us say we are using CLUE to collect media streams from a home security grid. I have 11 cameras on the perimeter pointing in and another 7 on the residence pointing out and one on my desk. The one on my desk can be considered to be trustworthy, if someone has compromised that, I am screwed. But that isn't the case for the perimeter net which is cobbled together from Raspberry Pis and cheapo cameras. That net is placed in a location I know is vulnerable.


Lets say we have an intrusion. First thing I do is to fire up a conference call with my security contractor. I don't want someone to be able to compromise one of my perimeter cameras in a way that tips them off to the fact the intrusion has been detected.



Introduction of spurious streams might be one of the best ways to attack a conferencing system. If I can see the main speaker and the audio is a little fuzzy, attacker introduces an additional stream with filtering that makes it more attractive to whatever AI is managing the conference. Now the attacker can literally put words in people's mouths. Could be fun for politicians giving town halls.