[rtcweb] More on authorization and endpoint authentication

Eric Rescorla <ekr@rtfm.com> Thu, 04 August 2011 16:46 UTC

MIME-Version: 1.0
From: Eric Rescorla <ekr@rtfm.com>
Date: Thu, 04 Aug 2011 09:46:49 -0700
Message-ID: <CABcZeBM2hgNkBgvB=8uw_CKuQ+=F=TPBtJq16SyvQ=SKPNVY+A@mail.gmail.com>
To: rtcweb@ietf.org
Content-Type: text/plain; charset="ISO-8859-1"
Subject: [rtcweb] More on authorization and endpoint authentication
Precedence: list

I wanted to follow up to something I said at the microphone last week
with regard to Cullen's presentation
(http://www.ietf.org/proceedings/81/slides/rtcweb-13.pdf) and in
particular slide 9, which argues that using a first-class signaling
protocol for the actual signaling has better security properties.  I
got on Cullen pretty hard about this, but in retrospect I think I may
have been a bit hasty.

The basic imperative for securing the camera and microphone in RTCWeb
is that you be able to determine who the media will go to (e.g., you
are talking to Ford.) What I had in my head in my slides was that we
would do this by restricting access to the JavaScript APIs. I.e., that
you would grant www.ford.com the right to use the WebRTC APIs. I.e.,
the policy would be:

  P1: Allow any script coming from www.ford.com to call anywhere.

The way (or at least a way) you use this to get a call is that
DoubleClick brings up an IFRAME on Ford's site that loads up JS
calling the right location at Ford. And since the browser has
direct access to the origin, it can evaluate this policy
directly. If you're not willing to have a hosted IFRAME like
this, then the permissions reduces to letting DC call alyone.


The model in slide 9, however, is different: instead of restricting
access to the APIs to a given site, it instead allows anyone to
invoke the APIs, but *calls* are restricted to a given site. So,
in this case, the access control policy would be:

  P2: Allow anyone to place a call provided that it goes to
  *@ford.com.

The browser evaluates this (in the simplest case) by connecting to
ford.com's SIP server and verifying that it is indeed ford.com.



My (rather loud) argument at the microphone was that P1 and P2 have
similar security properties and I was trying to push back on Cullen's
claim that P2 was better. In either case, ford.com gets to capture
your audio and video and send it wherever it wants. And obviously, if
it just wants to turn on your camera and send it to your ex-wife it
can. However, thinking about it more, I think that claim is
too strong; there are two differences, one technical and one social.

The technical difference is simply that in one case Ford has to stand
up a bunch of new technical infrastructure to serve/service the WebRTC
code that establishes its origin and initiates the call [0]. By contrast,
the solution Cullen proposes allows reuse of all the existing calling
technical infrastructure (whether SIP or Jingle) that Ford already
has.

The social difference is about the assertion that Ford is making.
In the former case, Ford is saying "I'll do something with this call,
and you'll like it." In the second they are saying "you are making
a call to someone at Ford." So, while from a technical security
guy perspective, they are equally able to screw you over in either
case by lying, if they *aren't* lying then the policy your browser
is enforcing on your behalf is a lot clearer: "only call people
who have Ford addresses."


At a higher level, I think we've been failing to draw the distinction
between two cases which are socially (and arguably technically) very
different:

1. The calling service is permitted to place a call but all the
   information about who you are actually calling is accessible
   only to the JS provided by the calling service.

2. The calling service is permitted to place a call but the
   browser has independent information about who you are
   actually talking to. [and could at least in principle render
   it to the user.]

Matthew's UI document and Cullen's presentation both point towards
the notion that #2 is valuable as well, with Cullen's presentation
being more oriented towards signaling and Matthew's more oriented
towards the media-level security, but in both cases suggesting
that we may not just want to authorize the calling service and
let it call wherever with no other independent security mechanisms.

I'm not sure whether this concept covers all the relevant cases, since
at least conceptually it depends on the notion that everyone has an
actual name that is somehow qualified by a universally meaningful name
and there are probably cases where I just want to phone home to the
Web site. However, it seems like it's at least plausibly a useful
piece of functionality in addition to what I had been thinking of as
an origin-based API restriction.

-Ekr


[0] It can of course outsource that with <script src="..."> or
Akamai, but then it loses control of the interaction.

[rtcweb] More on authorization and endpoint authe… Eric Rescorla
Re: [rtcweb] More on authorization and endpoint a… Bernard Aboba
Re: [rtcweb] More on authorization and endpoint a… Harald Alvestrand
Re: [rtcweb] More on authorization and endpoint a… Eric Rescorla
Re: [rtcweb] More on authorization and endpoint a… Ted Hardie
Re: [rtcweb] More on authorization and endpoint a… Igor Faynberg