[MEDIACTRL] AD review: draft-ietf-mediactrl-mrb-12

Robert Sparks <rjsparks@nostrum.com> Thu, 08 March 2012 22:19 UTC

Message-ID: <4F593060.5020002@nostrum.com>
Date: Thu, 08 Mar 2012 16:19:12 -0600
From: Robert Sparks <rjsparks@nostrum.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: "mediactrl@ietf.org" <mediactrl@ietf.org>, draft-ietf-mediactrl-mrb@tools.ietf.org, mediactrl-chairs@tools.ietf.org
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Received-SPF: pass (nostrum.com: 71.170.125.181 is authenticated by a trusted mechanism)
Subject: [MEDIACTRL] AD review: draft-ietf-mediactrl-mrb-12
Precedence: list

Summary: There are several concerns to address with a revised ID before 
progressing this to IETF LC.

This was a very difficult document to read, and the existing flow made 
some spec bugs hard to find.  I encourage the working group to consider 
another editorial pass focusing on informing the implementer while 
working on the technical questions below.

First a question:

Why is the use of <seq> different in 5.2.3 (the consumer interface) than 
it is in the publish interface? It has the server pick the initial value 
of a sequence space that the client has to use for subsequent requests. 
Why is that important?

Technical/Process issues:

The document currently normatively references RFC5705 ("Keying Material 
Exporters for Transport Layer Security (TLS)"). It's obvious from the 
place where this reference is made that it meant to reference RFC5707 
("Media Server Language (MSML)"). This is a _really_ big bug to have 
gotten through the various reviews that should have caught it so far. 
The working group should discuss the intended reference. If the document 
is changed to reference RFC5707 (at least as 5705 is currently 
referenced), it will be a downref (5707 was published on the independent 
stream).

The registration in 13.1 does not match the template required by RFC6230 
- please correct it.

The media-type review pointed out that the registrations in this 
document are using an outdated template. Please update them to use the 
templates in RFC4288.

Sections 13.6 and 13.7 use an incorrect URI for registering the schema - 
it should be
urn:ietf:params:xml:schema:mrb-publish and 
urn:ietf:params:xml:schema:mrb-consumer, not :ns:.
(:ns: is right as the identifier for the namespace, but not the schema)

There are several places (5.1.4.16 is the first) where the document is 
using xml:lang to try to specify what language is supported for 
automatic speech recognition or text to speech. That's not an 
appropriate use for that descriptor - xml:lang describes what language 
the xml entity it appears in should be interpreted with - see 
<http://www.w3.org/TR/2008/REC-xml-20081126/#sec-lang-tag>. You probably 
want something like <asr-support><language>en</language></asr-support> 
instead.

The phrase "compliant with the specification in the related IANA 
registry (e.g., "msc-ivr/1.0"), for which the <...> applies" occurs 
throughout the document. It is not precise. Which related IANA registry? 
I think you are trying to say "as it appears in 'Control Packages' 
subregistry of the 'IANA Media Control Channel Framework Parameters' 
registry. Since this occurs so often, it might be worth figuring out a 
way to present this using a reference.

Section 5.1.3.1 - Please clarify the scope of uniqueness of id.
Is it the intent that these only be unique within the scope of a given 
MS, MRB pair? Or is there a need for it to be unique across all the 
subscriptions concurrent in a given deployment?

Section 5.1.3.1's description of <expires> confuses what the MRB is 
asking for vs what the MS gave it.
This may be a structural problem - this section is about requests, and 
its description of <subscription> is written from a request
point of view, but <subscription> is reused by responses in 5.1.5.  This 
section should be clear that this is just the duration the MRB is asking 
for, and that the MS may provide a different value, and if it does, 
that's what to use to determine when to "subscribe again". It's also 
only implicit right now that if a success response doesn't say anything 
about the duration of the subscription, the duration is what was 
requested (or at least that's how I'm reading the document). Please make 
that clearer. (Editorial question: Why is the discussion of this request 
and response separated by the (really long) discussion of the notification?)

Section 5.1.4.1 - The scope for uniqueness for <media-server-id> is 
still vague - what does "system wide" mean? Are you trying to say 
"chosen such that it is extremely unlikely that two different media 
servers would present the same id to a given MRB"? Can the document 
provide guidance on constricting this id?

Section 5.1.4.5 and .6 - What is an "inactive RTP session"? (Can you 
provide a definition or a reference to a definition?).

Section 5.1.4.13's <audio-mixing-modes> speaks of a "specific available 
algorithm". What defines the values that can appear here?

Section 5.1.4.13's <video-mixing-modes> says the <video-mixing-mode> 
element can contain a non-XCON layout name as long as it is "properly 
prefixed". Please be more precise with what "properly prefixed" means.

In section 5.1.4.14, what does it mean to "support" these kinds of tone? 
Can you provide pointers to help explain how this will be used?

In section 5.1.4.15, what defines the values that can occur in 
<stream-mode>? Are these restricted to standardized protocols? Are the 
names case sensitive? Is (for example) HTTPS meaningful here? What is a 
receiver supposed to do with values it doesn't understand? Should the 
values appear in a registry? Is there advice the document should give MS 
or client implementers about the security implications of choosing how 
to populate this list?

In section 5.1.4.17, it's not clear what the values for the support 
attribute of <vxml-mode> are allowed to be. Are they the literal strings 
"RFC5552","RFC4240", and "IVR-Package"? If so, why is IVR-Package 
different (and not "RFC6231"). Are these values case-sensitive? Should 
the possible values be in a registry?

Are the values "RFC4733" and "Media" occuring in the various places 
dtmf-support is discussed case-sensitive?

The protocol allows describing the civic location of a media server, and 
qualifying a request with the civic location of a media server.
These should say something about the level of specificity of the 
provided location (do you just say what region something is in, or do you
specify its location down to a room number?) The security considerations 
section is silent on the ramifications, particularly those of being
able to influence where potentially large amounts of traffic are about 
to be sourced from. It would help to understand the tradeoff of 
providing this information by providing some discussion of why it needs 
to appear here.

There are several sections that talk about declaring support for SRTP 
(such as 5.1.4.21), but they don't appear to account for indicating what 
keying mechanism is used.

Section 5.1.5 and 5.2.6.1 declare that implementations must return a 420 
error if they receive XML with attributes/elements they don't 
understand. How is the sender supposed to figure out what 
attribute/element caused the 420 to happen?

Right now, I don't see what's supposed to prevent one MRB from affecting 
the subscriptions from another MRB.  It looks like a malicious MRB A 
could remove MRB B's subscriptions. A well meaning MRB A could 
accidentally disrupt MRB B. The short description of relying on 
authentication from the protocol you are embedding this one in doesn't 
seem to address that case. The security considerations section should at 
least discuss the threat. Similarly, what keeps one MRB from removing 
another MRB's session at an MS?

The document uses the term "call leg" in several places, and it's not 
clear exactly what that means in each context. RFC3261 stopped using 
that term for a reason. In some of the places this document uses the 
phrase it means "dialog". In others it means "media session". Please 
clarify each instance. Would it be possible to remove the term "call 
leg" from the document altogether?

In section 5.2.2.1, what prevents the MRB's INVITE from parallel forking 
on the way to the MSes? What is it supposed to do if it gets back 
multiple 200 OKs to that INVITE? Similarly, what is it supposed to do 
with failures? That question has two aspects. First, what can it do 
"downstream"? Can it effectively serial-fork if the first MS it tried 
didn't work out? Should it follow redirect requests?. Then, what is it 
supposed to do upstream? If, say, there was only one MS in the universe, 
and it returned a 603 to the INVITE, what SIP response is the MRB 
supposed to return to the INVITE it received?

Section 5.2.2.1's first bullet in the third list of bullets (starting 
"Include a payload in the SIP 2xx class response" is confused with 
respect to its actors. It seems to be trying to put a requirement on the 
MRB about constructing the application/sdp part of the multipart body, 
but the MRB can only copying something that the MS constructed.

The 4th bullet of section 5.2.2.1's third list of bullets inspires the 
question of whether MRBs in IAMM support offerless invites. Please make 
it clear in the document what an MRB should do with an INVITE that 
contains no SDP.

Does the last sentence of 5.2.3 conflict with the last sentence of 5.2.2.1?


Editorial:

Please consider another editorial pass through the document. There are 
several sections that are much harder to read than they could be. Some 
careful repositioning when discussions start to wander from the main 
point of a section would help significantly.

Throughout, please change MIME type to media type.

Why does the document say "SIP 2xx class response" instead of "SIP 200 
OK". What other response are you anticipating?
It's not incorrect, but it's odd that you leave this flexibility for SIP 
and not for HTTP.

There are several chunks of text that will not age well. The first 
paragraph of the Introduction is a prime example.
Many of these could simply be deleted. The others should be rewritten to 
make sense to an implementer 10 years from now.

There are many sections that introduce confusion through passive voice 
and extraneous text. Please consider removing
as much passive voice as possible and deleting sentences like the first 
in section 4. There is also some editorializing
that is not needed (such as the first sentence of 5.2.2). Please review 
each of those and confirm they add something
necessary to the document.

I found page 14 (sections 5 and 5.1) particularly hard to understand. I 
think the text there can be made much shorter.

The idea that MS selection is performed with the best information 
available is distributed (and buried) in several sections.  Consider 
moving the point to a standalone discussion, and state it as simply as 
possible.

In section 5.1.4.11, the semantics of max-prepared-duration are not 
clearly represented. It would help to say "will be kept in the prepared 
state before timing out" instead of "can be prepared" in the first 
paragraph. Adding a pointer to 4.4.2.2.6 of RFC6231 would also help.

In section 5.1.4.10, consider noting that the <file-formats> element 
describes media types, and might better have been named "media-format" 
but the name "file-format" is being used due to existing implementations.

Section 5.1.4.19: "not meant to provide any explicit information" is not 
what you mean. This field is definitely providing explicit information. 
What are you actually trying to clarify?

Section 5.2 claims the Consumer interface is defined by the 
'application/mrb-consumer+xml MIME type. This document defines the 
interface, it uses that media type. Please clarify the language in this 
section.

Section 5.2.2.1 first sentence - this is a strange use of MUST. Why is 
it there?

Should 5.2.3's first sentence say "media resource server" where it 
currently says "media resource"?

At the end of 5.2.1.2, lease hasn't been defined (and was only pointed 
to prior to this in the document once, long before here). Consider 
discussing leases earlier, or at least providing another forward reference.

The discussion of <expires> in section 5.2.6.1.1.1 (do we really need 6 
levels of section heading?) seems confused about actors - it recommends 
consumer clients refresh a leas, but provides an example of a server 
refreshing a transaction.

Section 8, 5th bullet uses "in advance" once more than it needed to.

In section 12, the document says a couple of times that security 
considerations from other documents MUST be used. What does that mean? 
Please adjust the text to clarify. It would be appropriate to say that 
security considerations from other documents are applicable. It would be 
appropriate to point to specific mechanics those security consideration 
sections call out and say those MUST be used. While adjusting, please 
avoid the temptation to be vague and say something like "the appropriate 
mitigation techniques from <blah> MUST be used". "Appropriate" in that 
context is not precise enough.

Consider noting that Ben's RAI review was on an early version of this 
document (it was on -03) (unless, of course, his review gets updated to 
reflect this version.)

[MEDIACTRL] AD review: draft-ietf-mediactrl-mrb-12 Robert Sparks