Re: [clue] AD Review: draft-ietf-clue-signaling-11

Adam Roach <adam@nostrum.com> Fri, 25 August 2017 00:29 UTC

Return-Path: <adam@nostrum.com>
X-Original-To: clue@ietfa.amsl.com
Delivered-To: clue@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E66111329CD for <clue@ietfa.amsl.com>; Thu, 24 Aug 2017 17:29:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.879
X-Spam-Level:
X-Spam-Status: No, score=-1.879 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aPTksJnMYZEv for <clue@ietfa.amsl.com>; Thu, 24 Aug 2017 17:29:45 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 766A6132377 for <clue@ietf.org>; Thu, 24 Aug 2017 17:29:45 -0700 (PDT)
Received: from Orochi.local (99-152-146-228.lightspeed.dllstx.sbcglobal.net [99.152.146.228]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id v7P0TdAr053348 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Thu, 24 Aug 2017 19:29:42 -0500 (CDT) (envelope-from adam@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host 99-152-146-228.lightspeed.dllstx.sbcglobal.net [99.152.146.228] claimed to be Orochi.local
From: Adam Roach <adam@nostrum.com>
To: "Rob Hansen (rohanse2)" <rohanse2@cisco.com>, "clue@ietf.org" <clue@ietf.org>
References: <0b69d2f1-11e1-8fd1-d4a1-2faacc0a8528@nostrum.com> <d4cfe8e14c7c40f0963f5d3e65fd17f9@XCH-RCD-016.cisco.com>
Message-ID: <c4e95707-1fc6-0806-d878-da57397b1dde@nostrum.com>
Date: Thu, 24 Aug 2017 19:29:33 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:52.0) Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <d4cfe8e14c7c40f0963f5d3e65fd17f9@XCH-RCD-016.cisco.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/clue/WST1ZlqdVQtg13Opxk8evkeiy_Q>
Subject: Re: [clue] AD Review: draft-ietf-clue-signaling-11
X-BeenThere: clue@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: CLUE - ControLling mUltiple streams for TElepresence <clue.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/clue>, <mailto:clue-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/clue/>
List-Post: <mailto:clue@ietf.org>
List-Help: <mailto:clue-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/clue>, <mailto:clue-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Aug 2017 00:29:48 -0000

Thanks! Responses inline.

On 8/20/17 21:40, Rob Hansen (rohanse2) wrote:
> Section 4.5.4.3: "Note that this is distinct from cases where the CLUE protocol negotiation fails, or an error occurs in the CLUE protocol; see [I-D.ietf-clue-protocol] for details of media and state preservation in this circumstance." -- I carefully scrubbed the CLUE protocol document to try to determine what this is referring to. Please change it to "see [I-D.ietf-clue-protocol] section X.Y.Z", but replacing "X.Y.Z" with the section that provides the details you allude to.
>
> [Rob] I believe when I wrote this the plan was that call preservation actions in the event of a protocol error/failure would be addressed as part of the protocol document, but that this section had not yet been written, and that remains the case. Simon, is this something you have planned, or can you point me at the relevant section?

Simon is on vacation for (I think) at least another week or so; but I 
agree that this may need some coordination. See also my earlier response 
to Roni.

> BLOCKER: Compare the normative statements in paragraph 2 of Section 5.3:
>
>      Generally, implementations that receive messages for which they have
>      incomplete information SHOULD wait until they have the corresponding
>      information they lack before sending messages to make changes related
>      to that information.  For example, an answerer that receives a new
>      SDP offer with three new "a=sendonly" CLUE "m=" lines for which it
>      has received no CLUE Advertisement providing the corresponding
>      capture information SHOULD include corresponding "a=inactive" lines
>      in its answer, and SHOULD make a new SDP offer with "a=recvonly" when
>      and if a new Advertisement arrives with Captures relevant to those
>      Encodings.
>
> With the normative statements in section 4.5.2.2:
>
>      If the initial offer contained "a=recvonly" CLUE-controlled media
>      lines the recipient SHOULD include corresponding "a=sendonly" CLUE-
>      controlled media lines for accepted Encodings
>      ...
>      If the initial offer contained "a=sendonly" CLUE-controlled media
>      lines the recipient MAY include corresponding "a=recvonly" CLUE-
>      controlled media lines
>
> 5.3 says "SHOULD set a=inactive" in the exact same circumstances 4.5.2.2 says "SHOULD set a=sendonly". Please pick one expected behavior and make sure both sections agree. Ideally, you would refactor this so that the normative statement is made in only one location.
>
> [Rob] I don't think these sections are in conflict - the quoted paragraph from section 5.3 is referring to cases where the SDP offer includes "a=sendonly" lines, whereas the section in 4.5.2.2 saying "SHOULD set a=sendonly" is talking about that the SDP *answer* including "a=sendonly" lines in response to the offerer's "a=recvonly" lines. It's the paragraph below that corresponds to the quoted 5.3 paragraph, which says that the SDP *answer* MAY include "a=recvonly" in its response or "MAY" wait, and then references section 5.3, which is where the quoted paragraph with recommendation that implementations should wait and send a subsequent SDP is included. We ended up with this approach because, even though in most cases implementations should wait until they receive the information about the encodings and their contents via the CLUE channel, there are some valid use-cases where implementations will know this up-front and hence can avoid the need for multiple SDP exchanges.

Ah, okay. I see what you're getting at here. I think the problem, then, 
is that the language in 5.3 isn't really normative per se (or, rather, 
it shouldn't be normative), as much as it is illustrative. (This is 
reinforced by the phrasing "For example...") I would propose:

     For example, an answerer that receives a new
     SDP offer with three new "a=sendonly" CLUE "m=" lines for which it
     has received no CLUE Advertisement providing the corresponding
     capture information would typically include corresponding "a=inactive"
     lines in its answer, and make a new SDP offer with "a=recvonly" only
     when and if a new Advertisement arrives with Captures relevant to
     those Encodings.


> General, but surfaced in section 8: The procedures described in this document virtually guarantee that every CLUE call that is established will result in glare (response code 491) behavior. This might cause the operations folks some heartburn, as it means that their error counts will spike once CLUE is deployed. Further, without fairly advanced analysis of the callflow, this will make it impossible to distinguish "expected" CLUE-induced 491s from the oddball actual glare conditions usually signaled by 491. Has any consideration been given to avoiding this situation (e.g., by having the called party wait on the order of one second before attempting to negotiate its encodings)?
>
> [Rob] I definitely agree that glare is much more likely at the start of a CLUE call. There was quite a bit of discussion in the group on the pros and cons of introducing an asymmetry into the call messaging to avoid (or reduce the frequency) of glare, and how best to do so, but the final conclusion in the end was not to do so and to rely on SIP's mechanisms to resolve it.

Sure. What I'd like to have positive confirmation on is: did the working 
group specifically consider the operational aspects of this decision? I 
agree that it works from a protocol perspective. I'm just worried that 
it will give operators unnecessary difficulty.

> Section 10: It is rather unusual to include authors in the acknowledgements section. For each of Rob Hansen, Paul Kyzivat, and Christian Groves, I suggest removing the individual's name from either the Acknowledgements section or from the authors list.
>
> [Rob] The authors list hasn't really been updated since the initial stages. Looking at other docs like the framework one I can see they've been revised a fair bit. For now I've left the authors as-is and removed the duplicate names from the acknowledgements, but will reach out to Paul and Roni for guidance here.

Thanks. Either resolution makes sense to me, and I suspect that the 
current author list is correct.

> Section 8: "In this case Bob is the Channel Initiator..." this isn't clear (and, in fact, it's counterintuitive to me) -- perhaps there should be some text indicating *why* Bob is the Channel Initiator.
>
> [Rob] I've made explicit that, when the SCTP over DTLS channel is negotiated, Bob ends up the client and hence the Channel Initiator. However, when I went to double-check that that was how the initiator role was assigned, I can't actually find anything in the protocol or datachannel document that defines who ends up with the Initiator role. That definitely seems like something that we need to fix... (unless I've just failing to find it). Simon, is this something you're planning to address?

The reason this seems counter-intuitve to me is that it is backwards 
from how RTCWEB (JSEP) works in the general case. To be clear, for 
datachannels, the TLS client is selected by the "a=setup" attribute; and 
JSEP implementations are required (MUST) to put "a=setup:actpass" in 
their offers, and expected (SHOULD) to put "a=setup:active" in their 
answers. The rationale here is: the way ICE ends up working, the 
answerer will have the first opportunity to send a packet, so this 
reduces overall setup time by ~1/2-RTT.

Of course, CLUE is free to do this however it wants [1]; but doing it 
opposite from RTCWEB is likely to confuse people beyond just me. I think 
you'd also need a reasonably good rationale, as a naïve analysis of CLUE 
is that doing it the way you currently have in your examples is 
generally going to impose an additional 1/2-RTT delay on datachannel 
establishment. But I freely admit that I haven't spent a lot of time 
thinking about the low-level details, and could be overlooking something.

/a

____
[1] Subject to the constraints in 
<https://tools.ietf.org/html/draft-ietf-mmusic-sctp-sdp-11>, sections 10 
- 11