Re: [MMUSIC] Fwd: New Version Notification for draft-uberti-mmusic-nombis-00.txt

Robin Raymond <robin@hookflash.com> Thu, 26 March 2015 18:01 UTC

Return-Path: <robin@hookflash.com>
X-Original-To: mmusic@ietfa.amsl.com
Delivered-To: mmusic@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BFD741A87B9 for <mmusic@ietfa.amsl.com>; Thu, 26 Mar 2015 11:01:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GzhX9RS6Iooh for <mmusic@ietfa.amsl.com>; Thu, 26 Mar 2015 11:01:44 -0700 (PDT)
Received: from mail-ig0-f174.google.com (mail-ig0-f174.google.com [209.85.213.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DB76B1B29DD for <mmusic@ietf.org>; Thu, 26 Mar 2015 11:01:33 -0700 (PDT)
Received: by igbud6 with SMTP id ud6so137282637igb.1 for <mmusic@ietf.org>; Thu, 26 Mar 2015 11:01:33 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:message-id:in-reply-to :references:subject:mime-version:content-type; bh=2srOs7d3wNr2L4bx+WJNDQd4ZFEkdMHsb3nBy69c1fo=; b=KxqaPnKddMYQAlbw813i4PF+uvrtyeACwOlzMruKEVyF4cL6FXc5WFlPyQxZL8JrRo tdYYvTWrkaqKXL1JuFlQ9xgs6vO3hlW9vGs9ZzmipR/5rBXbKzAPutl9zMkagvzEuYVF D/xppQj3yWPWdWyuNyY4aybBqVhhRuUFpZHn7ElIJLI0MJqFKz/h0L61Z/aH6WAfQvPE OTo1lSIL8UhsTGg0yUYziUcP3AhxS7z/qjtNTrzDetkhrCId1PRa+oVIXNo2xm+5zwAd bGmxRqcyvpQnZqfd77IZbHSixO+fQtP63SPkAufM1RxbjmqvT8KOxgaw37wxzg3vBA4l hJ5Q==
X-Gm-Message-State: ALoCoQnaYTJTCHpTbEOuX+LGtVdLcbTdyBln9iS6AyEdbp7efNUvuWn6tFNGnp7c77eV8pWxxU7q
X-Received: by 10.50.171.170 with SMTP id av10mr38720270igc.28.1427392893238; Thu, 26 Mar 2015 11:01:33 -0700 (PDT)
Received: from Robin-iMac.home (bas5-ottawa10-2925208292.dsl.bell.ca. [174.91.34.228]) by mx.google.com with ESMTPSA id d10sm3645631igo.0.2015.03.26.11.01.31 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 26 Mar 2015 11:01:32 -0700 (PDT)
Date: Thu, 26 Mar 2015 14:01:31 -0400
From: Robin Raymond <robin@hookflash.com>
To: Justin Uberti <juberti@google.com>
Message-ID: <etPan.5514497b.6b8b4567.df61@Robin-iMac.home>
In-Reply-To: <CAOJ7v-3T3yoMigTehDWVXAv1q0b1Y0JE=+7_esLjGAcxGi=M5Q@mail.gmail.com>
References: <etPan.550a0f0d.520eedd1.32e@Robin-iMac.home> <CAOJ7v-3T3yoMigTehDWVXAv1q0b1Y0JE=+7_esLjGAcxGi=M5Q@mail.gmail.com>
X-Mailer: Airmail (286)
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="5514497b_327b23c6_df61"
Archived-At: <http://mailarchive.ietf.org/arch/msg/mmusic/BtYW2eFMZDoHkS2M6bPybJ3KU8I>
Cc: "mmusic@ietf.org" <mmusic@ietf.org>
Subject: Re: [MMUSIC] Fwd: New Version Notification for draft-uberti-mmusic-nombis-00.txt
X-BeenThere: mmusic@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <mmusic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mmusic>, <mailto:mmusic-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmusic/>
List-Post: <mailto:mmusic@ietf.org>
List-Help: <mailto:mmusic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmusic>, <mailto:mmusic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Mar 2015 18:01:52 -0000

Couple of points of response:

1) I misunderstood your use of LIFETIME, but I understand now. I agree with you that the controlling side should be able to send binding requests to keep any candidate pairs "alive" and non pruned. LIFETIME may be an option to help control that or simply stop sending BINDING requests might be enough. I do think it would be good for the controlled side to be able to sending binding requests to keep candidate pairs alive too since the controlled may have a different agenda than the controlling side. Without that, the controlled side would always require OOB methods to signal its intentions to the controlling side and then the controlling side would have to perform binding requests on behalf of the controlled side in order to keep some candidate pairs alive. In other words, having that option for the controlled side might make some signalling scenarios easier if it wasn't required to be always from the controlling side.

2) I do see value in NOT pruning some candidate pairs that are not actively being sent binding requests, aka "backup candidate pairings". You're absolutely correct those pairings may not succeed later when tested due to firewall pinhole closures even if both sides electively and simultaneously retest the "backup candidate pairing" at a later time. But with the advent of mobile IPv6 and TURN servers in the mix we do have some semi-stable IPs / ports available (especially if TURN MOBILITY-TICKET ends up being used); there is a real chance of success under many circumstances.

Specifically I'd like to be able to use back-up candidate pairs in this scenario:
  a) mobile client sets up a connection with another desktop webrtc client;
  b) mobile client nominates wlan but keeps its local wwan candidate alive (e.g. IPv6 as a backup) [tells remote side which local candidates remain alive (or not)]
  c) desktop client keeps TURN relay alive (as a backup as well); again no binding requests are sent to keep the relay candidate pairing alive;
  d) mobile client does NOT power wwan up while wlan is working (to not consume battery life); again there are no binding requests being sent over wwan;
  e) wlan works fine, but then wlan is temporary disrupted (not long enough to cause a full failure [and thus no ICE restart just yet])
  f) both mobile client and desktop client detect minor disruption and immediately send binding requests to backup candidates pairings while assuming/hoping the wlan will come back; wwan is established as a "just in case" backup (and maybe temporarily switched to during the time of disruption);
  g) back-up candidate pairs are likely to succeed and thus the disruption is minimized should the previous wlan disruption becomes permanent (should wlan come back then wlan can be used again);

Obviously keeping candidate pairs "hot" with binding requests is better where possible but with mobile we really need to keep those radios transmitters powered down if they don't absolutely not needed. That's why having candidate pairs as "backups" works well for mobile in those cases as opposed requiring "hot" and active candidate pairs via binding requests. I understand it won't work all scenarios but it should work well enough to be a sufficient as a backup due to a temporary disruption without performing a full ICE restart.

Plus, we have to consider that the OOB channel being used for exchanging new candidates might also become disrupted in the event of a wlan failure, If the wlan was maintaining the OOB TCP channel it would be dead too. Thus relying on an ICE restart to from disruptions might not be as successful as we might hope (i.e full OOB signalling paths may need to be re-initialized from scratch before any new candidates can be exchanged). [NOTE: I'm aware multi-path TCP could help keep OOB channels alive in the future when devices and servers support multipath TCP more readily but even that won't likely help; Keeping a wwan TCP path connection alive would likely require wwan power-up and that's not desirable either.]

3) Agreed. I think the wlan to wwan and then back to wlan is challenging / problematic but solvable if the wlan re-trickle candidates when we detect the wlan is back up after having been pruned.

Again, all in all, excellent proposal and strong +1 from me.

-Robin


On March 19, 2015 at 1:13:49 AM, Justin Uberti (juberti@google.com) wrote:



On Wed, Mar 18, 2015 at 4:49 PM, Robin Raymond <robin@hookflash.com> wrote:


[BA] As noted in Section 3.3: 

   the controlling endpoint MUST have some way to indicate to the
   controlled side that specific candidates are to be kept alive.

To achieve this goal, is it necessary to send Binding Requests for each candidate pair to be kept alive? This could consume quite a bit of energy, particularly if there are multiple candidates on both sides.  I wonder if there are ways to achieve the goal more economically.  

For example, on the controlled side, if the relay candidate receives a check but the corresponding host candidate doesn't, it would make no sense to prune the host candidate.  Similarly, if a relay candidate on the controlled side is kept alive due to receipt of a periodic check originating from a WLAN interface, does the controlling side also have to periodically wake the WWAN interface so as to check that same relay candidate?  I realize that the WWAN interface does need to be periodically awoken to satisfy the requirements on RFC 5245 Section 4.1.1.4.  

The explicit timeout approach could reduce the burden still further by reducing the interval at which checks need to be done.  
[Justin wrote]:
This is a good question, but I think any NAT pinholes for established pairs need to be kept alive, similar to what is described in 4.1.1.4 for candidates. Given that keeping your srflx candidates alive will require waking the wwan interface somewhat frequently anyway, pinging existing pairs can probably be done with little additional energy expenditure if properly coalesced. Or, if this is a significant issue, the app may not choose to keep wwan active if wifi is solid.


As a general note, this draft is very much on the right direction.

Thanks - glad it looks useful. 

As for having to keep the wwan ‘hot' for the sake of pinhoeles/turn - not necessarily. Consider the case where the controlling side is on mobile but the controlled side is desktop with TURN. The mobile side might keeps it wwan IP alive but not used (e.g. IPv6 interface). The controlled side might keep it’s TURN candidate alive as a backup should the primary path fail. Neither side need to be actively pinging between each other but keep their respective interfaces alive in the event of primary path failure. You are correct that the pinhole won’t remain open since it’s not an active tested path but both are good back up candidates in the event of primary path failure (and if both check simultaneously then the pinholes will open). The moment both sides simultaneously detect failure they could both start doing connectivity checks to the other which will re-open the pinholes and allow traffic on the backup path (but this can only happen if candidates are NOT pruned away).

This doesn't work in all cases, e.g. NATs with address-dependent mapping. For those, the binding must be kept warm or else it will become invalid and repinging will generate a new binding.
 

I think the couple of “TODOs” you have in the draft will resolve many of the remaining issues I see. Specifically, 5.2, “should the controlled side have any say …]. My answer is “yes”, either side should be able to send a connectivity check to keep an active candidate pair alive but I agree controlling side should always pick the nominated/active path.

I wasn't proposing the controlled side be able to keep pairs warm; it probably makes no sense to do so since the controlling side decides what is actually used. The question here is whether the controlled side can specifically drop 'backup' candidates that have some downside, e.g. cost. It seems potentially useful so I agree we should provide for this.


The other TODO "decide if this implicit timeout approach is correct, or if we should have some sort of approach similar to TURN LIFETIME indicating when a pair should be GCed, with LIFETIME==0 indicating immediate GC.” I think a LIFETIME option for a candidate (and indirectly would be the combined pair lifetime) would be a good idea. This would allow an ice transport to know which remote candidates are considered gone vs still around / available. Thus candidate pairs can be kept as “backup” should the primary path(s) look like it is failing and they can be checked again for connectivity as long as both local / remote candidate pairs are still considered alive. When a primary path looks like it might be failing, both parties could recheck simultaneously their backup candidates (which would open firewall pinholes at that time to each other) and the backup candidates could therefor work and a temporary switch could happen to the backup candidates until the primary path either comes back [or fails completely]. This will work well with TURN and/or IPv6 on mobile. This would require ICE not prune non-failed pairs so long as the combined candidate pair LIFETIME was still alive.

The point of LIFETIME would be to declaratively tell the controlled side to stop using a pair. Otherwise, we have to use some sort of timeout. I think either could work, it's just a question of whether LIFETIME provides more flexibility.

There is another scenario that does cause me concern though. I’m concerned that ICE will cause fallback to wwan (good) but it will not recognize that a wlan was back available again (bad). A wlan can temporarily be disrupted causing a fallback to wwan. Because candidates are only trickled when they become availing and the IP for the wlan never goes away when connectivity fails, the wlan candidate are never trickled again (and a new STUN test might reveal the same firewall port so that won’t be trickled again either). If the wlan interface is just intermittently failing then remote party may have pruned the wlan candidates away and never retest it again and thus the connection remains on wwan permanently. Since the connection is working there’s no reason to restart ICE either. So I think there is a need here that a wlan device may have “reachability” checks it conducts on its own and then re-trickle the candidate back when it detects “reachability" for the wlan has changed to available.

As I understand it, the failure scenario here is for a wlan pair to be selected, then wlan fails, wwan is selected, and wlan is pruned due to multiple failures. I think this is a subset of the problem of being on wwan, and then entering an environment where wlan is available - in this case, you'd want to collect new wlan candidates since the wlan interface is preferred.

So I’m +1 on this draft and +1 for controlled keep-alines and definitely a +1 for candidate “LIFETIME".

-Robin