Re: Reducing the battery impact of ND

Andrew 👽 Yourtchenko <> Sat, 11 January 2014 19:58 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 3BD451AE06E for <>; Sat, 11 Jan 2014 11:58:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.7
X-Spam-Status: No, score=-1.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, MIME_8BIT_HEADER=0.3, SPF_PASS=-0.001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id elkumBw2fwyP for <>; Sat, 11 Jan 2014 11:58:56 -0800 (PST)
Received: from ( [IPv6:2607:f8b0:400d:c00::229]) by (Postfix) with ESMTP id 5F3911AE06D for <>; Sat, 11 Jan 2014 11:58:56 -0800 (PST)
Received: by with SMTP id w8so99790qac.0 for <>; Sat, 11 Jan 2014 11:58:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=ZCoe3BYSWp46viXU1eCl2H/6pa3rKhiaXBqLVTJWzl8=; b=F+chCALlwVtBmzCE1JdNyR9X2sp1uV3EbEIWpyJ3hJivlQcCDmZi7ICTCReOA/nqJr 9tQS/Z8EHSq2LxhAz5ieYJpZDMqQYVjahdjtTySsLTGZ5P/sHS8tXV0d/oo771VsGxOb Udbw5wLFdP5sTb7RB0Xx3jnO2ao27ULp/Q0moTZ4mdfBsOmR4X+JG1At1IW0/nlDlFaw h9RUlrLGD1bqQvV/XixKoMJkHCr0XeeThg6ZhbeLwP0EkN4lDNL1N0F2X9TDZnPRqKNS VMsTb4aZGxXRgn55aG9n10ECfIKrR5oBxj6hvffSY4aZ0qYeIlwam/6R+7Xf9qyjRC7n J52Q==
MIME-Version: 1.0
X-Received: by with SMTP id s6mr22254841qeo.60.1389470325874; Sat, 11 Jan 2014 11:58:45 -0800 (PST)
Received: by with HTTP; Sat, 11 Jan 2014 11:58:45 -0800 (PST)
In-Reply-To: <>
References: <>
Date: Sat, 11 Jan 2014 19:58:45 +0000
Message-ID: <>
Subject: Re: Reducing the battery impact of ND
From: Andrew 👽 Yourtchenko <>
To: Lorenzo Colitti <>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Sat, 11 Jan 2014 15:17:39 -0800
Cc: 6man Chairs <>, 6man WG <>
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 11 Jan 2014 19:58:58 -0000


On 1/11/14, Lorenzo Colitti <> wrote:

> On Sat, Jan 11, 2014 at 8:57 AM, Andrew 👽 Yourtchenko
> <>wrote:
>> 2) The bigger problem - the battery life:
> This can be a real problem, yes. But as you say, I think we can mitigate it
> well without changing ND at all.

assuming some cooperation between L2 and L3, yes :-)

>> Spontaneous all-hosts multicasts interfere with these mechanisms, thus
>> the battery drain on the devices is much higher
> One thing that I think we haven't discussed here is that in principle,
> there are *very few* all-hosts multicasts - basically, only RA. Even RS is

Thanks to Tim for mentioning dns-sd. It is another big offender.
Funnily enough, both RAs and DNS-SD are primarily acting up in the
larger-scale multi-AP environments, due to roaming: according to my
experiments, at least an iPhone roaming from an AP to AP acts as if it
has reattached to another link (and for good reason - there can be
plenty of setups where this is the only robust behavior).

In the home WLAN, the impact of the problem is quite less. Still not
zero, but less - I am enjoying not having to maintain the DNS for the
gadgets at home and just run mDNS on all of them - and the battery
life is okay-ish.

> all-routers multicast, not all hosts. So really the only thing we're
> talking about is solicited-node multicasts for NS (and occasional
> unsolicited NAs, which should only really happen when nodes change their
> MAC addresses. Note that I'm not talking about malicious nodes here because
> any malicious node can decide to spam the link with multicast messages
> regardless of what the ND protocol looks like.

While a node can jam the airtime with sending the packets, all packets
first have to travel to the AP, which then retransmits them to all
So, in the environments where you can not trust the wireless clients,
you can still keep the handle on the clients (with the L2 tweaks
again, of course).

> I think that using multicast instead of broadcast and defining
> solicited-node multicast addresses based on the IID, were very wise
> choices. This is because there are so many different groups (2^24), that
> even in very large networks, each solicited-node multicast group will
> likely only have one member.

This is indeed quite an elegant approach.

But, the funny part - an intermediate L2.5 can track who has which
address based on DAD/traffic, or  multicast group membership, or,
both. Realistically you have to do both, and this is how the IPv4
already has the similarly (and maybe more) efficient implementation of
ARP for WiFi, for those implementations that care to optimize that

Nonetheless, the 2^24 groups do allow additional flexibility.

> So - at least in principle - there are two
> layers at which the battery impact of NS packets can be reduced to zero or
> close to zero:
>    1. If the AP keeps state of what MAC addresses have joined which
>    multicast groups, it can selectively turn multicast NS into a set of
>    unicast NS (L2 unicast; not L3 unicast). This will be a pretty
> significant
>    bandwidth optimization both because a) as you say, unicast is faster
> than
>    multicast, and b) because one multicast NS will almost certainly turn
> into
>    only one unicast NS. Win/win.

Yes. MLD snooping++.

>    2. Even if the AP does not do #1, if the wifi chipset in the device
>    keeps state of what multicast groups the device has joined, then the
> wifi
>    chipset can simply drop packets that aren't interesting to the device's
>    main CPU. From experience we know that this can also lead to massive
>    battery savings - the wifi chipset is basically on a lot of the time
>    anyway, because it has to listen for beacons and incoming packets, and
> this
>    sort of filtering can be pretty efficient.

Do you have a pointer to read up more on these results ?

AFAIK, the "radio is always on" is not always true, because:

1) each WiFi packet has a field ("duration/id") which determines how
long the sending station will take for the transmission, so the radio
can be in a tiny "nap" mode for this period (since it *knows* the
carrier is busy anyway).

2) a node going to sleep can indicate so to an AP, and can sleep until
an AP notices incoming frame to be sent onto the air, buffers it, and
sends a beacon an indication that there is traffic for this station.
Till then the radio can stay in a relatively low power mode. (I'd need
to refresh what happens for the multicast packets in this case). Once
the radio sees that it has pending packets, it wases up and sends the
poll packet to the AP, which then resends the buffered packets to the
station, which are then acked by the station..

This "pending traffic" stuff is a bitmap - so I'd imagine not a whole
lot of processing is possible besides that.

so, in my understanding, lone packets are bad, whatever they are.

> That leaves RAs. Multicast RAs are a bane for battery life, because every
> time a device joins the network, it sends an RS. If the router responds
> with a multicast RA, then all devices on the link get a packet which they
> didn't need. On large wifi networks, I've seen this cause RAs once every
> 3-4 seconds, which is really painful in terms of battery life. Fortunately,
> there's an easy solution here: respond to router solicitations with unicast
> RAs sent to the sender. This is pretty trivial, and it doesn't require any
> state anywhere. There is a corner case where, if 10000 devices come online
> at the same time because the AP just booted up, you can get a thundering
> herd problem and have to send 10000 unicast packets, but this is can be
> optimized too: if the router sees that it's sending more than 100 solicited
> unicast RAs per second, it can simply send a multicast RA instead.

One AP can get up to 2007 clients, so it will be a multitude of APs
that will have to boot up and come online.

Each client, before it comes to sending the RS, has to exchange at
least 4 packets with the AP (authentication request/response,
association request/response). This of course has to happen over the
air - which will naturally rate-limit the process.

In short: you *really* don't want APs with 10000 connected clients to
lose them all at once - with or without IPv6 :-) - so for realistic
scenarios, unicast solicited RAs might work fine. (I don't have the
practical experience of rebooting a 10000-node network in such a
situation, can't tell where the bottleneck is going to be :-)

>> - especially in a typical network where the mobile devices move, and
>> might
>> trigger an RS/RA on each L2 roam between BSSIDs.
> The client devices I'm familiar with don't send RSes when L2 roaming
> between BSSIDs on the same SSID. Is this a bug? Should they?

Thinking more - I might be mixing the DNS-SD packets being sent. I
tested in the office capturing the DNS-SD traffic (also all-hosts)
while walking around. I do not remember if I captured the RA. I'll
retest on monday.

Intuitively, in the well-behaved enterprise environment we probably
don't want to be gratuitous about multicast  - the infra knows all
well about the node. But I can easily see where a mobile device
manufacturer, confronted with a multitude of weird 2-3 uncoordinated
APs home WLAN setups, could be compelled to flash the packets "just to
be sure".

> I hear someone saying: "Yes, but this is all about wireless, DC is all
>> different story - it's all switches, they don't hold the state!" - yes
>> and no.
> I'd argue that it's not "yes and no", it's just "no". The state required by
> the approaches I suggest above is the same amount of state as multicast
> snooping, and it's pretty much the same amount of state required by the
> efficient ND approach - but it doesn't require complicating the ND protocol

Lots of enterprises have to migrate VMs around the place - which, in
the absence of any hooks, for the host looks like it magically wakes
up in a completely different place in the network, and for the network
looks like the host has disappeared in one place and reappeared in the
other. And presumably there is a L2-like link between these two
places, with X layers of CAM tables, at the same time some parts of
this stuff run atop the overlays across the L3 network. All this time,
you want to ensure that this MAC address that reappeared, is a
legitimate MAC address, and not someone spoofing your server.

Also, on the WiFi the roaming decisions are host-driven and undoable
(of the reauthentication to the new AP fails, the host can stay at the
old AP), in the VM migration case they are external to the host and
host has no control over them.

So, it is not the same situation.

>> Maybe worth to split the problem area into smaller subsets and see
>> whether a "cheaper" solutions are possible. Let's first take the
>> "multicast RAs" problem, and assume we do the below steps:
>> a) Allow the environments that wish to do so, to send solicited RAs
>> unicast. (standard already permits to do so).
> Yes please!
>> b) Have the hosts restart the  Resilient RS [5] process at 1/2 or so
>> of router lifetime expiry: the router now can have the heuristic to
>> know which hosts are supporting the "unicast RA update" mechanism
>> *and* did not receive the periodic RA which would have been sent
>> usually at 1/3 the lifetime expiry (three RAs per lifetime rule of
>> thumb).
> Yes please!
>> c) bump up the allowable MaxRtrAdvInterval and AdvDefaultLifetime to
>> their maximum theoretically possible values (spec says the hosts
>> should already be able to handle those, I did not test though) - this
>> would allow to further reduce the periodic RA frequency, from 2.5
>> hours today to ~18.2 hours, or, if we feel adventurous, put all-ones
>> to be a "solicited-only RAs by default" - thus, unless another router
>> sends an RA with different lifetime, refreshing the router info
>> becomes host-driven, with an option for a router to override at any
>> time.
> This sort of depends what device you have. For example - on a mobile phone,
> receiving one packet every 2.5 hours is *so far down* in the noise that it
> just doesn't matter. Your phone is doing a massive amount of stuff already
> - it's syncing your email, receiving wifi beacons, checking calendar
> alarms, etc. etc. All of that uses way more CPU than receiving one
> multicast RA.
> There may be other devices that have lower power requirements, though I'm
> not sure - perhaps these devices simply can't use 802.11-style wifi because
> it's too expensive, and so already have to use something like 6LowPAN. I
> don't know if there's a use case here.

if you go unicast-only-RA (the all-ones case), you can achieve all the
benefits that you speficied power-savings wise, with zero changes to
the pipes between the router and the hosts. Some networks may have
control over the hosts, limited control over the routers, and less of
control over the wifi inbetween.

Also, for a major part of WiFi networks (university campus,
enterprise, large-scale event, etc) 18 hours is "infinity" for
practical purposes: a huge percentage of the clients will not be on
the network for that long.

This would provide a valuable pain relief mechanism in case there is
some multicast-related issue which prevents the multicast RAs reaching
the client.

The other reason for this, besides the above, is that since the host
is required to handle all the ranges of the lifetimes anyway, we might
as well have the behavior 100% predictable and they do not blow up if
someone decides to send such a packet.

>> These steps take care of the "multicast RA" problem. But again, also,
>> each of them by itself brings incremental usefulness, and is doable on
>> a single device => no "chicken and egg", they gradually shrink the
>> issue - and are compatible with the existing tweaks that solve the
>> same problem (in a more layer-violating way).
> Care to work together on a document that provides operational guidance to
> optimize battery impact of ND on wifi networks? Since they involve zero
> protocol changes, v6ops would be a good candidate group for it. And it
> might help persuade vendors that are not your employer to implement stuff
> like this :-)

Definitely interested! Let's sync unicast.

>> Now let's think of updating the various caches while roaming:
> How much more do you think this sort of thing will help, assuming we do the
> "100% feasible using the existing protocol" stuff above?

I don't know. Depends on how many L2 belts and suspenders we are
aiming to make unnecessary.

I suppose we could see how much remains after axing the issue with the
existing protocol.

> NB: the above does not take care of the "defending the host's address
>> on behalf of the sleeping node" problem.
> An NS is 72 bytes, and if we do multicast snooping, then a sleeping node is
> only ever going to be woken up if a node with the same last 3 address bytes
> shows up. That should be extremely rare.

As I wrote above, my understanding was by the time you are awake
enough to poke into the MAC address, you are way too much awake
already :-)

But as I said I'd be interested to see more info to educate myself
better on this !