Re: [v6ops] Interesting problems with using IPv6

Mark ZZZ Smith <markzzzsmith@yahoo.com.au> Mon, 15 September 2014 01:03 UTC

Return-Path: <markzzzsmith@yahoo.com.au>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 16B1F1A0464 for <v6ops@ietfa.amsl.com>; Sun, 14 Sep 2014 18:03:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.203
X-Spam-Level: **
X-Spam-Status: No, score=2.203 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_LOCAL_NOVOWEL=0.5, HK_RANDOM_ENVFROM=0.001, HK_RANDOM_FROM=1, HK_RANDOM_REPLYTO=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pOU6P62o3HDb for <v6ops@ietfa.amsl.com>; Sun, 14 Sep 2014 18:03:22 -0700 (PDT)
Received: from nm36-vm8.bullet.mail.bf1.yahoo.com (nm36-vm8.bullet.mail.bf1.yahoo.com [72.30.239.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E4E1B1A0462 for <v6ops@ietf.org>; Sun, 14 Sep 2014 18:03:21 -0700 (PDT)
Received: from [98.139.215.142] by nm36.bullet.mail.bf1.yahoo.com with NNFMP; 15 Sep 2014 01:03:20 -0000
Received: from [98.139.212.224] by tm13.bullet.mail.bf1.yahoo.com with NNFMP; 15 Sep 2014 01:03:20 -0000
Received: from [127.0.0.1] by omp1033.mail.bf1.yahoo.com with NNFMP; 15 Sep 2014 01:03:20 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 884294.56539.bm@omp1033.mail.bf1.yahoo.com
Received: (qmail 11050 invoked by uid 60001); 15 Sep 2014 01:03:20 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com.au; s=s1024; t=1410743000; bh=2HTzsdb6dwlrcv1WYxx1Cj9XR2bTvViQG+OBITrCtQU=; h=References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=jxN/XP4UwepNAzUub7fWiT982GBDeU6ZQsmFTgTAakrOECYpdxk6Eae3Udsow3hgqBhI/paDHQNTzPHsMm8+uKVgngtJacJPBTAtxCHkEkOLturhKqt0IcuMU9UTiZYzwpLTx4c2dwO/p1skEbDjWGn+slD/RxUWtOCywiaq3qo=
X-YMail-OSG: DnZ1WC0VM1mnMvGKWqY9Vx94MA8QUMIW41wN_UIGT7e.6ou aEfOvylIlSdOFksh0.PXHYAo90vmB9orjSXjlSQ1qcnwgYrSGQzuBbpPXbeR 4XWuuxq_7rl1bjP2.gvr4dxk0rdumHe56MlR3CYgbdh2IXnEJCuZxghiFU5O duSuz1FYUP30VKXkrHhVGYIjiFrgfKsRwXH1AG2D0TkG0aLXh2pgVEb7uLXJ MmUyui6HdSpeTQbDKR.LaZNADQ1L8u.0KnFZIr2GJE8ErbAKEbdlSBxSMyEo LfDWFMdhVOHtV0Z7jRvJjnc_rzsVNW0WBQbKzHmciwmTre5OzM7N8VnczHOy g.y2ffgfv9mRZedSEZ3MQ7bVbtwbkjZ7zLk7s2UM.7MUi3.m9FL506VkBsHq yS73P74OFD3vxX0gbm_mdiMerlmtTHRNmwC9U_5kZ6OadqkprpvO9Q4KILqY hid4q9XKqSMZliX0mUeWWX_XyOgnz8YC5xEFjQTeiMBdJ3wVAYwO_kuOBPW_ riMNpJpqcUHxQXK0gSrbjuKYFaEEH4Jxt9ab2SLCMJXPRxcscpV0FSExzCSw xr1deqxYBHvCPh0cEPfU1iXV1J0Ko8CXuBa037Z8_14bLq5.MlzPHyEr8KNj E3hkQZKPUL1YakpRdlP9qIjOCEe0I_4Qg0.cVZnceygGAvE8BfoLomDK1U9V icFrvTsh6Sk7MJ9XMq6p7tp5TNSJPDRg_qLwdNDKnnnrFhXGk.esEqRMriaX Wxx88jvX0kzad6zjKxe6rgRO23ojYjs.UwW.omdiAHglS2o.S_g--
Received: from [49.182.6.179] by web162204.mail.bf1.yahoo.com via HTTP; Sun, 14 Sep 2014 18:03:20 PDT
X-Rocket-MIMEInfo: 002.001, SGkgQnJpYW4sCgoKLS0tLS0gT3JpZ2luYWwgTWVzc2FnZSAtLS0tLQo.IEZyb206IEJyaWFuIEUgQ2FycGVudGVyIDxicmlhbi5lLmNhcnBlbnRlckBnbWFpbC5jb20.Cj4gVG86IE1hcmsgWlpaIFNtaXRoIDxtYXJrenp6c21pdGhAeWFob28uY29tLmF1Pgo.IENjOiBEYWxlIFcuIENhcmRlciA8ZHdjYXJkZXJAd2lzYy5lZHU.OyBJUHY2IE9wZXJhdGlvbnMgPHY2b3BzQGlldGYub3JnPjsgImwud29vZEBzdXJyZXkuYWMudWsiIDxsLndvb2RAc3VycmV5LmFjLnVrPgo.IFNlbnQ6IFR1ZXNkYXksIDkgU2VwdGUBMAEBAQE-
X-Mailer: YahooMailWebService/0.8.203.696
References: <1410082125488.85722@surrey.ac.uk> <540CB702.3000605@gmail.com> <20140908183339.GB98785@ricotta.doit.wisc.edu> <540E26D9.3070907@gmail.com> <1410227735.13436.YahooMailNeo@web162204.mail.bf1.yahoo.com> <540E6299.2050003@gmail.com>
Message-ID: <1410743000.11973.YahooMailNeo@web162204.mail.bf1.yahoo.com>
Date: Sun, 14 Sep 2014 18:03:20 -0700
From: Mark ZZZ Smith <markzzzsmith@yahoo.com.au>
To: Brian E Carpenter <brian.e.carpenter@gmail.com>
In-Reply-To: <540E6299.2050003@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/v6ops/Eo8LGonJvf4ZCIiXLoKN2EdSpAQ
Cc: IPv6 Operations <v6ops@ietf.org>, "l.wood@surrey.ac.uk" <l.wood@surrey.ac.uk>
Subject: Re: [v6ops] Interesting problems with using IPv6
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: Mark ZZZ Smith <markzzzsmith@yahoo.com.au>
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Sep 2014 01:03:24 -0000

Hi Brian,


----- Original Message -----
> From: Brian E Carpenter <brian.e.carpenter@gmail.com>
> To: Mark ZZZ Smith <markzzzsmith@yahoo.com.au>
> Cc: Dale W. Carder <dwcarder@wisc.edu>; IPv6 Operations <v6ops@ietf.org>; "l.wood@surrey.ac.uk" <l.wood@surrey.ac.uk>
> Sent: Tuesday, 9 September 2014, 12:14
> Subject: Re: [v6ops] Interesting problems with using IPv6
> 
> Mark,
> 
> My point is that it's worth *understanding* such problems and then
> perhaps writing up operational or implementation recommendations.


Sure.

The trouble with that blog post is that it doesn't provide conclusive evidence that the Router Alert option in MLD messages is the only cause of their networking problems.

The actual blog post doesn't really provide much technical evidence of anything. The evidence is does provide however is that they haven't been performing troubleshooting well, nor maintaining their network very well:

- they took a 'suck-it-and-see' approach to trying to fix the first issue, and in particular, changed multiple things at once. If the problem then disappears, which change made resolved it? Would they now perform disruptive firmware upgrades even though that change may not have been the one that fixed the problem?

- they suspect they have 'bridge loops' forming, and that is overloading the control plane during these 'bridge loops'. Yet they don't say they have spent any time confirming their speculation, or any effort in identifying the cause of these 'bridge loops'. They also speculate about what the consequences of these 'bridge loops' are (STP messages being dropped because of control plane overload). They're actively working on taking actions to remedy this, without any evidence their speculation is true. They're jumping to conclusions.

With so little apparent proper evidence collecting and proper troubleshooting and diagnosis, how can the blog author be sure IPv6 MLD with the RA option is the specific cause of their problems? Consequently, how can this group then accept that as conclusive and therefore spend time speculating on changes that could be made to MLD that may also be completely ineffective on that network - because it may not be the cause at all? There isn't even enough detail in that blog post to be able to reproduce the scenario independently - what hosts/host OSes are they using, how many are there, are they running other multicast applications etc., etc.,

I think it is possible that RA option in MLD messages might be one of the causes of or contributors to their problems. However, there are many other possible causes that don't seem to have been investigated.


One reason I'm sceptical about MLD being the cause is because versions of Windows since Vista (released January 2007) have been IPv6 enabled by default, and have been issuing solicited-node MLDv2 reports for its link-local addresses, and if global or ULA prefixes are available, has been generating privacy addresses and corresponding solicited-node MLDv2 reports for those too. So since at least 2007, many MLD messages have been sent onto many different switched networks for many years. If RA option in MLDv2 reports were such a problem to switched networks, we should have heard about it by now and have also seen other networks suffer from this problem. This problem seems to be unique to this network, and therefore I think that is a sign that something else is going on, in addition to the other signs like this 'bridge loops' and unexplained software and perhaps hardware faults.

I've proposed using these solicited-node MLD reports to further help mitigate the ND cache DoS (draft-smith-v6ops-mitigate-rtr-dos-mld-slctd-node). People were concerned that hosts weren't sending them, so I've collected some packet captures of an individual host booting on a (virtual) network with a single IPv6 router (OpenWRT 14.07 rc1) announcing a single ULA prefix. If people want to see what is typical, here are the Windows captures (Windows XP after IPv6 was enabled). Note the captures were taken at the router's interface rather than the host's, so they're showing what the router receives from the hosts or sends towards them, rather than what the hosts were sending or receiving.

http://www.users.on.net/~markachy/windows-xp-sp3-boot-rtr-cap.pcap

http://www.users.on.net/~markachy/windows-vista-boot-rtr-cap.pcap

http://www.users.on.net/~markachy/windows-7-boot-rtr-cap.pcap

http://www.users.on.net/~markachy/windows-8.1-boot-rtr-cap.pcap

Regards,
Mark.


> Exactly what the v6ops charter says we should do, in fact.
> 
> "1. Solicit input from network operators and users to identify
> operational issues with the IPv4/IPv6 Internet, and
> determine solutions or workarounds to those issues. These issues
> will be documented in Informational or BCP RFCs, or in
> Internet-Drafts."
> 
> Or recommend protocol changes if they might help, which is why I
> want to understand the value of Router Alert in MLD messages for
> Solicited-Node multicast addresses. Or should we revive
> draft-pashby-magma-simplify-mld-snooping-01?
> 
>     Brian
> 
> 
> 
> On 09/09/2014 13:55, Mark ZZZ Smith wrote:
>> 
>> 
>> 
>>  ----- Original Message -----
>>>  From: Brian E Carpenter <brian.e.carpenter@gmail.com>
>>>  To: Dale W. Carder <dwcarder@wisc.edu>
>>>  Cc: IPv6 Operations <v6ops@ietf.org>; l.wood@surrey.ac.uk
>>>  Sent: Tuesday, 9 September 2014, 7:59
>>>  Subject: Re: [v6ops] Interesting problems with using IPv6
>>> 
>>>  I switched to the relevant list.
>>> 
>>>  On 09/09/2014 06:33, Dale W. Carder wrote:
>>>>   Thus spake Brian E Carpenter (brian.e.carpenter@gmail.com) on Mon, 
> Sep 08, 
>>>  2014 at 07:50:26AM +1200:
>>>>>   If they really are interesting problems, it might be more to 
> the
>>>>>   point to analyse them over on v6ops. Given the number of large
>>>>>   IPv6 deployments that don't have such problems, it seems 
> like
>>>>>   this particular deployment hit an unfortunate combination of
>>>>>   implementation issues.
>> 
>> 
>>  If this email is referring to this blog post:
>> 
>> 
> http://blog.bimajority.org/2014/09/05/the-network-nightmare-that-ate-my-week/
>> 
>> 
>>  then I think this particular network was already on the verge of 
> catastrophic failure, and some of IPv6's ND differences were just the 
> trigger to push it over that threshold.
>> 
>> 
>>  According to the blog post, they might have:
>> 
>>  - hardware errors
>>  - software errors
>>  - 'bridge loops' causing the control plane to overload
>> 
>>  In some of these cases they've taken actions to try to resolve them, 
> however the actions taken seem to be based on speculating what is happening 
> rather than finding evidence to support the hypothesised cause before taking 
> action.
>> 
>>  For example, there isn't any evidence that they've actually 
> determined that 'bridge loops' are occurring, found out what is causing 
> them, and then taking measures to prevent them or at least reduce the chances of 
> them occurring. If a 'bridge loop' can cause the control plane to 
> overload, then adding anything to this network like IPv6 is likely to only make 
> the problem worse.
>> 
>>  I'd like to see them get to the bottom of and then fix these 
> 'bridge loop' and other problems first before any value is placed on 
> addressing their criticisms of IPv6. Other people have deployed IPv6 without 
> these problems, so what is different between this network and everybody 
> else's that doesn't have these sorts of problems?
>> 
>> 
>> 
>>   That is worth understanding (for example,
>>>>>   how large is the layer 2 network that leads to the MLD 
> listener
>>>>>   report overload?).
>>>>   Implementing MLD snooping for Solicited-Node multicast addresses 
>>>>   is probably a bad idea.
>>>> 
>> 
>>>>   See: draft-pashby-magma-simplify-mld-snooping-01
>>>  OK, but I would also like to understand why we require
>>>  MLD messages for a Solicited-Node multicast address to
>>>  set Router Alert.
>>> 
>>>      Brian
>>> 
>>>  _______________________________________________
>>>  v6ops mailing list
>>>  v6ops@ietf.org
>>>  https://www.ietf.org/mailman/listinfo/v6ops
>>> 
>> 
>