Re: [babel] WG Last Call for draft-ietf-babel-source-specific (2018-03-26 to 2018-04-09)

Denis Ovsienko <denis@ovsienko.info> Sun, 22 July 2018 12:29 UTC

Return-Path: <denis@ovsienko.info>
X-Original-To: babel@ietfa.amsl.com
Delivered-To: babel@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B6C77130DC1; Sun, 22 Jul 2018 05:29:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=ovsienko.info
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dl4OdFMALaCr; Sun, 22 Jul 2018 05:29:00 -0700 (PDT)
Received: from sender-of-o51.zoho.com (sender-of-o51.zoho.com [135.84.80.216]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B2013130DD0; Sun, 22 Jul 2018 05:29:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1532262532; s=zohomail; d=ovsienko.info; i=denis@ovsienko.info; h=Date:From:To:Message-ID:In-Reply-To:References:Subject:MIME-Version:Content-Type:Content-Transfer-Encoding; l=6126; bh=DxZ4SGY5cBo9t/5HXAQZ4mMmumo33pFH2WQLbs9vmEk=; b=BJ+BvObgsLEb6cTKnHtK7/yH3rcw8Xqv2SGcamznvOowMMxfkcSYYp5FVlqu4Zwf 8CGuV7eoW7LT2potG7TUltmfT8+4Q+19zsZz9WkdRz4c6HIoY3ByfByHVMF6mAXtE/3 mASMDNxMFy+dhZ991JilgNmAGIyrLeUzoZLtLYV0=
Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1532262531859683.3032684435532; Sun, 22 Jul 2018 05:28:51 -0700 (PDT)
Date: Sun, 22 Jul 2018 13:28:51 +0100
From: Denis Ovsienko <denis@ovsienko.info>
To: Babel at IETF <babel@ietf.org>, babel-chairs <babel-chairs@ietf.org>
Message-ID: <164c1f6c305.de4e0cfa319563.445636890330531708@ovsienko.info>
In-Reply-To: <87d0x6u1yd.wl-jch@irif.fr>
References: <CAF4+nEHUmjUcY7PS0eVDuPr8YHaJG4t+CyoxzMR15821X+-Vsg@mail.gmail.com> <CAF4+nEFa+ZFfYScDxbsCbe3bX=p6w+YKpq0eXa+tjtYZDzvwyA@mail.gmail.com> <163a3eefcb3.105e54392539813.8869059599002671510@ovsienko.info> <0B1F8607-E0D3-4725-A9F2-2ACF41207D57@irif.fr> <163cabe6d49.1115744068931.3357457871401802835@ovsienko.info> <87d0x6u1yd.wl-jch@irif.fr>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/K14Acm4dyX6hvCSSl8IA8TN_4a4>
Subject: Re: [babel] WG Last Call for draft-ietf-babel-source-specific (2018-03-26 to 2018-04-09)
X-BeenThere: babel@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: "A list for discussion of the Babel Routing Protocol." <babel.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/babel>, <mailto:babel-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/babel/>
List-Post: <mailto:babel@ietf.org>
List-Help: <mailto:babel-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/babel>, <mailto:babel-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Jul 2018 12:29:03 -0000

Another comment that I consider important for this WGLC and the whole Babel WG work.

 ---- On Mon, 04 Jun 2018 14:16:10 +0100 Juliusz Chroboczek <jch@irif.fr> wrote ---- 
 > > Please correct me if the following interpretation is wrong. There is 
 > > a deployed fleet of RFC 6126 devices and it is not going to disappear 
 > > immediately. There is going to be a deployed fleet of SS-extended 
 > > devices. When the two kinds randomly meet in the wild and 
 > > source-specific routes start to propagate, the network can randomly 
 > > break or degrade. 
 >  
 > > Acknowledging the problem in a document is a good start. But ending up 
 > > with a design that fails to fail safe looks wrong. 
 >  
 > Babel development tries, to the extent possible, to meet the needs of our 
 > users.  When we decided to go with the mandatory bits (and therefore break 
 > compatibility with 6126), I spoke with a number of users of Babel, some of 
 > them in the live, some of them by e-mail.  I then explained the transition 
 > plan on the babel-users mailing list, no less than three times. 

About the mailing list and the users.

People well familiar with mailing lists know that lack of negative feedback on the mailing list does not mean everything is good.

In partucular, I have been on babel-users for 6.5 years, and I have learned that a typical subscriber is either a network protocols hacker, or a community network operator, or both at once. For the pre-IETF Babel that was perfectly OK, and having most of the enthusiasts on a single mailing list indeed gave an impression that things will not go terribly wrong. If the network breaks, many subscribers can troubleshoot and sometimes even fix it.

But, first, not all Babel users are on the list. Some people are happy to enable a protocol and leave it running without joining any mailing lists. Like, for instance, OSPF -- its typical users do not join dedicated mailing lists or hack OSPF internals. They just follow the router configuration guide and move on to the next task. You would like to be able to coordinate the migration with 100% of the user base, but this is impossible.

Second, the IETF Babel is intended, through the implementers, exactly for end users, not network protocol hackers. When a random end user finds an access point that has an "enable Babel routing protocol" checkbox, their reasonable expectation will be either the feature works, or it fails safe. Or they will just randomly flip each available checkbox forth and back to see what happens. What actually happens will depend in which year the access point was flashed and with which implementation, and what other implementations happen to be in the network at the time.

Ironically, if IETF Babel becomes a success, it will motivate end users to enable as many Babel routers as they find around, and the probability of 6126bis source-specific running into a plain 6126 will increase, which will demotivate end users from using Babel. This is a negative feedback loop, in other words, 6126bis with version 2 works against the success of this routing protocol. This protocol design lesson has already been learned elsewhere, and well more than once.

 > There were no objections.  Our users are worried about a flag day, since 
 > they are unable to update all of their routers in a timely manner. 
 > However, they are not worried about an orderly transition: 
 >  
 >   - the next version of babeld will not break compatibility without an 
 >     explicit option; 
 >   - future versions of babeld will have a per-interface flag called 
 >     "rfc6126-compatible" that will cause babeld to remain compatible with 
 >     deployed implementations. 
 >  
 > Of course, I have not spoken with every single user of babeld.  However, 
 > I am confident that the above transition plan is the best we can do 
 > without splitting the community into "version 2" and "version 3", which at 
 > this stage would be tantamount to suicide. 

About the protocol version and transition.

Migrating from version 2 to version 3 is expectedly complicated. A valid version 2 will disregard version 3, consequently, version 3 protocol will have a fail-safe property. The cost and complexity of this migration are known beforehand and the migration can be planned. Indeed the most affected would be the user base of version 2, but that is a one-time pain. This looks like a classic known-good solution.

Moreover, the current revision of 6126bis has inherited the following text from RFC 7557: "The version number in the Babel header should only be increased if the new version is not backwards compatible with the original protocol."

Section 6 of draft-ietf-babel-source-specific specifically says this is exactly the case. 

Doing the migration on the base of version 2, if done carefully, will indeed make the migration easier for the existing version 2 user base (given they are in contact and willing to coordinate). But the resulting two Babel version 2 dialects will not have the ability to fail safe, and the place for a backfire will remain for years. This looks like gambling at somebody else's risk.

So, there is a problem and there are two ways to deal with it. Let me simplify it below to make it easier to see the difference:

* Make it easy in the short-term for network protocol hackers and difficult in the long-term for the general public.
* Make it difficult in the short-term for network protocol hackers and easy in the long term for the general public.

A part of the problem is, 6126bis does not acknowledge the problem at all. The least 6126bis must have is an explanation of this protocol design choice, whatever it is. Besides that, it would also be reasonable to expect, in a Standards Track document, a design choice that will be good for years ahead.

Before this issue with 6126bis gets properly addressed in the first place, it is impossible to conclude whether draft-ietf-babel-source-specific (and, for that matter, 6126bis itself, which has been in a WGLC since October 2017)  is technically sound for publication.

Thank you for reading.

-- 
    Denis Ovsienko