Re: [nmrg] draft-irtf-nmrg-autonomic-network-definitions-01 feedback

Brian E Carpenter <brian.e.carpenter@gmail.com> Tue, 22 July 2014 11:29 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: nmrg@ietfa.amsl.com
Delivered-To: nmrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 58E6B1A0AC9 for <nmrg@ietfa.amsl.com>; Tue, 22 Jul 2014 04:29:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DFNTDjpXQhqK for <nmrg@ietfa.amsl.com>; Tue, 22 Jul 2014 04:29:39 -0700 (PDT)
Received: from mail-we0-x231.google.com (mail-we0-x231.google.com [IPv6:2a00:1450:400c:c03::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7D3AF1A00E8 for <nmrg@irtf.org>; Tue, 22 Jul 2014 04:29:38 -0700 (PDT)
Received: by mail-we0-f177.google.com with SMTP id w62so9001405wes.22 for <nmrg@irtf.org>; Tue, 22 Jul 2014 04:29:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:organization:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=vptw9WaYDkgZP7mUd3o7A6f2zOaHmjuPKfZeULmN7bE=; b=nKvlI+U5uhd6RccuvmZDiPFYSXY1MOon/sKXkj4ZPu0NU7j7dreISDq7shBL6GT53O FwS0JBBZ4W4V82+s9KtFPndIRvCG8YWIgmZxvDdWxJ54IBPNsbAW12i/5PnUhQjApcis MPJr4RxSECSRB/2qknDwjQNDfzgez7eqMiMc3Vzj5FcQmOtQ8n6kpKIuZqTWkUtG3bOq t1fwl5IfhdgvFQNf2KodoEkiGPJNasXTelhvBCsNmZPs35IfN84wPgt4tRZ85AtkMgpY nwZCEZJ3lxB+fxhOsRcoLfHvaRLefKh6PHOQT2qqxy36YEZzfK2LZflhN+MVa/js2mcK CXaA==
X-Received: by 10.194.133.1 with SMTP id oy1mr32905553wjb.87.1406028575686; Tue, 22 Jul 2014 04:29:35 -0700 (PDT)
Received: from [31.133.140.161] (dhcp-8ca1.meeting.ietf.org. [31.133.140.161]) by mx.google.com with ESMTPSA id r9sm52896939wia.17.2014.07.22.04.29.33 for <multiple recipients> (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 22 Jul 2014 04:29:35 -0700 (PDT)
Message-ID: <53CE4B20.1010908@gmail.com>
Date: Tue, 22 Jul 2014 23:29:36 +1200
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Organization: University of Auckland
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: Laurent Ciavaglia <Laurent.Ciavaglia@alcatel-lucent.com>
References: <53CD5D41.6050302@cisco.com> <53CD8E33.7070808@gmail.com> <3AA7118E69D7CD4BA3ECD5716BAF28DF21BEE9C7@xmb-rcd-x14.cisco.com> <53CD9C24.4070002@alcatel-lucent.com> <B8B16B5F-EB44-4970-B18D-D326B3F218D1@freeradius.org> <53CDDE3D.8090105@alcatel-lucent.com>
In-Reply-To: <53CDDE3D.8090105@alcatel-lucent.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/nmrg/kQ7-6n6EZ1eLcrTN96EQv1Pa7F8
Cc: "nmrg@irtf.org" <nmrg@irtf.org>
Subject: Re: [nmrg] draft-irtf-nmrg-autonomic-network-definitions-01 feedback
X-BeenThere: nmrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Network Management Research Group discussion list <nmrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmrg>, <mailto:nmrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/nmrg/>
List-Post: <mailto:nmrg@irtf.org>
List-Help: <mailto:nmrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmrg>, <mailto:nmrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Jul 2014 11:29:41 -0000

Excuse front posting, but I would like to note that operators need
(and will I think insist on) an emergency override for a situation
where autonomic mechanisms go wrong and make mistakes. So in that
sense CLI will *always* have the ultimate power, and for any autonomic
node the most powerful CLI command might be 'disable autonomic'.

But even if CLI overrides everything else, it should certainly
be treated as the last resort.

Regards
   Brian

On 22/07/2014 15:45, Laurent Ciavaglia wrote:
> Arran,
> 
> Please see in-line, marked [LC].
> 
> Kind regards, Laurent.
> 
> On 22/07/2014 02:02, Arran Cudbard-Bell wrote:
>> Hi Laurent,
>>
>>> I generally agree with your explanation (difference and co-existence
>>> between intent and configuration). This makes sense to me.
>>>
>>> However, I have a question/comment on: "a more specific guidance
>>> takes priority over a less specific one", why / how have we ended up
>>> that CLI shall/should take priority over an intent?
>>> I would say that in the general case, again this makes sense and is
>>> perfectly applciable. However, taking the analogy of aviation, there
>>> are rules/principles(/laws), when a plane is operating in auto-pilot,
>>> that forbid a human pilot to take an action/command that will "harm"
>>> the plane/flight (i.e. the human could make mistakes, not be sane...).
>> Interesting. Say an autonomic function was measuring two links under
>> LACP or ECMP, and could determine that manually
>> disabling one of the links would cause utilisation of the other link
>> to exceed or come very close to 100%, are you
>> saying the administrator should be prevented from disabling that link?
> 
> [LC]: yes, or at least the administrator should be warned that such
> change would impact/has impacted the performance negatively.
> 
>>
>>> So why should a CLI overrides an intent? or stated differently, we
>>> should insert some principles/laws to drive the priority among the
>>> different configuration/intent/other interactions. (see also
>>> reference text below for more detail)
>> I think possibly the difference is that SNMP/CLI etc.. are possibly
>> intended as temporary overrides for maintenance
>> purposes? and intent represents the default state of the autonomic
>> function, bent to the operator's idea of what the
>> network should look like? Also seeking clarification from Michael.
> 
> [LC]: we have tried to develop this aspect with Michael f2f after I
> raised the point. the "temporary overrides" / "default state" you
> mention are key. if the network is performing as planned/expected, then
> there should not be reasons to override the intent except for
> temporary/minority fixes/tuning.
> [LC]: the manual mode shall(/should?) always be possible and take
> priority over the automatic/autonomic one, however the system may raise
> warnings if the manual changes impact negatively (or in too large
> proportion) the network behaviors (wrt. the intent).
> [LC]: one example (from Michael): imagine you secure an entire area with
> encryption (intent), except one link to a traffic analyzer you wish to
> keep w/o encryption. this manual override is ok. but if you end up
> in-securing 20-30% of the links by manual changes, then the system
> should be able to warn the administrator that the state of the network
> is not anymore "close"/in accordance with the (initial) intent; even if
> there might have been good reasons to do these manual changes.
> 
>>
>>> The autopilot is designed to fly the aircraft within the normal
>>> flight envelope
>>>
>>> The management of the network is autonomic unless there are severe
>>> problems
>>> The autopilot automatically disengages if the aircraft flies
>>> significantly outside the normal flight envelope limits
>>>
>>> The automatic management of the network is automatically disengaged
>>> if the key performance indicators significantly differ from normal
>>> (good) values
>> The cases where this would be useful are likely to be small.
>>
>>  From a security standpoint, being able to force the network to apply
>> different controls by triggering overload
>> or manipulating performance metrics is undesirable.
> 
> [LC]: true. however, I think this is equally valid for an autonomic
> network entity or a human operator... right?
> [LC]: the key point resides in how to make/ensure the system is/remains
> robust, stable, predictable when confronted to adverse/tough conditions
> (could be attacks, overloads, loss of/no access to information/sensing...).
> 
>> Disabling ECMP like functions, loop prevention, route distribution,
>> link aggregation, multicast propagation/group
>> control etc... Would also likely have a negative impact on network
>> performance.
>>
>> There's more use in determining why the gauge is now showing an
>> abnormal value than disabling autonomic control and
>> hoping the network behaves in a sane way.
> 
> [LC]: my understanding is the disabling of the automatic mode is
> triggered under the assumption that, in the particular situation faced
> by the system, a more capable entity (e.g. the human
> pilot/administrator) would take over and avoid the crash.
> 
>>
>> -Arran
>>
> 
> 
>