Re: [core] Your cool presentation on friday core meeting

Alexander Pelov <alexander.pelov@telecom-bretagne.eu> Thu, 19 November 2015 08:22 UTC

Return-Path: <alexander.pelov@telecom-bretagne.eu>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B64151ACCE6 for <core@ietfa.amsl.com>; Thu, 19 Nov 2015 00:22:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.951
X-Spam-Level:
X-Spam-Status: No, score=-0.951 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, BIGNUM_EMAILS=2.999, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HELO_EQ_FR=0.35, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TbE25nJyRM58 for <core@ietfa.amsl.com>; Thu, 19 Nov 2015 00:22:12 -0800 (PST)
Received: from zproxy210.enst-bretagne.fr (zproxy210.enst-bretagne.fr [192.108.117.8]) by ietfa.amsl.com (Postfix) with ESMTP id D4EA31ACCE4 for <core@ietf.org>; Thu, 19 Nov 2015 00:22:11 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by zproxy210.enst-bretagne.fr (Postfix) with ESMTP id 9C7FF232066; Thu, 19 Nov 2015 09:22:10 +0100 (CET)
Received: from zproxy210.enst-bretagne.fr ([127.0.0.1]) by localhost (zproxy210.enst-bretagne.fr [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id i0HpzSa94mJX; Thu, 19 Nov 2015 09:22:09 +0100 (CET)
Received: from localhost (localhost [127.0.0.1]) by zproxy210.enst-bretagne.fr (Postfix) with ESMTP id 9FBF6232169; Thu, 19 Nov 2015 09:22:09 +0100 (CET)
DKIM-Filter: OpenDKIM Filter v2.9.0 zproxy210.enst-bretagne.fr 9FBF6232169
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telecom-bretagne.eu; s=CFDC2CFA-4654-11E5-AACD-7BCC68B6580D; t=1447921329; bh=yVzMv2u1h4EEJ5giYcYNJ/Li2Sv3eT5Wq9qlbKDwrjg=; h=Content-Type:Mime-Version:Subject:From:Date:Message-Id:To; b=Yon2r/+N5QeAM55IYLRI9si5UkttNcTG+ytk1fokSEtkg0UEhMnyV8pV821VU/2pf enqmERJ64sFCDXUbeKjCt6VeVQKilwbDv8lJyhb5MNNk+SNeMWSPCojgnSBomnEino 4g/QnrcKDyaRvDk5S4R+sbyCYbIUjvEXa81Lwwrs=
X-Virus-Scanned: amavisd-new at zproxy210.enst-bretagne.fr
Received: from zproxy210.enst-bretagne.fr ([127.0.0.1]) by localhost (zproxy210.enst-bretagne.fr [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id x99IGX4pwSSG; Thu, 19 Nov 2015 09:22:09 +0100 (CET)
Received: from [10.138.187.203] (unknown [193.54.23.146]) by zproxy210.enst-bretagne.fr (Postfix) with ESMTPSA id 2D4E7232066; Thu, 19 Nov 2015 09:22:09 +0100 (CET)
Content-Type: multipart/alternative; boundary="Apple-Mail=_0D054025-4596-4F6F-98EA-1CA327E64C60"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Alexander Pelov <alexander.pelov@telecom-bretagne.eu>
In-Reply-To: <f7f13a18f7759f4c9b25c4e47b5d7412@xs4all.nl>
Date: Thu, 19 Nov 2015 09:22:08 +0100
Message-Id: <161D3A3E-B063-42A1-BF31-03C118847908@telecom-bretagne.eu>
References: <82c7667140aeaa28efab31a778a26204@xs4all.nl> <564C89C6.40903@ackl.io> <f7f13a18f7759f4c9b25c4e47b5d7412@xs4all.nl>
To: consultancy@vanderstok.org
X-Mailer: Apple Mail (2.2104)
Archived-At: <http://mailarchive.ietf.org/arch/msg/core/9xxMsgR6hRBcCIKDK-hM2RpV3Ec>
Cc: Core <core@ietf.org>
Subject: Re: [core] Your cool presentation on friday core meeting
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Nov 2015 08:22:16 -0000

Dear Peter,

Once again, sorry for the brievity.

Important remarks to keep in mind:
1) there is a seamless way in which you can include hashes within a Structured ID (hashes are a subset of structured IDs).
2) hashes are incompressible. Structured IDs are very-well adapted to compression.

As we are working on converging on the CoOL and CoMI approaches, it seems to me that 1) and 2) allow for a graceful cooperation in this area.

See inline.

> Le 19 nov. 2015 à 08:43, peter van der Stok <stokcons@xs4all.nl> a écrit :
> 
> Hi Alexander
> 
> see between <pvds> and </pvds>
> 
> Alexander Pelov schreef op 2015-11-18 15:23:
>> Hi Peter,
>> I'm quite busy these days. I would like to send you a detailed
>> response, but unfortunately this will be impossible today.
>> Please, see in-line for some initial remarks.
>> Le 18/11/2015 09:09, peter van der Stok a écrit :
>>> Hi Alexander,
>>> I like to react to your presentation of CoMI during the Friday CoRE meeting.  That is necessary for me to understand the underlying factual discussion.
>>> Your statement “A hash clash 5 years down the road can break your network” needs some clarification. I interpret that, out of the blue, a hash clash can occur and lead to problems in the network. This is simply not true. A new hash clash may occur when modules are changed and recompiled in a server. At that moment and before the server is made operational hash clashes are detected and remedial actions can be taken.
>> One of the specific cases I cited is, when you have your network
>> running for months, maybe several years, before updating a YANG
>> module, which leads to hash clash. This leads to unexpected protocol
>> exchanges (e.g. clash-file loading), unexpected memory allocations
>> (the clients must learn that some of its servers have these new
>> clashes, while others have some other clashes, and so forth), bugs
>> that have not been tested for long time, ...
> 
> <pvds>
> I do not share your concern. hashes are generated before loading of the module, hash clashes are easily verified, and rehashes are also done before loading of the module.
> The rehash protocol concerns one client and one server and can be tested in isolation.
> Hash clashes are easily simulated.
> </pvds>

As indicated, the problem is not that your resource-unconstrained, human-operated compiler will not work flawlessly. It certainly will! 

The problem is that you have your hundreds of devices, coming from different manufacturers, working in a given manner, until one day you update some of them, which provokes all these unseen exchanges (rehashing), and CAN break your network 5 years down the road. (the statement of my presentation).

The biggest problem is actually thing-to-thing communication. With hashes you have no rendez-vous points within the hashed space.


>>> Your next statement “hashes are underspecified” also needs some clarification. I understood from you that the remark was motivated by the absence of a full-proof process defining the solution of hash clashes. Such a process is not necessary to assure inter-operability. Once solved, the server returns the unique rehashed values and there is no need to specify how the rehash values are reached. Nevertheless, the draft suggests that a tilde is prefixed to the YANG name, after which it is rehashed. There is a very small probability that the new hash clashes with another one. Actually, you said to be afraid that the hashing algorithm may continuously generate the same set of clashing values independent of the prefix. I have no idea if this is true, and do not propose to check it for murmur.
>> I've though on how to handle this specific question for a long time,
>> and unfortunately you cannot guarantee that re-hashing will no
>> guarantee new hash clash. Which could, in turn, generate new clashes.
>> Each clash leads to two new names that need to be rehashed.
>>> I agree that for efficiency reasons the same rehash process should be followed in all servers such that all servers with the same set of module versions arrive at the same rehash values. An approach different from rehashing is to assign the lowest not assigned natural number to the clashing names in lexicographical order. That will generally mean that two colliding names get the values 1 and 2 assigned, and very rarely the values 3 and 4 may be used.
>> I think that Structured IDs provide a way of handling this in a
>> reasonable manner. We've already discussed how in a structured ID you
>> can have hashes.
> <pvds>
> I showed that the problem above (continuous generation of hashes) can be easily avoided.

The problem is not the generation of hash-free clash-files. I am more then persuaded that you can do this. 

The problem is that the collision probability increases.

> </pvds>
>>> Last I should like to make a remark about probabilities. The whole world around us is based on probabilities. For example, there is a finite probability that a fatal fault will occur in an airplane during one hour of flight. Or that during digital transmission bits are toggled without detection by the checksum. These probabilities are calculated and should be smaller than a given probability value. This is a well-established engineering practice. Therefore, the clash probabilities are calculated in appendix E of the CoMI draft. It shows that for the targeted hash size and number of names, the probability that more than one clash occurs is 10^-3 smaller than the probability of one clash. These are quite small values.
>> What constitutes a "small probability" is a relative question. 10E-3
>> is typically considered quite elevated (not to say - unacceptably
>> high) collision probability. Given that you can have hundreds of
>> entries, this probability is even worst. Given that each collision
>> requires re-hashing, which could provoke other collisions, we end up
>> with a problem I consider to be quite dangerous.
> <pvds>
> Appendix E cites different numbers than you do above.
> For a server with 1000 hashed YANG names and a hash of 30 bits, the probability of a single clash in a given server is 5*10^-4.
> That means that in an installation with 2000 servers containing disjunct sets of 1000 YANG names (total of 2 million YANG names), there is on average one server with one clash.
> There being more than one clash has a probability that is 10^-3 lower.
> In my opinion these are acceptably low probabilities, because they promise quite small rehash information in the clients (In this case: no rehash or one server with one clash)
> BTW Andy has hashed all YANG names in all to him known modules and no clash occurred.

As I am quite lazy, I like using tools. http://davidjohnstone.net/pages/hash-collision-probability <http://davidjohnstone.net/pages/hash-collision-probability>
With a hash value over 30 bits, with 20 servers with 1000 YANG names the probability of a collision is 17%. Actually, I was using your numbers, and I do not consider that this will be a major problem by itself. The implications are that you WILL have hashes. 

> </pvds>
> 
>>> By the way, by relying on identifier assignment, there is also a finite probability that the same identifier is allocated to names on different modules (a hidden clash), due to power failures, undetected transmission errors, or simply copying mistakes.
>>> I recommend that in the security section of the identifier assignment draft, it is discussed how modules are detected with an identifier that has been assigned without going through the registration process.
>> That's a non-issue.
> <pvds>
> I regret that you take this issue so lightly.
> When there is no SDO that makes YANG name number allocation part of the certification process, YANG module writers are not very motivated to pass through a registry and assign the registered numbers.
> Consequently, servers may use identical numbers for different YANG objects.
> This means that the extension of your installation with a server runs the risk of duplicate identifiers.
> You need a procedure to test this, or at least write text that this a reasonable hazard.

Too long to develop here, but the crux of the matter is that the way YANG modules are used in constrained devices should influence the system design. The main points are:
1) we need 100% RESTCONF / YANG compatibility
2) we don’t need 100% compression of all YANG modules (most actually do not concern constrained networks)
3) people / SDOs / system integrators that want to run a given YANG module in a constrained network are quite interested in having their identifiers compressed

The reason I claim this is a non-issue is because there are many ways in which we can guarantee that identifiers are not assigned to different SDOs / YANG modules / etc. etc. 

> </pvds>
>>> Hope this will stimulate further discussion.
>>> Peter
>> Thanks for the useful remarks. I am glad when we have constructive
>> discussion. Sorry, for the brevity.
>> Best,
>> Alexander


To conclude, I think that we’ve initiated a very important process of splitting the CoMI/CoOL proposals and converging on a common ground (which is actually quite straightforward). Some details must be clarified, of course, but in general I don’t see any major issues.

The only real issues/questions will actually be solved by simply considering:
1) the constraints of constrained networks and devices (the REASON why there is CoMI/CoOL)
2) the way people use YANG/RESTCONF

Best,
Alexander



> 
> _______________________________________________
> core mailing list
> core@ietf.org <mailto:core@ietf.org>
> https://www.ietf.org/mailman/listinfo/core <https://www.ietf.org/mailman/listinfo/core>