Re: [core] YANG notification within CoMI

Michel Veillette <Michel.Veillette@trilliant.com> Tue, 12 June 2018 14:55 UTC

Return-Path: <Michel.Veillette@trilliant.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 97DC6130F4F; Tue, 12 Jun 2018 07:55:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=trilliant.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LS4M1JptM6pl; Tue, 12 Jun 2018 07:55:00 -0700 (PDT)
Received: from NAM02-CY1-obe.outbound.protection.outlook.com (mail-cys01nam02on0717.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe45::717]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 90CEC130E3D; Tue, 12 Jun 2018 07:55:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Trilliant.onmicrosoft.com; s=selector1-Trilliant-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Jsqhn8ZYk9GFp2WjPbkLt1668ln16K8YGapDvOygnCc=; b=1wuGdeJnpafjeqo188rHE2QoIa+LDx17w1CtgOtN7Wtq5/nagHLBVLo4/lcENgWqya8OXGSV0TzFyw8i41znAONFX374ofLf4hEsxDDYhPCR/7UOm3PeqmFa7kP5kHpCFj9uE1yQLk/qCDcc5Od0tv0qmz96hIvsfp5Wg04Ht0o=
Received: from DM5PR06MB2777.namprd06.prod.outlook.com (10.175.107.139) by DM5PR06MB3004.namprd06.prod.outlook.com (10.175.109.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.841.17; Tue, 12 Jun 2018 14:54:58 +0000
Received: from DM5PR06MB2777.namprd06.prod.outlook.com ([fe80::f8bd:cd7c:eb9e:b248]) by DM5PR06MB2777.namprd06.prod.outlook.com ([fe80::f8bd:cd7c:eb9e:b248%10]) with mapi id 15.20.0841.019; Tue, 12 Jun 2018 14:54:58 +0000
From: Michel Veillette <Michel.Veillette@trilliant.com>
To: Carsten Bormann <cabo@tzi.org>
CC: Peter van der Stok <stokcons@bbhmail.nl>, Andy Bierman <andy@yumaworks.com>, Alexander Pelov <a@ackl.io>, "Eric Voit (evoit)" <evoit@cisco.com>, Henk Birkholz <henk.birkholz@sit.fraunhofer.de>, "yot@ietf.org" <yot@ietf.org>, Core <core@ietf.org>
Thread-Topic: [core] YANG notification within CoMI
Thread-Index: AdP/UVBv6lm2EKsrSzmsBpb94+0wbgC4TPCAAAmcHdA=
Date: Tue, 12 Jun 2018 14:54:57 +0000
Message-ID: <DM5PR06MB2777F7E160141E68194E0B7D9A7F0@DM5PR06MB2777.namprd06.prod.outlook.com>
References: <DM5PR06MB2777CAB016D2789C4F1DD67F9A7B0@DM5PR06MB2777.namprd06.prod.outlook.com> <836A7F59-C26B-4A6C-B9DE-331E9D1CB123@tzi.org>
In-Reply-To: <836A7F59-C26B-4A6C-B9DE-331E9D1CB123@tzi.org>
Accept-Language: fr-CA, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=Michel.Veillette@trilliant.com;
x-originating-ip: [207.96.192.122]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; DM5PR06MB3004; 7:vZ0PlvBmjbGQHKrEWNlr9gyZVIU5q58VXWwYKlqEEzzD8QaBPUJG4xbMt1XJsrbfA7wlnce9RhXvtFhRVo6+lKw8kXRpELldfc+fap1yooGV/JsmUnkQ9fbwaUFZ0bUdQYZj/V7+9Eq0guIFLnH3Jf4gWDg6cxZvitnJ7GqutNbyvmbtndThPrGOvb2tabWsS3JwKxAKToK9rn9zQe+F/rL7+peNAa/SOUZyRFkjtSeFwEa7AR5Ca/t+Zwd4Ifqn
x-ms-exchange-antispam-srfa-diagnostics: SOS;
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020); SRVR:DM5PR06MB3004;
x-ms-traffictypediagnostic: DM5PR06MB3004:
x-microsoft-antispam-prvs: <DM5PR06MB300460B776383229D0F0C6899A7F0@DM5PR06MB3004.namprd06.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(190756311086443)(158342451672863)(95692535739014);
x-ms-exchange-senderadcheck: 1
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(5005006)(8121501046)(3231254)(944501410)(52105095)(93006095)(93001095)(10201501046)(3002001)(149027)(150027)(6041310)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(20161123560045)(6072148)(201708071742011)(7699016); SRVR:DM5PR06MB3004; BCL:0; PCL:0; RULEID:; SRVR:DM5PR06MB3004;
x-forefront-prvs: 07013D7479
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39850400004)(396003)(376002)(346002)(39380400002)(366004)(199004)(189003)(31014005)(13464003)(106356001)(5250100002)(229853002)(33656002)(14454004)(446003)(476003)(99286004)(54906003)(486006)(55016002)(345774005)(3846002)(68736007)(6116002)(551934003)(105586002)(316002)(478600001)(7736002)(305945005)(72206003)(6436002)(11346002)(102836004)(4326008)(25786009)(6246003)(6506007)(2906002)(15650500001)(53546011)(3660700001)(3280700002)(8676002)(6916009)(5660300001)(8936002)(81166006)(81156014)(26005)(186003)(76176011)(7696005)(86362001)(66066001)(74316002)(2900100001)(9686003)(97736004)(53936002)(59450400001); DIR:OUT; SFP:1102; SCL:1; SRVR:DM5PR06MB3004; H:DM5PR06MB2777.namprd06.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: trilliant.com does not designate permitted sender hosts)
x-microsoft-antispam-message-info: dTwzbU/2h9HVyR3P6r85WJwMkxJfVKOoVhkwpN4p2WeclxmXKWDWZmTLAfwN0nmONbCa09VMziXtBFmkn04o3OPz7TrFn/KADnFXY5Gi2l9KjuWkz3UTpnt8n/mce7UUKMG3ITZn8fgRSdumz4qVD5QxeRN3JVN7YrUoaaR1yvAoXXvC65X3ZTZpzqm2Iuxq
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Office365-Filtering-Correlation-Id: 29aea41d-0f92-46f8-6e42-08d5d0747774
X-OriginatorOrg: Trilliant.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 29aea41d-0f92-46f8-6e42-08d5d0747774
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Jun 2018 14:54:58.0758 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4f6fbd13-0dfb-4150-85c3-d43260c04309
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR06MB3004
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/G7YgyzbX9ka6Bcf-6MDP7nevJF4>
Subject: Re: [core] YANG notification within CoMI
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Jun 2018 14:55:06 -0000

Hi Carsten

Thanks for sharing your points of view on this subject.

I'll first validate if the anycast address can effectively address my first issue.
I need to verify if I can configure the infrastructure to use an anycast source address for my CoAP traffic.

The CoAP library already support sharing of DTLS states.
This feature at least enable a solution based on a POST form the CoMI server.
I assume that the same technique is possible for the observe states, something to be validated.

I hope to get these answers prior to the next IETF in Montreal.

Regards,
Michel
 
-----Original Message-----
From: Carsten Bormann [mailto:cabo@tzi.org] 
Sent: Tuesday, June 12, 2018 5:48 AM
To: Michel Veillette <Michel.Veillette@trilliant.com>
Cc: Peter van der Stok <stokcons@bbhmail.nl>; Andy Bierman <andy@yumaworks.com>; Alexander Pelov <a@ackl.io>; Eric Voit (evoit) <evoit@cisco.com>; Henk Birkholz <henk.birkholz@sit.fraunhofer.de>; yot@ietf.org; Core <core@ietf.org>
Subject: Re: [core] YANG notification within CoMI

Hi Michel,

sorry for being slow to answer; I finally had a nice discussion with Henk today about your concerns that I want to summarize.

Let me propose some terminology first:  “servers” are a CoAP term, so when we talk about “heavy boxes with noisy fans installed into a rack”, I’ll use “machines” (as opposed to “devices” that are the light bulbs being managed).  Not great terminology, but better than using the term “server” for both.

You write about the observe-based notifications proposed for COMI:

> 1) This approach in incompatible with load balancers, notifications are directly returned to the specific server within the cluster which have initiated the observe request.

Whether that is true depends a lot on what kind of load balancer you are thinking about.
For UDP CoAP, an anycast mechanism is the obvious choice for a load balancer.

Clearly, for any load balancing mechanism to be useful, the machines sharing the load need to share state (symbolized by “database” in your diagram).
If they are clients, they need to share communication state as well as application state (e.g., addresses of servers they talk to, the tokens they use for outgoing requests).  They don’t need to do this for every single transaction (they don’t if there is no problem in an occasional loss), but they do have to share observe state.

When people talk about load balancers and resilience, they sometimes mean that the peer with initiative (here: the CoAP server sending another notification) needs to perform a full rendezvous with the other side (e.g., in the Big Web, a fresh DNS lookup that might lead to a completely different machine, potentially a new TLS session).  See (3) below for how we see this, for now let’s just say that we are trying to achieve generally rendezvous-free notifications.

> 2) Typical CoAP implementation (e.g. Californium) doesn't support the persistence of the observe context. These contexts can’t be recovered after a server restart and can't be shared between servers.

That is indeed a problem.  But the real problem is that they are not ready for sharing their client state at all.  Once they are, adding persistence to that sharing becomes a rather straightforward exercise.

So what I’m reading out of your message is that, to employ load balancing for COMI notifications, CoAP implementations need to grow support for sharing communication state (preferably with a persistent mode).  That is an important message to implementers, and thank you for highlighting this.

> 3) Registrations to event streams are not resilient, they can terminate unexpectedly upon transmission error or reception of a Reset message.

Now how does the resilience come in?

In the Big Web situation mentioned above, it comes from redoing the rendezvous each time a notification is needed (potentially with some caching, both of DNS state [leading to defined periods of blackholing] and of any connections still open [which will hopefully time out if there is a problem].  

In a rendezvous-free world, we have to do this explicitly.  For a CoAP server that cares about delivering the notifications, it will send (at least some of) the notifications as confirmable messages [it actually has to, once every 24 hours, but can do that more often if resilience calls for it].  So it will notice when the recipient of the notifications is no longer there.  [It will also notice if the recipient is confused enough to send a Reset, but persistence of communication state is supposed to make this a non-event.]  RFC 7641 tells us that the CoAP server is to cease delivering notifications when the client machine goes away.  It doesn’t tell us what else the implementation might want to do in that event.  In a COMI call home scenario, I would expect the device to notice that its relationship to home broke and redo the rendezvous (call home) — once, when needed, not for every transaction (mitigated only by caching).  

So, in effect, we can have all the advantages of the “do the rendezvous always, with caching” world with much less black-holing and unnecessary message overhead.

For the above to be actionable, we do have to have implementations that:
— on the big machines, can share enough communication state so they can take part in anycast-based load balancing, — on the small devices, can react to loss of an observation interest by redoing a call-home transaction.

The second one clearly is an implementation quality requirement.  Let’s work on that with the implementers.
The first could be thought to call for a protocol for coordinating the machines.  The IETF has not been very successful in establishing “reliable server pooling”.  I would actually expect implementations that want to provide that coordination to come with their own high-performance mechanisms, involving the usual state sharing databases such as Redis — they already have to do this with the management (application) state shared between the machines.

Grüße, Carsten