Re: [Netconf] Is there a problem with confirmed commits?

Robert Wilton <rwilton@cisco.com> Mon, 14 January 2019 15:49 UTC

Return-Path: <rwilton@cisco.com>
X-Original-To: netconf@ietfa.amsl.com
Delivered-To: netconf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CC0971310DC for <netconf@ietfa.amsl.com>; Mon, 14 Jan 2019 07:49:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -19.053
X-Spam-Level:
X-Spam-Status: No, score=-19.053 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-4.553, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A-TapWPcFPJP for <netconf@ietfa.amsl.com>; Mon, 14 Jan 2019 07:49:16 -0800 (PST)
Received: from aer-iport-3.cisco.com (aer-iport-3.cisco.com [173.38.203.53]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6C6201310DD for <netconf@ietf.org>; Mon, 14 Jan 2019 07:49:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=6983; q=dns/txt; s=iport; t=1547480955; x=1548690555; h=subject:to:references:cc:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=qPRgOyQA1n0ciBISsZUC+TlsTB2WYOXsONUIk+qKwS4=; b=IJ3ntSgwuPeiUMduC7UXHCKb9CGU5h8ahiJEhuJpAWWRtAZDt6P5Cx2R mw1huT3hObfP/Nvr7YAdOxRLC8EQSfghzbfY5bDvkdhXj9mRB8Ph8qqXJ ZVt/VhEPKpHjXZneGW2hglD0/qQfoYq9wJKWhi3ysU0jkXcS5G34+pIj7 0=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0BDAAD3rjxc/xbLJq1bBQMZAQEBAQEBAQEBAQEBBwEBAQEBAYFlgVuBD08hEieEAYh5jG4IJXyXFIFnDRgLgVSCL0YCgmE4EgEDAQECAQECbRwBC4VKAQEBAQIBAQEhDwEFNgsFBwICCxABBAEBAQICJgICGwwoCAYNBgIBARuDAwGBeQgPrU6BL4VChF0FBYEGi0uBQD+BEScMgioHLoMeAQGBLgENBQEJNyaCQoJXAolSh32QNQmSAgYYihqHZYFiiBOJbYcKgV0hZXEzGggbFTuCNAougicXg0szhGGFPz8DMIghgj4BAQ
X-IronPort-AV: E=Sophos;i="5.56,478,1539648000"; d="scan'208";a="9378922"
Received: from aer-iport-nat.cisco.com (HELO aer-core-4.cisco.com) ([173.38.203.22]) by aer-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Jan 2019 15:48:58 +0000
Received: from [10.63.23.64] (dhcp-ensft1-uk-vla370-10-63-23-64.cisco.com [10.63.23.64]) by aer-core-4.cisco.com (8.15.2/8.15.2) with ESMTP id x0EFmw24020548; Mon, 14 Jan 2019 15:48:58 GMT
To: "netconf@ietf.org" <netconf@ietf.org>
References: <em106ef27b-c989-4e0b-b819-413fef852d53@morpheus> <20190114135056.t6sow7dbcyow6qcn@anna.jacobs.jacobs-university.de> <em5dfb175c-7835-43eb-a767-38e270601427@morpheus> <20190114154026.tbevjbcdn3oh34uz@anna.jacobs.jacobs-university.de>
From: Robert Wilton <rwilton@cisco.com>
Message-ID: <2492d27d-d64f-58bd-6006-2b10128f2813@cisco.com>
Date: Mon, 14 Jan 2019 15:48:58 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <20190114154026.tbevjbcdn3oh34uz@anna.jacobs.jacobs-university.de>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-Outbound-SMTP-Client: 10.63.23.64, dhcp-ensft1-uk-vla370-10-63-23-64.cisco.com
X-Outbound-Node: aer-core-4.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/netconf/cVXpOuG6CkKC18-A9-9L9N3GtNs>
Subject: Re: [Netconf] Is there a problem with confirmed commits?
X-BeenThere: netconf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Network Configuration WG mailing list <netconf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netconf>, <mailto:netconf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netconf/>
List-Post: <mailto:netconf@ietf.org>
List-Help: <mailto:netconf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netconf>, <mailto:netconf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Jan 2019 15:49:19 -0000

Hi Juergen,

On 14/01/2019 15:40, Juergen Schoenwaelder wrote:
> It seems the <candidate> datastore should not be allowed to be used as
> long as a persistent confirmed commit is still ongoing. I leave it to
> Martin to check whether this is said somewhere or an omission.
>
> In general, an application can't assume that <candidate> contains
> anything sensible. Hence, the proper way is to lock <candidate> and
> then to make sure it contains something sensible, i.e., issuing a
> discard_changes.

But the text that you quote below states that a client cannot acquire a 
lock on candidate if it contains any changes.  Doesn't this implies that 
discard_changes after acquiring the lock should be unnecessary?

Thanks,
Rob


>   And I think implementations should not allow an
> application to obtain a lock on <candidate> while a commit is active.
> The text on page 45 already says:
>
>        A lock MUST NOT be granted if any of the following conditions is
>        true:
>
>        [...]
>
>        *  The target configuration is <candidate>, it has already been
>           modified, and these changes have not been committed or rolled
>           back.
>
> I think this covers the case of an ongoing but not completed
> persistent confirmed commit, no?
>
> /js
>
> On Mon, Jan 14, 2019 at 03:14:02PM +0000, Jonathan Hansford wrote:
>> If a persistent confirmed commit has not timed out, the running
>> configuration datastore will be the same as the candidate and
>> <discard-changes> won't change its contents. Any edit of candidate will be
>> based on the configuration resulting from the persistent confirmed commit.
>>
>> If the persistent confirmed commit has timed out, the running configuration
>> datastore will have reverted and <discard-changes> will change candidate.
>> Any edit of candidate in this case will be based on the configuration prior
>> to the start of the persistent confirmed commit.
>>
>> ------ Original Message ------
>> From: "Juergen Schoenwaelder" <j.schoenwaelder@jacobs-university.de>
>> To: "Jonathan Hansford" <jonathan@hansfords.net>
>> Cc: "netconf@ietf.org" <netconf@ietf.org>
>> Sent: 14/01/2019 13:50:56
>> Subject: Re: [Netconf] Is there a problem with confirmed commits?
>>
>>> Hi,
>>>
>>> I have not yet understood where you see a problem. In general,
>>> <candidate/> contains arbitrary stuff and hence it is the client's
>>> responsibility to clear any arbitrary stuff found in <candidate/>
>>> after obtaining a lock. If does not really matter whether there has
>>> been a failed confirmed commit before or something else. I think the
>>> general safe pattern is:
>>>
>>> lock(candidate)
>>> discard_changes()
>>> push_whatever_needed()
>>> commit()
>>> unlock(candidate)
>>>
>>> If you do a confirmed commit and the session disappears, then the lock
>>> will disappear as well. But I do not think this creates a race
>>> condition, or I am just not yet seeing it. Perhaps it helps to write
>>> down the sequence of actions that leads to a race.
>>>
>>> /js
>>>
>>> On Mon, Jan 14, 2019 at 12:50:38PM +0000, Jonathan Hansford wrote:
>>>>   Hi,
>>>>
>>>>   No one seems to be responding to my email and proposed erratum around
>>>>   the subject of confirmed commits (apart from Martin), but I would really
>>>>   like to know it I am missing something here. As far as I can tell,
>>>>   session termination during a confirmed commit leads to unpredictable
>>>>   behaviour and I would like to know whether anyone is using confirmed
>>>>   commits and how (if at all) they address the issues outlined below. My
>>>>   assumptions are that locks are used and :writable-running is not
>>>>   supported.
>>>>
>>>>   If the <candidate> and <running> configuration datastores are locked to
>>>>   prevent concurrent access, and a confirmed commit sequence is
>>>>   interrupted by the session terminating, the locks will automatically be
>>>>   released but the server MUST NOT accept a lock on <running> from any
>>>>   session if another session has an ongoing confirmed <commit>.
>>>>   Consequently, after session termination no client can acquire a <lock>
>>>>   on <running>, not even the one that initiated the confirmed <commit>,
>>>>   until after the confirmed <commit> has timed out. However, if the
>>>>   confirmed <commit> included the <persist> parameter, the original client
>>>>   could still issue a <commit> using the persist-id to complete the
>>>>   sequence prior to the timeout, even without a lock.
>>>>
>>>>   Of course, the problem now is the race for the new lock on <candidate>.
>>>>   If the original client is successful then all is good. But if a new
>>>>   client locks <candidate> before the timeout on the confirmed commit,
>>>>   whether or not they precede <lock> with <discard-changes>, <candidate>
>>>>   will be the same as <running> and the new client will pick up everything
>>>>   from the previous session. However, the client won’t be able to lock
>>>>   <running> until after the timeout, at which point <running> reverts but
>>>>   <candidate> still represents the previous session. If the client tries
>>>>   to lock <candidate> after the timeout, <running> will have reverted and
>>>>   the lock will only be granted after a <discard-changes> which will cause
>>>>   the <candidate> to revert. So, depending on when the lock on <candidate>
>>>>   occurs relative to the confirmed commit timeout, the client could be
>>>>   editing <candidate> in one of two states. Further, before the timeout on
>>>>   the confirmed commit, even if the new client has locked candidate, the
>>>>   original client could still issue a confirming commit (they don’t need a
>>>>   lock on <candidate> to do so) which would persistently commit any edits
>>>>   made by the new client. NOTE: it is not the use of the persist-id that
>>>>   introduces this behaviour; a new client would have the same problem even
>>>>   if a confirmed commit was not intended to persist beyond a session
>>>>   termination.
>>>>
>>>>   If the server also supports the :startup capability then, if the session
>>>>   termination was due to the server rebooting, the behaviour above would
>>>>   be further complicated by <running> now containing the configuration
>>>>   from the <startup> configuration datastore.
>>>>
>>>>   Am I right?
>>>>
>>>>   Jonathan
>>>>
>>>>   ---
>>>>   This email has been checked for viruses by Avast antivirus software.
>>>>   https://www.avast.com/antivirus
>>>>   _______________________________________________
>>>>   Netconf mailing list
>>>>   Netconf@ietf.org
>>>>   https://www.ietf.org/mailman/listinfo/netconf
>>>
>>> --
>>> Juergen Schoenwaelder           Jacobs University Bremen gGmbH
>>> Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
>>> Fax:   +49 421 200 3103         <https://www.jacobs-university.de/>