Re: [Dime] Question regarding RFC 6733 on transport end disconnections

Ivan Skytte Jørgensen <isj-dime@i1.dk> Mon, 22 October 2018 14:01 UTC

Return-Path: <isj-dime@i1.dk>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BA8E7128BCC for <dime@ietfa.amsl.com>; Mon, 22 Oct 2018 07:01:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.998
X-Spam-Level: *
X-Spam-Status: No, score=1.998 tagged_above=-999 required=5 tests=[KHOP_DYNAMIC=1.999, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hi8DrouslZkl for <dime@ietfa.amsl.com>; Mon, 22 Oct 2018 07:01:50 -0700 (PDT)
Received: from i1.dk (55e9f507.rev.dansknet.dk [85.233.245.7]) by ietfa.amsl.com (Postfix) with ESMTP id 644D2130F7F for <dime@ietf.org>; Mon, 22 Oct 2018 07:01:19 -0700 (PDT)
Received: from i1.dk (localhost [127.0.0.1]) by i1.dk (Postfix) with ESMTP id DD9C01A400CB for <dime@ietf.org>; Mon, 22 Oct 2018 14:01:17 +0000 (UTC)
Received: from isjsys.localnet (unknown [10.0.0.2]) by i1.dk (Postfix) with ESMTPA for <dime@ietf.org>; Mon, 22 Oct 2018 14:01:17 +0000 (UTC)
From: Ivan Skytte =?ISO-8859-1?Q?J=F8rgensen?= <isj-dime@i1.dk>
To: dime@ietf.org
Date: Mon, 22 Oct 2018 16:01:17 +0200
Message-ID: <2055975.PXQd7Ydikz@isjsys>
User-Agent: KMail/4.11.5 (Linux/3.11.10-29-desktop; KDE/4.11.5; x86_64; ; )
In-Reply-To: <AM4PR07MB33135B773985A79741E8D2CCD1F80@AM4PR07MB3313.eurprd07.prod.outlook.com>
References: <AM4PR07MB33135B773985A79741E8D2CCD1F80@AM4PR07MB3313.eurprd07.prod.outlook.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/VnApzq_wsMs7akLqFg7cLKSRsm4>
Subject: Re: [Dime] Question regarding RFC 6733 on transport end disconnections
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Oct 2018 14:01:52 -0000

On Thursday 18 October 2018 15:56:52 Miguel Rodriguez Caudevilla wrote:
> 
> We have a cluster wide distributed stack. The diameter instances in the cluster can be administratively be locked, which in consequence ends a transport end connection with a peer.
[...]
> The stack implementers have a different argument. The administrative lock of a single diameter instance, they consider that it is performed in a way that all the services on cluster level are available. So, it would be quite strange to indicate the peer disconnect-case REBOOTING for the loss of a single node considering we can still provide diameter service through all the other nodes in the cluster. So, the loss of a node (and connections handled by it) is a transient fault that is easily handled if the peer reconnects. They consider that this way the peer reconnects as quickly as possible in such cases. That the stack can achieve this according to the standard as well if it does not send any DPR with connection loss reasons.
> 
> A given Diameter instance of the peer state machine MUST NOT use more than one transport connection to communicate with a given peer, unless multiple instances exist on the peer, in which, case a separate connection per process is allowed.
> 
> And consider that a diameter disconnect cause (eg: REBOOTING or whatever else) sent by a diameter node to a peer actually reflects the overall state of the sending node. Since we have a cluster wide distributed implementation it would be not correct to send a disconnect case REBOOTING as that indicates the whole node is going to be rebooted. We just perform some node level operations while still providing full service to diameter peers.
> 
> We would like to have your point of view regarding this discussion.
> Is this intentionally left open for the stack implementation?
> Did we interpret something wrong?

The RFCs can be confusing/convoluted in this area. But re-read section 1.2:

"Diameter Node
      A Diameter node is a host process that implements the Diameter
      protocol and acts as either a client, an agent, or a server."

and combine with section 4.3.1:

"If multiple Diameter nodes run on
the same host, each Diameter node MUST be assigned a unique
DiameterIdentity."

combine with section 5.6.1:

"Upon receipt of a CER, the
   identity of the connecting peer can be uniquely determined from the
   Origin-Host."


Don't be fooled by the "host" part of "origin-host". The payload ia a diameter identity. The AVP was named before anyone thought of these problems.


Then you can deduce that a diameter node is an entity (that being a process, thread, task, fiber, machine, vm, ...) that:
 - has an identity, which is unique and used as the host identity (which is actually a node identity)
 - has a state machine

and therefore:
 - has at most one transport connection per peer.


For simple setups those distinctions aren't that important but when you have a cluster of cooperating nodes with shared state they are.

If you have such a cluster of nodes then you cannot have them all have the same host-identity because identities must be unique and if you somehow manage to get two transport connections to a peer using the same host-identity then the peer must close one of them (as per election process).

What you can do is that each node (vm/process/thread/whatever) has a unique identity:
  * node1.example.com
  * node2.example.com
  * node3.example.com
  * node4.example.com
  ...
and the messages (Eg. CCR/EAP/UD... but not CER/DPR) use a common identity in origin-host.
  * system.example.com

So if node1 sends a eg. CCR to a peer it will have:
  * origin-host: system.example.com
  * route-record: node1.example.com

The peer will simply treat node1 as an intermediate proxy. Nothing special. And your problem with node lock / reboots becomes trivial - just send a DPR with whatever-reason and the peer will only treat it as a message from node1 and only concerning node1. 

The only drawback I can think of is that there might be some non-compliant peers that don't support proxies. But back in 2013/2014 when I tested such cluster setups I didn't encounter any peers that were non-compliant in this area.


Regards,
  Ivan