Re: [radext] More bad behavior

Alexander Clouter <alex+ietf@coremem.com> Tue, 05 September 2023 09:57 UTC

Return-Path: <alex+ietf@coremem.com>
X-Original-To: radext@ietfa.amsl.com
Delivered-To: radext@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F768C151538 for <radext@ietfa.amsl.com>; Tue, 5 Sep 2023 02:57:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.806
X-Spam-Level:
X-Spam-Status: No, score=-2.806 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=coremem.com header.b="I6TgauFi"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="zJsLUeiK"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id szTE1fS3u60P for <radext@ietfa.amsl.com>; Tue, 5 Sep 2023 02:57:52 -0700 (PDT)
Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7E127C14CE3B for <radext@ietf.org>; Tue, 5 Sep 2023 02:57:52 -0700 (PDT)
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 7EF625C011D for <radext@ietf.org>; Tue, 5 Sep 2023 05:57:51 -0400 (EDT)
Received: from imap46 ([10.202.2.96]) by compute5.internal (MEProxy); Tue, 05 Sep 2023 05:57:51 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=coremem.com; h= cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1693907871; x=1693994271; bh=xa Mxd4XrFscAyi3BfvYzXhg/WvkFidYVsIkKkXshOx4=; b=I6TgauFipE+gQDxuRr DUW9yQb6KCK/hohjz0iS13EjzIbbsEDsi+UYc3n4weDFTue8igu7tUbrBQnEfMnP rr5/FCr2hj88maH35+wJb3Q/TXJPQ3ODXUPBlIvm7Kz4GqAb6eWaPKY2AuMv3KW8 0mXYdlE66VxyAN4xjSqiQhPbQ5XVMPK+HnzCezhoM8spvgp3wiiTvxh9l6O/xJMC d8QWOz7pib1l8rp67w4E7u3JTdOxXSvHtWU9PKQq8Wszqq66VsKuFws7w5SyAfjI dkkmRvh6QcgQ+H1QnLpp5C/jGeDUXvZ6Y00ULr4wdZHMALOOW+HBnGn3TrDEyAzI 3kkw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1693907871; x=1693994271; bh=xaMxd4XrFscAy i3BfvYzXhg/WvkFidYVsIkKkXshOx4=; b=zJsLUeiKQsCtyvYML06WF7lU05N8K H3VAO3Ti9w7x5EdxxrJhBx66vo0D89jOfHa41tg9RRVpaPSwl7+Dv4FovHvY1f5k GqBDKV7wrd4Y6KSgUmuWU1Nm1sX4R8tlgNsUlT5pTBWSiHtO8r76e5Sossgsbhx5 wYgXlfmcEo3sT5hBU2iN58z2VhXY2G6V085GyTfIZEDarOPDQEXokNLxJQ8tdhQi CO292zD16nPRbeQlgcR1H6jxbmPZ0eYSH1rDMzqAPSnVexyP5ddu83nISKDk5/Q/ adSZnq1ArHktQpt4ik/n6IiLNT82IcZu1n1/RKy0XMirXFqFZUXMEAv2Q==
X-ME-Sender: <xms:n_v2ZFKqbGhKQY4leXtR7_cFQaRPTmhWCTj8ggByqzOnIq4n1OPAfA> <xme:n_v2ZBJN2H49-21znYE0u35kZLxqZY9gok4BrrsC01awGCsQUloRxgJlKNDHM9Bhs E-lPQUSg6-807errQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudehtddgudehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd ertderredtnecuhfhrohhmpedftehlvgigrghnuggvrhcuvehlohhuthgvrhdfuceorghl vgigodhivghtfhestghorhgvmhgvmhdrtghomheqnecuggftrfgrthhtvghrnhepvdetje fhhefggeelueelfeetieekgfevhfehkeehvedvkeeufeekieeifffhvdegnecuvehluhhs thgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprghlvgigodhivghtfh estghorhgvmhgvmhdrtghomh
X-ME-Proxy: <xmx:n_v2ZNtUT_eLfJY6ZHFPFRYoZbGpC4jvceAatuE_6mN5tF6aJhlrjQ> <xmx:n_v2ZGZPy_tkeKcqbJJz035dh9PkxflZAhi7xHdncgezCn39rMq6VQ> <xmx:n_v2ZMY2MsFs9RNaLxRXtCtEoF2PELB2hKCDXS2hkACB-QT_GaXqpQ> <xmx:n_v2ZFlN9QvQ77M4aRWuheObS7WXSKNfz_WdbX1wR-4Q3yhmrCj_GA>
Feedback-ID: ie3614602:Fastmail
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 3A64F2A20085; Tue, 5 Sep 2023 05:57:51 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.9.0-alpha0-701-g9b2f44d3ee-fm-20230823.001-g9b2f44d3
Mime-Version: 1.0
Message-Id: <8eb962e5-0e7e-4d11-b159-f960fa60977a@app.fastmail.com>
In-Reply-To: <CAMAo9NbGX1bXg3mOQ3_nvSCAhR6e-WzzDQBVunYn1Xz+8VMZmA@mail.gmail.com>
References: <D8A1FC3E-EC1A-48F4-87B9-B5E454FA4B40@deployingradius.com> <CAMAo9NbGX1bXg3mOQ3_nvSCAhR6e-WzzDQBVunYn1Xz+8VMZmA@mail.gmail.com>
Date: Tue, 05 Sep 2023 10:57:29 +0100
From: Alexander Clouter <alex+ietf@coremem.com>
To: radext@ietf.org
Content-Type: text/plain
Archived-At: <https://mailarchive.ietf.org/arch/msg/radext/k_aasUXRe2GITT6BPFcsMRyHxnY>
Subject: Re: [radext] More bad behavior
X-BeenThere: radext@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: RADIUS EXTensions working group discussion list <radext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/radext>, <mailto:radext-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/radext/>
List-Post: <mailto:radext@ietf.org>
List-Help: <mailto:radext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/radext>, <mailto:radext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Sep 2023 09:57:57 -0000

On Tue, 5 Sep 2023, at 10:06, Yasin KAPLAN wrote:
> There may be a couple of reasons why a RADIUS server cannot process/record
> the received accounting requests as you stated. Replying with an
> Error-Cause would be more useful and this should have minimum impact on
> existing RADIUS client/server implementations.

I perceive the aim is to hand off the responsibility of handling the *event* (not state) from the client to the server.

There are things the RADIUS server can do which does not involve finalising all work in processing the event, for example dumping to a local (disk) journal, that allows it to ACK the request; this decoupling from the request allows for opportunities to increase accounting throughput.

Of course disk space is not infinite...and disks fail...but these are problems that we deal with already today.

> RADIUS clients have limited
> resources to keep failed RADIUS accounting requests for retrying. Replying
> with an Error-Cause would eliminate unnecessary resource usage on the
> client side.

There is zero additional cost to the client to retain the state that it already has and needs to during normal service; byte counters, COA/user-name attributes, session times, when to disconnect, timers for when to send the next event, etc.

This continues to be the same demands when the accounting server goes out to lunch too.

The only cause for an increased demand of additional resource should be when Accounting-Stop packets start to accumulate.

> RADIUS clients should disconnect user sessions if a predefined amount of
> RADIUS interim updates cannot be processed to prevent revenue loss if
> RADIUS accounting is being used for billing purposes.

I suspect (and hope) this is the rare case and should be strongly discouraged.

Coalescing your Accounting events (eg. retaining only the latest) may flatten out the peaks of any %ile billing opportunities but somewhat less so than completely zero'ing all your client traffic by forcibly disconnecting everyone due to say a DB outage. :)

Of course each and every operator is different and so I do not think it would be the right place to make a suggestion here on any course of action as the impact of disconnecting all the mobile phones of a network is different to the impact of disconnecting everyone on an Internet cafe wifi hotspot.

Maybe a document could point to the consequences of the choices available here but I would personally shy away from recommendations of any sort.

>From my (limited, naive but optimist) viewpoint, I see:

 * prioritise revenue - fail closed (ie. disconnect users)
 * prioritise availability - fail open

With the second approach, I hope the feedback loop that arises balances clawing back lost revenue by investing in better backend event capturing would work its-self out. Nothing though I can see good comes of the first approach :)

Cheers