Re: Appropriate use of HTTP status codes for application health checks

Amos Jeffries <squid3@treenet.co.nz> Mon, 27 February 2017 04:44 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 497331297AA for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 26 Feb 2017 20:44:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.902
X-Spam-Level:
X-Spam-Status: No, score=-6.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z8rqorMaTVP2 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 26 Feb 2017 20:44:05 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F1EAF1297A7 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 26 Feb 2017 20:44:04 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1ciD65-0001kf-Fw for ietf-http-wg-dist@listhub.w3.org; Mon, 27 Feb 2017 04:39:41 +0000
Resent-Date: Mon, 27 Feb 2017 04:39:41 +0000
Resent-Message-Id: <E1ciD65-0001kf-Fw@frink.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by frink.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <squid3@treenet.co.nz>) id 1ciD5x-0001jk-LW for ietf-http-wg@listhub.w3.org; Mon, 27 Feb 2017 04:39:33 +0000
Received: from [121.99.228.82] (helo=treenet.co.nz) by titan.w3.org with esmtp (Exim 4.84_2) (envelope-from <squid3@treenet.co.nz>) id 1ciD5p-000627-07 for ietf-http-wg@w3.org; Mon, 27 Feb 2017 04:39:28 +0000
Received: from [192.168.20.251] (unknown [121.98.40.15]) by treenet.co.nz (Postfix) with ESMTP id A1DBCE6EBA; Mon, 27 Feb 2017 17:38:52 +1300 (NZDT)
To: Willy Tarreau <w@1wt.eu>
References: <CADfyV-Pa0fu2SDwLYzMrUe4D0Tv0wu27pmHpLjCxQXR3ev4mmA@mail.gmail.com> <119d9b4e-8587-0d8b-d292-3be61cd1ea72@treenet.co.nz> <20170223102431.GC30956@1wt.eu>
Cc: ietf-http-wg@w3.org
From: Amos Jeffries <squid3@treenet.co.nz>
Message-ID: <d2b11486-267e-230f-bf3d-821ee9036f56@treenet.co.nz>
Date: Mon, 27 Feb 2017 17:38:49 +1300
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1
MIME-Version: 1.0
In-Reply-To: <20170223102431.GC30956@1wt.eu>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Received-SPF: pass client-ip=121.99.228.82; envelope-from=squid3@treenet.co.nz; helo=treenet.co.nz
X-W3C-Hub-Spam-Status: No, score=-4.3
X-W3C-Hub-Spam-Report: AWL=-1.172, BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1ciD5p-000627-07 be3a59fb82f27d1234b6df485f0e0887
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Appropriate use of HTTP status codes for application health checks
Archived-At: <http://www.w3.org/mid/d2b11486-267e-230f-bf3d-821ee9036f56@treenet.co.nz>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/33618
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 23/02/2017 11:24 p.m., Willy Tarreau wrote:
> Hi Amos,
> 
> On Thu, Feb 23, 2017 at 10:53:07PM +1300, Amos Jeffries wrote:
>> IMHO a better efficient way for a polling system is to use 204 as "All
>> okay", and 200 as "some problem(s)". No bandwidth wasted with payload on
>> the common Up status, and ability to deliver details about the outage on
>> the Down status.
> 
> In fact it's common to see health check applications return 5xx for a
> very simple reason, the front equipment performing the check (often a
> load balancer) has to deal with these situations anyway, and most use
> cases just want to return "completely up" or "completely dead". But I
> agree that when you want to support the gray area in between, it's much
> better to support intermediary codes. FWIW haproxy also supports a
> special case of 404 to mean "closing soon, no more requests please" so
> that admins can simply touch/rm a file in a docroot. That's just to say
> that there are many valid use cases and tha common sense adapted to what
> components *reliably* support is often the best here.
> 

For an individual health-check you are right. But that is not the
use-case matt has.

The use-case in question is for the response coming from some aggregator
process, which uses health-checks as its input/data. One status code
summarizing the situation of N endpoints.  No 4xx or 5xx is going to be
adequate for that, simply because of what the 400 and 500 defaults mean
to the general HTTP ecosystem.

The individual endpoints being health-tested, sure a 4xx/5xx is usually
best. Squid uses 503 to respond to *all* queries received during teh
final seconds of shutdown so retries can be done for them. Your 404 as a
final status is of most use for active health-check probes to an
individual endpoint.

Amos