Re: Appropriate use of HTTP status codes for application health checks

Amos Jeffries <squid3@treenet.co.nz> Thu, 23 February 2017 09:56 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F2DE12969A for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 23 Feb 2017 01:56:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.901
X-Spam-Level:
X-Spam-Status: No, score=-6.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s3aQHpJe3gOC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 23 Feb 2017 01:56:14 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 24B881293FB for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 23 Feb 2017 01:56:14 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1cgq66-0001dm-7I for ietf-http-wg-dist@listhub.w3.org; Thu, 23 Feb 2017 09:54:02 +0000
Resent-Date: Thu, 23 Feb 2017 09:54:02 +0000
Resent-Message-Id: <E1cgq66-0001dm-7I@frink.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by frink.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <squid3@treenet.co.nz>) id 1cgq5z-0001cP-Ba for ietf-http-wg@listhub.w3.org; Thu, 23 Feb 2017 09:53:55 +0000
Received: from [121.99.228.82] (helo=treenet.co.nz) by titan.w3.org with esmtp (Exim 4.84_2) (envelope-from <squid3@treenet.co.nz>) id 1cgq5s-0003UW-BG for ietf-http-wg@w3.org; Thu, 23 Feb 2017 09:53:50 +0000
Received: from [192.168.20.251] (unknown [121.98.40.15]) by treenet.co.nz (Postfix) with ESMTP id 871A1E6EBA for <ietf-http-wg@w3.org>; Thu, 23 Feb 2017 22:53:15 +1300 (NZDT)
To: ietf-http-wg@w3.org
References: <CADfyV-Pa0fu2SDwLYzMrUe4D0Tv0wu27pmHpLjCxQXR3ev4mmA@mail.gmail.com>
From: Amos Jeffries <squid3@treenet.co.nz>
Message-ID: <119d9b4e-8587-0d8b-d292-3be61cd1ea72@treenet.co.nz>
Date: Thu, 23 Feb 2017 22:53:07 +1300
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1
MIME-Version: 1.0
In-Reply-To: <CADfyV-Pa0fu2SDwLYzMrUe4D0Tv0wu27pmHpLjCxQXR3ev4mmA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=121.99.228.82; envelope-from=squid3@treenet.co.nz; helo=treenet.co.nz
X-W3C-Hub-Spam-Status: No, score=-4.3
X-W3C-Hub-Spam-Report: AWL=-1.172, BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1cgq5s-0003UW-BG d2269170f45d44cd2244f0b7cadd6510
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Appropriate use of HTTP status codes for application health checks
Archived-At: <http://www.w3.org/mid/119d9b4e-8587-0d8b-d292-3be61cd1ea72@treenet.co.nz>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/33593
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 23/02/2017 1:54 p.m., matt wrote:
> Hello,
> 
> 
> 
> My colleagues and I are involved in a debate about the proper usage
> of HTTP return codes for application health pages.
> 
> 
> 
> For instance, you have a /health page that returns JSON listing your
> application’s dependencies as either “Up” or “Down”
> 

The action being performed as far as HTTP semantics are concerned is not
a health check - it is simply "fetch".

As such the status code refers to the "JSON file" thing not its
contents. HTTP does not care what that "JSON file" thing means to your
application, its just some opaque bytes to be located and delivered.


> 
> Some suggest that it is acceptable for your /health page to return an
> unassigned 5xx or 503 if the /health page returns successfully, but

HTTP defines that all unknown 5xx codes are equivalent to 500 status.

For HTTP agents outside your application that usually means a retry with
different server is required, until a 2xx/3xx status is found or no
alternative servers can be identified.

So a 5xx status "working" in terms of your application health check is
actually an excentional event depending on a narrow set of circumstances;
 no other server/IPs available, and
 no alternative routes across the network to reach it.


> the page results indicate the application is not healthy. Spring Boot
> <https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-1.1-Release-Notes#healthindicators>
> has done this. Although I have reservations about 503 since your
> request for the page was handled successfully.
> 

Rightly so.


> Other contend that your /health page should always return a 200
> regardless of whether the page results is indicative of application
> health or not.
> 
> As a layman I can see the argument for both sides, and it seems both
> practices have been used in the past. I perused the RFCs but I don’t
> feel like I found the ‘silver bullet’ answer on this.
> 

The "always 200" is a bit strict. All 2xx and 3xx status mean successful
*fetch*, with various grades of meaning to that success.


IMHO a better efficient way for a polling system is to use 204 as "All
okay", and 200 as "some problem(s)". No bandwidth wasted with payload on
the common Up status, and ability to deliver details about the outage on
the Down status.

Amos