[TOOLS-DEVELOPMENT] Data on slowdowns
Glen <glen@amsl.com> Mon, 27 June 2011 14:37 UTC
Return-Path: <glen@amsl.com>
X-Original-To: tools-development@ietfa.amsl.com
Delivered-To: tools-development@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1FC2D21F8656 for <tools-development@ietfa.amsl.com>; Mon, 27 Jun 2011 07:37:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.199
X-Spam-Level:
X-Spam-Status: No, score=-106.199 tagged_above=-999 required=5 tests=[AWL=0.400, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jfZB+7NKoBTv for <tools-development@ietfa.amsl.com>; Mon, 27 Jun 2011 07:37:23 -0700 (PDT)
Received: from mail.amsl.com (mail.amsl.com [64.170.98.20]) by ietfa.amsl.com (Postfix) with ESMTP id 6043621F8655 for <tools-development@ietf.org>; Mon, 27 Jun 2011 07:37:23 -0700 (PDT)
Received: by c1a.amsl.com (Postfix, from userid 1000) id 59CA21C3852B; Mon, 27 Jun 2011 07:37:23 -0700 (PDT)
Date: Mon, 27 Jun 2011 07:37:23 -0700
From: Glen <glen@amsl.com>
To: tools-development@ietf.org
Message-ID: <20110627143723.GD21782@amsl.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.5.17 (2007-11-01)
Subject: [TOOLS-DEVELOPMENT] Data on slowdowns
X-BeenThere: tools-development@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Tools Development list server <tools-development.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-development>, <mailto:tools-development-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tools-development>
List-Post: <mailto:tools-development@ietf.org>
List-Help: <mailto:tools-development-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-development>, <mailto:tools-development-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jun 2011 14:37:24 -0000
Tools team - You've already read my report to the I*** in which I explained my findings thus far. My data at this point is as follows: At 0315, the server appeared to be running normally: ---------------------------------------- Mon Jun 27 03:15:01 PDT 2011 w: 03:15:01 up 9:20, 2 users, load average: 1.04, 2.30, 2.12 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root xvc0 21:29 4:37m 0.22s 0.22s bash henrik pts/2 18:09 9:05m 0.04s 0.04s bash -i vmstat: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 3 1 76 1922680 2046980 10300904 0 0 129 172 144 8 6 1 90 2 free -lmt: total used free shared buffers cached Mem: 16394 14518 1876 0 1999 10059 Low: 16394 14518 1876 High: 0 0 0 -/+ buffers/cache: 2459 13935 Swap: 8189 0 8189 Total: 24584 14518 10066 None of the apache processes appeared to have been impacted yet: 11787 root 20 0 475m 69m 10m S 0 0.4 1:20.26 httpd2-prefork 19352 wwwrun 20 0 569m 97m 6564 S 0 0.6 0:01.50 httpd2-prefork 19847 wwwrun 20 0 555m 89m 6356 S 0 0.5 0:00.56 httpd2-prefork 19869 wwwrun 20 0 565m 93m 6380 S 0 0.6 0:00.76 httpd2-prefork 19877 wwwrun 20 0 476m 63m 3484 S 0 0.4 0:00.14 httpd2-prefork 19986 wwwrun 20 0 476m 62m 3260 S 4 0.4 0:00.08 httpd2-prefork 20005 wwwrun 20 0 476m 62m 3260 S 0 0.4 0:00.10 httpd2-prefork 20012 wwwrun 20 0 555m 89m 6356 S 0 0.5 0:00.52 httpd2-prefork 20028 wwwrun 20 0 476m 61m 2092 S 4 0.4 0:00.08 httpd2-prefork 20029 wwwrun 20 0 556m 90m 6384 R 8 0.5 0:00.68 httpd2-prefork 20030 wwwrun 20 0 476m 62m 3236 S 0 0.4 0:00.08 httpd2-prefork 20031 wwwrun 20 0 568m 94m 6412 S 0 0.6 0:00.72 httpd2-prefork 20033 wwwrun 20 0 476m 62m 3188 S 0 0.4 0:00.02 httpd2-prefork 20059 wwwrun 20 0 568m 93m 5472 S 0 0.6 0:00.74 httpd2-prefork 20081 wwwrun 20 0 558m 87m 5436 R 82 0.5 0:00.68 httpd2-prefork 20097 wwwrun 20 0 476m 61m 2020 S 0 0.4 0:00.02 httpd2-prefork 20106 wwwrun 20 0 476m 62m 3204 S 4 0.4 0:00.06 httpd2-prefork 20107 wwwrun 20 0 479m 65m 3248 S 0 0.4 0:00.36 httpd2-prefork 20119 wwwrun 20 0 476m 61m 2032 S 0 0.4 0:00.06 httpd2-prefork 20120 wwwrun 20 0 476m 61m 2012 S 0 0.4 0:00.02 httpd2-prefork 20141 wwwrun 20 0 556m 86m 5088 D 78 0.5 0:00.44 httpd2-prefork 20252 wwwrun 20 0 475m 59m 852 S 0 0.4 0:00.00 httpd2-prefork By 0320, things were going downhill: ---------------------------------------- Mon Jun 27 03:20:01 PDT 2011 Load average still looked okay (w): 03:20:01 up 9:25, 2 users, load average: 3.16, 3.43, 2.76 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root xvc0 21:29 4:42m 0.22s 0.22s bash henrik pts/2 18:09 9:10m 0.04s 0.04s bash -i But check out the free memory (from free -lmt, down 10GB from 5 minutes ago): total used free shared buffers cached Mem: 16394 15354 1040 0 8 2326 Low: 16394 15354 1040 High: 0 0 0 -/+ buffers/cache: 13020 3374 Swap: 8189 0 8189 Total: 24584 15354 9229 And note that two of the apache processes were now using that 10GB: 21095 wwwrun 20 0 0 0 0 Z 0 0.0 0:04.72 httpd2-pr <defunct> 11787 root 20 0 475m 69m 10m S 0 0.4 1:26.33 httpd2-prefork 20804 wwwrun 20 0 8999m 8.2g 5324 R 71 51.2 3:38.15 httpd2-prefork 20988 wwwrun 20 0 3386m 2.8g 6300 R 58 17.5 1:28.75 httpd2-prefork 20991 wwwrun 20 0 568m 95m 6564 D 0 0.6 0:01.72 httpd2-prefork 20994 wwwrun 20 0 560m 89m 6144 D 0 0.5 0:10.74 httpd2-prefork 21019 wwwrun 20 0 567m 95m 6488 D 0 0.6 0:04.12 httpd2-prefork By 0325, we were thrashing: ---------------------------------------- Mon Jun 27 03:25:10 PDT 2011 03:25:19 up 9:30, 2 users, load average: 63.64, 39.90, 18.77 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root xvc0 21:29 4:47m 0.22s 0.22s bash henrik pts/2 18:09 9:16m 0.04s 0.04s bash -i procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 38 5 1040428 844356 2928 662264 1 4 134 174 101 14 6 2 89 3 total used free shared buffers cached Mem: 16394 15560 834 0 2 648 Low: 16394 15560 834 High: 0 0 0 -/+ buffers/cache: 14909 1485 Swap: 8189 1026 7162 Total: 24584 16587 7997 11787 root 20 0 475m 33m 9364 S 0 0.2 1:28.48 httpd2-prefork 20804 wwwrun 20 0 11.5g 6.5g 5236 R 7 40.3 13:32.55 httpd2-prefork 20988 wwwrun 20 0 6150m 3.8g 6196 R 6 23.8 11:25.67 httpd2-prefork 20991 wwwrun 20 0 571m 58m 6520 S 0 0.4 1:17.19 httpd2-prefork 20994 wwwrun 20 0 577m 69m 6608 D 0 0.4 1:44.54 httpd2-prefork 21019 wwwrun 20 0 577m 69m 6592 S 0 0.4 1:43.82 httpd2-prefork Note that two apache processes were impacted. When I do a grep on POST requests from the datatracker log around this time frame, I see nothing but POST /doc/search/ requests, except for these two identical requests, at just the right time: 134.226.56.7 - - [27/Jun/2011:03:16:57 -0700] "POST /doc/draft-ietf-payload-rfc3016bis/edit/position/ HTTP/1.1" 302 - "https://datatracker.ietf.org/doc/draft-ietf-payload-rfc3016bis/edit/position/" "Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0" 134.226.56.7 - - [27/Jun/2011:03:19:07 -0700] "POST /doc/draft-ietf-payload-rfc3016bis/edit/position/ HTTP/1.1" 302 - "https://datatracker.ietf.org/doc/draft-ietf-payload-rfc3016bis/edit/position/" "Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0" The next slowdown occurred starting between 0510 and 0515. And here again I see similar requests: 134.226.56.7 - - [27/Jun/2011:05:11:38 -0700] "POST /doc/draft-ietf-payload-rfc3016bis/edit/position/ HTTP/1.1" 302 - "https://datatracker.ietf.org/doc/draft-ietf-payload-rfc3016bis/edit/position/" "Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0" 213.145.205.249 - - [27/Jun/2011:05:14:55 -0700] "POST /submit/status/33455/f52ef05298cd7945af3596bafd20aa63/ HTTP/1.1" 200 33982 "https://datatracker.ietf.org/submit/status/33455/f52ef05298cd7945af3596bafd20aa63/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-GB; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" 213.145.205.249 - - [27/Jun/2011:05:15:05 -0700] "POST /submit/status/33455/confirm/aeb3677f2a2be33d87708522a2da1fae935688fa/ HTTP/1.1" 200 33245 "http://datatracker.ietf.org/submit/status/33455/confirm/aeb3677f2a2be33d87708522a2da1fae935688fa/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-GB; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" I am thinking that something might be wrong with either the code, or the database entry, relating to ballot positions. Thoughts would be welcome! Glen