Nagios plugin to count apache threads
Overview
At work I have a misbehaving web server. Sometimes it spawns the maximum number of apache threads (which has a hardcoded maximum of 256, no matter what you configure) and then occupies 100% of the processor. I have decided that the normal nagios checks for the http site and ssh and so on aren't good enough for monitoring purposes. So I wrote my own simple nagios check. And then I put it in an rpm for easy deployment.
The nagios check
Here is the code for check_apache_threads, although you can check the latest
version at my
gitlab page. #!/bin/sh # File:
/usr/lib64/nagios/plugins/check_apache_threads # Author: bgstack15@gmail.com #
Startdate: 2017-01-09 15:53 # Title: Nagios Check for Apache Threads #
Purpose: For a troublesome dmz wordpress host # Package: nagios-plugins-
apache-threads # History: # Usage: # In nagios/nconf, use this checkcommand
check command line: $USER1$/check_by_ssh -H $HOSTADDRESS$ -C
"$USER1$/check_apache_threads -w $ARG1$ -c $ARG2$" # Reference: general design
/usr/lib64/nagios/plugins/check_sensors # general design http://www.kernel-
panic.it/openbsd/nagios/nagios6.html # case -w
http://www.linuxquestions.org/questions/programming-9/ash-test-is-string-a-
contained-in-string-b-671773/ # Improve:
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
PROGNAME=
basename $0PROGPATH=
echo $0 | sed -e 's,[\/][^\/][^\/]*$,,'REVISION="0.0.1" . $PROGPATH/utils.sh print_usage() { cat <<EOF Usage:
$PROGNAME -w <thresh_warn> -c <thresh_crit> EOF } print_help() {
print_revision $PROGNAME $REVISION echo "" print_usage echo "" echo "This
plugin checks for the number of active apache threads." echo "" support exit
$STATE_OK } # MAIN # Total httpd threads tot_apache_threads="$( ps -ef | grep
-ciE "httpd$" )" verbosity=0 thresh_warn= thresh_crit= while test -n "${1}";
do case "$1" in --help|-h) print_help exit $STATE_OK ;; --version|-V)
print_revision $PROGNAME $REVISION exit $STATE_OK ;; -v | --verbose)
verbosity=$(( verbosity + 1 )) shift ;; -w | --warning | -c | --critical) if
[[ -z "$2" || "$2" = -* ]]; then # Threshold not provided echo "$PROGNAME:
Option '$1' requires an argument." print_usage exit $STATE_UNKNOWN elif [[
"$2" = +([0-9]) ]]; then # Threshold is a number thresh="$2" # use for a
percentage template, from reference 2 #elif [[ "$2" = +([0-9])% ]]; then # #
Threshold is a percentage # thresh=$(( tot_mem * ${2%\%} / 100 )) else #
Threshold is not a number or other valid input echo "$PROGNAME: Threshold must
be an integer." print_usage exit $STATE_UNKNOWN fi case "$1" in *-w*)
thresh_warn=$thresh;; *) thresh_crit=$thresh;; esac shift 2 ;; -?) print_usage
exit $STATE_OK ;; *) echo "$PROGNAME: Invalid option '$1'" print_usage exit
$STATE_UNKNOWN ;; esac done if test -z "$thresh_warn" || test -z
"$thresh_crit"; then # One or both values were unspecified echo "$PROGNAME:
Threshold not set" print_usage exit $STATE_UNKNOWN elif test "$thresh_crit"
-le "$thresh_warn"; then echo "$PROGNAME: Critical value must be greater than
warning value." print_usage exit $STATE_UNKNOWN fi if test "$verbosity" -ge 2;
then # Print debugging information /bin/cat <<EOF Debugging information:
Warning threshold: $thresh_warn Critical threshold: $thresh_crit Verbosity
level: $verbosity Apache threads: ${tot_apache_threads} EOF fi if test
"${tot_apache_threads}" -gt "${thresh_crit}"; then # too many apache threads
echo "APACHE CRITICAL - $tot_apache_threads" exit $STATE_CRITICAL elif test
"${tot_apache_threads}" -gt "${thresh_warn}"; then echo "APACHE WARNING -
$tot_apache_threads" exit $STATE_WARNING else # fine echo "APACHE OK -
$tot_apache_threads" exit $STATE_OK fi
Walking through the code
I included the code above so it gets cached by web crawlers. You should look at the code on gitlab so you get the proper indentations, and line numbers. So the general format of this script I got from a local file, check_sensor, and Reference 1 below. The utils.sh call provides nagios-related definitions, including the exit codes that you see used like $STATE_OK. The shell script is pretty self-explanatory, really. The variables are initialized and the actual checked value is calculated (ps -ef | grep httpd). About half the script (lines 51-100) is parsing the parameters, which is a nice, simple solution if you have predictable and simplified input (like from nagios) and you don't do the proper parameter parsing that includes -XvalueofXhere with no space between the flag and the value. Some sanity checking for threshholds (102-113) and debugging information if given enough verbosity (115-125), and then the actual results are determined in 127-140.
Final thoughts
The hardest part of using this plugin is not writing, using, or deploying the
shell script. The hardest part is getting the script to run. To use this check
properly, you actually need to write a nagios checkcommand like so:
$USER1$/check_by_ssh -H $HOSTADDRESS$ -C "$USER1$/check_apache_threads -w
$ARG1$ -c $ARG2$"
With the arguments as the numbers for your thresholds. I
used the values 50 and 150 for warning and critical. Any questions?
Comments