Knowledge Base

Preserving for the future: Shell scripts, AoC, and more

Nagios plugin to count apache threads

Overview

At work I have a misbehaving web server. Sometimes it spawns the maximum number of apache threads (which has a hardcoded maximum of 256, no matter what you configure) and then occupies 100% of the processor. I have decided that the normal nagios checks for the http site and ssh and so on aren't good enough for monitoring purposes. So I wrote my own simple nagios check. And then I put it in an rpm for easy deployment.

The nagios check

Here is the code for check_apache_threads, although you can check the latest version at my gitlab page. #!/bin/sh # File: /usr/lib64/nagios/plugins/check_apache_threads # Author: bgstack15@gmail.com # Startdate: 2017-01-09 15:53 # Title: Nagios Check for Apache Threads # Purpose: For a troublesome dmz wordpress host # Package: nagios-plugins- apache-threads # History: # Usage: # In nagios/nconf, use this checkcommand check command line: $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "$USER1$/check_apache_threads -w $ARG1$ -c $ARG2$" # Reference: general design /usr/lib64/nagios/plugins/check_sensors # general design http://www.kernel- panic.it/openbsd/nagios/nagios6.html # case -w http://www.linuxquestions.org/questions/programming-9/ash-test-is-string-a- contained-in-string-b-671773/ # Improve: PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin PROGNAME=basename $0PROGPATH=echo $0 | sed -e 's,[\/][^\/][^\/]*$,,'REVISION="0.0.1" . $PROGPATH/utils.sh print_usage() { cat <<EOF Usage: $PROGNAME -w <thresh_warn> -c <thresh_crit> EOF } print_help() { print_revision $PROGNAME $REVISION echo "" print_usage echo "" echo "This plugin checks for the number of active apache threads." echo "" support exit $STATE_OK } # MAIN # Total httpd threads tot_apache_threads="$( ps -ef | grep -ciE "httpd$" )" verbosity=0 thresh_warn= thresh_crit= while test -n "${1}"; do case "$1" in --help|-h) print_help exit $STATE_OK ;; --version|-V) print_revision $PROGNAME $REVISION exit $STATE_OK ;; -v | --verbose) verbosity=$(( verbosity + 1 )) shift ;; -w | --warning | -c | --critical) if [[ -z "$2" || "$2" = -* ]]; then # Threshold not provided echo "$PROGNAME: Option '$1' requires an argument." print_usage exit $STATE_UNKNOWN elif [[ "$2" = +([0-9]) ]]; then # Threshold is a number thresh="$2" # use for a percentage template, from reference 2 #elif [[ "$2" = +([0-9])% ]]; then # # Threshold is a percentage # thresh=$(( tot_mem * ${2%\%} / 100 )) else # Threshold is not a number or other valid input echo "$PROGNAME: Threshold must be an integer." print_usage exit $STATE_UNKNOWN fi case "$1" in *-w*) thresh_warn=$thresh;; *) thresh_crit=$thresh;; esac shift 2 ;; -?) print_usage exit $STATE_OK ;; *) echo "$PROGNAME: Invalid option '$1'" print_usage exit $STATE_UNKNOWN ;; esac done if test -z "$thresh_warn" || test -z "$thresh_crit"; then # One or both values were unspecified echo "$PROGNAME: Threshold not set" print_usage exit $STATE_UNKNOWN elif test "$thresh_crit" -le "$thresh_warn"; then echo "$PROGNAME: Critical value must be greater than warning value." print_usage exit $STATE_UNKNOWN fi if test "$verbosity" -ge 2; then # Print debugging information /bin/cat <<EOF Debugging information: Warning threshold: $thresh_warn Critical threshold: $thresh_crit Verbosity level: $verbosity Apache threads: ${tot_apache_threads} EOF fi if test "${tot_apache_threads}" -gt "${thresh_crit}"; then # too many apache threads echo "APACHE CRITICAL - $tot_apache_threads" exit $STATE_CRITICAL elif test "${tot_apache_threads}" -gt "${thresh_warn}"; then echo "APACHE WARNING - $tot_apache_threads" exit $STATE_WARNING else # fine echo "APACHE OK - $tot_apache_threads" exit $STATE_OK fi

Walking through the code

I included the code above so it gets cached by web crawlers. You should look at the code on gitlab so you get the proper indentations, and line numbers. So the general format of this script I got from a local file, check_sensor, and Reference 1 below. The utils.sh call provides nagios-related definitions, including the exit codes that you see used like $STATE_OK. The shell script is pretty self-explanatory, really. The variables are initialized and the actual checked value is calculated (ps -ef | grep httpd). About half the script (lines 51-100) is parsing the parameters, which is a nice, simple solution if you have predictable and simplified input (like from nagios) and you don't do the proper parameter parsing that includes -XvalueofXhere with no space between the flag and the value. Some sanity checking for threshholds (102-113) and debugging information if given enough verbosity (115-125), and then the actual results are determined in 127-140.

Final thoughts

The hardest part of using this plugin is not writing, using, or deploying the shell script. The hardest part is getting the script to run. To use this check properly, you actually need to write a nagios checkcommand like so: $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "$USER1$/check_apache_threads -w $ARG1$ -c $ARG2$" With the arguments as the numbers for your thresholds. I used the values 50 and 150 for warning and critical. Any questions?

References

Weblinks

  1. General design http://www.kernel-panic.it/openbsd/nagios/nagios6.html
  2. case -w http://www.linuxquestions.org/questions/programming-9/ash-test-is-string-a-contained-in-string-b-671773/

Comments