[More Splunk: Part 1] Monitor specific processes on remote servers

November 22nd, 2012 by William Smith

I was given a simple Splunk project: Monitor MySQL CPU usage and Apache web server processes on multiple servers.

Splunk is an amazing product but it’s also a beast! While it may be just a tool in one administrator’s arsenal of gadgets it could very well be another administrator’s full time job. Installing the software is a breeze and getting interesting reports is child play. Getting the meaningful reports you want, on the other hand, requires skills in the realms of system administration, scripting, statistics and formula building (think Excel).

My first project with the software was to monitor two things on remote servers:

  • MySQL CPU usage
  • Count of Apache web server processes

It sounds simple but involves a few pieces:

  • Writing a script to get the data
  • Configuring servers as Splunk Forwarders
  • Forwarding the data to a central server
  • Creating a search to populate a meaningful chart

Create the script

This is the easy part but it requires some special formatting to get Splunk to recognize the data it returns.

First, Splunk parses most any log file based on a time stamp and it can recognize many different versions of timestamps. The data following the timestamp constitutes the rest of the row or record. When Splunk gets to a second timestamp it considers that information to be another record.

So, my script output needed a timestamp. I followed the RFC-3339 specs (one of many formats), which describes something that looks like this:

2012-11-20 14:10:14-08:00

That’s a simple calendar date followed by a time denoted by its offset from GMT time. In this case the -08:00 denotes Pacific Standard Time or PST.

Next, I needed to collect two pieces of data: MySQL CPU usage and the number of active Apache web server processes. I started with a couple of shell script ps commands.

MySQL CPU usage

ps aux | grep mysqld | grep -v grep | awk '{ print $3 }'

Count of Apache web processes

ps ax | grep httpd | grep -v grep | wc -l

While Splunk can understand a standard timestamp as a timestamp it needs some metadata to describe the information that these commands are returning. That means each piece of information needs a name or “field”. This creates a key/value pair it can use when searching on the information later.

In other words the MySQL command above will return a number like “23.2″. Splunks needs a name like “MySQLCPU”. The key/value pair then needs to be in the form of:

MySQLCPU=23.2

This is the entire script to return the timestamp and two key/value pairs separated by tabs:

#!/bin/sh

# RFC-3339 date format, Pacific
TIMESTAMP=$( date “+%Y-%m-%d %T-08:00″ )

# Get CPU usage of the mysqld process
CPUPERCENTAGE=$( ps aux | grep mysqld | grep -v grep | awk ‘{ print $3 }’ )

# Get count of httpd processes
APACHECOUNTRAW=$( ps ax | grep httpd | grep -v grep | wc -l )
APACHECOUNT=$( echo $APACHECOUNTRAW | sed -e ‘s/^[ \t]*//’ )

echo “$TIMESTAMP\\tMySQLCPU=$CPUPERCENTAGE\\tApacheCount=$APACHECOUNT”

It will return a result that looks something like this:

2012-11-20 14:34:45-08:00 MySQLCPU=23.2 ApacheCount=1

Save this script with a descriptive name such as counters.sh. Each Splunk Forwarder server will run it to gather information at specified time intervals and send those results to the Splunk Indexer server. For that see:

[More Splunk: Part 2] Configure a simple Splunk Forwarder

Tags: ,

Comments are closed.