Posts Tagged ‘forwarder’

[More Splunk: Part 3] Report on remote server activity

Wednesday, November 28th, 2012

This post continues [More Splunk: Part 2] Configure a simple Splunk Forwarder.

With data flowing from the Splunk Forwarders into the Splunk Receiver server, the last step toward getting meaningful information is to create a search for specific data and put it into a report.

Splunk searches range from simplistic strings such as “error” to complex phrases that resemble Excel formulas mixed with shell scripting. To extract the data gathered from a remote server will require narrowing down the location of the data from host to source to field and then manipulating the field values to get meaning from them.

Creating a search

After logging in to the Splunk Receiver server, select Search from the App menu.

Choose Search

This presents a page with a seemingly simple search field at the top with three panels below called “Sources”, “Source Types” and “Hosts”. The window is actually a very helpful formula builder for creating complex searches. Locate the Hosts area. This lists both the local computer as well as all Splunk Forwarders.


Clicking any of the host names, in this case “TMI”, begins building the search formula. It automatically inserts a correctly formatted string into the Search field:


At the same time Splunk displays a table of data from that host and begins displaying a dynamic graph based on that data. Without any filtering or refining it’s displaying the count of records from log files it has gathered. Interesting but not very useful.

Host search

Now that the data shown is narrowed down to the server, let’s narrow it down to the data coming from the script running on the server. The script is considered the “source” of the data and the path to the script is the value:

hosts="TMI" source="/Applications/splunkforwarder/etc/apps/talkingmoose/bin/"

This search result narrows Splunk’s results considerably. Note that Splunk is highlighting the host and source information in the textual data. Also, note how the graph consistently shows “1″ across its scope. This indicates it’s reporting one record for each time reported. Again, not very useful.

Source search

What we really want are the values of the results displayed over time. This is handled by the “timechart” function in Splunk. The formula now pipes the data returned from the host and source into a function:

host="TMI" source="/applications/splunkforwarder/etc/apps/talkingmoose/bin/" | timechart avg(MySQLCPU)

Remember that the script was written to denote “fields” called “MySQLCPU” and “ApacheCount”. Using the field name in the timechart function returns the values over time. Using “avg” returns the average of the values (really, just the average of the one value). The final result is a simple table of data, which is all that’s needed to create a report.


Creating a report

Now, we can graph this table of data. From the Create menu select Report… Splunk creates a rough graph, which is useful but not very easy to read.

Initial graph

Using the formatting options above the graph, adjust these items:

  • Chart type: area
  • Chart title: MySQL CPU Usage

Area graph

To save this graph so that it’s easily accessible without having to recreate the search each time, let’s add it to a dashboard. A dashboard is a single Splunk page that can act as an overview for multiple related or unrelated processes or servers.

From the Create drop down menu select Dashboard panel… Name the new panel “MySQL CPU Usage” and click the Next button. If an appropriate dashboard already exists simply choose to add the panel to that existing dashboard. Otherwise, name the new dashboard itself “Servers Dashboard” and click the Next button. Click the Finish button when done.

To view the report panel without having to recreate the search each time, locate the Dashboards & Views menu and select the Servers Dashboard.

Select dashboard

A dashboard can hold any number of report graphs for one or multiple machines. Create a new search and then create a new report based on that search. When done save it to the dashboard. Drag and drop panels on the page to reorder them or put higher priority panels toward the top or left of the page.

[More Splunk: Part 2] Configure a simple Splunk Forwarder

Monday, November 26th, 2012

This post continues [More Splunk: Part 1] Monitor specific processes on remote servers.

So far, I have a simple shell script called that will return two pieces of information I want fed into my Splunk indexer server:

  • MySQL CPU usage
  • Count of Apache web server processes

It’s going to return a result that looks something like:

2012-11-20 14:34:45-08:00 MySQLCPU=23.2 ApacheCount=1

Install the Forwarder

For each server  I need a Splunk agent called a Forwarder installed. The Forwarder’s purpose is to send the data collected on the local server to a remote Splunk server for indexing and reporting. Splunk offers three types of Forwarders but I want the one with the lightest weight and overhead—a Universal Forwarder. For my testing I downloaded the Mac OS X 10.7 installer and installed it onto OS X 10.8 without any noticeable issues.

At this point the Forwarder service hasn’t been started yet. I first want to add my script and a couple of configuration files. The configuration files are necessary because the Universal Forwarder has no web interface to facilitate point and click configuration.

Create and populate the app directory

First, I want to create a folder for my “app”. An app is a directory of scripts and configuration files. By creating my own app directory I can control the behavior of its contents, overriding preset server defaults if I choose.

mkdir /Applications/splunkforwarder/etc/app/talkingmoose/

Inside my app folder I’ll create two more called bin and local:

mkdir /Applications/splunkforwarder/etc/app/talkingmoose/bin
mkdir /Applications/splunkforwarder/etc/app/talkingmoose/local

The bin folder is a Splunk security requirement. Any executable, such as a script, must reside in this folder. This is where I’ll place my script and make it executable using chmod +x.

The local folder will contain two plain text configuration (.conf) files:

  • inputs.conf
  • outputs.conf

Put simply, inputs.conf is the configuration file that controls executing the script and getting its data into the Splunk Forwarder. And outputs.conf is the configuration file that controls sending the data out to the indexing server or “Splunk Receiver”. These files can be very simple or very complex depending on the needs. I like simple.

Contents of inputs.conf

disabled = false
interval = 60.0

This .conf file tells the Splunk Forwarder where to find the script to execute and then executes it every 60 seconds.

Contents of outputs.conf


This .conf file tells the Splunk Forwarder to send its collected script data to a specific IP address on port 9997 where the Splunk Receiver is listening.

Configure the Splunk Receiver to listen

All that’s left to do is configure the Splunk Receiver to listen for data coming in from Splunk Forwarders on port 9997 via its web interface and start the Splunk Forwarder’s service via its command line utility.

Enable receiving

On the Splunk Receiver server, the server accepting all the data for searching later, click the Manager link in the upper right corner and then click Forwarding and receiving. Click on Configure receiving and then click the New button to create a new listening port. Enter 9997 or another port number not commonly used. Click the Save button.

Enable forwarding

On each Splunk Forwarder the necessary files are already in place. The only task left is to start the Forwarder’s service.

sudo /Applications/splunkforwarder/bin/splunk start

If this is the first time running the start command then press the spacebar repeatedly to read the license agreement or press “q” to quit and immediately accept the agreement.

To test that the Forwarder is working run the list command:

sudo /Applications/splunkforwarder/bin/splunk list forward-server

If prompted for credentials use Splunk’s defaults:

Splunk username: admin
Password: changeme

It should return something that looks like this:

Active forwards:
Configured but inactive forwards:

Searching on the Splunk Receiver should also return results from the Forwarders. Search for host="<forwarderHostName>".

Now that remote server data is flowing into the Splunk indexer machine the last step is to search for it and turn it into meaningful reports.

[More Splunk: Part 3] Report on remote server activity

[More Splunk: Part 1] Monitor specific processes on remote servers

Thursday, November 22nd, 2012

I was given a simple Splunk project: Monitor MySQL CPU usage and Apache web server processes on multiple servers.

Splunk is an amazing product but it’s also a beast! While it may be just a tool in one administrator’s arsenal of gadgets it could very well be another administrator’s full time job. Installing the software is a breeze and getting interesting reports is child play. Getting the meaningful reports you want, on the other hand, requires skills in the realms of system administration, scripting, statistics and formula building (think Excel).

My first project with the software was to monitor two things on remote servers:

  • MySQL CPU usage
  • Count of Apache web server processes

It sounds simple but involves a few pieces:

  • Writing a script to get the data
  • Configuring servers as Splunk Forwarders
  • Forwarding the data to a central server
  • Creating a search to populate a meaningful chart

Create the script

This is the easy part but it requires some special formatting to get Splunk to recognize the data it returns.

First, Splunk parses most any log file based on a time stamp and it can recognize many different versions of timestamps. The data following the timestamp constitutes the rest of the row or record. When Splunk gets to a second timestamp it considers that information to be another record.

So, my script output needed a timestamp. I followed the RFC-3339 specs (one of many formats), which describes something that looks like this:

2012-11-20 14:10:14-08:00

That’s a simple calendar date followed by a time denoted by its offset from GMT time. In this case the -08:00 denotes Pacific Standard Time or PST.

Next, I needed to collect two pieces of data: MySQL CPU usage and the number of active Apache web server processes. I started with a couple of shell script ps commands.

MySQL CPU usage

ps aux | grep mysqld | grep -v grep | awk '{ print $3 }'

Count of Apache web processes

ps ax | grep httpd | grep -v grep | wc -l

While Splunk can understand a standard timestamp as a timestamp it needs some metadata to describe the information that these commands are returning. That means each piece of information needs a name or “field”. This creates a key/value pair it can use when searching on the information later.

In other words the MySQL command above will return a number like “23.2″. Splunks needs a name like “MySQLCPU”. The key/value pair then needs to be in the form of:


This is the entire script to return the timestamp and two key/value pairs separated by tabs:


# RFC-3339 date format, Pacific
TIMESTAMP=$( date “+%Y-%m-%d %T-08:00″ )

# Get CPU usage of the mysqld process
CPUPERCENTAGE=$( ps aux | grep mysqld | grep -v grep | awk ‘{ print $3 }’ )

# Get count of httpd processes
APACHECOUNTRAW=$( ps ax | grep httpd | grep -v grep | wc -l )
APACHECOUNT=$( echo $APACHECOUNTRAW | sed -e ‘s/^[ \t]*//’ )


It will return a result that looks something like this:

2012-11-20 14:34:45-08:00 MySQLCPU=23.2 ApacheCount=1

Save this script with a descriptive name such as Each Splunk Forwarder server will run it to gather information at specified time intervals and send those results to the Splunk Indexer server. For that see:

[More Splunk: Part 2] Configure a simple Splunk Forwarder