Posts Tagged ‘splunk’

[More Splunk: Part 4] Narrow search results to create an alert

Wednesday, January 30th, 2013

This post continues [More Splunk: Part 3] Report on remote server activity.

Now that we have Splunk generating reports and turning raw data into useful information, let’s use that information to trigger something to happen automatically such as sending an email alert.

In the prior posts a Splunk Forwarder was gathering information using a shell script and sending the results to the Splunk Receiver. To find those results we used this search string:

hosts="TMI" source="/Applications/splunkforwarder/etc/apps/talkingmoose/bin/counters.sh"

It returned data every 60 seconds that looked something like:

2012-11-20 14:34:45-08:00 MySQLCPU=23.2 ApacheCount=1

Using the timechart function of Splunk we extracted the MySQLCPU field to get its value 23.2 and put that into a graph for easier viewing.

Area graph

Returning to view that graph every few minutes, hours or days can get tedious if nothing really changes or the data isn’t out of the ordinary. Ideally, Splunk would watch the data and alert us when something is out of the ordinary. That’s where alerts are useful.

For example, the graph above shows the highest spike in activity to be around 45% and we can assume that a spike at 65% would be unusual. We want to know about that before processor usage gets out of control.

Configuring Splunk for email alerts

Before Splunk can send email alerts it needs basic email server settings for outgoing mail (SMTP). Click the Manager link in the upper right corner and then click System Settings. Click on Email alert settings. Enter public or private outgoing mail server settings for Splunk. If using a public mail server such as Gmail then include a user name and password to authenticate to the server and select the option for either SSL or TLS. Be sure to append port number 465 for SSL or 587 for TLS to the mail server name.

Splunk email server settings

In the same settings area Splunk includes some additional basic settings. Modify them as needed or just accept the defaults.

Splunk additional email server settings

Click the Save button when done.

Refining the search

Next, select Search from the App menu. Let’s refine the search to find only those results that may be out of the ordinary. Our first search found all results for the MySQLCPU field but now we want to limit its results to anything at 65% or higher. The where function is our new friend.

hosts="TMI" source="/Applications/splunkforwarder/etc/apps/talkingmoose/bin/counters.sh" | where MySQLCPU >= 65

This takes the result from the Forwarder and pipes it into an operation that returns only values of the MySQLCPU field that are greater than or equal to “65″. The search results, we hope, are empty. To verify the search is working correctly, change the value temporarily from “65″ to something lower such as “30″ or “40″. The lower values should return multiple results.

On a side note but unrelated to our need, if we wanted an alert for a range of values an AND operator connecting two statements will limit the results to something between values:

hosts="TMI" source="/Applications/splunkforwarder/etc/apps/talkingmoose/bin/counters.sh" | where MySQLCPU >= 55 AND MySQLCPU <=65

Creating an alert

An alert will evaluate this search as frequently as Splunk receives new data and if it spots any results other than nothing then it can do something automatically.

With the search results in view (or lack of them), select Alert… from the Create drop down menu in the upper right corner. Name the search “MySQL CPU Usage Over 65%” or something that’s recognizable later. One drawback with Splunk is that it won’t allow renaming the search later. To do that requires editing more .conf files. Leave the Schedule at its default Trigger in real-time whenever a result matches. Click the Next button.

Schedule an alert

Enable Send email and enter one or more addresses to receive the alerts. Also, enable Throttling by selecting Suppress for results with the same field value and enter the MySQLCPU field name. Set the suppression time to five minutes, which is pretty aggressive. Remember, the script on the Forwarder server is sending new values every minute. Without throttling Splunk would send an alert every minute as well. This will allow an administrator to keep some sanity. Click the Next button.

Enable alert actions

Finally, select whether to keep the alert private or share it with other users on the Splunk system. This only applies to the Enterprise version of Splunk. Click the Finish button.

Share an alert

Splunk is now looking for new data to come from a Forwarder and as it receives that new data it’s going to evaluate it against the saved search. Any result other than no results found will trigger an email.

Note that alerts don’t need to just trigger emails. They can also run scripts. For example, an advanced Splunk search may look for multiple Java processes on a server running a Java-based application. If it found more than 20 spawned processes it could trigger a script to send a killall command to stop them before they consumed the server’s resources and then issue a start command to the application.

Manually delete data from Splunk

Thursday, December 27th, 2012

By default Splunk doesn’t delete the logging data that it’s gathered. Without taking some action to remove data it will continue to process all results from Day 1 if doing an All time search. That may be desirable in some cases where preserving and searching the data for historical purposes is necessary, but when using Splunk only as a monitoring tool the older data becomes superfluous after time.

Manually deleting information from Splunk is irreversible and doesn’t necessarily free disk space. Splunk users should only delete when they’ve verified their search results return the information they expect.

Enabling “can_delete”

No user can delete data until he’s been provided the can_delete role. Not even the admin account in the free version of Splunk has this capability enabled. To enable can_delete:

  1. Click the Manager link and then click the Access Controls link in the Users and authentication section.
  2. Click the Users link. If running the free version of Splunk then the admin account is the only account available. Click the admin link. Otherwise, consider creating a special account just for sensitive procedures, such as deleting data, and assigning the can_delete role only to that user.
  3. For the admin account move the can_delete role from the Available roles to the Selected roles section.
    Enable can_delete
  4. Click the Save button to keep the changes.

Finding data

Before deleting data be sure that a search returns the exact data to be deleted. This is as simple as performing a regular search using the Time range drop down menu to the right of the Search field.

The Time range  menu offers numerous choices for limiting results by time including Week to dateYear to date and Yesterday. In this case, let’s search for data from the Previous month:

Search time range

Wait for the search to complete then verify the results returned at the results that need deleting.

Deleting data

Deleting the found data is as simple as performing a search and then piping it into a delete command:

Search and delete

 

This runs the search again, deleting found results on the fly, which is why searching first before deleting is important. Keep in mind that a “delete and search” routine takes as long or longer to run as the initial search and consumes processing power on the Splunk server. If deleting numerous records proceed with one search/delete at a time to avoid overtaxing the server.

[More Splunk: Part 3] Report on remote server activity

Wednesday, November 28th, 2012

This post continues [More Splunk: Part 2] Configure a simple Splunk Forwarder.

With data flowing from the Splunk Forwarders into the Splunk Receiver server, the last step toward getting meaningful information is to create a search for specific data and put it into a report.

Splunk searches range from simplistic strings such as “error” to complex phrases that resemble Excel formulas mixed with shell scripting. To extract the data gathered from a remote server will require narrowing down the location of the data from host to source to field and then manipulating the field values to get meaning from them.

Creating a search

After logging in to the Splunk Receiver server, select Search from the App menu.

Choose Search

This presents a page with a seemingly simple search field at the top with three panels below called “Sources”, “Source Types” and “Hosts”. The window is actually a very helpful formula builder for creating complex searches. Locate the Hosts area. This lists both the local computer as well as all Splunk Forwarders.

Hosts

Clicking any of the host names, in this case “TMI”, begins building the search formula. It automatically inserts a correctly formatted string into the Search field:

host="TMI"

At the same time Splunk displays a table of data from that host and begins displaying a dynamic graph based on that data. Without any filtering or refining it’s displaying the count of records from log files it has gathered. Interesting but not very useful.

Host search

Now that the data shown is narrowed down to the server, let’s narrow it down to the data coming from the counters.sh script running on the server. The script is considered the “source” of the data and the path to the script is the value:

hosts="TMI" source="/Applications/splunkforwarder/etc/apps/talkingmoose/bin/counters.sh"

This search result narrows Splunk’s results considerably. Note that Splunk is highlighting the host and source information in the textual data. Also, note how the graph consistently shows “1″ across its scope. This indicates it’s reporting one record for each time reported. Again, not very useful.

Source search

What we really want are the values of the results displayed over time. This is handled by the “timechart” function in Splunk. The formula now pipes the data returned from the host and source into a function:

host="TMI" source="/applications/splunkforwarder/etc/apps/talkingmoose/bin/counters.sh" | timechart avg(MySQLCPU)

Remember that the counters.sh script was written to denote “fields” called “MySQLCPU” and “ApacheCount”. Using the field name in the timechart function returns the values over time. Using “avg” returns the average of the values (really, just the average of the one value). The final result is a simple table of data, which is all that’s needed to create a report.

Timechart

Creating a report

Now, we can graph this table of data. From the Create menu select Report… Splunk creates a rough graph, which is useful but not very easy to read.

Initial graph

Using the formatting options above the graph, adjust these items:

  • Chart type: area
  • Chart title: MySQL CPU Usage

Area graph

To save this graph so that it’s easily accessible without having to recreate the search each time, let’s add it to a dashboard. A dashboard is a single Splunk page that can act as an overview for multiple related or unrelated processes or servers.

From the Create drop down menu select Dashboard panel… Name the new panel “MySQL CPU Usage” and click the Next button. If an appropriate dashboard already exists simply choose to add the panel to that existing dashboard. Otherwise, name the new dashboard itself “Servers Dashboard” and click the Next button. Click the Finish button when done.

To view the report panel without having to recreate the search each time, locate the Dashboards & Views menu and select the Servers Dashboard.

Select dashboard

A dashboard can hold any number of report graphs for one or multiple machines. Create a new search and then create a new report based on that search. When done save it to the dashboard. Drag and drop panels on the page to reorder them or put higher priority panels toward the top or left of the page.

[More Splunk: Part 2] Configure a simple Splunk Forwarder

Monday, November 26th, 2012

This post continues [More Splunk: Part 1] Monitor specific processes on remote servers.

So far, I have a simple shell script called counters.sh that will return two pieces of information I want fed into my Splunk indexer server:

  • MySQL CPU usage
  • Count of Apache web server processes

It’s going to return a result that looks something like:

2012-11-20 14:34:45-08:00 MySQLCPU=23.2 ApacheCount=1

Install the Forwarder

For each server  I need a Splunk agent called a Forwarder installed. The Forwarder’s purpose is to send the data collected on the local server to a remote Splunk server for indexing and reporting. Splunk offers three types of Forwarders but I want the one with the lightest weight and overhead—a Universal Forwarder. For my testing I downloaded the Mac OS X 10.7 installer and installed it onto OS X 10.8 without any noticeable issues.

At this point the Forwarder service hasn’t been started yet. I first want to add my script and a couple of configuration files. The configuration files are necessary because the Universal Forwarder has no web interface to facilitate point and click configuration.

Create and populate the app directory

First, I want to create a folder for my “app”. An app is a directory of scripts and configuration files. By creating my own app directory I can control the behavior of its contents, overriding preset server defaults if I choose.

mkdir /Applications/splunkforwarder/etc/app/talkingmoose/

Inside my app folder I’ll create two more called bin and local:

mkdir /Applications/splunkforwarder/etc/app/talkingmoose/bin
mkdir /Applications/splunkforwarder/etc/app/talkingmoose/local

The bin folder is a Splunk security requirement. Any executable, such as a script, must reside in this folder. This is where I’ll place my counters.sh script and make it executable using chmod +x.

The local folder will contain two plain text configuration (.conf) files:

  • inputs.conf
  • outputs.conf

Put simply, inputs.conf is the configuration file that controls executing the script and getting its data into the Splunk Forwarder. And outputs.conf is the configuration file that controls sending the data out to the indexing server or “Splunk Receiver”. These files can be very simple or very complex depending on the needs. I like simple.

Contents of inputs.conf

[script:///Applications/splunkforwarder/etc/apps/talkingmoose/bin/counters.sh]
disabled = false
interval = 60.0

This .conf file tells the Splunk Forwarder where to find the script to execute and then executes it every 60 seconds.

Contents of outputs.conf

[tcpout:group1]
server=192.168.5.42:9997

This .conf file tells the Splunk Forwarder to send its collected script data to a specific IP address on port 9997 where the Splunk Receiver is listening.

Configure the Splunk Receiver to listen

All that’s left to do is configure the Splunk Receiver to listen for data coming in from Splunk Forwarders on port 9997 via its web interface and start the Splunk Forwarder’s service via its command line utility.

Enable receiving

On the Splunk Receiver server, the server accepting all the data for searching later, click the Manager link in the upper right corner and then click Forwarding and receiving. Click on Configure receiving and then click the New button to create a new listening port. Enter 9997 or another port number not commonly used. Click the Save button.

Enable forwarding

On each Splunk Forwarder the necessary files are already in place. The only task left is to start the Forwarder’s service.

sudo /Applications/splunkforwarder/bin/splunk start

If this is the first time running the start command then press the spacebar repeatedly to read the license agreement or press “q” to quit and immediately accept the agreement.

To test that the Forwarder is working run the list command:

sudo /Applications/splunkforwarder/bin/splunk list forward-server

If prompted for credentials use Splunk’s defaults:

Splunk username: admin
Password: changeme

It should return something that looks like this:

Active forwards:
192.168.5.42:9997
Configured but inactive forwards:
None

Searching on the Splunk Receiver should also return results from the Forwarders. Search for host="<forwarderHostName>".

Now that remote server data is flowing into the Splunk indexer machine the last step is to search for it and turn it into meaningful reports.

[More Splunk: Part 3] Report on remote server activity

[More Splunk: Part 1] Monitor specific processes on remote servers

Thursday, November 22nd, 2012

I was given a simple Splunk project: Monitor MySQL CPU usage and Apache web server processes on multiple servers.

Splunk is an amazing product but it’s also a beast! While it may be just a tool in one administrator’s arsenal of gadgets it could very well be another administrator’s full time job. Installing the software is a breeze and getting interesting reports is child play. Getting the meaningful reports you want, on the other hand, requires skills in the realms of system administration, scripting, statistics and formula building (think Excel).

My first project with the software was to monitor two things on remote servers:

  • MySQL CPU usage
  • Count of Apache web server processes

It sounds simple but involves a few pieces:

  • Writing a script to get the data
  • Configuring servers as Splunk Forwarders
  • Forwarding the data to a central server
  • Creating a search to populate a meaningful chart

Create the script

This is the easy part but it requires some special formatting to get Splunk to recognize the data it returns.

First, Splunk parses most any log file based on a time stamp and it can recognize many different versions of timestamps. The data following the timestamp constitutes the rest of the row or record. When Splunk gets to a second timestamp it considers that information to be another record.

So, my script output needed a timestamp. I followed the RFC-3339 specs (one of many formats), which describes something that looks like this:

2012-11-20 14:10:14-08:00

That’s a simple calendar date followed by a time denoted by its offset from GMT time. In this case the -08:00 denotes Pacific Standard Time or PST.

Next, I needed to collect two pieces of data: MySQL CPU usage and the number of active Apache web server processes. I started with a couple of shell script ps commands.

MySQL CPU usage

ps aux | grep mysqld | grep -v grep | awk '{ print $3 }'

Count of Apache web processes

ps ax | grep httpd | grep -v grep | wc -l

While Splunk can understand a standard timestamp as a timestamp it needs some metadata to describe the information that these commands are returning. That means each piece of information needs a name or “field”. This creates a key/value pair it can use when searching on the information later.

In other words the MySQL command above will return a number like “23.2″. Splunks needs a name like “MySQLCPU”. The key/value pair then needs to be in the form of:

MySQLCPU=23.2

This is the entire script to return the timestamp and two key/value pairs separated by tabs:

#!/bin/sh

# RFC-3339 date format, Pacific
TIMESTAMP=$( date “+%Y-%m-%d %T-08:00″ )

# Get CPU usage of the mysqld process
CPUPERCENTAGE=$( ps aux | grep mysqld | grep -v grep | awk ‘{ print $3 }’ )

# Get count of httpd processes
APACHECOUNTRAW=$( ps ax | grep httpd | grep -v grep | wc -l )
APACHECOUNT=$( echo $APACHECOUNTRAW | sed -e ‘s/^[ \t]*//’ )

echo “$TIMESTAMP\\tMySQLCPU=$CPUPERCENTAGE\\tApacheCount=$APACHECOUNT”

It will return a result that looks something like this:

2012-11-20 14:34:45-08:00 MySQLCPU=23.2 ApacheCount=1

Save this script with a descriptive name such as counters.sh. Each Splunk Forwarder server will run it to gather information at specified time intervals and send those results to the Splunk Indexer server. For that see:

[More Splunk: Part 2] Configure a simple Splunk Forwarder

Introducing Splunk: Funny name, serious logging

Thursday, November 15th, 2012

So, my boss says:

“Write an article called ‘Getting Started with Splunk.’”

I reply:

“What, you think I know all this stuff? This really would be a getting started article.”

But here it is and WOW is Splunk cool!

My only experience with Splunk up to a couple days ago was seeing a T-shirt with “Log is my copilot”. I knew it had something to do with gathering log files and making them easier to read and search. In about an hour I had gone to Splunk’s website to research the product, downloaded and installed it, and started viewing logs from my own system. The Splunk folks have made getting their product into their customer’s hands easy and getting started even easier.

What is Splunk?

Simply put, Splunk can gather just about any kind of data that goes into a log (system logs, website metrics, etc.) into one place and make viewing that data easy. It’s accessed via web browser so it’s accessible on any computer or mobile device such as an iPad.

What do I need to run Splunk?

Practically any common operating system today can run Splunk: Mac OS X, Linux, Windows, FreeBSD and more.

How much does Splunk cost?

Don’t worry about that right now. Download and install the free version. It takes minutes to install and is a no-brainer. Let’s get started.

Getting Splunk

IT managers and directors may be interested in watching the introductory and business case videos with the corporate speak (“operational intelligence” anyone?) and company endorsements. Techs will be interested in getting started. Right on their home page is a big green Free Download button. Go there, click it and locate the downloader for your OS of choice. I downloaded the Mac OS X 10.7 installer to test (and installed it on OS X 10.8 without any issues).

Splunk home

This does require a sign-up to create an account. It takes less than a minute to complete. After submitting the information the 100 MB download begins right away.

While waiting for the download…

When the download is on its way the Splunk folks kindly redirect to a page with some short videos to watch while waiting. Watch this first one called Getting data into Splunk. It’s only a few minutes and this is the first thing to do after getting into Splunk.

Installing and starting Splunk

The download arrives as a double-clickable Apple Installer package. Double-click and install it. Toward the end it opens a simple TextEdit window with instructions for how to start, stop and access the newly installed Splunk site.

Install done

Files are installed in /Applications/splunk and resemble a UNIX file system.

Splunk application folder

Open the Terminal application found in /Applications/Utilities and run the command /Applications/splunk/bin/splunk start. If this is the first time running Splunk it prompts to accept its license agreement. Tap the spacebar to scroll through and read the agreement or type “q” to quit and agree to the license.

EULA

Accepting the agreement continues to start Splunk where it displays some brief setup messages.

Starting Splunk

The setup then provides the local HTTP address for the newly installed Splunk site. Open this in a web browser to get to the login screen. The first login requires that the administrator account password be reset.

Splunk login

Following along with the Getting data into Splunk video, Splunk will need some information. Mac OS X stores its own log files. Let’s point to those.

Click the Add Data link to begin.

New Splunk home

Since Mac OS X’s log files are local to the machine, click A file or directory of files.

Add files

Click Next to specify local files.

Add local logs

This opens a window that exposes not only Mac OS X’s visible folders but its invisible folders as well. Browse to /var/log/system.log and click the Select button.

Browse logs folder

For now, opt to skip previewing the log file and click Continue.

Path to system.log

Now, let’s opt to monitor not only the system.log file but the entire /var/log folder containing dozens of other log files as well. Note that Splunk can watch rotated and zipped log files too. Click Save to finish adding logs.

Add /var/log folder

Let’s start searching!

Succes, start searching

The Search window initially displays a list of all logs Splunk is monitoring. To narrow the search change the time filter drop down menu to Last 60 minutes. This will make the results a little easier to see on a system that’s only been running a short while.

Last 24 hours

Now, search for install*. Splunk will only search for the word “install” without providing the asterisk as a wildcard character. Splunk supports not only wildcard searches but booleans, parentheses, quotes, etc. It will return every instance recorded in the logs that matches the search criteria. It also creates an interactive bar chart along the top of the page to indicate the number of occurrences found for the search at particular times.

Search for install

To further refine the search, Option+click most any word in the log entries below and Splunk will automatically add the necessary syntax to remove an item. In this case the install* search returned installinstaller and installd. Option+clicking installd changed the search criteria to install* NOT installd.

Modified search

Now what?

Continue exploring the videos to understand Splunk’s possibilities and take advantage of its Splunk Tutorial, which is available online as well as in PDF format for offline viewing. They do a great job leading users through setup and creating reports.

Still asking about price? Good.

The free version remains free but doesn’t include many features that really make it sing such as monitoring and alerts, multiple user accounts and support beyond the Splunk website. Cost depends primarily on the amount of data you want to suck into Splunk and have it watch. It’s not cheap but for an enterprise needing to meet certain service level requirements it beats browsing through multiple servers trying to find the right log with the right information.

FYI, putting together this 1,000-word article probably took me 10 times longer than performing the Splunk install itself and beginning to learn it. It’s really well-done and easy to use. Splunk makes getting started simple.