Tools For Network Administrators
Tuesday February 14th 2006, 4:45 pm
Filed under:

It often seems like there’s an overabundance of network tools out there, and it’s nearly impossible to separate the gems from the crap. No one has the time to demo every single product and so you often wind up feeling like you might be missing the best tool simply because of lack of time. So, in the hope that this might help someone else out there make some decisions, here’s a list of the tools I find indispensible in maintaining my network:

Nagios

I’ve been through all kinds of monitoring software running the gamut from free to cheap to mind-bogglingly expensive and ultimately it turned out to be one of the free tools that won out. There are literally hundreds of Windows-based monitoring programs out there in the under $100 price range. I’ve tried a few and they’re all basically the same. If you need point-and-click configuration for basic monitoring tasks they in general do a nice job, but for anything more advanced they just don’t cut the mustard. The expensive behemoths of the monitoring world such as Tivoli or NimBUS are much more full-featured but ultimately are cumbersome to configure and lack flexibility unless you have the power of a large IT staff at your disposal to do some custom coding. In the free arena, there are several options that have potential, such as Zabbix, but none are as mature with as large an install base as Nagios.

Nagios on its own is just an engine that executes scripts (plugins) to check pretty much anything you can conceive of. The great part is that Nagios doesn’t care what those scripts are or what they do, so long as they return a value of OK, WARNING, CRITICAL, or UNKNOWN. Many of the more expensive software suites allow for greater granularity of alert severity, but those three are perfect for my needs. Since Nagios doesn’t particularly care anything about the scripts other than what they return, it means it’s ridiculously simple for even a poor coder like me craft my own checks in the language of my choice. That lets me choose the best tool for the job. Sometimes it’s shell scripts, sometimes PERL, sometimes compiled C. And best of all, thanks to the large Nagios userbase, typically someone’s already written a plugin to do exactly what I’m looking for, thus saving me the trouble. There’s even a fantastic site where users can share the plugins they’ve written.

That large userbase also extends to form a fantastic support community of shared plugins, add-ons, tips and tricks and some very informative mailing lists. (Look for me on nagios-users!) The combined expertise of the Nagios community provides faster and more accurate support, in my opinion, than many of the big dollar vendors are able to muster for their products.

A few Nagios add-ons I couldn’t live without:

NRPE

The Nagios Remote Plugin Executor is a lightweight client that has a *NIX and Windows version. The *NIX version can run as a daemon, or out of Inetd/XInetd depending on your preference. The Windows version runs as a service. It allows you to run scripts locally on the client server which is handy when checking things not easily exportable via SNMP or in an environment where SNMP is forbidden. On the Windows side it’s invaluable in getting access to WMI.

Nagiostat

Nagiostat is a handy little application that takes performance data from Nagios in realtime and inserts the data into RRD files. It also has the ability to generate graphs from the RRD files, but in the name of using the best tool for the job, I use Cacti for that part. For me, Nagiostat is the perfect conduit between the monitoring prowess of Nagios and the graphing elegance of Cacti.

Cacti

It sounds silly, but I can’t live without my graphs. They help for capacity planning, trending, and an easy visual way of seeing when something’s out of whack. There are countless RRD front-ends, but for my money Cacti is hard to beat. RRDTool makes my brain hurt sometimes, and Cacti goes a long way towards easing that pain. It’s as close to point-and-click graphing as you’re going to find and it makes beautiful graphs. My biggest beef with Cacti at this point is the clunky user management interface, but the developers of Cacti are an on-the-ball bunch and I expect that to improve.

Like Nagios, Cacti also has a devoted following and excellent support channels via their web forums.

IP Plan

Anyone who manages a network that spans more than a few hundred IP’s knows that they need something to help manage their usage. Spreadsheets quickly become inadequate as you begin to want to store more and more data about individual IP’s. Unfortunately, the pickings are slim when it comes to IP management tools. Too many come with extra crap aimed at Windows-centric shops, which is great for those folks I guess but those of us running heterogeneous networks have no use for the extra gew-gaws that ultimately clutter up the application and impede us from doing our jobs. Then there’s IP Plan. It’s a very simple LAMP-based application — takes literally 5 minutes to install, but is surprisingly a very powerful tool. It has a very simple, clean interface and gets right down to business. You tell it about the subnets you operate, and it can populate some fields automagically via DNS lookups and you’re off to the races. You can allocate subnets, detail hostname information, generate reverse DNS records in BIND format, and search based on any of the fields available. Fantastic! I use it store all sorts of contact information, abuse information, anything I might want to know about a particular IP on my network.

RTG

95th Percentile. Everyone bills that way, why is it there are so few pieces of software that can accurately calculate it? It’s not a difficult thing — store 5 minute averages every 5 minutes for a month, lop off the top 5 % of the samples and bill based on the next highest sample. This isn’t rocket science. Yet of all the tools capable of tracking bandwidth usage (Cacti included), very few have 95 percentile capabilities, or if they do, don’t do it right. Many introduce more averaging into the monthly sample, giving you garbage results. Others throw in way too many extra “features” that just aren’t necessary. RTG does it perfectly. It runs as a daemon and polls your network devices and stores the values in a MySQL database. The documentation is sparse, but let’s face it if you need this tool you can probably figure it out. The 95 percentile calculation can be done via an included perl or php script, or you can always easily roll your own. That makes it extremely easy to integrate into your billing system — it took our in-house developer less than a day start to finish to use RTG to automatically bill our customers for bandwidth usage. RTG also has the ability to generate graphs based on the values it stores but it makes some of the ugliest graphs you’ll ever see. Stick with Cacti for the graphing.

LanSpy

LanSpy does the same thing as RTG, but must run on Windows, and uses an Access database as the back-end. It’s more user-friendly (web-based configuration) and draws prettier graphs, but the Access database is a major limitation and it doesn’t store data more than 1 month old other than in a round-robin fashion. I’ve also found it doesn’t scale well beyond beyond a few hundred monitored interfaces and is a beast on system resources. I used this until RTG came into my life. It’s great for small installations and those who don’t have the technical chops or desire to work with RTG.

IsoQlog

Log analysis is one of the more difficult jobs of systems administrator. You’ve got gigs and gigs of log data and oftentimes sifting through that data for the bit you need is akin to finding a needle in a haystack. IsoQlog works with the logs of most any mail server out there and mines out the interesting bits. It will tell you how many messages were sent and received, and how much data was sent and received in total and by domain. It’s great for capacity planning, and identifying anomalies that might indicate a compromised server spewing spam.

NetVault

Is there anything more important to an enterprise infrastructure than the backup platform? Even with the best equipment and the most redundant setup, eventually something (or more likely someone) is going to screw up and send you to your backups. You can’t afford to have an OOPS when that happens. We evaluated all the top competitors in the backup space (Legato, Veritas, BrightStor, NetVault) and ultimately selected NetVault. Each has their plusses and minuses, but the things that put NetVault on top were it’s outstanding *NIX support (All major UNIX and Linux flavors are supported) and database support. NetVault has plugins for just about every type of database including some hard-to-find-support-for databases such as PostgreSQL, as well as plugins for Microsoft Exchange and other proprietary database-driven applications. In use, we’ve found NetVault to be rock-solid reliable, as well as blazingly fast. It does have a rather steep learning curve, but once you get the hang of it it is quick and elegant to configure and work with. But, like any of the options in this arena, be prepared to break out the pocketbook — they’re all electrifyingly expensive.


No Comments so far
Leave a comment



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)