Ranting, Technically Speaking

November 6, 2009

Nagios Rules All

Filed under: General — Tags: , — archangel @ 1:44 pm

Nagios is a network monitoring application which itself provides no actual monitoring but rather specializes in scheduling checks and notifications. As a module framework, it works very well and there are a lot of monitoring plugins and all told, there aren’t many (or any) systems that really compare, F/OSS, proprietary, or otherwise.

Since it’s not a complete solution in and of itself, I know at least I found it a bit daunting to get in to. So I got this book:

Building a Monitoring Infrastructure with Nagios by David Josephsen

It’s not a huge book like say, HP’s OpenView manual(s), so read it first.

Nagios is super cool. You build definitions for each host on your network and each service on each host. Nagios checks each service recording the service’s status. When a service fails, nagios will send a notification once it is sure the service is down and then periodically until it comes back up.

Fine, that’s the basic premise. Now the configuration works pretty well because any host can inherit it’s configuration from any other host definition including host definition templates. So set your general parameters once, and then override where necessary. It’s the same for services. And you also have host and service groups which allow you logically group hosts (or services).

Nagios doesn’t have any built-in way to check services, it’s all through plugins. A plugin is simply an external script or program which exits with a status of 0 for OK, 1 for warning, or 2 for critical and optional 1 line of standard output for status text. Nagios has many standard plugins available, for example the check_ping plugin. This plugin is a little wrapper script which is invoked with arguments specifying the warning and critical thresholds for response time and packet loss. So in testing a plugin, you can simply invoke the plugin with the arguments that Nagios would be feeding it.

Now if a service goes down, Nagios will check if the host is down. Again, this is a plugin of the same type as for the service. Typically, this means check_ping. So you don’t really need to have a check_ping “service” check, just for the host. So if your host runs a webserver, you would use check_http and if that fails, Nagios will check_ping on the host to see if that’s down. If the host is down, well then obviously all services on that host are a write-off so Nagios will send a notification for that host once rather than for each individual service.

And when I say Nagios will send a notification, it doesn’t know how to do that either. Notifications are also defined but typically the stock notification will suffice. On Fedora, it uses the “mail” program to send a mail message.

Ah, so who does it notify? Well, each service and each host defines contact groups and also contact hours. So Nagios will notify everyone in a contact group if it’s during notification hours. So you can monitor your development systems as well as production ones and only get notifications when appropriate.

Nagios also provides escalations. So if a service (or host) remains down, you can define an escalation path. Maybe level one is help desk, and if they don’t respond, it escalates to supervisors as well, and if they still don’t respond, then it escalates to on-call staff, managers and eventually the head cheese.

What else is cool? Oh yeah, parent-child relationships. On each host, you can define parent hosts. So if you have say several routers throughout your network, connectivity to hosts would depend on connectivity to their routers. So if a router goes down, as with services on a host, Nagios will know to only notify of the router being down and not all the children individually.

There is also an agent for Nagios called the NRPE. It is totally optional but if you want local system checks, like disk, CPU, checking running processes, and not just network service checks, then NRPE lets you do this. Install NRPE on your monitored hosts, and NRPE is available for “Linux” and Windows, and it, I think, is like a little baby Nagios invoked by the mothership. So you install service check plugins with the NRPE and then on the server, your service checks are like check_nrpe!check_disk … or something like that so the server sends the service check to the NRPE on the monitored system. I haven’t used this yet, but will definitely be doing so.

The NEB is another cool part of Nagios. The Nagios Event Broker is an interface where can write programs which hook into Nagios’s regular operations. There’s a couple dozen callback functions you can hook into and this makes the possibilities for Nagios virtually endless.

The part I’ve left for last is the user interface. Well, once again, there is none. You configure it, it fires off notifications, that’s the core. There is a web interface Nagios provides you can use if you want. It will show you host and service status and you can acknowledge alarms through it and schedule downtime for hosts. Now you can hook in more functionality, for example, historical graphs can be very useful. If you’re checking disk usage with Nagios, why not keep a record of it? Well, when you get a service check result back, it comes back with one line of text from standard output, right? Well, there’s packages that will build graphs from this data so that you can have your service status and historical reports too! Josephsen has a pretty extensive discussion on doing this kind of stuff and some great info on some of the options out there.

So, yeah. Get that book! Use Nagios! Monitor everything with it! Let it tell you when your toast is toasted or your beer needs a refill!

- Arch

July 15, 2009

VMware and Unity

Filed under: General, HOWTO, Rant, Troubleshooting — Tags: , , — archangel @ 1:34 pm

I’ve been running Fedora 11 (x64) on my workstation at work and running Windows XP (32b) in a VMware virtual machine. It was a VM I’d created with server so all I needed to run it was the free VMware Player.

First, installing VMware Player was a bit of a problem. The install from RPM didn’t work, it hosed initially. Then the install from the bundle also failed… Much like it does for many users Online it turned out so there was a community-created patch which worked just fine.

Then there was running the VM. Initially, it seemed great. I was running Windows XP full-screen on my right-screen and had my Fedora desktop / apps on my left screen. But it was pretty wonky about mouse control so I got to the point where I was firing up the Windows VM only when I needed it and then not in full-screen mode.

But I discovered that VMware’s Unity mode helps bridge the gap. It pulls you out of console mode and launches any apps from the guest VM in their own windows in your desktop environment. This is especially useful for say, running MSIE or MS Outlook. It’s still a little weird because the apps *look* like they should be running natively yet the responsiveness is clearly far behind local apps, but the only real gap is that I can’t Shift+Right-Click -> Run As… on tools like Active Directory Users and Computers (which I need). I tried switch back to the console, doing the Run As… and then switching back to Unity, but the escalated app doesn’t show up.

Well, it’s great and closes the gap some, but for now I’ll just keep updating Player and Tools and see if eventually that full-screen mode just gets fixed and works transparently.

- Arch

June 6, 2009

Google Apps

Filed under: General, Uncategorized — Tags: — archangel @ 10:00 am

One of the cool services that Google offers is the hosting of various services for your domain. Basically, you can brand Google with your own domain including mail, calendar, chat, docs, sites and “mobile” (I haven’t used “mobile”, but it includes sync services). The service is called Google Apps.

The “standard edition” is pretty much the standard services and limits you to 50 user accounts. And 50 people is quite a few for a personal domain or even a small business. Once you need more features or more accounts, its $50 / year per account. Which, truth be told, is pretty cheap since even just paying for anti-spam/anti-virus filtering is about $30 / year for pretty basic service from Symantec of whomever.

At any rate, I found it a bit confusing at first but mostly because I was setting this up in a sub-domain (dl.thenibble.org) on GoDaddy. But once I got in, it’s pretty easy. You get this dashboard which shows you which services are activated and you can just click on whichever ones you want and if DNS changes are required, it will tell you and give you pretty specific instructions. But there’s a lot. You have to do one just to activate the domain, add aliases for all your services (unless you want to use google.com/apps/mydomain or whatever), and then for email, there’s 5 MX records and for chat there’s about 10 SRV records.

But now that it’s all setup, it’s pretty fancy. You can create email groups, use docs, publish calendars, etc. I tried poking around a bit and really all that Google does for stuff like “sites” when you create an alias under your domain is it just redirects the user to sites.google.com/domain/whatever … So it won’t be a replacement for having a web host. But for email, it will just accept mail at your domain so it’s a full email service.

And standard edition is free. Did I mention that? Yeah, it’s ad-supported, but otherwise free.

Fun!
- Arch

April 20, 2009

Retiring the old hand-made blog

Filed under: Blogging, General — Tags: , , , , — archangel @ 5:38 am

On the weekend I finally decided to shutdown my old blog (it was under http://dl.nibble.bz/~archangel). I’d started in in early 2003 and sortof just build my own code. Needless to say, it was very simple with basic “categories” features for links and I built my own archives system, syndication, everything. Well, WordPress also started in 2003 and it now has all these features and they work better and are richer. And their code is maintained.

Moving to WordPress is certainly simple. It imports blogs from many different formats including RSS 2.0 which is a widely accepted standard for blog syndication. Since I’d already built a feed for my blog, but in Atom, I simply had to covert my code to generate RSS 2.0 and then have it pit out all my posts (about 200) into a single RSS 2.0 file.

For the switch to RSS 2.0, I just pulled an RSS 2.0 feed from WordPress. I then used the old copy & paste coding to make a syndication script which produced similar output. Then I debugged by running my output through the W3C Feed Validation Service.

Once my RSS 2.0 was validating, I made the script spit out the full content for all my posts rather than the usual 20, went into WP and just imported it and it took them all just fine.

And now, here we are!

- Arch

Powered by WordPress