High Availability

About

High Availability involves doing everything possible to keep a server self-maintained, self-healing, and with the highest amount of uptime possible with the lowest amount of administrator interaction. The goal is to keep a server online.

Planning

First, consider the reasons that a server or service could become unavailable. Some examples would be hardware failure, out of memory, incorrect permissions, high CPU usage, failed services, improper configurations.

Causes for service unavailability can be hedged by software solutions that do two jobs: monitor and repair any damage possible, and alert the administrators of possible issues that require personal attention.

Prevention

The best way to prevent outages is to gauge the expected server load, and configure the services to operate within parameters that will not exceed the available resources.

Accessibility

Granting access to a system is the most critical part of availability because without it, there is no chance to do administration and correction.

Goals

Keep a system online
Keep all services active
Remote access available
Warn systems admin about possible issues
Reduce single points of failure

Methodologies

Monit - manage services
collectd - gather system statistics
High availability proxies - round-robin of servers
Database clusters