High Availability

About

High Availability involves doing everything possible to keep a server self-maintained, self-healing, and with the highest amount of uptime possible with the lowest amount of administrator interaction. The goal is to keep a server online.

Planning

First, consider the reasons that a server or service could become unavailable. Some examples would be hardware failure, out of memory, incorrect permissions, high CPU usage, failed services, improper configurations.

Causes for service unavailability can be hedged by software solutions that do two jobs: monitor and repair any damage possible, and alert the administrators of possible issues that require personal attention.

Prevention

The best way to prevent outages is to gauge the expected server load, and configure the services to operate within parameters that will not exceed the available resources.

Accessibility

Granting access to a system is the most critical part of availability because without it, there is no chance to do administration and correction.

Goals
  • Keep a system online
  • Keep all services active
  • Remote access available
  • Warn systems admin about possible issues
  • Reduce single points of failure
Methodologies
  • Monit - manage services
  • collectd - gather system statistics
  • High availability proxies - round-robin of servers
  • Database clusters