Routine Server Maintenance

These are some examples of routine maintenance you can do on a server.

Before maintenance begins, stop all cron daemons and stop all monitoring services on affected servers. And don't forget to restart when finished.

Check for emails sent to root user

Often, when there is an error or output from a cron job, the local mail service will send it to [email protected].

Most of these can be minor complaints from the system, but some can point to cruft that needs to be cleaned up. An example would be logrotate complaining about an old service that was once installed, and it can't find the logs anymore.

Install alpine and setup the mailserver to deliver mail locally for its own hostnames.

This is a good place to start to check for anomalies.

Test outgoing mail server

Verify outgoing email works with local sendmail instance. Verify that SendGrid outgoing email is working.

Look at cron jobs

Make sure they are all running and producing the output you want.

Review contents of /etc/cron.*

When programs are uninstalled, they sometimes leave behind scripts.

Review system logs

Find out how many PHP errors there are, what ones have been recent, and check for Warning and Fatal errors.

Check to see if the files are growing large and size, and need to be rotated.

Standardize file locations

Update the location of files to updated standards. PHP error logs, for an example.

Verify backups

Make sure that all the proper backups are executing. Download and unpack a GPG tarball to verify it's correctly created.

Create md5sum hashes for backed up files that are stored in alternate places.

Check Amazon S3 uploads

Make sure S3 keys work properly. Migrate backups to year-month folders as well.

Examine crontabs

Check crontab entries to make sure they are commented and are clearly described. Verify cron jobs are executing.

Install crag.

Check monit status

Make sure all services are being monitored and properly reported.

Webmin

Upgrade to the latest version if necessary. Refresh modules to make sure everything is available in the menu. Setup IP restrictions. Setup hostname configuration so it goes to the right URL. Setup SSL using local certificates if available

NTP

Run NTP client to sync local time.

Log rotation

Make sure all the system and application logs are both being created and properly rotated. Make sure that programs are logging to the latest log file, and not an old one.

Remove old logs that are not needed anymore.

Verify MySQL backups

Make sure that MySQL is doing proper backups. Do test imports on an alternate system.

Check ZFS snapshots

Make sure that they are being properly created, and are still accessible.

Time Machine

Check for old entries and remove them if necessary.

Apple Hardware

Make sure that everything is up-to-date with the latest security releases.

Check users, permissions

Look for any instances of logins, cron jobs, personal files left lying around from old employees on servers. Remove any old SSH authorized key entries on accounts that we have access to.

Update hosting information

Look through known SSH hosts, and document any that we no longer have access to, or should not have access to.

Run external scans

Scan our public servers for issues with networking or security.

Check firewall

Verify that the firewall is configured properly, and is allowing access from all the right IP addresses.

OpenSSL configuration

Upgrade to latest version. Disable Heartbeat extension and old ciphers.

OpenSSH configuration

Check for mis-configured SSH services.

Create READMEs

Create a README and MOTD files for servers that have known issues or pending changes so that when SSH'ing into a remote box, the users are aware.