Differences

This shows you the differences between two versions of the page.

Link to this comparison view

routine_server_maintenance [2014/11/13 08:38]
routine_server_maintenance [2014/11/13 08:38] (current)
Line 1: Line 1:
 +====== Routine Server Maintenance ======
  
 +These are some examples of routine maintenance you can do on a server.
 +
 +Before maintenance begins, **stop all cron daemons** and **stop all monitoring services** on affected servers. ​ And don't forget to restart when finished.
 +
 +== Check for emails sent to root user ==
 +
 +Often, when there is an error or output from a cron job, the local mail service will send it to ''​root@localhost''​.
 +
 +Most of these can be minor complaints from the system, but some can point to cruft that needs to be cleaned up.  An example would be logrotate complaining about an old service that was once installed, and it can't find the logs anymore.
 +
 +Install alpine and setup the mailserver to deliver mail locally for its own hostnames.
 +
 +This is a good place to start to check for anomalies.
 +
 +== Test outgoing mail server ==
 +
 +Verify outgoing email works with local sendmail instance. ​ Verify that SendGrid outgoing email is working.
 +
 +== Look at cron jobs ==
 +
 +Make sure they are all running and producing the output you want.
 +
 +== Review contents of /etc/cron.* ==
 +
 +When programs are uninstalled,​ they sometimes leave behind scripts.
 +
 +== Review system logs ==
 +
 +Find out how many PHP errors there are, what ones have been recent, and check for Warning and Fatal errors.
 +
 +Check to see if the files are growing large and size, and need to be rotated.
 +
 +== Standardize file locations ==
 +
 +Update the location of files to updated standards. ​ PHP error logs, for an example.
 +
 +== Verify backups ==
 +
 +Make sure that all the proper backups are executing. ​ Download and unpack a GPG tarball to verify it's correctly created.
 +
 +Create md5sum hashes for backed up files that are stored in alternate places.
 +
 +== Check Amazon S3 uploads ==
 +
 +Make sure S3 keys work properly. ​ Migrate backups to year-month folders as well.
 +
 +== Examine crontabs ==
 +
 +Check crontab entries to make sure they are commented and are clearly described. ​ Verify cron jobs are executing.
 +
 +Install crag.
 +
 +== Check monit status ==
 +
 +Make sure all services are being monitored and properly reported.
 +
 +== Webmin ==
 +
 +Upgrade to the latest version if necessary. Refresh modules to make sure everything is available in the menu.  Setup IP restrictions. ​ Setup hostname configuration so it goes to the right URL.  Setup SSL using local certificates if available
 +
 +== NTP ==
 +
 +Run NTP client to sync local time.
 +
 +== Log rotation ==
 +
 +Make sure all the system and application logs are both being created and properly rotated. ​ Make sure that programs are logging to the latest log file, and not an old one.
 +
 +Remove old logs that are not needed anymore.
 +
 +== Verify MySQL backups ==
 +
 +Make sure that MySQL is doing proper backups. ​ Do test imports on an alternate system.
 +
 +== Check ZFS snapshots ==
 +
 +Make sure that they are being properly created, and are still accessible.
 +
 +== Time Machine ==
 +
 +Check for old entries and remove them if necessary.
 +
 +== Apple Hardware ==
 +
 +Make sure that everything is up-to-date with the latest security releases.
 +
 +== Check users, permissions ==
 +
 +Look for any instances of logins, cron jobs, personal files left lying around from old employees on servers. ​ Remove any old SSH authorized key entries on accounts that we have access to.
 +
 +== Update hosting information ==
 +
 +Look through known SSH hosts, and document any that we no longer have access to, or should not have access to.
 +
 +== Run external scans ==
 +
 +Scan our public servers for issues with networking or security.
 +
 +== Check firewall ==
 +
 +Verify that the firewall is configured properly, and is allowing access from all the right IP addresses.
 +
 +== OpenSSL configuration ==
 +
 +Upgrade to latest version. ​ Disable Heartbeat extension and old ciphers.
 +
 +== OpenSSH configuration ==
 +
 +Check for mis-configured SSH services.
 +
 +== Create READMEs ==
 +
 +Create a README and MOTD files for servers that have known issues or pending changes so that when SSH'​ing into a remote box, the users are aware.