Linux Server Maintenance Checklist

Updates

  • New package updates have been installed within the last month.
  • Keeping your server up to date is one of the most important maintenance tasks that needs to be done. Before applying updates to your server, confirm that you have a recent backup or snapshot if working with a virtual machine so that you have the option of reverting back if the updates cause you any unexpected problems.

    If possible you should aim to test updates on a test server first if you are applying them to a production server. This allows you to first confirm that the updates will not break your server and will be compatible with any other packages or software that you may be running.

    You can update all packages currently installed on your server by running a ‘yum update’. Ideally this should be done at least once a month so that you have the latest security patches, bug fixes, and improved functionality and performance. You can automate the update by making use of crontab to check for and to apply updates whenever you like.

  • Other applications have been updated in the last month.
  • Other web applications, such as Wordpress/Drupal/Joomla, need to be frequently updated, as these sorts of applications act as a gateway to your server, usually by being more accessible than direct server access and by allowing public access in from the Internet. Lots of web applications may well also have third party plugins installed which can be coded by anyone, potentially having many security vulnerabilities. As such it is critical to update these sorts of applications installed on your server very frequently.

    These content management systems are not managed by ‘yum’, so they will not be updated with a ‘yum update’ like the other packages installed. The updates are usually provided directly through the application itself - if you’re unsure contact the application provider.

  • Reboot the server if a kernel update was installed.
  • If you ran a ‘yum update’ as previously discussed, check to see if the kernel was listed as an update. Alternatively you can actively update your kernel with a ‘yum update kernel’. The Linux kernel is the core of the Linux operating system and is updated regularly to include security patches, bug fixes and added functionality. Once the kernel has been installed you must reboot your server to complete the process. Before you reboot, run the command ‘uname –r’ which will print the current kernel version that you are booted into.

    After you reboot and the server is running, run the ‘uname –r’ command again and confirm that the newer version that was installed with yum was displayed. If the version number does not change you may need to investigate the kernel that is booted in /boot/grub/grub.conf - yum will update this file by default to boot the updated kernel.

Security

  • Server access reviewed within the last 6 months.
  • To increase security you should review who has access to your server. In any given organization you may have staff who have left but still have accounts with access; these should be removed or disabled. There may be accounts that have sudo access, meaning they have root permissions that should no longer be granted such permissions. This should also be reviewed often to avoid a possible security breach: granting root access is very powerful, you can check the /etc/sudoers file to see who has root access and if you need to make changes do so with the ‘visudo’ command. You can view recent logins with the ‘last’ command to see who has been logging into the server.

  • Firewall rules reviewed in the last 6-12 months.
  • Firewall rules should also be reviewed from time to time to ensure that you are only allowing required inbound and outbound traffic. Requirements for a server change; and as packages are installed and removed the ports that it is listening on may change, potentially introducing vulnerabilities, so it is important to restrict this traffic correctly.

    This is typically done in Linux with iptables or perhaps a hardware firewall that sits in front of the server. You can test for ports that are open by using nmap, and view the current rules on the server by running ‘iptables –L –v’.

  • Confirm that users must change passwords.
  • User accounts should be configured to expire after a period of time, common periods are anywhere between 30-90 days. This is important so that the user password is only valid for a set amount of time before the user is forced to change it. This increases security because if an account is compromised it will not always be able to be used as the password will change to something different – access by an attacker will not be maintained through that account.

    If your accounts are using an LDAP directory, such as Active Directory, this can be set for the accounts there, otherwise in Linux you can set this on a per account basis. However, this is not as scalable as using a directory because you need to implement the changes on all of your servers individually, which will take time.

Backups

  • Backups and restores been tested and confirmed to be working.
  • It is important to back up your servers in case of data loss. It is equally important to actually test that your backups work and that you can successfully complete a restore. Check that your backups are working on a daily or weekly basis - most backup software should be able to notify you if a backup task fails and should be investigated.

    It is a good idea to perform a test restore every few months or so to ensure that your backups are working as intended. This may sound time consuming but it is well worth it. There are countless stories of backups appearing to work until all the data is lost; only then do people realise that they are not actually able to restore the data from backup.

    You can back up locally to the same server, which is not recommended, or you can back up to an external location either on your network, or out on the Internet - this could be your own server or a cloud storage solution like Amazon S3 or Acronis backup for Linux Server.

Monitoring

  • Monitoring has been checked and confirmed as working correctly.
  • If your server is used in production you most likely have it monitored for various services. It is important to check and confirm that this monitoring is working as intended and that it is reporting correctly so that you know you will be correctly alerted if there are any issues. It is possible that incorrect firewall rules may disrupt monitoring, or your server may be performing different roles now and so may need to be monitored for additional services.

    If you’re using Anturis for monitoring you will be alerted if there are any problems, such as a failure to connect, so you will be able to fix any such issues quickly and you will not have to regularly check that your servers are still being monitored correctly. If you have a server monitored already it is also very easy to add or remove monitors to fit the current role of the server so that you can monitor the required services.

  • Resource usage has been checked in the last month.
  • Resource usage is typically checked as a monitoring activity. It is, however, good practice to observe long term monitoring data in order to get an idea of any resource increases or trends which may indicate that you need to upgrade a component of your server so that it is capable of working under the increased load.

    This information can be monitored with Anturis; you can view CPU usage and load levels, free disk space, free physical memory and other SNMP variables. This is beneficial because you can monitor all of your servers from one central location and determine if any need to be upgraded based on past resource usage and performance levels, as well as receive alerts when a set threshold level is reached, which can help indicate that you may need to upgrade or otherwise to investigate where the increase has come from.

  • Hardware errors have been checked in the last week.
  • Critical hardware problems will likely show up on your monitoring and be obvious as the server may stop working correctly. You can potentially avoid this scenario by monitoring your system for hardware errors which may give you a heads up that a piece of hardware is having problems and should be replaced in advance before it fails.

    You can use mcelog which processes machine checks, namely memory and CPU errors on 64-bit Linux systems - it can be installed with ‘yum install mcelog’ and then started with ‘/etc/init.d/mcelogd start’. By default mcelog will check hourly using crontab and report any problems into /var/log/mcelog so you will want to monitor this file regularly every week or so.

File system maintenance

  • Unused packages have been removed.
  • You can both save disk space and reduce your attack surface by removing old and unused packages from your server, hardening it as there is less code available for an attacker to make use of. The command ‘yum list installed’ should display all packages currently installed on your server: ‘yum remove package-name’ will remove the package from your server - just be sure you know what the package is and that you actually want to remove it. Be careful when removing packages with yum; if you remove a package that another package depends on, the dependent package will also be removed, which can potentially remove a lot of things at once. After having run the command, it will confirm the list of packages that will be removed, so carefully double check it before proceeding.

  • File system check performed in the last 180 days.
  • By default after 180 days or 20 mounts (whichever comes first) your servers will be file system checked with e2fsck. This should be run occasionally to ensure disk integrity and to repair any problems.

    You can force a disk check by running ‘touch /forcefsck’ and then rebooting the server: the file will be removed on the next boot, or with the ‘shutdown –rF now’ command to force a disk check on the next boot and perform the reboot now. Alternatively you can use -f instead of –F to skip the disk check; this can be useful for example if you have just performed a kernel update and need to reboot and you want the server back up as soon as possible rather than waiting for the check to complete.

    The mount count can be modified using the tune2fs command - the defaults are pretty good however ‘tune2fs –c 50 /dev/sda1’ will increase the mount count to 50 so a file system check will happen after it has been mounted 50 times. On the other hand ‘tune2fs –i 210’ will change the disk so that it is only checked after 210 days rather than 180.

Other general tasks

  • Logs and statistics are being monitored daily or weekly.
  • If you look through /var/log you will notice that there are a lot of different log files on the server which are continually written to with different information. This is sometimes useful information but most of the time it is irrelevant, leading to a large amount of information to go through.

    Logwatch can be used to monitor your servers’ logs and to email the administrator a summary on a daily or weekly basis – you can control it via crontab. Logwatch can also be used to send a summary of other useful server information, such as the disk space in use on all partitions on the server, so it’s a good way to get up to date notifications from your servers. You can install the package with ‘yum install logwatch’.

    With Anturis you can put even more granular checks in for the log files. For example, if you want to be alerted for a particular type of error log you can set up a log file monitor and that will let you know every time a particular event happens. This means you don’t have to manually connect to the server and regularly review the log files for any problems, allowing you to proactively monitor issues rather than reactively detect issues.

  • Regular scans are being run on a weekly/monthly basis.
  • In order to stay secure it is important to scan your server for malicious content. ClamAV is an open source antivirus engine which detects trojans, malware and viruses and works well with Linux. You can set the cron job to run a weekly scan at 3am for instance and then email you a report outlining the results. Depending on how much content you have, the scan may take a while. It is recommended that you set an intensive scan to run once a week at a low resource usage time, such as on the weekend at night. Check the crontab and /var/log/cron log file to ensure that the scans are running as intended. You can also configure an email summary to be sent to you so you might want to confirm you are receiving these alerts.