Windows Server Maintenance Checklist

Updates

  • New package updates have been installed within the last month.
  • Keeping your server up to date is one of the most important maintenance tasks that needs to be done. Before applying updates to your server, confirm that you have a recent backup or snapshot if working with a virtual machine so that you have the option of reverting back if the updates cause you any unexpected problems.

    Where possible you should aim to test updates on a test server first if you are applying them to a production server. This allows you to first confirm that the updates will not break your server and that they will be compatible with any other packages or software you may be running. If possible make use of a WSUS server; this will allow you to approve and control the updates that go to particular server groups in your environment.

    You can apply updates through Windows Update, which is accessible through Start > Control Panel > System and Security >Windows Update, or just by typing Windows Update into the Start search box. From here you can check for and install updates, and select if they should download and install automatically. This is enabled by default and the recommended option. It may not always be possible to have production servers rebooting whenever there are updates out of hours so you may need to disable these and schedule in updates and reboots as required. If you have numerous servers you can configure Windows updates through group policy instead of locally on each server.

  • Other applications have been updated in the last month.
  • Other web applications, such as Wordpress/Drupal/Joomla, need to be frequently updated, as these sorts of applications act as a gateway to your server, usually by being more accessible than direct server access and by allowing public access in from the Internet. Lots of web applications may well also have third party plugins installed which can be coded by anyone, potentially having many security vulnerabilities. As such it is crucial to update these sorts of applications installed on your server very frequently.

    These content management systems are not managed through Windows updates as they are standalone pieces of software. They may have update functions within them, or otherwise you may need to manually download and apply updates from an online source, particularly if you’re unsure contact the application provider.

Security

  • Server access reviewed within the last 6 months.
  • To increase security you should review who has access to your server. In any given organization you may have staff who have left but still have accounts with access; these should be removed or disabled. There may be local accounts on the server or domain accounts in active directory if your server is a member of a domain with varying degrees of access, such as an administrator who should no longer be granted such permissions. Another group to check is the remote desktop users group as this allows the user to remotely connect. This should be reviewed to avoid a possible security breach. Server access can be reviewed through the security log in event viewers on each server, or on the domain controller in a domain environment. You can check the members of important groups such as administrator, domain administrator, and remote desktop users.

  • Firewall rules reviewed in the last 6-12 months.
  • Firewall rules should also be reviewed from time to time to ensure that you are only allowing required inbound and outbound traffic. Requirements for a server change over time: as applications are installed and removed, the ports that it is listening on may change, potentially introducing vulnerabilities, so it is important to restrict this traffic correctly.

    Windows operating systems come with Windows Firewall installed and running by default; only inbound traffic is restricted while all outbound traffic is allowed out. You can test for ports that are responsive from another external server by using telnet to a specific port. You can also enable auditing events so that you can log and view denied traffic.

  • Confirm that users must change passwords.
  • User accounts should be configured to expire after a period of time. Common periods are anywhere between 30-90 days. This is important so that the user password is only valid for a set amount of time before the user is forced to change it. This increases security because if an account is compromised it will not always be able to be used as the password will change to something different – access by an attacker will not be maintained through that account. It is also worth checking that no users have been set to have their password never expire, for further information see here.

    If your accounts are using Active Directory, this can be set centrally for the accounts through group policy. Otherwise, you can set this on a per account basis locally on the server itself through local users and computers. However, this is not as scalable as using active directory because you need to implement the changes on all of your servers individually, which will take time and be harder to manage.

Backups

  • Backups and restores have been tested and confirmed to be working.
  • It is important to back up your servers in case of data loss. It is equally important to actually test that your backups work and that you can successfully complete a restore. Check that your backups are working on a daily or weekly basis - most backup software should be able to notify you if a backup task fails and should be investigated.

    It is a good idea to perform a test restore every few months or so to ensure that your backups are working as intended. This may sound time consuming but it is well worth it. There are countless stories of backups appearing to work until all the data is lost; only then do people realise that they are not actually able to restore the data from backup.

    You can back up locally to the same server, which is not recommended, or you can back up to an external location either on your network, or out on the Internet - this could be your own server or a cloud storage solution like Amazon S3 or Acronis backup for Windows Server. A simple and useful option could also be to enable shadow copies, which allow you to easily revert files.

Monitoring

  • Monitoring has been checked and confirmed as working correctly.
  • If your server is used in production you most likely have it monitored for various services. It is important to check and confirm that this monitoring is working as intended and that it is reporting correctly so you know you will be correctly alerted if there are any issues. It is possible that incorrect firewall rules may disrupt monitoring, or that your server may be performing different roles now and so may need to be monitored for additional services.

    If you’re using Anturis for monitoring you will be alerted if there are any problems, such as a failure to connect, so you will be able to fix any such issues quickly and you will not have to regularly check that your servers are still being monitored correctly. If you have a server monitored already it is also very easy to add or remove monitors to fit the current role of the server so that you can monitor the required services.

  • Resource usage has been checked in the last month.
  • Resource usage is typically checked as a monitoring activity. It is, however, good practice to observe long term monitoring data in order to get an idea of any resource increases or trends which may indicate that you need to upgrade a component of your server to make it capable of working under the increased load.

    This information can be monitored with Anturis; you can view CPU usage, free disk space, free physical memory and other variables. This is beneficial because you can monitor all of your servers from one central location and determine if any need to be upgraded based on past resource usage and performance levels, as well as receive alerts when a set threshold level is reached, which can help indicate that you may need to upgrade or otherwise to investigate where the increase has come from. Through Windows you can also define performance monitors to monitor various resources over time and present them as a graph.

  • Hardware errors have been checked in the last week.
  • Critical hardware problems will likely show up on your monitoring and be obvious as the server may stop working correctly. You can potentially avoid this scenario by monitoring your system for hardware errors which may give you a heads up that a piece of hardware is having problems and should be replaced in advance before it fails. Through Windows, the best place to observe such events is through event viewer, typically under Windows logs > System, look for warnings and critical events.

File system maintenance

  • Unused applications have been removed in the last month.
  • You can save both disk space and reduce your attack surface by removing old and unused applications from your server, hardening it as there is less code available for an attacker to make use of. Checking for unused applications and removing them if not required should be done on a monthly basis. You can view the currently installed applications through Start > Control Panel > Programs > Programs and Features, or just searching for Programs and Features in start. It will display a list of everything installed, when it was installed, and the size used. If you find anything suspicious that should not be there it should be removed and you should investigate how it got there immediately.

  • Disk integrity checked in the last month.
  • The hard drive in a server typically has the most moving parts, meaning it has the potential to have the most problems so it should be checked often. With the ‘chkdsk’ command you can scan the hard drive and check for a number of problems. You can do this in Computer by right clicking the drive, Select properties > Tools > Check now. This is how you graphically run chkdsk and repair errors, otherwise you can use the command through command prompt.

    NTFS introduced an online self-healing feature in Windows Server 2008 resulting in chkdsk not being needed as often. Since Windows Server 2012, corruption can now be scanned for and fixed online with no down time. In previous versions of Windows Server, before 2012, the file system volume would need to be taken offline in order to scan and repair, though you can run a scan only online without the repair option at any time. If you are running a server operating system that is older than 2012 you will need to schedule in the downtime to run the repair if the chkdsk scan detects any faults.

Other general tasks

  • Event logs and statistics are being monitored daily or weekly.
  • Windows will report important events that can be viewed through the event viewer. These events should be checked at least weekly for warnings and critical issues. By default a server will log its events locally, which means if you have a large number of servers you need to log into each one to check the events, which is not very efficient.

    To make this task easier you can set up forwarded events where you choose one server to collect all of the events from other servers, allowing you to have one central location to store and view events via event viewer.

  • Regular scans are being run on a weekly/monthly basis.
  • In order to stay secure it is important to scan your server for malicious content. Windows Defender is a first line of defence which comes installed with some versions of Windows. However, ideally, you should be using a more comprehensive antivirus/malware solution and scanning regularly, at least on a weekly basis during a period of time where there will be low resource usage so that the scan will not interfere with normal operations, for instance at 4am on a Saturday. Don’t forget to ensure that your software also automatically updates every day or week to ensure that you are able to detect the latest threats.

  • Check server reliability every month.
  • Windows server comes with a reliability monitor that allows you to view overall system stability and details about events that impact server reliability; it provides a stability index over a period of time to give you an idea of how reliably your server is running.
    Reliability monitor will display your reliability index from 1 to 10 over time. When there are program crashes or other problems, the index will drop. The more stable it is over time with fewer problems, the higher it will increase.

    This is an easy way to gain a quick overview regarding critical events that have happened on the server over time, which can allow you to potentially make connections to events that may have started the problem. For instance you may have increased critical events after installing new software.