anturis.com

How CloudLinux Helps You Handle Noisy Neighbors and Security Problems

- Clifford -

CloudLinux is an extension to CentOS/RHEL that gives a web hosting company granular control over the number of resources that are assigned to each customer. CloudLinux solves noisy neighbors problem by preventing one customer from spinning too many processes and thereby dominating the machine. Plus it improves security: one customer cannot see another customer’s processes or file system.

Control resources by customer, boost security and performance

Compare CloudLinux with a hypervisor – because it is the exact opposite of that. A hypervisor, like VMWare, lets different operating systems run on the same server, thus supporting more than one customer operating system at a time in a cloud environment. A shared web hosting environment does not do that; instead a web hosting company runs multiple customer domains on the same web server under the same operating system. CloudLinux was created to address an inherent problem of this set up.

With different websites all running on the same server, with the same version of Apache or LiteSpeed, the busiest website is going to slow down the others as it begins to consume more resources than the other sites on the server. CloudLinux allocates limits for memory, processes and disk I/O on a per customer basis, so that one customer cannot slow down others. This results in a more stable hosting environment for everyone on that server.

CloudLinux provides Hypervisor-like functions that prevent customers from seeing each other’s processes and file systems. In regular CentOS/RHEL without CloudLinux installed, a user can see other customer processes and is prevented from navigating their file system only by Linux file permission settings or php.ini. CloudLinux provides a more secure way to handle that, by creating an abstraction layer on top of basic kernel functions. While control panels such as Plesk and cPanel (which work with CloudLinux) aim to prevent these sorts of security problems, the CloudLinux kernel goes a step further and increases security by helping prevent symlink attacks trace exploits and can restrict visibility to ProcFS.

CloudLinux enables the web hosting company or server owner to control the percent of CPU utilization, the maximum number of CPU cores a customer can access, memory usage, disk I/O, and the number of concurrent processes that can be run by that user.

Why is this important? Consider an example: the Apache web server spawns a new process for each user connection when it is working in single-thread, thread safe mode. Each of these con­sume lots of memory, especially if they are running something like Python which requires a new instance of the Python interpreter for each process, as Python does not run in-process in Apache. So a busy site will dominate the machine resulting in a slow down for everyone. CloudLinux controls how many processes a customer can create, limiting this problem to the customer’s own account: if they start to consume a large amount of resources they will be cut off; the server will still be performing fine, only that customer’s account processes and websites will slow down. This is a far better scenario than one in which everyone else in the shared environment experiences problems because of some other user, so in summary CloudLinux ensures fairness and stability.

Lightweight Virtual Environment (LVE)

The main CloudLinux feature is an isolated and limited runtime environment for each customer account, which is named a lightweight virtual environment (LVE), essentially a container. LVE is the kernel-level technology that has the same origins as container based virtualization. It limits CPU, memory, and disk I/O, and other resources assigned to each user, based on the CloudLinux configuration, which can either be set for the whole server or differently for each user account. LVE improves overall server performance, by limiting user accounts and points out which users need more resources. This gives the web hosting company a chance to work with such customers in order to fix the performance problems, moreover increasing revenue by selling them additional capacity in the form of a package with higher resources or their own dedicated server.

Controlling account limits also allows a web host to increase server density with less negative impact on existing accounts on the server. This works by having more accounts on the same server. Greater stability and reliability is also achieved as accounts together cannot use up more resources than are available on the server and put it into overloaded condition resulting in downtime and an unresponsive server. Here is an example of what can be controlled with CloudLinux (default settings for a container):

For example, below are the resource usage statistics (fetched from LVE manager) for an account with a maximum entry processes limit of 20.

The meaning of the table headers is explained below.

From/To The time period the information is from
aCPU Average CPU usage (as a percentage out of 100%)
mCPU Max CPU usage (as a percentage out of 100%)
lCPU CPU Limit (always 100% – 100% of whatever limit in place)
aEP Average Entry Processes
mEP Max Entry Processes
lEP maxEntryProc limit (concurrent processes, limited to 20 (default))
aMem Average memory that has been used
mMem Maximum memory that has been used
lMem Memory limit assigned to the account
MemF Out Of Memory Faults
MepF Max Entry processes faults

Another example of an account that periodically hits its 512M memory limit.

What happens when an account hits LVE limits?

Different things happen to an account depending on which limit is hit, here we’ll cover what happens when memory, CPU or disk I/O limits are hit and how you can investigate the increased resource usage.

The resource usage can be viewed by a user through their control panel, where they will be able to see information displayed in the previous images, as well as a graph which displays the resource usage over time.

  • High memory usage: The most popular sign indicating that the memory limit has been hit is when the website displays the error “500 Internal Server Error”. This error can display for many other reasons though, such as a poorly configured .htaccess file. However, if it is intermittent then it may be CloudLinux. You can see the script that may have triggered the memory error in the Apache error logs; this allows you to investigate the script further if it always tends to be the same thing triggering the limit.
  • High CPU usage: The most common sign alerting you that the CPU limit has been hit is that the website displays the error “503 Resources Unavailable”. The best thing to investigate first is the currently running processes under the user account, which can be done via the command line by running “ps aux | grep username”, or by viewing the highest resource intensive processes with the ‘top’ command. You may find stale processes that can be killed, or at least get an idea of what is causing the problem. For instance, if all the processes are Apache then you could look at the logs for the user and see which pages are being hit hard, and then optimize the website from there. There are also some CloudLinux specific commands which can help such as ‘lveps’, which display the process lists for LVE containers.
  • High number of concurrent processes: Checking for this is similar to high CPU usage; you’ll see the message “508 Resource Limit Reached” displayed if you try to load a website that currently is hitting the process number limit. Furthermore, the table will display the maximum entry processes number as being the same as the maximum entry process limit – because the number of processes running cannot pass the limit. The process list can be viewed for the user with either ‘lveps’ or ‘ps aux | grep username’. This feature helps prevent DoS attacks against the server as Apache will not be tied up with lots of connections, allowing other accounts to continue to work fine, rather than taking Apache down for the whole server.
  • High Disk I/O: When the disk I/O limit is reached the processes are throttled and put to sleep ensuring that the processes are unable to pass the limit defined. Currently I/O limits are only available under CloudLinux 6.x (CentOS/RHEL 6) and not 5.x – so if you’re running version 5 I/O, limits will not apply. If a file is in disk cache it can still be loaded without contributing to the I/O limits. The best method to investigate high I/O usage is to install ‘iotop’ with the yum command, iotop can be used to list the processes by disk read/write operations and these can then be further investigated as required.

CageFS

CageFS can be installed on a CloudLinux server and provides each customer with a dedicated virtual file system. This changes the operating system so that one customer cannot see another customer´s directories, files, or processes. This is better than using php.ini to secure directories as that does not work for cgi scripts. With CageFS a user cannot see Apache config files, thus they cannot see all the virtual servers set up on the machine.

For example, with CageFS when a user looks at the /etc/passwd file they cannot see other user accounts on the system, only the ones they need for their own environment. This helps to increase security because by default the /etc/passwd file needs to be world readable on a server. So, if an attacker gained access to an account they would be able to easily get a list of all other user accounts on the server to attack. However, with CageFS this sort of activity is not allowed. As there is no access to SUID files the majority of privilege escalation attacks are also prevented as only safe binaries are available to the user, further reducing the attack surface on the server.

PHP Selector

PHP Selector is available for installation with CloudLinux via the yum command and is important because it allows the customer to run whatever version of PHP they want. This is great for sites that are not ready to upgrade to the newest release, perhaps due to legacy code they may be running. Each customer can also use whatever version of PHP extensions they choose and tweak php.ini settings independently for their own website. For instance, you can define the versions available on the server and allow a user to select to use PHP 5.2, 5.3, 5.4, 5.5, or 5.6. This is an extremely useful feature as it saves a lot of time for the hosting company: they no longer need to migrate accounts to newer servers with newer versions of PHP, which can also potentially break customer websites if they are not ready for the newer version of PHP – instead, the customer can just change this as required.

MySQL Governor

MySQL Governor is also available with CloudLinux. The web hosting company installs this using yum and then runs a Python script so installation is straightforward. MySQL Governor can also be used to kill off long-running slow SQL SELECT queries that may be hanging around consuming resources unnecessarily.

CloudLinux uses an algorithm to detect MySQL usage abuse, such as too many bytes read/written from or to a database, or a large number of MySQL connections which can cause MySQL connectivity problems for everyone on the server if there are none free. The algorithm is smart enough and takes into account many different parameters to detect an abusive user. It will then limit a customer from accessing MySQL for a period of time. If abuse continues, the periods of restriction can also get longer.

All restricted users are placed into the same LVE and you can then control the resources available to the restricted accounts. There is a downside, though, as it requires quite a bit of fine-tuning to make the algorithm work for your particular server configuration, but once set up it is greatly beneficial.Previously, before MySQL Governor with just CloudLinux alone, the MySQL user would be able to cause high load on a server because the SQL queries did not count towards a user’s resource usage; now that they are monitored and this problem is essentially resolved.

Securelinks

This is a security feature that prevents symbolic link hacker attacks, which are quite common. A symbolic link attack lets one account view another account’s .php files. In this scenario, one user can create a symbolic link to point to another user’s wp-config.php file, or any other file for that matter as files typically have 644 permissions set, meaning any other user can read them. That file can then be read using a web browser because the Apache has permission to do that, or the content can be viewed via command line. SecureLinks prevents this and is implemented at the kernel level. It will ensure that a file will only be served by Apache if it is owned by the same user as the owner of the Apache VirtualHost – so take care that all user files are owned by the user and not by root, for example, as these files will not work correctly.

Installation and Conversion

CloudLinux does not replace CentOS/RHEL; it simply repacks CentOS/RHEL with a modified kernel to provide better control in a multi-tenancy hosting environment. The installation on CentOS/RHEL 5.x or 6.x is very straightforward – the root user runs just one script then reboots the machine to boot into the CloudLinux kernel. There is no need to recompile the kernel manually; it’s all taken care of via yum.

The installation typically takes less than 20 minutes: the root user logs in and runs cldeploy. This script is downloaded from CloudLinux – for further information see the official documentation for converting existing servers. The script detects whether cPanel or Plesk are installed and adds the required GUI components for them automatically. Once the script has completed you just need to reboot to finish the process.

How Anturis monitors CloudLinux

The administrator can use Anturis to set up monitoring of LVE faults and usage throttling. There are several reasons you would want to know which customer’s resources are being restricted because of CloudLinux.

  • Being able to centrally monitor all of your servers from a one location. Currently you have to view the LVE statistics on a per server basis. As such, you need to be logged into that server to see the information, which can make monitoring difficult if you have a large number of servers. Anturis makes this easy by centralizing all of the information into one location.
  • An account that is hitting the resource limit could be infected by malware or otherwise be trying to abuse resources, which could threaten the whole system resulting in high resource usage – this is something you would want to be aware of in order to fix quickly.
  • If an account regularly hits or comes close to its resource limits you may wish to offer the customer an upgrade path to a package with more resources or their own server if performance issues with their website cannot be optimized once identified by CloudLinux. With centralized monitoring it is far easier to get this information which can be used to increase overall revenue by upselling products.

In order to set up CloudLinux monitoring, you need to use Anturis Custom Monitor and use Cloud Linux’s lveinfo utility.

Below are some command line examples, for further information have a look at the man pages for each command, for example ‘man lveinfo’, to see detailed information on all the command options.

To monitor the number of LVE faults, use this command:

lveinfo –display-username –show-all –limit=25 –order-by=any_faults –by-fault=any -r 1 -d 1 –period=5m | tee /dev/stderr | wc -l“, and set threshold – “<1” and period – 5 mins.

To monitor the number of LVE with max CPU usage >90% use this command:

lveinfo –display-username –show-all –limit=25 –order-by=cpu_max –by-usage=cpu_max -p 90 -d 1 – iod=5m | tee /dev/stderr | wc -l“, and set threshold – “<1” and period – 5 mins.

To show I/O, use:

lveinfo –display-username –show-all –limit=25 –order-by=io_max –by-usage=io_max -p 90 -d 1 -period=5m | tee /dev/stderr | wc -l“, and set threshold – “<1” and period – 5 mins.

Container environment

In summary, CloudLinux is a product that web hosting companies are widely starting to use to create a virtualized container environment for customer accounts, with each customer being given an allocated set of resource limits on the server. This helps balance machine resources between customers, so that one customer does not cause problems for everyone else on the server, resulting in increased performance, stability and uptime for the server overall – keeping customers happy, as well as allowing you to find high resource users who may require upselling to a package with higher limits. This is all improved further with Anturis as you can view all of this information from one location, which solves scaling problems when a large number of servers need to be managed and monitored.

Leave a Comment

Your email address will not be published. Required fields are marked *

 
 
 

We are glad you have chosen to leave a comment. Please keep in mind that comments are moderated according to our comment policy.