May 14th, 2014 - by Walker Rowe
What is the Nginx web and proxy server and how does it compare to Apache? Should you use one of these servers or both? Here we explore some answers to these questions.
The Apache web servers have been in use since 1995. Apache powers more websites than any other product; Microsoft IIS comes in second.
Because the open-source Apache web server has been available for so many years, and has so many users, lots of modules have been written to expand its functionality - most of these are open-source as well. One popular configuration, for instance, is to use Apache to server up static web pages and the mod_jk module to run Java and JSP code on Tomcat to make an interactive application. Another example is using mod_php to execute php scripts without having to use cgi.
But Apache slows down under heavy load, because of the need to spawn new processes, thus consuming more computer memory. It also creates new threads that must compete with others for access to memory and CPU. Apache will also refuse new connections when traffic reaches the limit of processes configured by the administrator.
Nginx is an open source web server written to address some of the performance and scalability issues associated with Apache. The product is open source and free, but Nginx offers support if you buy its Nginx Plus version.
Nginx says its web server was written to address the C10K problem, which is a reference to a paper written by Daniel Kegel about how to get one web server to handle 10,000 connections, given the limitations of the operating systems. In his paper, he cited another paper by Dean Gaudet who wrote, “Why don't you guys use a select/event based model like Zeus? It's clearly the fastest”.
Nginx is indeed event-based. They call their architecture “event-driven and asynchronous”. Apache relies on processes and threads. So, what’s the difference?
How Apache works and why it has limitations
Apache creates processes and threads to handle additional connections. The administrator can configure the server to control the maximum number of allowable processes. This configuration varies depending on the available memory on the machine. Too many processes exhaust memory and can cause the machine to swap memory to disk, severely degrading performance. Plus, when the limit of processes is reached, Apache refuses additional connections.
Apache can be configured to run in pre-forked or worker multi-process mode (MPM). Either way it creates new processes as additional users connect. The difference between the two is that pre-forked mode creates one thread per process, each of which handles one user request. Worker mode creates new processes too, but each has more than one thread, each of which handles one request per user. So one worker mode process handles more than one connection and one pre-fork mode process handles one connection only.
Worker mode uses less memory than forked-mode, because processes consume more memory than threads, which are nothing more than code running inside a process.
Moreover, worker mode is not thread safe. That means if you use non thread-safe modules like mod_php, to serve up php pages, you need to use pre-forked mode, thus consuming more memory. So, when choosing modules and configuration you have to confront the thread-versus-process optimization problem and constraint issues.
The limiting factor in tuning Apache is memory and the potential to dead-locked threads that are contending for the same CPU and memory. If a thread is stopped, the user waits for the web page to appear, until the process makes it free, so it can send back the page. If a thread is deadlocked, it does not know how to restart, thus remaining stuck.
Nginx works differently than Apache, mainly with regard to how it handles threads.
Nginx does not create new processes for each web request, instead the administrator configures how many worker processes to create for the main Nginx process. (One rule of thumb is to have one worker process for each CPU.) Each of these processes is single-threaded. Each worker can handle thousands of concurrent connections. It does this asynchronously with one thread, rather than using multi-threaded programming.
The Nginx also spins off cache loader and cache manager processes to read data from disk and load it into the cache and expire it from the cache when directed.
Nginx is composed of modules that are included at compile time. That means the user downloads the source code and selects which modules to compile. There are modules for connection to back end application servers, load balancing, proxy server, and others. There is no module for PHP, as Nginx can compile PHP code itself.
Here is a diagram of the Nginx architecture taken from Andrew Alexeev´s deep analysis of Nginx and how it works.
In this diagram we see that Nginx, in this instance, is using the FasCGI process to execute Python, Ruby, or other code and the Memcache memory object caching system. The worker processes load the main ht_core Ngix process to for HTTP requests. We also see that Nginx is tightly integrated with Windows and Linux kernel features to gain a boost in performance; these kernel features have improved over time, allowing Nginx to take advantage.
Nginx is said to be event-driven, asynchronous, and non-blocking. “Event” means a user connection. “Asynchronous” means that it handles user interaction for more than one user connection at a time. “Non-blocking” means it does not stop disk I/O because the CPU is busy; in that case, it works on other events until the I/O is freed up.
Nginx compared with Apache 2.4 MPM
Apache 2.4 includes the MPM event module. It handles some connection types in an asynchronous manner, as does Nginx, but not in exactly the same manner. The goal is to reduce memory requirements as load increases.
As with earlier versions, Apache 2.4 includes the worker and pre-forked modes we mentioned above but has added the mpm_event_module (Apache MPM event module) to solve the problem of threads that are kept alive waiting for that user connection to make additional requests. MPM dedicates a thread to handle sockets that are both in listening and keep-alive state. This addresses some of the memory issues associated with older versions of Apache, by reducing the number of threads and processes created. To use this, the administrator would need to download the Apache source code and include the mpm_event_module, and compile Apache instead of using a binary distribution.
The Apache MPM event model is not exactly the same as Nginx, because it still spins off new processes as new requests come in (subject to the limit set by the administrator). Nginx does not set up more processes per user connection. The improvement that comes with Apache 2.4 is that those processes that are spin offs create fewer threads than normal Apache worker mode would. This is because one thread can handle more than one connection, instead of requiring a new process for each connection.
Using Nginx and Apache both
Apache is known for its power and Nginx for speed. This means Nginx can serve up static content quicker, but Apache includes the modules needed to work with back end application servers and run scripting languages.
Both Apache and Nginx can be used as proxy servers, but using Nginx as a proxy server and Apache as the back end is a common approach to take. Nginx includes advanced load balancing and caching abilities. Apache servers can, of course, be deployed in great numbers. A load balancer is needed in order to exploit this. Apache has a load balancer module too, plus there are hardware based load balancers.
Another configuration is to deploy Nginx with the separate php-fpm application. We say that php-fpm is an application, because it is not a .dll or .so loaded at execution time, as is the case with Apache modules. Php-fpm (the FastCGI Process Manager) can be used with Nginx to deliver php scripting ability to Nginx to render non-static content.
When is Apache preferred over Nginx?
Apache comes with built in support for PHP, Python, Perl, and other languages. For example, the mod_python and mod_php Apache modules process PHP and Perl code inside the Apache process. mod_python is more efficient that using CGI or FastCGI, because it does not have to load the Python interpreter for each request. The same is true for mod_rails and mod_rack, which give Apache the ability to run Ruby on Rails. These processes run faster inside the Apache process.
So if your website is predominately Python or Ruby, Apache might be preferred for your application, as Apache does not have to use CGI. For PHP, it does not matter as Nginx supports PHP internally.
Here we have given some explanation of how Nginx is different from Apache and how you might consider using one or both of them, and which of the two might fit your needs more closely. Plus we have discussed how Apache 2.4 brings some of the Nginx improvements in thread and process management to the Apache web server. So you can choose the best solution for your needs.