The rapid growth of the world of information has made applications which handles that information to be large, complex, slow and resource hungry. Therefore, the main focus of today’s age of application developers is to find a way to tackle many or all of these issues. One of the popular approaches currently used is making applications scalable.
Scalability is in essence, the capability of a system, network or process to handle a growing amount of work or its potential to be enlarged in order to accommodate that growth. Therefore, any good system design must inculcate most of the below points to be a scalable application.
- The ability to store large amounts of data on demand
- The ability to perform a large number of transactions
- The ability to handle large traffic loads
One of the most important points to keep in mind is that scaling your application needs a lot of careful planning. While you focus on rectifying one problem, it can introduce other unforeseen issues to the system. Few examples of this are as follows:
- It takes longer to add new features
- Code can be harder to test
- Finding and fixing bugs is more frustrating
- Getting local and production environments to match is more difficult
It is also wise to keep in mind that solutions documented by others (e.g. Google, Facebook) may not suit your problem and it can bring adverse effects to your system.
Premature optimization is the root of all evil.
— Donald Knuth
When considering scalability, there are two types of scaling to choose from:
- Vertical -- Adding more and more resources (CPU, Memory, Storage) to a single system when the demand increases. This approach has limits and costs involved can increase exponentially.
- Horizontal -- Adding more systems to handle the resource demands. This approach needs careful attention and better architecture.
While vertical scaling is straightforward, this article’s main focus will be on horizontal scaling and the best practices you can use to design and scale your web application. Most of these practices are industry standard and does not confine to any specific programming language.
1. Application Partitioning
The idea is to partition the application into smaller applications based on functionality and group them to provide better performance and scaling. Few best design practices to follow in this regard are SOA (Service Oriented Architecture), ROA (Resource Oriented Architecture). Ideal architecture is where each independent component communicates with the other and if any component fails, it should not impact the system to a “Single point of failure”. While decoupling of the system into related functionality provides more flexibility to scale them independently, it also provides provisions to code and develop individual modules independently. Therefore, modular development leads to developers being experts in their own modular subset without having to worry about the whole system. Modular development also helps to scale database tier, as using different databases for each module provides further scaling than one large database for the whole system.
2. Distributed Caching
A web application can store frequently accessed data such as results of a database query or computation intensive work in a cache. This avoids the web application from accessing slow databases or file systems and instead retrieves data directly from the fast local cache memory. With Distributed Caching, an application can scale linearly by adding more nodes to the cache cluster. Popular distributed cache technologies currently are Memcached, Redis and Infinispan.
Since caching is all about minimizing the amount of work a system does, it is recommended to put caching in its own tier rather than using application server machines. It also help scaling the caching tier independently of the application tier.
3. CDN
A Content Delivery Network (CDN), in theory, is a large, geographically distributed system of servers deployed in the Internet. They serve content to end-users with high availability and high performance which are mostly used for delivering static content like CSS, images, javascript, static HTML pages near to the user location. The idea is to find the best possible server which can fulfill the request in the least amount of time by considering minimum network hops, highest availability and minimum requests. Popular CDNs are Akamai or Amazon CloudFront.
4. Group Shared Resources
Some application specific files, such as configuration and user uploaded content can be stored in a shared storage cluster where other modules of the application can access. This will alleviate the need to replicate or copy the same content over other application modules storage space and minimizes the consumption of network bandwidth and CPU usage. Popular methods include NAS and SAN for on-premise storage cluster or Amazon S3, OpenShift Swift for Cloud implementation.
5. Asynchronous and Parallel Processing
Web applications must be designed in such a way that two processes or jobs are not too interdependent on their successful completion and failure of one process can mean a deadlock on the application. One such example is the background jobs like order processing, sending emails, sending notifications, etc. which can be executed in a multi-threaded asynchronous environment. It can make use of queues and parallel worker processing. Popular queues are RabbitMQ, ZeroMQ, Apache and ActiveMQ.
6. Handling Application Sessions and State
When scaling web applications, session and state must not be stored or handled by the application tier, rather it must be handled by a capable load balancer or state and session details must be stored in a database which each cluster can access. Popular load balances are NginX, Pound, Squid and Inlab.de’s Balance.
What gets measured, gets managed
— Peter Drucker
References:
- https://en.wikipedia.org/wiki/Scalability
- http://www.aosabook.org/en/distsys.html
- https://www.airpair.com/aws/posts/building-a-scalable-web-app-on-amazon-web-services-p1
- http://www.tomsitpro.com/articles/understanding-big-data-scalability-book-excerpt,2-805-2.html