When a developer needs to add more capaticy to a computer system, he usually considers two ways to do so: horizontal scaling or vertical scaling. Which strategy is selected depends on the problem being solved, and the limited resources in the system. In this post, we’ll go over both of these scaling strategies, and discuss the pros and cons of each. If you’re building a software system that needs to grow, you either select a scaling strategy explicitly, or a strategy is selected implicitly. Be intentional about knowing how your system is going to grow.
In a vertical scaling model, the process of adding more capacity means taking existing actors in a system and increasing their individual power. For example, let’s say you’re in charge of overseeing a lumber harvesting operation.
In this example, let’s assume you have 3 trucks that can carry 25 felled trees per load, and it takes 1 hour to move each load down the road to where it needs to be further processed. Given these numbers, we see that the maximum capacity of our system is:
3 trucks * 25 trees * 1 hour/load = 75 trees processed per hour
Assuming we’ve chosen a vertical scaling capacity model, how would we respond if we wanted to be able to process 150 felled trees per hour? We’d need to do one of two things: either double the carrying capacity of each truck (50 trees per hour), or halve the time it takes for each truck to process each load (30 minutes).
3 trucks * 50 trees * 1 hour/load = 150 trees processed per hour OR 3 trucks * 25 trees * 30 minutes/load = 150 trees processed per hour
We haven’t increased the number of actors in the system, but we have increased the productivity of each actor to achieve the desired jump in capacity.
In a horizontal scaling model, instead of increasing the capacity of each individual actor in the system, we simply add more actors to the system. In our lumber harvesting example, this means adding more trucks to move the lumber. So when we need to increase our capacity from 75 trees per hour to 150 trees per hour, we simply add 3 more trucks:
6 trucks * 25 trees * 1 hour/load = 150 trees processed per hour
The productivity of each actor in the system remains the same, but we’ve added more trucks to the system.
Scaling Your Web Database
With a basic understanding of horizontal and vertical scaling, let’s look at scaling a web system. There are numerous components in a website that need their scalability properties considered, I’d like to focus on one that usually ends up being the most critical: the database. Why is the database the most critical? Because your user’s data is usually what people care about the most. Because data is often a shared resource, it becomes the main contact point for nearly every web request.
What Kind of System Is Yours?
The most important question you have to ask when considering the scalability of your database is, “What kind of system am I working with?” Are you working with a read-heavy or a write-heavy system? Examples of a read-heavy website might include: An online shopping site, where most people spend the majority of their time browsing (reads) and only a small amount of their time purchasing (writes), or a blog, where the majority of the time people are consuming posts (reads), and only a small amount of the time are commenting or the author is posting (writes). On the flip side, good examples of a write-heavy system include: A credit card transaction processor, where the main workload is journaling transactions (writes), and occasionally looking up transactions (reads), or Google Analytics, where the majority of the workload is journaling traffic data (writes) and occasionally showing graphs of the analytics (reads).
Knowing what kind of system you’re building will help you select the right technologies when your website has to grow.
If your website is primarily a read-heavy system, vertical scaling your datastore with a relational database such as MySQL or PostgreSQL can be a good choice. Couple your RDBMS with a robust caching strategy that uses memcached or a CDN and you’ll have a system that can scale pretty cheaply. In this model, when the database runs out of capacity, putting more pieces of data in the cache helps offset the burden of reads. When there’s no more items left to cache, upgrading your database hardware with faster disks or more processors will usually buy you the necessary runway. Moore’s law makes vertical scaling with this method as simple as buying better hardware.
If your website is primarily a write-heavy system, you’re probably going to want to think about using a horizontally scalable datastore such as Riak, Cassandra or HBase. Unlike most RDBMSes, these datastores usually grow by adding more nodes. Because your system is going to be mostly writing, caching layers will not help you much like in a read-heavy system. Many write-heavy systems start out using a vertical scaling strategy, but soon run out of runway. Why? Because hard-drives and processor counts plateau at a certain point, and the marginal cost of adding one more core or a harddrive that does a few more I/O ops per second grows exponentially. If you instead choose a horizontally scalable strategy for your write-heavy system, you reach an inflection point where the marginal cost of adding one more node to the system becomes far cheaper than the cost of a harddrive that might eek out a few more disk seeks.
Another thing to keep in mind is the often unforseen costs of each scaling strategy. In a vertical scaling setup, extra costs are placed on the isolated individual components of the system. As we add more capacity to the system, the individual components become more costly to manage. From our lumber harvesting example, if we make our trucks to carry twice the number of trees per load, our trucks beds are going to have to get either longer, wider, or taller. Perhaps there’s a height restriction for the roadway based on bridge height, or a width restriction based on lane width, or a length restriction based on safe driver maneuverability. There’s a limit to how much vertical scaling you can do with the individual components of the truck. The same concepts apply to vertical scaling servers: more processors require more case room which requires more individual server rackspace.
In contrast, a system that scales horizontally places extra costs on the connected shared components of the system. As we add more capacity to the system, the shared costs associated with coordinating the actors increases. In our lumber harvesting example, as we add more trucks to the road, the road is a shared resource that becomes constrained. Can that many trucks even fit on the road at the same time? Do we have enough safe loading zones that all the trucks can be receiving lumber simultaneously? If we look at our horizontally scalable database system, the often overlooked cost on the system becomes the network that connects the servers together. As you add more nodes to the system, this shared resource often becomes increasingly taxed, usually in a non-linear fashion.
Fitting It Together
Like most things in computers, good solutions are not usually so simple as what I’ve outlined here. I’ve attempted to simplify the ideas in order to speak to the concepts, rather than any specific tactics. Scaling is a hard problem, that needs pragmatic thought at every step of the process. There is no magic scaling tactic, or magic software that will help you build an entirely reliably scalable system. Like many other problems of scale, the larger solution is usually made of hundreds of tiny solutions all working together in unison. Getting each of them right takes careful design and at every step of development.