Software is not just about creating value through functionality, but about creating value that can scale cheaply and easily. Why is scaling important? Because scaling is how value is easily multiplied, both in the software world and outside it. A fundamental building-block of sustainable business growth is not just innovative product ideas, but how to bring those product ideas to more people. A well architected software system will not only serve a small body of users correctly, but will support a large body of users for only marginally more cost. Software is easier to scale than other things because bytes are cheap to copy, and code naturally creates solutions that are easy to replicate over and over again by obedient servers.
Every software developer should know how to create scalable solutions. Working on projects that grow from small to large scale is not only highly addictive, but also how you magnify good engineering disciplines. Techniques like measuring, adjusting, optimizing, planning for failure, understanding your domain, and knowing the limits of your system are all critical skills for robust systems, but they’re especially important when you’re building systems ‘at scale’.
I’ve been asked, “What is different about a system that needs to operate at scale versus one that doesn’t?” Imagine two different car roads:
The Country Road
The Country Road is small, simple, and effective. It gets you from point A to point B in rural settings. What it lacks in car carrying capacity, it makes up for in low maintenance costs and simple effectiveness. The Country Road will usually have a single lane, might twist and turn, and have bad patches of pot holes. This doesn’t tend to matter much, because when you’re out in the country you’re not often in a great hurry. You’re worried more about the scenery than about fighting big traffic jams or how you’re going to get to your destination on time. The Country Road might be likened to the small web site that only serves a few requests per minute, with one server. Just like a few pot holes won’t hurt the throughput of a country road, a 500 millisecond database query won’t have an effect on the throughput of a small web site. Sure, slow response times might annoy your users, but your web site isn’t going to fall over from it.
In sharp contrast to The Country Road, we have The Freeway. The Freeway is what I would call the solution to a problem at scale. It’s entirely focused on moving the most amount cars from point A to point B in the shortest amount of time and with the greatest carrying capacity. Where you found pot holes and winding roads on The Country Road, The Freeway is nothing but wide, smooth, and straight roadways (usually at least). At this scale, even the slightest defect in the road structure can have bad consequences for total throughput. Travel times are tracked with great precision because knowing the bottlenecks in the system and fixing them are how efficiency is kept. Unlike software, scaling road capacity and throughput is an expensive ordeal - adding lanes is not something you can do at the push of a button. The Freeway might be likened to the large web site that does many thousands of requests per minute or more. Just like The Freeway has many lanes, a big web site has many servers, and has other important elements such as load balancers and caching servers that all serve to keep traffic flowing smoothly. With a large web site, IO bound tasks such as slow database queries can cause major queuing problems, and failures can cause cascading back-up in other parts of the system. Response times and requests per minute are tracked religiously, because having a constant pulse on the system is the only way you can ensure everything is running smoothly.
Why It’s Important
Even at a small scale, having The Scaling Mindset will serve you well. This doesn’t mean the technical solutions of The Freeway should be blindly applied to The Country Road - building each of these different roads means solving different car carrying problems. Do you need line-paint and precision travel-time sensors on The Country Road? Absolutely not. You’re making the system worse if you add these features when they’re not needed. I’ve seen too many engineers (including myself) get caught up in using complicated tools like distributed databases and queuing systems when the problem they’re solving would be better served with a boring relational data store and synchronous communication. Why do engineers get caught in these bear traps of over-engineering? Because using sexy tools like the big guys do feels like the right thing to do. If Facebook is using a distributed database solution, I should be too, right? (especially if I want to be like Facebook). The reality is, for most of today’s web sites, a boring relational database works just great. You must always be asking yourself what tools are necessary to solve the problems at hand, and then ask what tools are necessary to solve problems that might appear tomorrow. Just like a good bridge builder knows the environmental problems that face their bridge designs, a good software engineer knows when picking something like a distributed queuing system is considered wise, and when it’s considered folly.
Unlike roadways, however, small-scale software systems sometimes become large-scale systems. Connected internet software has the potential to experience exponential user growth, sometimes quite literally overnight. One day your startup is trying to put itself on the map, and the next day you’re in the media spotlight and your CPUs are melting and your RAM is exploding and your disks are swapping. How does an engineer prepare for something like this?
Building scalable software is about constantly asking, “How would this system grow? How do I measure it? How do I know where the bottlenecks are? Can this system handle a sudden burst in traffic? Will this design collapse under pressure?” It’s about knowing the limitations of your resources, and knowing how to build a solution that meets the constantly growing demands of your users. This is The Scaling Mindset - a mental discipline that needs to be practiced with every line of code that’s written. When you consider that scalability is one of the fundamental features of good software, it’s amazing to me that so many engineers aren’t asking these questions more often. Perhaps the reason is because not enough engineers get a chance to build up their scalability muscles - too many are languishing with small mind numbing CRUD web applications that don’t really enhance engineering discipline. The bottom line is that the best engineers are constantly questioning the breaking points and behaviors of the systems they build, and are able to design the right systems to meet these problems head-on.
More To Know
There are many deep topics to cover around scalable software. In future blog posts, I’m going to discuss topics like vertical vs. horizontal scaling, scaling reads vs. scaling writes, and other more specific topics. In its simplest form, scalable software is about exercising engineering rigor to its fullest potential. When you design your systems to grow from small to big, you will naturally design systems that are robust, and can meet the changing problems of tomorrow.