2.6. Hardware Platforms
For large-scale web applications, software comprises an important but not complete piece of the puzzle. Hardware can be as significant as software, in the design as well as the implementation stages. The general architecture of a large application needs to be designed both in terms of the software components and the hardware platform they run on. The hardware platform, at least initially, tends to form a large portion of the overall cost of deploying a web application. The cost for software development, comprised of ongoing developer payroll, is usually bigger in the end, but hardware costs come early and all at once. Thus it's important to think carefully about designing your hardware platform in order to be in a position where initial cost is low and the track for expansion is clearly defined.
Donald Knuth said it best, in a quote that we'll be revisiting periodically:
This applies directly to software development, but also works well applied as a rule for hardware platform design andthe software process in general. By starting small and general, we can avoid wasting time on work that will ultimately be thrown away.
Out of this principal come a few good rules of thumb for initial design of your hardware platform:
2.6.1. Shared Hardware
After an application goes past the point of residing solely on your local machine, the next logical step is to use shared hardware. Shared hardware is usually leased from large providers, such as an ISP and hosting service, where a box is shared with many other users. This kind of platform is great for prototyping, development, and even small scale launches. Shared hosting is typically very cheap and you usually get what you pay for. If your application usesa database, then your performance is at the mercy of the other database users. Web server configuration and custom modules are not usually possible. Larger providers offer upgrade paths to move onto dedicated hardware when the time comes, making the transition to a full setup easier.
2.6.2. Dedicated Hardware
The next step up from using shared hardware is moving to dedicated hardware. The phrase "dedicated hardware" tends to be a little misleading, in that in addition to the hardware being dedicated to running your application, you're renting it from a provider who owns and maintains the hardware. With dedicated hardware, your contact still goes only as far as remotely logging in over SSH; you don't need to swap out disks, rack machines, and so on. Dedicated hosting comes in the full range from completely managed (you receive a user login and the host takes care of everything else) to completely unmanaged (you get a remote console and install an OS yourself).
Depending on the scale you want to grow to, a dedicated hardware platform is sometimes the most cost-effective. You don't need to have system administrators on your engineering team and you won't spend developer time on configuration tasks. However, the effectiveness of this setup very much relies on the working relationship between you and the host's network operations center (NOC) and staff. The level of service that hosts provide varies wildly, so it's definitely worth getting references from people you know who are doing similar things.
2.6.3. Co-Located Hardware
This kind of vendor will not last in the long term if you intend to create a really large application. The world's largest web applications require hundreds of thousands of servers, although you're probably not going to reach that scale. Along with the dedicated server model, you have two options. Small companies and startups usually opt to start with co-location. A co-location facility (or "colo") provides space, power, and bandwidth, while you provide the hardware and support.
The services provided by a colo can vary quite a bit. Some will do virtually nothing, while some provide server and service monitoring and will diagnose server issues with you over the phone. All facilities provide network monitoring and basic services such as rebooting a crashed server, although depending on your contract, such services might incur per-incident costs.
Choosing a colo is a big task and should not be taken lightly. While changing colos is certainly possible, it's a big pain that you'll almost certainly want to avoid. If you get stuck in a bad colo further down the line, the effort and cost involved in moving can be enough to dissuade you from ever moving again (a fact that some colos appear to bank on). As with hosting vendors, gather the opinions of other people who host their platforms at the colos you're interested in. In particular, make sure you talk to developers of applications at the same scale as your proposed application. Some colos specialize in small platforms and provide bad support for larger platforms, while some will only provide good service to large customers.
When you get to the point of having a few thousand servers, it's usually beneficial to start running your own data centers (DCs). This is a huge task, which usually involves designing purpose-built facilities, hiring 24-hour NOC and site operations staff, and having multiple redundant power grid connections, a massive uninterruptible power supply (UPS), power filtering and generation equipment, fire suppression, and multiple peering contracts with backbone carriers.
It can sometimes be tempting to self-host hardware on a small scale; getting a leased line into your offices and running servers from there seems simple enough. This is usually not a good idea and should probably be avoided. You will usually end up spending more money and having more problems than you would with other solutions. If you don't have a colo near you, consider hosting in a managed environment, or hiring a systems administrator who lives near a colo. Self-hosting can work well up to the point; then bandwidth gets too expensive (upstream bandwidth to a private location typically costs much more than downstream) or you suffer an outage. Being down for a few days because someone cut through a phone cable is annoying when it's your home connection, but crippling when it's your whole business.
Helping you to create your own DC is definitely outside the scope of this book, but hopefully one day your application will grow to the scale where it becomes a viable option.