Section 2.7. Hardware Platform Growth

2.7. Hardware Platform Growth

As if the problems involved with designing and implementing a hardware platform for a large application weren't already enough, growing your implemented platform from one scale to another brings a whole new set of issues to the table. The hardware platform for a large-scale application usually looks significantly different than its small-scale siblings. If a small application or prototype supporting 100,000 users is based on a single dedicated server, then assuming a linear scaling model (which is not often the case), the same application with a 10 million-person user base will require 100 machines. Managing a server platform with 100 boxes requires far more planning than a single box and adds some additional requirements.

The responsibility for specifying and purchasing hardware, especially on a small team, often falls within the domain of the architectural or engineering lead. In larger operations, these tasks fall into the realm of an operations manager, but having a dedicated coordinator for smaller platforms would be overkill.

When making the jump from a pure engineering role to hardware platform management responsibilities, there are a number of factors to consider that aren't immediately obvious. We'll look at a few of these in turn and try and cover the main pain points involved with organizing an initial build-out.

2.7.1. Availability and Lead Times

When choosing a hardware vendor, in addition to taking into account the specification and cost of the hardware you order, it's important to find out how difficult ordering more of the same will be. If you're planning to rely a single type of hardware for some taskfor instance, a specific RAID controllerthen it's important to find out up front how easy it's going to be to order more of them. This includes finding out if your supplier keeps a stock of the component or, if not, how long the lead time from the manufacturer is. You don't want to get caught in a situation where you have to wait three months for parts to be manufactured, delaying your ability to build out.

Depending on the vendors you choose, reliability and products being discontinued can also be an issue. For core components that require a very specific piece of hardware, it can be worth contacting the manufacturer to find out what its plans are for the product line. If it's going to be discontinued, it's very useful to know that before you make a desperate order for 50 more. In this regard, it's also worth finding peers using the same vendors and learning about their experiences working with them. Nothing beats real world experience.

2.7.2. Importing, Shipping, and Staging

If you're hosting outside the United States, then much of your hardware may need to be imported. Importing hardware is time-consuming and costly. Import tax on hardware can add significant overhead to purchases and it needs to be budgeted before ordering. Getting hardware through customs can be a big issue, often adding several days to the shipping process. There's no good rule of thumb for how much time to allow when moving hardware across borders, but a good working knowledge of the customs procedures in your country will save you a lot of time in the long run.

When you're buying from local resellers, bear in mind that they may be importing the hardware they sell to you and adjust your lead times accordingly. It's useful to establish a good relationship with your vendors to find out about their stock levels, suppliers, and ordering schedules. Many vendors may be able to deliver hardware the next day, stretching to a month if they're out of stock.

A good co-location facility will allow you to ship hardware directly to the DC and held in a staging area before being deployed. Having hardware delivered to your home or office and then transporting it to its final destination quickly becomes a waste of time and effort. As your hardware platform grows, this process will become more and more inconvenient and at a certain point unmanageable. The space needed to stage a large hardware platform and the power needed to boot the boxes can easily exceed a small office. It's worth taking the availability of staging space into account when choosing a provider.

2.7.3. Space

If your initial rollout is within a small hosting facility, physical space can become a serious issue. If your platform starts out as an octal rack space, you need to know whether you'll be able to expand to a half rack or full rack when you need to. After that point, can you get 2 racks? Can you get 30? These are questions that are crucial to ask up front before committing yourself to a provider. If you'll be able to expand, will your racks be contiguous? If not, will the facility provide cabling between the noncontiguous racks? What kind of rack mountings are provided (Telco, two-post, four-post) and will your hardware fit? It's important to check both the rack mounting and the rack depthsome racks won't fit especially long servers.

There are few things more difficult, time-consuming, and stressful than moving data centers, especially when you have a growing application and are running out of capacity. Data center moves typically require a lot of extra hardware to keep portions of your application running in both DCs while you move. This is obviously quite expensive, so anything you can do to avoid it is worth looking into.

2.7.4. Power

In conjunction with space, it's important to make sure that you will be provided with enough UPS-backed power for your expected platform growth. Server power consumption is measured in amperes and a rack typically includes 15 amps (if any). A full rack of 40 1U servers can easily use 100 amps (even more if you have a lot of fast-spinning disks), so you'll need to check into the cost of having extra power installed in your racks, the ongoing costs for extra power, and how much capacity the DC needs.

You can buy all the servers you want but when a data center runs out of power capacity, you won't be able to use them. Include power in both your general budgeting and your capacity planning for ongoing growth.

2.7.5. NOC Facilities

The services offered by different providers vary wildly, with many facilities offering a range of service levels at different price points. At the lowest level, the NOC will monitor its own infrastructure and alert you if something related to your operations starts to happen. This can include general network outages and routing problems, in addition to specific problems with your installation, such as attempted denial of service attacks.

Basic non-network-related services, such as rebooting boxes, are nearly always available. A more involved service, such as diagnosing faulty hardware and then swapping dead disks, is not always available. Where such a service is provided, it may be offered as an extra, or a fee may be included in your general hosting costs or on a per-incident basis. Some hosts will provide a certain number of free incidents a month, charging when you go over that limit. Depending on how close you or your engineering team is located to the facility, this can be less of an issue than other factors. It is still, however, worth bearing in mind when you choose a facility.

2.7.6. Connectivity

Top-tier facilities have multiple peering connections to high-speed backbone Internet links. The connections that facilities offer vary widely. The default connection into most rack space will be a 100-baseT Ethernet cable. Find out if your facility can easily bump you up to 1000-baseT when you need it or 1000-baseSX if you choose to use fiber.

If your outer network hardware supports it, getting a second backup line into your racks can be very useful to deal with failures at the DC-routing level. Extra redundant lines for administration, outside of your production connectivity, can also be desirable. Again, this sort of service varies by provider, so a little research is a good idea.