Step Away From The Spinning Media
Wednesday, March 06, 2013 posted by Dave Wright
Requirement #1 for guaranteed Quality of Service (QoS): An All-SSD Architecture
Anyone deploying either a
large public or private cloud infrastructure is faced with the same
issue: how to deal with inconsistent and unpredictable application
performance. As we
discussed earlier, overcoming this problem requires an
architecture built from the ground up to guarantee Quality
of Service (QoS) for many simultaneous applications.
The first requirement for achieving this level of performance is
moving from spinning media to an all-SSD architecture. Only an
all-SSD architecture allows you to deliver consistent latency for
every IO.
At first, this idea might seem like overkill. If you don't
actually need the performance of SSD storage, why can't you
guarantee performance using spinning disk? Or even a hybrid disk
and SSD approach?
Fundamentally, it comes down to simple physics. A spinning disk
can only serve a single IO at a time, and any seek between IOs adds
significant latency. In cloud environments where multiple
applications or virtual machines share disks, the unpredictable
queue of IO to the single head can easily result in orders of
magnitude variance in latency, from 5 ms with no contention to 50
ms or more on a busy disk.
The solutions are part of the problem
Modern storage systems attempt to overcome this fundamental
physical bottleneck in a number of ways including caching (in DRAM
and flash), tiering, and wide striping.
Caching is the easiest way to reduce contention for a spinning
disk. The hottest data is kept in large DRAM or flash-based caches,
which can offload a significant amount of IO from the disks.
Indeed, this is why large DRAM caches are standard on every modern
disk-based storage system. But while caching can certainly increase
the overall throughput of the spinning disk system, it causes
highly variable latency.
Data in DRAM or flash cache can be served in under 1 ms, while
cache misses served from disk will take 10-100 ms. That's three
orders of magnitude for an individual IO. Clearly the overall
performance for an individual application is going to be strongly
influenced by how cache-friendly it is, how large the cache is, and
how many other applications are sharing it. In a dynamic cloud
environment, that last criteria is changing constantly. All told
it's impossible to predict, much less guarantee, the performance of
any individual application in a system based on caching.
Tiering is another approach to overcome the physical limits of
spinning disk, but suffers from many of the same problems as
caching. Principally, tiered systems move "hot" and "cold" data
between different storage in an attempt to give popular
applications more performance. But as we've discussed before this
solution suffers from the same unpredictability problems as
caching.
Wide striping data for a volume across many spinning disks doesn't
solve the problem either. While this approach can help balance IO
load across the system, many more applications are now sharing each
individual disk. A backlog at any disk can cause a performance
issue, and a single noisy neighbor can ruin the party for
everyone.
All-SSD is the only way to go
All-SSD architectures have significant advantages when it comes to
being able to guarantee QoS. The lack of a moving head means
latency is consistent no matter how many applications demand IOs,
regardless of whether the IOs are sequential or random. Compared to
the single-IO bottleneck of disk, SSDs have eight to 16 channels to
serve IOs in parallel, and each IO is completed quickly. So even at
a high queue depth, the variance in latency for an individual IO is
low. All-SSD architectures often do away with DRAM caching
altogether. Modern host operating systems and databases do
extensive DRAM caching already, and the low latency of flash means
that hitting the SSD is often nearly as fast as serving from a
storage-system DRAM cache anyway. The net result in a well-designed
system is consistent latency for every IO, a strong requirement for
delivering guaranteed performance.
An all-SSD architecture is just the starting point for guaranteed
QoS, however. Even a fast flash storage system can have noisy
neighbors, degraded performance from failures, or unbalanced
performance. Stay tuned to this blog as we discuss the five other
critical architecture requirements required for guaranteed QoS, and
join us on our upcoming webinar with WHIR to learn more:
Unlocking the Secret to QoS in the Cloud: The 6
Requirements of Your Storage Architecture
Web Host Industry Review Webinar with SolidFire
Tuesday, April 2, 2:00pm EST
Register now
-Dave Wright, Founder & CEO

