Built To Scale...Easier Said Than Done
Monday, May 21, 2012 posted by Dave Wright
Scalability is often marketed as a feature of a storage system.
But scale is not a checkbox feature, nor is it a single number like
capacity. Scale is a set of constraints that operate across every
metric and feature of a system. Within large cloud environments all
parts of the infrastructure are expected to operate against this
backdrop of scale. In two recent posts we touched briefly on the magnitude of the
challenges presented by scale and why EMC spent
$430 million to acquire scale. However, as a critical
consideration in any cloud infrastructure build-out, we wanted to
discuss more deeply how we solve the challenges of
scale.
As it relates to storage, two of the most critical dimensions of
scale in a cloud environment are performance and capacity. Using
traditional storage systems, optimizing for either one of these
resources almost always comes at the expense of the other. The best
visual depiction of this dilemma can be seen in this graphic.
Flash-based designs today are
IOPS rich but lack the capacity, high-availability and/or shared
characteristics required to scale to the broader demands of a large
scale cloud environment. Meanwhile, hard disk-based systems have
plenty of capacity scale but lack the IOPS needed to service the
full capacity footprint adequately. Unfortunately a storage
infrastructure containing lots of underutilized disk is
unsustainable from both a cost and management perspective.
Properly architecting for scale in a multi-tenant cloud
environment requires a system design that is able to manage the
mixed workload profile inherent to this environment. Unlike an
on-premise architecture that has a more controlled binding between
application and storage, the economics of cloud are predicated on a
shared infrastructure across many applications. Rather than
optimizing the underlying storage for a single application, a cloud
infrastructure must be able to accommodate for the unique
performance and capacity requirements of 1000's of applications.
Modern hypervisors provide this level of flexibility for compute
resources today. It is about time storage caught up.
So what are the defining characteristics of a storage system
designed to operate under the constraints of scale? Here are some
of the design objectives we have based our system around:
- Performance and capacity balance- Rather than force an sub-optimal tradeoff at the system level (i.e. performance or capacity) we instead designed an architecture with a more balanced blend of performance and capacity. Armed with our performance virtualization technology, service providers can now carve up this system to serve the unique needs of many different applications across a wide mix of performance and capacity requirements. This more granular level of provisioning is a far more efficient method for allocating storage resources relative to more traditional system-centric alternatives that force a capacity or performance decision upfront on every application.
- Incremental growth- The recurring nature of the service provider business model necessitated an incremental approach to scale. Each node added to the SolidFire cluster adds equal parts performance and capacity to the global pool. With a more balanced, and linearly scalable resource pool at its disposal, a cluster can more easily span environments both small and large. Traditional controller based architectures require a large investment up-front for redundant controllers, and while adding more disk shelves can increase capacity, in many architectures the performance benefit is limited, or a complex reconfiguration is required.
- Dynamic change- Capacity and performance allocations within the cluster needs to be dynamic and non-disruptive to account for the only two constants in the cloud; growth and change. This requirements applies both at the node and volume level. Node additions to a SolidFire cluster are done non-disruptively with data rebalanced across the newly added footprint. Performance QoS settings for individual volumes can be dynamically adjusted on real-time through the SolidFire REST-based APIs.
- Single management domain- As a storage environment scales it is critically important that the management burden does not do the same. The clustered nature of the SolidFire architecture ensures a single management domain as the cluster grows. Alternative architectures often require additional points of management for each new storage system. Even worse, scale limitations often prevent vendors from addressing such a broad range of capacity and performance requirements from within the same product family. The complexity resulting from multiple points of management across multiple product families can have crippling effects at scale. Multiple clusters can be set up in different fault domains or availability zones as required, but the key decision point about what scale to place in each domain is determined by the customer, not by the storage system.
The scale challenges of cloud environments mandated different design choices for us at SolidFire compared to a solutions intended for more traditional enterprise use cases. Delivering such a balanced pool of performance and capacity with a single management domain is unique in the storage industry today. Layering our performance virtualization technology into the architecture allows service providers to flexibly host a much broader range of application requirements from start to scale. Consequently, I would urge anyone building a scale-out cloud infrastructure to at least consider the above criteria as a starting point for any discussion around scale.
-Dave Wright, Founder & CEO

