Monday, May 14, 2012
posted by Dave Wright
SolidFire's unique approach to scale-out all-SSD storage for
cloud environments involves different engineering challenges than
those confronted by traditional storage systems. Rather than
focusing on ASICs, buses, and RAID firmware, SolidFire is solving
difficult distributed systems problems dealing with scale, latency,
reliability, and quality of service. The magnitude of this
challenge requires us to continually add new talent with experience
in this area. We've added more than a dozen great people to the
team so far this year. One recent hire I'd like to highlight is our
new Vice President of Engineering, Dan Berg.
In addition to his skills as a leader and manager for our
engineering team, Dan has a long history of building the type of
complex distributed systems that SolidFire delivers. After a 15
year career at Sun Microsystems, which concluded as VP of Systems
Engineering and Distinguished Engineer, Dan served as CTO of Skype
in Europe. At Skype Dan helped grow the engineering team and
significantly broaden their product offerings while increasing
platform scale and stability. Following his return to Colorado from
Europe, Dan most recently ran R&D for Avaya in the US.
While a P2P VOIP platform like Skype may seem very different from
a primary storage system, it represents exactly the type of
scale-out, fault-tolerant distributed system at the core of true
cloud architectures like SolidFire. Cloud computing is changing not
just how IT is deployed, but fundamentally how the underlying
infrastructure is built.
I'm pleased to welcome Dan Berg as well as all our other recent
hires to the team. If you're excited about the work SolidFire is
doing to advance the way the world is using the cloud, I'd
encourage you to bookmark our Careers Page and check it regularly!
-Dave Wright, Founder & CEO
Thursday, May 10, 2012
posted by Dave Wright
EMC has made a big play with its announced acquisition of
XtremIO for a reported $430 million. In acquiring the all-flash
scale-out flash storage system vendor, EMC has made another
aggressive bet in an emerging growth market. When a growth
opportunity justifies making a bet, EMC is best in class at getting
it done. But to assume EMC spent $430 million to simply double down
on its investment in flash is shortsighted.
This deal is not just about flash. This deal is about scale. I
suspect EMC's early entry into the flash market was invaluable
learning experience for understanding the opportunities and
challenges posed by flash. Somewhere along the way they realized
that building flash into an architecture is one thing, but building
a true scale-out flash system is a whole different challenge. This
is not a challenge to be solved with traditional storage controller
technologies that were designed in the hard disk era.
Scale imposes an entirely different set of constraints on a system
and its underlying media. Delivering consistent performance at
scale, delivering efficiency and data reduction at scale,
automating management at scale...each of these challenges on their
own are hard enough. Solving them with a completely different media
at the base of the design requires a rethink architecturally.
The timing is interesting here. As it pertains to the flash
market, acquisitions at this stage of the game are much earlier
than the storage industry traditionally likes to place their chips.
However, the urgency with which EMC chose to strike is indicative
of the market demand for more than just bolt-on solutions backed by
go-to-market heft.
If this deal was just about flash, EMC had a number of different
options at their disposal, including staying the course with its
evolving portfolio of flash solutions while the market matured.
However, the transformative nature of flash necessitated a
different approach. Realizing these challenges EMC made a rich bet,
but one that will eventually seem small compared to the
opportunities created by scale-out flash storage.
-Dave Wright, Founder & CEO
Tuesday, May 01, 2012
posted by Jay Prassl
Last week our Founder and CEO Dave Wright attended Tech
Field Day's Solid State Storage Symposium (SSSS) in San Jose.
At the event he joined a number of other companies from across the
flash storage ecosystem for a day full of lively discussions on the
most optimal use cases, implementation types and future directions
for flash technology.
Dave kicked off the day with a presentation on SolidFire's
vision for the future of flash storage that really set the tone for
the event. My one line takeaway from his presentation was
this: "Sure flash is fast, but what good is all that performance
without control". In his talk he expands the argument to include
efficiency and scale. The net of all this is that flash is a means
to an end but without complementary innovations across quality of
service, efficiency and automation the end market is never going to
be as big as some industry analysts are predicting.
At SolidFire we believe that our technology and approach to the
market is fundamentally advancing the way the world uses the cloud.
In his SSSS presentation I think you will find that Dave paints a
clear and compelling picture for where flash is headed and what
companies like SolidFire are doing to bring this vision to life.
You can find the full presentation from the event on slideshare along with the video posted on Vimeo.
I would also encourage you to check out the panel sessions from
the event as well. You will surely find some useful insights across
a number of key trends that are shaping the future of solid
state.
Hats off to Stephen Foskett and the fantastic moderators that he
brought on board for the day. The content and discussion on
these panels are much richer than what you would find at a run of
the mill tradeshow.
- Jay Prassl - VP of Marketing
Monday, April 09, 2012
posted by Dave Cahill
OpenStack matters because choice matters. In order for markets,
and innovation within these markets to thrive, consumers must have
platform choices. Multiple platform options help to accommodate the
varying requirements, skill-sets and risk profiles of different
customers. In the cloud context, platform options help service
providers right-size cost and quality of service to the unique
needs of a subset of customers. Competition between multiple
platforms forces all the players to be better (In this context,
Citrix's recent release of CloudStack to the Apache Software
Foundation might turn out to be one of best things to ever happen
to OpenStack).
Despite the fragmentation that competition creates early on,
market forces will whittle down the number of platforms choices
over time. Technology history has taught us that platform markets
can sustain only a few dominant players. Often times this includes
a proprietary and open source alternative. The operating system
wars that started 20+ years ago are the most frequently cited
evidence of this dynamic. The fragmented and proprietary Unix
variants eventually lost out to Linux and Windows as the open
source and proprietary standards respectively. Server
virtualization has seen a similar trajectory with VMware and Xen
leading in a race that is still underway. Most recently iOS and
Android have created a competitive and rapidly evolving mobile
operating system market.
Fast forward to today and history is repeating itself in the cloud
"operating system" market. VMware's proprietary stack has become
the clear commercial leader. Meanwhile, there is an emerging group
of open source platforms vying to become the "Linux" of the cloud
data center. Only time will tell how this plays out, but OpenStack
has as good a shot as any to become this defacto standard. With the
stakes so clear the question isn't why invest in OpenStack, but
rather why wouldn't you?
Despite the magnitude of the opportunity, let's not lose sight of
the fact that it is still early days. July of this year marks only
the two year anniversary of the OpenStack effort. In just six short
months since the last release, OpenStack has made some big strides. Of course, challenges still persist, but there are more
than 150 companies and 2500+ developers working on the
problem.
Coinciding with the Essex code release last week, the OpenStack Conference
& Design Summit will be held April 16-21 in California. At
SolidFire, we have been working hard since the last summit and are
proud of our achievements over this period. We will be very
active participants throughout the week of the conference. If you
are attending, make sure to stop by our booth or come see our
panel, "OpenStack & Block Storage...Where to from here?" on
Thursday at 1 p.m. PST. We will also be hosting a party with
CloudScaling and RightScale on Monday night. Building off the
Mirantis reception earlier in the evening, make sure to come hang
out with three of the most innovative companies in the cloud
ecosystem at 111 Minna Gallery in downtown San Francisco. Details
and registration for the party are posted here.
-Dave Cahill, Director of Strategic
Alliances
Wednesday, March 21, 2012
posted by Dave Wright
At the Cloud Connect Performance Summit back in
February I presented the topic "Increasing Storage Performance in a Multi-Tenant
Cloud". The way the schedule fell out I took the stage after
Adrian Cockroft from NetFlix. Coincidentally, I borrowed a few
quotes from Adrian's prior blogging on the subject to help bring to
life the biggest roadblocks to achieving great storage performance
in a multi-tenant cloud. In my discussion I called out three key
problem areas: the capacity vs IOPS imbalance, handling
multi-tenancy, and performance consistency. My discussion centered
around the limitations of legacy solutions and how flash storage,
if leveraged correctly, can help remedy current cloud performance
woes.
Many thanks to Adrian, who continues to be a great straight man
for the biggest challenges we are tackling here at SolidFire. In a
recent Q&A with ZDNet UK's Jack Clark,
Adrian shared some perspectives that we commonly hear from cloud
service providers and their customers:
- "The thing I've been publicly asking for has been better IO in
the cloud. Obviously I want SSDs in there. We've been asking cloud
vendors to do that for a while."
- "The instances available from AWS have similar CPU, memory and
network capacity to instances available for private datacentre use,
but are currently much more limited for disk I/O."
- "The hard thing to do in the cloud is to do high-performance IO
[input-output], but that is starting to change as third-party
vendors are figuring out ways of connecting high-performance IO
externally, and we've worked around it with our [Cassandra] data
store architecture."
Probably the most interesting answer was in response to a
question around why it took Amazon so long to roll out an SSD-based
offering (referring to DynamoDB).
Cockcroft remarked:
"It's purely scale for them. For
Amazon to do something they have to do it on a scale that's really
mind-boggling. If you think about deploying an infrastructure
service with a new type of hardware - if they got it wrong, they
can't turn it back out and do it again differently. So they have to
over-engineer what they do."
The key point here is that performance (through SSDs) was only
part of the problem Amazon had to address. In fact, the bigger
challenge for them to overcome was scale. Scale is what
differentiates true clouds from small virtualized environments.
Everything has to be designed to scale, which imposes a very
different set of design considerations and constraints on an
architecture. SSD or not, you can't escape this reality. At
SolidFire scale is what we do best. There are many options for
high-performance storage these days, but only SolidFire is designed
for cloud scale. In doing so we are enabling service providers to
focus on offering a differentiated portfolio of high performance
cloud services and advancing the way we all use the cloud.
-Dave Wright, Founder
& CEO
Tuesday, February 28, 2012
posted by Dave Cahill
The current flash-based storage landscape is filled with many
vendors proposing to address different niches of the market with
their respective solutions. With flash as the common ground, some
of the more easily identifiable differentiators are in areas like
host interface, form factor, media support and data protection
schemes. The design choices for these specifications are heavily
influenced by each vendors' target workload and/or customer set. Of
course, there are strengths and weaknesses to every approach. There
are bottlenecks to be minimized or altogether avoided if possible.
If all goes according to plan a vendor's target market will play to
more of its strengths than weaknesses.
At SolidFire we have taken direct aim at solving the challenges
encountered in delivering high performance storage for large-scale
multi-tenant cloud environments. For this customer set the
objective is not about delivering massive amounts of performance to
single application at any cost. Instead, these providers are
focused on cost effectively delivering consistent performance to
thousands of applications at the same time. This use case has
shaped many of our early design choices at SolidFire. We believe
the most efficient way to achieve the right price/performance
balance at scale is through a shared storage architecture.
In the case of shared storage, regardless of how fast the storage
system can deliver I/O, there will always be the issue of network
latency. Fusion-io has eliminated the network latency issue
altogether with its server resident PCI-based designs. This design
works well for DAS topologies serving massive IOPS to extremely
performance hungry applications. However for the service provider
use case referenced above, the price/performance and availability
story of server-resident flash misses the mark.
So if network latency is unavoidable, what is the best approach?
How do you optimize the storage stack to maximize IOPS and minimize
latency to deliver consistent performance to thousands of
applications? Sparing you a buzzword infused tongue twister that
distills our approach into as few words as possible (think
"Raid-less All-SSD Scale-Out Storage System"), we have instead
outlined some of the key enabling features of our design in a more
digestible format below;
- An All-SSD system is the only way to confidently deliver
predictable performance across a large number of tenants and
applications in a large-scale cloud infrastructure. A tiered
approach may suffice in a controlled setting with a few
applications. However, the resource intensity and performance
variability encountered in larger QoS-sensitive environments make
tiering an unsustainable option.
- Scale-out can mean lots of different things. For SolidFire this
means no monolithic storage controllers. It also means a fully
distributed design with IO and capacity load evenly balanced across
every node in the cluster. At the media layer, data still has to
traverse the SAS bus, but ten drives per node are working in tandem
to deliver more than enough aggregate performance. Thinking through
alternative design choices here, it is important not to lose sight
of the fact that any latency encountered at this layer of the stack
is an order of magnitude less than what is encountered at the
network layer.
- RAID-less means exactly what you think, no RAID. More than any
controller bottlenecks, RAID is the biggest performance drag in the
storage stack. By rethinking the date protection algorithm you cure
a lot of what ails storage system performance today. At SolidFire
we have done just that, implementing a replication-based redundancy
algorithm where data is distributed throughout the cluster. The
result is significant improvements in write performance and drastic
acceleration of rebuilds from failure without performance
impact.
Sure our storage system does a heck of a lot more than these
three things. You can read all about the software innovations
embedded in our Element OS on our site. But these three
concepts we highlight above are critically important design choices
that we made early on. They are foundational components of our
architecture that make the rest of the story possible. They are
also three fairly tangible concepts to help you differentiate one
vendor from the next in the flash-based storage market. Good luck,
it's noisy out there!
-Dave Cahill, Director of Strategic
Alliances
Thursday, January 26, 2012
posted by Dave Cahill
"There comes a time when a storage company needs to define
itself by what it does for customers and not by the machinery it
uses to do so."
Chris Mellor, "How
to tell if your biz will do a Kodak", The
Register
The Register's Chris Mellor penned a great article the other day
reflecting on the continuous cycles of innovation and disruption
that have come to characterize the storage media industry. He uses
Kodak to paint the picture of an incumbent getting capsized by a
media transition. He goes on to cite other examples across tape and
optical media where incumbents failed to manage the transition to
the next generation media.
As the storage industry has transitioned through different
media types there have always been opportunistic stopgap
innovations that have bridged the gap from one generation to the
next. Virtual Tape Library (VTL) technology is a great example of
an innovation serving as a transitional bridge between the tape and
disk eras. Once applications were written with the capability to
natively interface with disk, deduplication and compression drove
down solution costs quickly making it an effective bulk storage
medium. Once financially viable, the flood gates were opened
and tape was relegated as a deep archive. Similarly, today we are
seeing flash-based caching and tiering technologies forming a
similar transitional bridge while the $/GB economics of flash fully
converge with, and eventually eclipse, disk.
So with history as a guide for how this plays out, why will
the disk to flash media transition be any different than the ones
before it? Well, I suspect this cloud thing might have something to
do with it.
In the enterprise IT sector, systems always seem to consume
features over time. At its core, the cloud is a massive
infrastructure system that when used properly is an extension of
existing IT. However, cloud infrastructures will increasingly chip
away at the incumbent IT footprint by rapidly incorporating new
innovations into its architecture. These enabling innovations allow
cloud providers to continually expand their portfolio of cloud
services. Over time the IT use cases applicable to this medium
naturally expand as applications and interfaces catch up,
performance improves and the economic value proposition can no
longer be ignored.
So what does this mean? From our perspective, the cloud adds
a third leg to the innovation sequence we have witnessed in the
past. New component level technologies will continue to enable new
architectures. But where it gets interesting is when these new
architectures drive the performance and economics to enable new
cloud services.
In storage, the media innovations that Mellor refers to, and
their related price/performance value proposition, are a powerful
enabling force behind new storage architectures. Applied to
traditional IT cost centers these architectures are interesting,
when applied to profit-driven cloud services they are game
changing. Amazon's recently announced DynamoDB
service is an early instantiation of this extended innovation
sequence where component level technologies (SSD), enable new
architectures that drive new services. Fortunately for the
end-customers, the economics of flash are only getting better from
here. Now is it up to the storage industry to innovate on top of
this medium, delivering next generation systems that can extend the
reach of cloud hosted services to an even wider range of
application workloads.
-Dave Cahill, Director of Strategic
Alliances
Tuesday, January 24, 2012
posted by Dave Wright
In our first two posts on storage tiering we talked
through the difference between capacity-centric vs.
performance-centric approaches and also exposed
some of the hidden costs of an automated
tiering implementation. Closing out this mini-series I wanted to
touch on a few other deficiencies inherent to an automated tiering
solution.
Within a storage infrastructure it is IOPS, not capacity,
that are the most expensive and limited resource. In a tiered
architecture, SSDs are inserted into the equation to try and
improve the balance between IOPS and capacity. However, while an
SSD tier may reduce performance issues for well-placed data, the
usage of this expensive tier remains inefficient. This inefficiency
stems from a lack of granularity in the data movement of a tiered
system. If a sub-LUN tiering system needs to move hot
data chunks anywhere from 32MB to 1GB, it will likely promote a lot
of cold data in the process. This overhead forces sub-optimal
utilization of the premium SSD capacity.
Another potential problem area from tiering, specifically in
a multi-tenant environment, is dealing with IO density - that is,
how IO is distributed across a range of disk space. Applications
whose IOs are concentrated within close proximity to each other (IO
dense) will gain greater benefit from sub-LUN tiering than those
whose IOs are spread more evenly over the entire logical block
address space (IO sparse). Because tiering mechanisms measure data
usage at the chunk level, an application who has more hits within a
small number of chunks is more likely to be promoted than an
application who spreads the same number of IOPS across more chunks.
From an array performance perspective this approach is reasonable,
as you get more performance within the same resource footprint.
However, in a multi-tenant setting with data distributed across
many distinct application this leads to serious problems with
fairness and performance consistency across workloads.
We originally discussed the
performance implications of tiering in July of last year. In a
multi-tenant setting this performance variability exposure is
magnified. Customers are continually exposed to the risk that the
promotion of another customer's hot data will result in the
demotion of their own. The order of magnitude
difference in latencies and IOPS between the different tiers makes
it practically impossible for a service provider to guarantee
performance to an individual application (or tenant) under these
conditions.
In recognition of the deficiencies of a tiered architecture,
SolidFire sought a better way. Our Performance Virtualization
technology decouples the tight binding between the storage
performance and capacity, resulting in a far more precise
allocation of IOPS and capacity on a volume by volume basis
regardless of issues such as IO density. Instead of best guess
efforts as to the size and tiers of media required to meet customer
performance requirements, a service provider can now dial-in IOPS
and capacity individually at the volume-level from cluster-wide
independent pools of capacity and performance. These allocations
can also be dynamically adjusted over time as application
requirements change. All things considered, Performance
Virtualization is a far more efficient way to address IOPS
scarcity, without exposing customers to the inefficiency and
unpredictable performance inherent in an automated tiering
architecture.
-Dave Wright, Founder &
CEO
Wednesday, January 18, 2012
posted by Dave Wright
Amazon launched a new service
today: DynamoDB. It's a scaleable NoSQL database service that will
run in the AWS cloud. It is akin to a hosted version of Cassandra
or MongoDB with unlimited scalability. The most notable section of
Werner Vogel's blog announcing the new service is worth
repeating:
Cloud-based systems have invented solutions to ensure
fairness and present their customers with uniform performance, so
that no burst load from any customer should adversely impact
others. This is a great approach and makes for many happy
customers, but often does not give a single customer the ability to
ask for higher throughput if they need it.
As satisfied as engineers can be with the simplicity
of cloud-based solutions, they would love to specify the request
throughput they need and let the system reconfigure itself to meet
their requirements. Without this ability, engineers often have to
carefully manage caching systems to ensure they can achieve
low-latency and predictable performance as their workloads scale.
This introduces complexity that takes away some of the simplicity
of using cloud-based solutions.
The number of applications that need this type of
performance predictability is increasing: online gaming, social
graphs applications, online advertising, and real-time analytics to
name a few. AWS customers are building increasingly sophisticated
applications that could benefit from a database that can give them
fast, predictable performance that exactly matches their
needs.
Looking under the covers a bit further here there are two
really interesting enabling components of the DynamoDB service that
deserve highlighting:
-
All-SSD- the service is
deployed using 100% SSDs to provide consistent high performance at
a very large scale. This is notable in that it is AWS' first use of
SSDs in their cloud architecture.
-
Guaranteed Throughput - The DynamoDB service
includes a concept called "Provisioned Throughout". This is
essentially a guaranteed QoS model, where a customer can purchase
reserved capacity (measured in queries per second), rather than
paying for the actual queries run. Applied to a storage service,
this would be akin to paying based on guaranteed IOPS. Currently
Amazon EBS's current pricing model is based on actual IO operations
with no guaranteed throughput or latency.
Amazon DynamoDB is a strong endorsement of several of
SolidFire's key principals. The first being that the cloud needs
Solid-State Drives (SSD) to adequately support the evolving
performance demands of multi-tenant storage. The second is the idea
that as more of these performance-sensitive applications make their
way to the cloud there is a clear requirement for guaranteed QoS
controls that can dynamically support performance requirements at a
much more granular level. Finally, and building off the first two,
is the validation that when armed with the enabling architecture to
confidently and economically deliver performance-based services,
service providers can stand-up cloud service offerings based on
committed performance.
Amazon is a great indicator on the pulse and direction of the
industry. The broader implications here for running performance
sensitive applications in a cloud environment are intriguing to
think about. Here at SolidFire, the continued innovations around
the enabling architectures required to make this a reality are what
get us really excited.
-Dave Wright, Founder &
CEO
Tuesday, January 17, 2012
posted by Dave Wright
In the initial
post of our series on tiering we covered the
merits of a proactive performance-driven approach to tiering
relative to the more traditional capacity-centric discussions.
Today we take a closer look at some of the less obvious cost
implications of "automated" tiering. On the surface, the promise of
tiering looks like an clear win - SSD performance with spinning
disk capacity and cost. However, the true economics of this type of
solution are not nearly as compelling as some vendors would lead
you to believe. Considered in the context of the unique
burdens faced by cloud service providers and the
proposed value proposition is even less appealing.
To start with, the "SSD performance" promise part of the
catchy tagline above must be caveatted by the fact that this only
proves to be the case if the data is actually residing in the SSD
tier. Easier said than done. The ability to guarantee SSD
performance in a tiered architecture requires a substantial SSD
tier and/or extremely accurate data placement algorithms.
Rightsizing the former skews the proposed economics of a tiered
solution substantially, while the latter has been long on promise
but short on delivery for at least three generations of marketing
executives. Before the industry marketed this functionality as
Automated Tiering it was known as Information Lifecycle Management
(ILM) and a few years before that it was Hierarchical Storage
Management (HSM). Regardless of what you call it, tiering has
always been impaired by the inability to accurately predict and
automate the movement of data between tiers. In the context of
cloud environments the significant scale requirements and extremely
low application-level visibility make solving this challenge even
more difficult.
It's also important to consider the flash media requirements
of a tiered solution. The write patterns in the flash layer of a
tiered architecture require a higher grade flash solution to
withstand the impact of write amplification and churn. Vendors are
forced to use the most expensive SLC flash to ensure adequate media
endurance. The cost impact even modest amounts of SLC flash destroy
the economic advantage of a tiered architecture relative to an
all-MLC design. In many examples we've seen that the
"combined" $/GB of a storage solution that incorporates SLC-flash,
15k SAS and SATA is actually higher than an all-flash MLC solution
with similar raw capacity. Importantly, this price advantage for
MLC over tiered storage is achieved before factoring in the
favorable impact of compression and deduplication for the all-flash
solution, making the flash design even more
compelling.
Tiering also hurts capacity utilization and controller
performance. In order to ensure data is in the right place at the
right time it is constantly being promoted and demoted between the
flash and disk tiers. There needs to be a certain capacity buffer
to accommodate this movement. There is also a controller processing
cost to keep up with all this activity. Most legacy systems have
limited CPU and controller memory relative to their overall
capacity, making the overhead of tiered storage processing one more
burden for them to manage. Even complex tiering requires only a
fraction of the processing power and memory needed for in-line data
reduction features like compression and dedupliction, which is why
those features are seldom found on legacy primary storage
controllers. A recent article from TechWorld
references a Forrester Research report by Andrew Reichman
(@ReichmanIT) that expands on the data management burden of a
tiered storage topology.
The issues outlined above are just a few examples of the
hidden costs embedded in an "automated" tiering solution. In some
cases these deficiencies may be acceptable in smaller IT
environments. However, in a large scale multi-tenant cloud
infrastructure the capital and management costs of these
shortcomings are magnified. The hyper-competitive nature of service
provider business model necessitates a more efficient
approach.
-Dave Wright, Founder &
CEO