If I had to choose the SFD15 presenter with the most impressive total solution, it would be Datrium. We saw some really cool tech this week all around but Datrium showed me something that I have not seen from too many vendors lately, specifically a deep and true focus on the end user experience and value extraction from technology.
Datrium is a hyperconverged offering that fits the “newer” definition of HCI, in that the compute nodes and storage nodes scale separately. There’s been an appropriate loosening of the HCI term of late, with folks applying it based on the user experience rather than a definition specified by a single vendor in the space. Datrium takes this further in my opinion by reaching for a “whole solution” approach that attempts to provide an entire IT lifecycle experience – primary workloads, local data protection, and cloud data protection – on top of the same HCI approach most solutions only offer in the on-premises gear.
From the physical perspective, Datrium’s compute nodes are stateless, and have local media that acts very much like a cache (they call it “Primary storage” but this media doesn’t accept writes directly from the compute layer). They are able to perform some very advanced management features of this cache layer, including global dedupe and rapid location-aware data movement across nodes (I.e. when you move a workload), so I’ll compromise and call it a “super-cache”. Its main purpose is to keep required data on local flash media, so yeah, it’s a cache. A Datrium cluster can scale to 128 nodes, which is plenty for its market space since a system that size tested out at 12.3M 4k IOPS with 10 data nodes underneath.
The storage layer is scale-out and uses erasure coding, and internally leverages the Log-Structured File System approach that came out of UC Berkeley in the early 90’s. That does mean that as it starts filling up to 80%+, writes will cost more. While some other new storage solutions can boast extremely high capacity utilization rates, this is a thing we’ve had to work with for a long time with most enterprise storage solutions. In other words, not thrilled about that, but used to it.
Some techies I talk to care about the data plane architecture in a hyperconverged solution. There are solutions that place a purpose-built VM in the hypervisor that exposes the scale-out storage cluster and performs all data management options, and so the data plane runs through that VM. Datrium (for one) does NOT do that. There is a VIB that sits below the hypervisor, so that should appease those who don’t like the VM-in-data-plane model. There is global deduplication, encryption, cloning, lots of no-penalty snapshots, basically all the features that are table stakes at this point. Datrium performs these functions up on the compute nodes in the VIBs. There is also a global search across all nodes, for restore and other admin functionality. Today, the restore level is at the virtual disk/VM level. More on that later.
The user, of course, doesn’t really see or care about any of this. There is a robust GUI with a ton of telemmetry available about workload and system performance. It’s super-easy from a provisioning and ongoing management perspective.
What really caught my attention was their cloud integration. Currently they are AWS-only, and for good reason. Their approach is to create a tight coupling to the cloud being used, using the cloud-specific best practices to manage that particular implementation. So the devs at Datrium leverage Lambda and CloudWatch to create, modify, monitor, and self-heal the cloud instance of Datrium (which of course runs in EC2 against EBS and S3). It even applies the security roles to the EC2 nodes for you so that you’re not creating a specific user in AWS, which is best practice as this method auto-rotates the tokens required to allow access. It creates all the networking required for the on-prem instances to replicate/communicate with the VPC. It also creates the service endpoints for the VPC to talk to S3. They REALLY thought it through. Once up, a Lambda function is run periodically to make sure things are where they are supposed to be, and fix them if they’re not. They don’t use CloudFormation, and when asked they had really good answers why. The average mid-size enterprise user would NEVER (well, hardly ever) have the expertise to do much more than fire up some instances from AMI’s in a marketplace, and they’d still be responsible for all the networking, etc.
So I believe that Datrium has thought through not just the technology, but HOW it’s used in practice, and gives users (and partners) a deliverable best practice in HCI up front. This is the promise of HCI; the optimal combination of the leading technologies with the ease of use that allows the sub-large enterprise market to extract maximum value from them.
Datrium does have some work ahead of it; they still need to add the ability to restore single files from within virtual guest disks, and after they can do that they need to extract that data for single-record management later, perhaps archiving those records (and being able to search on them) in S3/Glacier etc. Once they provide that, they no longer need another technology partner to provide that functionality. Also, the solution doesn’t yet deal with unstructured data (outside of a VM) natively on the storage.
Some folks won’t like that they are AWS only at the moment; I understand this choice as they’re looking to provide the “whole solution” approach and leave no administrative functions to the user. Hopefully they get to Azure soon, and perhaps GCP, but the robust AWS functionality Datrium manages may overcome any AWS objections.
In sum, Datrium has approached the HCI problem from a user experience approach, rather than creating/integrating some technology, polishing a front end to make it look good, and automating important admin features. Someone there got the message that outcomes are what matters, not just the technology, and made sure that message was woven into the fundamental architecture design of the product. Kudos.