AWS Backup Service – should you jump in?

I’ve heard before that if you see a title ending in a question mark, the answer is most likely going to be “no”.

Unfortunately, I’m not going to breaking any traditions here.

Background: AWS announced today that they have added support for EC2 backups now in their backup service.  The AWS backup service now has backup options for Amazon EBS volumes, Amazon Relational Database Service (RDS) databases, Amazon DynamoDB tables, Amazon Elastic File System(EFS), Amazon EC2 instances and AWS Storage Gateway volumes.

But if you’re expecting this to be like the backup/recovery solutions you’ve run on-prem, you’re in for a rude awakening.  It’s not. This has the scent of a minimally-viable product, which I’m sure they’ll build on and make compelling one day, but that day isn’t today.

It’s VERY important that you read the fine print on how the backups occur on the different services covered, and more importantly how you restore, and at what granularity you’re restoring.

First- from a fundamental architectural perspective- backups are called “recovery points”.  That’s an important distinction. We’ve seen the recovery point term co-habitate with “snapshot”. In fact, EBS “backups” are just that- snapshots.

So for EC2 and EBS-related backups (oops, “Recovery Points”), you’re simply restoring a snapshot into a NEW resource. Want to restore a single file or directory in an EC2 instance or in a filesystem on your EBS volume? Nope. All or nothing. Or, you’ll restore from the RP into a new resource, and copy the needed data back into your live instance. I’m sorry, that’s just not up to today’s expectations in backup and recovery.

What about EFS? Well, glad you asked. This is NOT a snapshot. There is ZERO CONSISTENCY in the “recovery point” for EFS, as the backup in this case doesn’t snapshot- it iterates through the files, and if you change a file DURING a backup, there is a 100% chance that your “recovery point” won’t be a “point” at all- so you could break dependencies in your data. Yet they still call the copy of this data a “restore point”. Give them props for doing incremental forever here, but most file backup solutions (when paired with enterprise NAS systems or even just Windows) know how to stun the filesystem and stream their backups from the stunned version, rather than the volatile “live” filesystem.  Also, if you want to do a partial recovery, you cannot do it in-place- it goes to a new directory located at the root of your EFS.

The BIGGEST piece missing from the AWS Backup Service is something we’ve learned to take for granted from B/R solutions: CATALOG.  You need to know what you want to restore AND where to find it in order to recover it. With EFS, this can get REALLY dicey. It’s really easy to choose the wrong data, perhaps it’s a good thing they don’t allow you to restore in place yet!

Look, I applaud AWS for paying some attention to data protection here. This does shine a light on the fact that AWS data storage architecture lends itself to many data silos that require a SPOG to manage effectively and compliantly. However, there is a (very short) list of OEM B/R and data management vendors that can do this effectively not just within AWS but across clouds, and still give you the content-aware granularity you need to execute your complex data retention and compliance strategies and keep you out of trouble.

So many organizations are rushing to the cloud, make sure that you’re paying adequate attention to your data protection and compliance as you go, you’ll find that while the cloud providers are absolutely amazing at providing a platform for application innovation and transformation, data governance, archive, and protection are not necessarily getting the same level of attention from them- it’s up to YOU to protect that data and your business.

 

 

Short thoughts on Project Nautilus (VMWare Fusion tech preview)

I’m going to be installing this puppy tonight I think. After reading the VMware blog on it here I do have concerns/questions about some of the things it brings to the table.

First, containers are going to be running in their own “PodVM” or pod, which is going to create all sorts of confusion when they bring Kubernetes to the table (as the article says they are going to do), as in K8s “pods” refer to one or a group of containers that are instantiated together and run as an application on a single host.  So in that case, a pod would be a group of containers that run in their own…pods.  I strongly suggest that the really smart folks at VMware find a different name for this construct, even though all the cool ones may already be taken. (“LiteVM”, maybe? “MiniVM”? or just “space”?)

Second- they’ve done something interesting with networking here. In Docker, if you want your container to talk to the network, you need to portmap the container to the localhost, hostport:containerport. This needs to be explicitly stated when you start your container.

With Nautilus, when you start your container, it gets automatically added to a VMnet- so out of the box you’ll get an IP on the NAT’d network so your local machine can get to the container- WITHOUT any explicit exposure/mapping of ports- everything looks like it’s open on that IP address that’s no longer the localhost.  If you add it to a bridged network, the LAN will give the container an IP via DHCP, and any listening ports will be available.   (If I’m wrong here, I’ll correct this ASAP).

Now, one of the things I REALLY like about apps being deployed on K8s is that you’re FORCED to explicitly state what ports the container will allowed to communicate on. This dramatically reduces attack surface and forces the developers and engineers to be much more aware of how their apps are using network resources. I’m sure (hoping) there will be other ways of locking down the containers that get IP’s from the VMnets, but as it looks like they won’t by default, I’m fearful that the quick and dirty way will lead to less security.

I’m looking forward to playing with this, in particular seeing how it works with things like PVCs, and other pipeline, testing, and integration toolsets.  I know it’s just the desktop version and it’s VERY new, but I have a hunch that at least some of the lessons learned are going to end up in Pacific.

[Off-Topic] Automatically set your iPad to “Do Not Disturb” when you open your Kindle (or other reader) app

One of the things I dislike about reading on my iPad is that there are so many distractions that can break your concentration, like iMessages, Facebook Messenger, Twitter notifications, etc. Now sure…you can swipe down and just tap the crescent moon and turn on Do Not Disturb.

But I forget to do that. Every time. Why can’t it just turn on DND automatically when it knows I’m reading??

Well…it can. And, it’s REALLY EASY. Here’s a step-by-step showing you how to do this using the Shortcuts app. I’ve written these instructions for beginners so if you know what you’re doing, you’ll fly through this quickly, I apologize for the very specific instructions.

1) Open up Shortcuts and click on “Automation” on the bottom center of the screen. Click on “Create Personal Automation.”

2) In the “New Automation” screen, choose “Open App” and choose Kindle (or your reader), this automation will trigger when you open the app.

3) Choose “Add Action” so we can tell Shortcuts what we want done.

4) In the search bar, type “Do Not Disturb”, and you’ll see it listed towards the bottom. Click the “Do Not Disturb” in the results.

5) At the bottom of the following screen, turn OFF the “Ask before running” so that you don’t have to acknowledge to Shortcuts that you actually want this done every time you bring up Kindle.

6) Click “Done” and that’s it! Make sure DND is OFF, launch Kindle, and you’ll get a pull-down notification saying your automation has run. Check DND, it should be on!

Note- If you want to disable DND….that’s on you. 😉 Pull down from the top right of the screen and tap the moon.

VMWare Workstation – DUP! packet issue resolved…sort of

I was getting VERY frustrated with some networking issues with my virtual guests in VMW 15.5 (and prior), on Windows 10.  See below:

dup-packets.png

If you look, you’ll see that for every ping request I’m sending to my gateway (or ANY other IP address outside the Windows host), I’m getting FOUR RESPONSES. This also manifests itself in *very* slow downloads for packages or updates I’m installing on the VM’s.  And, it’s just wrong so it needed fixing.

Note that the standard Google answer to this issue is to stop and delete the Routing and Remote Access Service.  The first time this happened, this solved the problem! There were a ton of other ‘solutions’ out there but none really understood the problem. Windows was creating some sort of packet amplification. (When I have time I’m going to reinstall pcap and dig into this).

But then….months later….

It came back.  I hadn’t re-enabled routing and remote access. I hadn’t made any networking changes inside the host or on my network.   I HAD done some other stuff, such as enabling Windows Services for Linux and installing Ubuntu for bash scripting purposes.  You know…messing around. Some of this could’ve re-written the bindings and orders of networks/protocols/services etc., but if so, it wasn’t reflected anywhere in the basic or advanced network settings. VERY frustrating!

I deleted a TON of stuff I’d installed that I no longer needed (which had to be done anyway, but I was saving that for New Years’). I re-installed the VMware bridge protocol. I repaired VMware Workstation. I REMOVED and re-installed VMware Workstation.

**Here’s what finally RE-solved the problem:

  • I RE-ENABLED RRAS (!)
  • I went into the properties of “Incoming Connections” in Network Adapter Settings and UNCHECKED IPv4, leaving IPv6 checked. (I’m not sure if this matters, try it without this step first).
  • I RE-DISABLED RRAS (!)

And…here’s the result.

non-dup-packets.png

I can only surmise that the act of STOPPING RRAS does a config of the network stack where it doesn’t amplify packets. And, you can’t stop a service unless it’s already started, right?

Makes complete sense.

NOT.

But, all’s well that ends.

VMWare Workstation 15 REST API – Control power state for multiple machines via Python

Or..How to easily power up/suspend your entire K8s cluster at once in VMWare Workstation 15

In VMWare Workstation 15, VMWare introduced the REST API, which allows all sorts of automation.  I was playing around with it, and wrote a quick Python script to fire up (or suspend) a bunch of machines that I listed in an array up top in the initialization section. In this case, I want to control the state of a 4-node Kubernetes cluster, as it was just annoying me to click on the play/suspend 4 times (I have other associated virtual machines as well, which only added to the annoyance.)

Your REST API exe (vmrest.exe) MUST be running if you’re going to try this. If you haven’t set that up yet, stop here and follow these instructions You’ll notice that Vmrest.exe normally runs as an interactive user mode application, but I’ve now set up the executable to run as a service on my Windows 10 machine using NSSM, I’ll have a separate blog entry to show how that’s done.

Some notes on the script:

  • Script Variables – ip/host:port (you need the port, as vmrest.exe gives you an ephemeral port number to hit), machine list, and authCode
  • Regarding the authCode.  WITH vmrest.exe running, go to “https://ip_of_vmw:port” to get the REST API explorer page (shown below). Click “authorization” up top, and you’ll get to log in. Use the credentials you used to set up the VMW Rest API via these instructions

Screen Shot 2019-12-24 at 3.05.04 PM.png

Then do a “Try it out!” on any GET method that doesn’t require variables and your Auth Code will appear in the Curl section in the “Authorization” header. Grab that code, you’ll use it going forward.

curl.png

Here’s the script, with relatively obvious documentation. Since more than likely your SSL for the vmrest.exe API server will use a self-signed, untrusted certificate, you’re probably going to need to ignore any SSL errors that will occur. That’s what the “InsecureRequestWarning” stuff is all about, we disable the warnings.  My understanding is that the disabled state is reset with every request made, so we need to re-disable it before every REST call.

I’ve posted this code on GitHub HERE.

#!/usr/bin/env python3
import requests
import urllib3
import sys
from urllib3.exceptions import InsecureRequestWarning

'''Variable Initiation'''

ip_addr = 'your-ip-or-hostname:Port' #change ip:port to what VMW REST API is showing
machine_list = ['k8s-master','k8s-worker1','k8s-worker2','k8s-worker3']
authCode = 'yourAuthCode'

'''Section to handle the script arg'''

acceptable_actions = ['on', 'off', 'shutdown', 'suspend', 'pause', 'unpause']

try:

    sys.argv[1]

except NameError:

        action = "on"

else:

    if sys.argv[1] in acceptable_actions:

        action = sys.argv[1]

    else:

        print("ERROR: Action must be: on, off, shutdown, suspend, pause, or unpause")

        exit()


'''Section to get the list of all VM's '''

urllib3.disable_warnings(category=InsecureRequestWarning)

resp = requests.get(url='https://' + ip_addr + '/api/vms', headers={'Accept': 'application/vnd.vmware.vmw.rest-v1+json', 'Authorization': 'Basic ' + authCode}, verify=False)

if resp.status_code != 200:

    #something fell down

    print("Status Code " + resp.status_code + ": Something bad happened")

result_json = resp.json()


'''Go through entire list and if the VM is in the machine_list, if so,
 act! '''

for todo_item in resp.json():

    current_id = todo_item['id']

    current_path = todo_item['path']

    for machine in machine_list:

        if current_path.find(machine) > -1:

        print(machine + ': ' + current_id)

        urllib3.disable_warnings(category=InsecureRequestWarning)

        current_url = 'https://' + ip_addr + '/api/vms/' + current_id + '/power'

        resp = requests.put(current_url, data=action, headers={'Content-Type': 'application/vnd.vmware.vmw.rest-v1+json', 'Accept': 'application/vnd.vmware.vmw.rest-v1+json', 'Authorization': 'Basic ' + authCode}, verify=False)

        print(resp.text)

        '''Better exception handling should be written here of course. 


**12/27/19 NOTE!** – I’ve noticed what I believe to be a bug in VMW 15.5 where if you control power state via the REST API, you lose the ability to control the VM via the built-in VMWare console in the app.  The VMs behave fine (assuming everything else is working), but for some reason the VMW app doesn’t attach the console process correctly.  If you want to follow this issue I’ve submitted to the community here.

Use Google Cloud Functions (Python) to modify a GKE cluster

I wanted to create a way to easily “turn on” and “turn off” a GKE cluster, via an HTTP link that I could bookmark and hit, even from my iPhone. With GKE, if you set your node pool size to zero, you’re not incurring any charge since Google doesn’t hit you on the master nodes. So I wanted to easily set the pool size up and down.  Sure, I could issue a “gcloud container” command, or set up Ansible to do it (which I will do since I want to automate more stuff), but I also wanted to get my feet wet with Cloud Functions and GCP API’s.

In Google Cloud Functions, you simply write your functional code in the main file (main.py), AND include the correct dependencies in the requirements.txt file (for Python).  That dependency is represented by the same name of the module you’d use in a “pip install”.  The module for managing GKE is “google-cloud-container“.

Now one of the great things about using Cloud Functions is that authorization for all API’s within your project “just happen”.  You don’t need to figure out OAuth2 or use API keys.  You just need to write the code. If you’re going to use this python code outside of Cloud Functions, you’d need to add some code for that and set an environment to point to your secret json file for the appropriate service account for your project.

Here’s sample code to change your GKE Cluster node pool size.

import google.cloud.container

def startk8s(request):
    client = google.cloud.container.ClusterManagerClient()
    projectID = '<your-project-id>' 
    zone = 'us-east1-d' """ your zone obviously """
    clusterID = '<your-cluster-name>' 
    nodePoolID = 'default-pool' """ or your pool name """
    client.set_node_pool_size(projectID, zone, clusterID, nodePoolID, 3)
    return "200"

You need to set the name of the Function you want triggered:

execute

Notice the import statement- “google.cloud.container”.  Now you can’t exactly “pip install” into a Cloud Function, it’s not your Python instance! That’s where the dependency.txt file comes in.  (There’s a version of that for node.js – package.json, since you can’t npm install either). Here’s the sample dependency.txt file:

# Function dependencies, for example:
# package>=version
google-cloud-container

Note that the package version seems to be optional.  My code works without it.

You can test the cloud function by clicking on the “testing” sub-menu.

 

Live Blog (a little late) – NetApp Insight Keynote Day 2

Sorry i’m late, I was….ermm….detained. ;). Going to live blog this from the green room backstage!

8:44a

Did I just hear “NetApp Kubernetes Services”???? Who the hell is this NetApp??

With all this automation, Anthony Lye’s group is answering the question “This Data Fabric thing sounds all wonderful, but HOW do I utilize it without a Ph.D. In Netapp and the various cloud providers??”

So the room I’m in has some cool people in it at the moment; Henri Richard, Dave Hitz, Joel Reich, Kim Weller, bunch of other really smart folks!

Wait- FREEMIUM MODEL??? Somebody actually talked to the AppDev folks.

OK time for Cloud Insights. James Holden on stage.

8:52a

Cloud Insights is GA! Again, free trial!

Lots of focus on using performance and capacity data to save money

“All that power and nothing to install.” – Anthony Lye

8:56a

Time to talk about Hybrid Cloud

Brad Anderson , SVP and GM Cloud Infrastructure Group

Hybrid Cloud Infrastructure – If it talks like a cloud, and walks like a cloud…(then it’s not a cloud because they neither walk nor talk.)

Seamless access to all the clouds and pay-as-you-grow.

“Last year it was just a promise, today hundreds of customers are enjoying the benefit of hybrid cloud computing. ”

Consultel Cloud – from Australia! Why are so many of the cool NetApp customers from down under? Dave Hitz says to me that companies in Australia are very forward leaning in regards to technology.

These guys are leveraging Netapp HCI to provide agile cloud services to their base, with great success. They “shatter customer expectations”.

100% Netapp HCI across the globe. Got common tasks done 68% faster. Using VMWare. They looked at other solutions, they already had SolidFire experience so that probably helped.

50% cost savings over former storage platform (but…weren’t they Solidfire before this? Maybe something else too?)

So Netapp has made cloud apps a TON easier – and letting them run wherever you want. This has been the dream that the marketing folks have been talking about for years, made real.

9:30a – Joel Reich and my friend Kim Weller up there to talk about the future of Hybrid Cloud.

In the future most data will be generated at the edge, processed in the cloud.

Data Pipeline – Joel Reich, a self-proclaimed “experienced manager” will use Kim’s checklist

Snapmirror from Netapp HCI to the cloud.

Octavian looking like DOC OC! He has a “mobile data center” on his BACK. Running NetApp Select! MQTT protocol to Netapp Select (for connected devices)

Netapp automating the administration for setting up a FabricPool. You don’t have to be an NCIE to do this. Nice.

FlexCache is back and it’s better! Solves a major problem for distributed read access of datasets.

Netapp Data Availability Services – now this is something a TON of users will find valuable.

9:51 – Here’s what I was waiting for – MAX DATA.

“It makes everything faster”.

Collab with Intel – Optane persistent memory.

Will change the way your datacenter looks.

11X – MongoDB accelerated 11X vs same system without it.

NO application rewrites! In the future they will make your legacy hardware faster.

In the future will work in the cloud.

Looking forward to more specifics here. Wanted to see a demo. But we’ll see it soon enough.

Live Blog: NetApp Insight Keynote Day 1

OK, starting late but this is important stuff. Going to talk about the cool stuff as it happens.

8:41a- George Kurian makes a bold statement that Istio is part of the de facto IT architecture going forward. That’s service mesh folks, where each containerized microservices app knows about all the others in the ecosystem by employing Nginx in each container rather than going through redundant centralized load balancers, and do automatic service discovery. That’s a big recognition.

8:47a – Dreamworks- “NetApp is the data authority”.

8:50a – Preview of “How to tame a dragon”. Never ceases to impress, and this tech is going to get exponentially better over the next 3-4 years.

8:52 – Got a note from an A-Teamer that there are new selectable offerings on cloud.netapp.com! Go check it out. New node types and software offerings…

8:53 Next speaker – Some future perspective from one of the leading media futurists in the world – Gerd Leanhard.

8:56 Humanity will change more in the next 20 years than in the previous 300 years. (My note- expect resistance, and more resistance as the pace quickens).

8:57 “MS Hololens may be free one day” – We all know that NOTHING is free. The form of payment just changes.

9:00 IoT/AI creates a 62 TRILLION DOLLAR economic shift. “Data is the new oil, AI is the new electricity, IoT is the new nervous system.”

9:02 2020 starts the Cognitive Systems Era. Just because Watson reads all the philosphy books doesn’t make it a philosopher. Won’t know what love feels like. “Knowledge is not the same as understanding!”

9:05 “They could buy France…they would think twice about that…” ZING.

9:06 Megashifts – a new Meta-Intelligence – for better or worse. When we cognify we will change the world. (How?) Disintermediation, Cognification, personalization, augmentation, virtualization

9:09 Tech is morally neutral until we USE it. #HellVen

AI will enable 38% profit gains by 2035. But inequality increases.

#DigitalEthics is a primary discussion now more than ever, discussed in news more than ever. Gartner #1 topic for 2019.

9:13 – China opensesame – instant credit, everyone gets a number.

9:14 Tech and data has no ethics – but societies without ethics are doomed. Cannot argue this- but purist capitalist societies do not incentivise ethics.

Who will be “mission Control for Humanity”? “Facebook gunning at Democracy”. Facebook wasn’t hacked, they’re not criminals, it was used properly but it was used unethically. FB doesn’t have lack of money.

Data Mining – Data MYning.

“Technology is exponential, Humans are NOT”.

“Don’t let your kids learn low level coding, or anything routine.”

Einstein: Imagination is more important than knowledge.

9:25 Summary from Gerd Leonhard – The future is driven by Data- defined by Humanity. Emrace technology- but don’t become it. WOW.

9:28 George back on the stage. I really love that NetApp, a company that has defined Data Fabric, and is at the core of so many data driven companies, is talking about the ethical use of that data and of technology in general. We need this industry leadership to extend this message into our policy making processes both at the state and federal level. Otherwise, we cede our policy making to the industry, who will act as (hopefully) benevolent feudal lords.

9:38 – Demo time coming….. :). Anthony Lye is on the stage.

Cloud.netapp.com – NetApp Cloud Central – not something to install, maintain, or configure- just something that exists, 7×24, optimized by workloads, accessible by callable routines.

1) discover endpoints – ActiveIQ. The fabric will tell YOU if there are opporftunities to tier and save money, capacity, etc.

Netapp “My Data Fabric”

Create Cloud Volume on all clouds from one page and API. WOW. And BACK UP those cloud volumes with snaps and a new backup service for archival. Full solution.

How does the storage get to compute? Combined Greencloud and stackpoint, control pnae to deploy K8S Istio, + trident. WOW WOW.

“CREATE FABRIC APP” – WuuuuuuuuT?

Install trusted config to your PRIVATE cloud. Create a “cloud volume” on your PRIVATE INFRASTRUCTURE…..

CLOUD INSIGHTS- SaaS, no SW to install. Access to On-Prem and all public clouds, realtime performance on both. Pay for what you consume. Small to extremely large businesses.

OK what I just saw what earth-shattering. There is a LOT of learning to do!!!

9:50 Now a customer who deals with Genome Sequencing.

“NuxiNexCode” – the internet of DNA

This guy just EXUDES scientist.

Single human genome sequencing in hours. 3 Billion Letters. Understand the millions of differences from one human to another. 40 Exabytes/yr

Expected to be the biggest datasets in the world

MArry this data with clinical and other patient and relative data.

GORdb Platform overview

Not sure if they could’ve gotten this done without Netapp Cloud Volumes functionality. By the way what other storage company is doing what Netapp is doing with on-prem, in-cloud instance, and in-cloud services? NONE. In fact, none are even CLOSE and that is astounding, there will be one storage, er data services, company standing in 10 years.

10:02a

George: Netapp #1 in US public services, Germany, and Japan, and ALL the biggest cloud providers. That’s a bold statement.

That’s a wrap until tomorrow!

#SFD15: Datrium impresses

If I had to choose the SFD15 presenter with the most impressive total solution, it would be Datrium.  We saw some really cool tech this week all around but Datrium showed me something that I have not seen from too many vendors lately, specifically a deep and true focus on the end user experience and value extraction from technology.

Datrium is a hyperconverged offering that fits the “newer” definition of HCI, in that the compute nodes and storage nodes scale separately. There’s been an appropriate loosening of the HCI term of late, with folks applying it based on the user experience rather than a definition specified by a single vendor in the space. Datrium takes this further in my opinion by reaching for a “whole solution” approach that attempts to provide an entire IT lifecycle experience – primary workloads, local data protection, and cloud data protection – on top of the same HCI approach most solutions only offer in the on-premises gear.

From the physical perspective, Datrium’s compute nodes are stateless, and have local media that acts very much like a cache (they call it “Primary storage” but this media doesn’t accept writes directly from the compute layer).  They are able to perform some very advanced management features of this cache layer, including global dedupe and rapid location-aware data movement across nodes (I.e. when you move a workload), so I’ll compromise and call it a “super-cache”. Its main purpose is to keep required data on local flash media, so yeah, it’s a cache. A Datrium cluster can scale to 128 nodes, which is plenty for its market space since a system that size tested out at 12.3M 4k IOPS with 10 data nodes underneath.

The storage layer is scale-out and uses erasure coding, and internally leverages the Log-Structured File System approach that came out of UC Berkeley in the early 90’s. That does mean that as it starts filling up to 80%+, writes will cost more. While some other new storage solutions can boast extremely high capacity utilization rates, this is a thing we’ve had to work with for a long time with most enterprise storage solutions.  In other words, not thrilled about that, but used to it.

Some techies I talk to care about the data plane architecture in a hyperconverged solution. There are solutions that place a purpose-built VM in the hypervisor that exposes the scale-out storage cluster and performs all data management options, and so the data plane runs through that VM. Datrium (for one) does NOT do that. There is a VIB that sits below the hypervisor, so that should appease those who don’t like the VM-in-data-plane model. There is global deduplication, encryption, cloning, lots of no-penalty snapshots, basically all the features that are table stakes at this point.  Datrium performs these functions up on the compute nodes in the VIBs.  There is also a global search across all nodes, for restore and other admin functionality. Today, the restore level is at the virtual disk/VM level. More on that later.

The user, of course, doesn’t really see or care about any of this. There is a robust GUI with a ton of telemmetry available about workload and system performance. It’s super-easy from a provisioning and ongoing management perspective.

What really caught my attention was their cloud integration. Currently they are AWS-only, and for good reason. Their approach is to create a tight coupling to the cloud being used, using the cloud-specific best practices to manage that particular implementation. So the devs at Datrium leverage Lambda and CloudWatch to create, modify, monitor, and self-heal the cloud instance of Datrium (which of course runs in EC2 against EBS and S3). It even applies the security roles to the EC2 nodes for you so that you’re not creating a specific user in AWS, which is best practice as this method auto-rotates the tokens required to allow access.  It creates all the networking required for the on-prem instances to replicate/communicate with the VPC. It also creates the service endpoints for the VPC to talk to S3. They REALLY thought it through. Once up, a Lambda function is run periodically to make sure things are where they are supposed to be, and fix them if they’re not. They don’t use CloudFormation, and when asked they had really good answers why. The average mid-size enterprise user would NEVER (well, hardly ever) have the expertise to do much more than fire up some instances from AMI’s in a marketplace, and they’d still be responsible for all the networking, etc.

So I believe that Datrium has thought through not just the technology, but HOW it’s used in practice, and gives users (and partners) a deliverable best practice in HCI up front. This is the promise of HCI; the optimal combination of the leading technologies with the ease of use that allows the sub-large enterprise market to extract maximum value from them.

Datrium does have some work ahead of it; they still need to add the ability to restore single files from within virtual guest disks, and after they can do that they need to extract that data for single-record management later, perhaps archiving those records (and being able to search on them) in S3/Glacier etc.  Once they provide that, they no longer need another technology partner to provide that functionality. Also, the solution doesn’t yet deal with unstructured data (outside of a VM) natively on the storage.

Some folks won’t like that they are AWS only at the moment; I understand this choice as they’re looking to provide the “whole solution” approach and leave no administrative functions to the user. Hopefully they get to Azure soon, and perhaps GCP, but the robust AWS functionality Datrium manages may overcome any AWS objections.

In sum, Datrium has approached the HCI problem from a user experience approach, rather than creating/integrating some technology, polishing a front end to make it look good, and automating important admin features. Someone there got the message that outcomes are what matters, not just the technology, and made sure that message was woven into the fundamental architecture design of the product. Kudos.

 

 

 

 

Commvault Go 2017

I’m currently at the Commvault Go keynote, and have just heard from the only man who has trekked to both the north AND south poles. Tough task for sure.  The best slide he put up was from the end of his one year journey, when he returned to base only to discover that his ride home was 7/8 submerged.  Talk about needing a backup plan!

After that inspiring speech (which connected to the event by talking about respecting DATA and FACTS in the context of climate change), CEO Bob Hammer took to the stage to discuss the major themes of the event.

First and most immediately impactful to the industry is the release of Commvault’s HyperScale platform, which runs on commodity hardware and signals the beginning of the end of legacy 3-tier Enterprise backup architecture.  Backed by RedHat GlusterFS, Commvault has created a back-end storage platform upon which they can layer a tuned version of their media agent/GRIDStore technology (which creates scalable, balanced,  and fault tolerant pools of data movers), all towards the purpose of providing a linearly scalable home for secondary data copies.

Notable is that CV has chosen to give customers a choice to use CV’s own hardware (offered as a subscription of HW and SW!) or run it on their own hardware from a number of verified hardware companies that span all the usual suspects (HPE, Cisco, Dell, etc).

More notable is that Cisco has aggressively gotten behind this product with their ScaleProtect offering, which is the CV HyperScale on their UCS platform, sold 100% through their channel.  I’ve spoken with 3 different Cisco sales reps in different regions and they are all planning on hitting their territories hard with this offering.

Hammer also talked about the pending release of new Analytics offerings, talking about using AI and Deep Learning to glean actionable information out of secondary data sets for the purposes of properly classifying, retaining, and/or deleting data as well as helping to achieve the ever-more-difficult objective of compliance.

More to come from this event- but I certainly look forward to seeing Commvault’s flag flying on the South Pole!