Information-235: Using decay models in a data infrastructure context

Information, like enriched uranium, doesn’t exist naturally.  It is either harnessed or created by the fusion of three elements:

  • Data
  • People
  • Processes

This is important to understand – many confuse data with information.  Data by itself is meaningless, even if that data is created by the combination of other data sources, or rolled up into summarized data sets.  The spark of information, or knowledge, occurs when a PERSON applies his/her personal analysis (process) to enable a timely decision or action. 

Once that information has been created and used, its importance and relevance to the informed immediately begins to degrade – at a rate that is variable depending on the type of information it is. 

This is not a new idea- there are more than a few academic and analytical papers that have been published that discuss models for predicting the rates of decay of information, and so I can thankfully leave the math for that to the mathematicians.   However, the context of these academic papers is data analytics, and how to measure the reliability or relevance of datasets in creating new actionable information.

I believe that this decay construct can become a most valuable tool for the infrastructure architect as well, if we extend the decay metaphor a bit further.  

Just as when you enrich a radioactive isotope, it transforms into something ELSE, when a radioactive isotope DECAYS, it decays into less radioactive ones, and sometimes even those isotopes decay further in a multi-step decay chain.   We can see that same behavior with information and its related data.

Let’s take an example of a retail transaction.  That transaction’s information is most important at the point of sale- there are hundreds of data points that gets fused together here, including credit card info/authorization, product SKUs, pricing, employee record, just think of everything you see on a receipt and multiply by x.  That information is used during that day to figure the day’s sales, that week to figure some salesperson’s commission perhaps, that month to figure out how to manage inventory.

A subset of that transaction’s information will get used in combination with other transactions to create summarized information for the month, quarter, and year.  The month and quarterly data will be discarded after a few years as well, leaving the yearly summaries.

 So in time, that information that was so important and actionable day one becomes nothing but a faint memory, yet the FULL DATA SET on that transaction is likely going to be stored, in its entirety, SOMEWHERE.   That somewhere, of course, we’ll call our Data Yucca Mountain.

What does this mean to the infrastructure architect?

If one can understand the data sets that create information, and understand the sets of information that get created from the datasets as well as “depleted” information (think data warehouses and analytics), then one should be able to construct the math to not only design the proper places for data to sit given a specific half-life, but to SIZE them correctly.

This model also gives the architect the angle to ask questions of the business users of information (and data), which will give him/her the “big picture” that allows them to align infrastructure with the true direction and operations of the business.  Too often, infrastructure is run in a ‘generic’ way, and storage tiers are built by default rather than by design. 

Building this model will take quite a bit of work, but it will go a long way towards ensuring alignment between the IT Infrastructure (or cloud) group and the business, and provide a much clearer ROI picture in the process.   

 

Information-235: Using decay models in a data infrastructure context