Your Refrigerator is Full…of Big Data
I do most of the grocery shopping in my house. You’d think that would mean I get to decide things like what we’re having for dinner or which brand of raisin bran we keep in the pantry, but you’d be wrong.
Usually, a trip to the grocery store starts with my wife writing a list of things she wants me to pick up, and me forgetting to take it with me as I leave. On the good days, one of us will notice before it’s too late and my wife will use her phone to zap it over to me. On the bad days I’ll decide to wing it; causing me to remember some of it, buy some stuff we didn’t need yet, and forget some others. Yes, that means I end up going to the store twice and wasting some money on top of it.
These weekly excursions usually end in the same way – with my wife commenting that I should invent something that makes this problem go away. Then she will remind me that if I’d invented “the Cloud” when she told me to fix her problem syncing her music across all of her devices we would be rich. Usually, I tune that last part out, but this past go-round I gave it a little thought. Am I working to solve part of this problem already?
Simple problems like ‘when do I need more milk?’ are actually far more complex than they might seem at first glance. To get started, you need to know things like when you got the milk, what the expiration date is, and when it was first opened. On top of that, you need to track actual consumption and estimate expected consumption to make sure you get more milk (hopefully delivered) just in time, but always have enough on hand to make whatever dish that strikes your fancy. That last part is really hard, because it requires a lot of behavioral information and some complex analytics. Not to mention that it requires you to be able to see how seemingly unrelated events occur over time.
What do I mean by that? Time and events have a huge impact on a little thing like when you need more milk. The obvious ones are your average consumption of milk over time, and how long it’s been since you opened your current carton. But what if you bought potatoes or cocoa powder the last time you were at the store (or had your groceries delivered)? Or more complex still, what if your consumption of the white stuff just goes up when it’s cold out, or near the holidays?
Companies have been collecting some of this data for years, but it’s incomplete and difficult to decipher – making it less useful to you than it should. They know that you use more milk if you also buy potatoes (and some other, less obvious correlations – like if they give you a coupon for cocoa you’re likely to buy more milk), but they don’t know that you usually have your potatoes on Thursday night, and that you’re short on milk the following day. To do that, companies need your side of the data, as well as tools designed to leverage that data in combination with their existing sources. Oh, and maybe a bit of a technological nudge.
That’s where Hadoop and Big Data analytics tools come in. The breadth and size of the data will require Hadoop, and understanding that data will require a new generation of analytics tools – can you imagine trying to make sense of it all with your current toolset?
So you’ll start, slowly at first, by collecting your data in a single repository, Hadoop. Once you’ve got that going, you’ll point your Big Data Analytics tool of choice at the corpus and start exploring the data – looking for insights that are only obvious because you are looking at your data all together, looking at it both in the aggregate and as a series of events. This will allow you to see how complex behaviors of an individual impact other parts of the system you’re exploring.
Believe it or not, most (if not all) of the technology required to do this already exists; it simply isn’t being put to this purpose. Major retailers are already using radio frequency identification (RFID) tags to track inventory – we just need to push that technology into the home. I imagine that someday soon, your refrigerator and pantry will both have RFID or near-field communication (NFC) readers built in and connected to your home network, pushing data to your preferred subscription services.
You’ll give those services access to it for the sheer convenience of not having to worry about running out of a staple, or maybe for a small discount on your purchases. Those service vendors will use this data to analyze your purchasing and usage together, and correlate that with other sources of data like the weather or the temperature you like to keep your house, collected straight from your learning thermostat. But without Hadoop and other Big Data technologies, none of it would be possible – it’s the glue that makes just in time delivery of household items a reality.
Keith McClellan leads up Federal Engineering at Platfora, and has been focused on Big Data and related technologies for most of his career. If you’re interested in his random musings, he tweets @keithmcc and occasionally writes for the Platfora blog.