If you are embarking on an IoT project and you are reading this, you already have a sense that the data generated by your devices is valuable and must therefore be captured, analyzed and leveraged for your (and your customers’) benefit. This is true. You are also likely to see a conflation of the following concepts: IoT, Big Data, and Analytics. I guess that makes sense to some degree; billions of connected devices are going to generate big amounts of data, so clearly a big data solution is needed, yes? No.
Start with Small Data
What you are likely to need first and foremost is just, well, data. There isn’t really a buzzword for that, so let’s call it the opposite of Big Data. What you need first is Small Data. Think of Small Data as describing what your devices are doing at the moment and over time. If you make smart light bulbs, it is useful to know when they stop working. For instance, you would instrument the code in your devices to tell you when the LEDs burn out, which would send you a failure notification. So for each individual light bulb, you would be able to know if a product failure was within warranty, and then you could contact that customer to let them know a replacement is on the way. This is a small example of why collecting data from connected objects is important.
When does data get Big?
You may think that if you have sold a million light bulbs, tracking all that data would certainly require a “big” data solution, but that isn’t strictly true. Just collecting data about what your devices are doing, no matter how many devices you are tracking, is still in the realm of Small Data. We think of analytics in steps of complexity like this, based on what the analysis can tell you:
- What happened? (Small Data)
- Why did it happen? (Still Small Data in many cases, but crosses into Big Data)
- What is going to happen? (Big Data)
- What should I do about it? (Big Data)
Knowing that a light bulb failed is important. Knowing that a bunch of light bulbs failed, and that they were all from the same production run is even more useful, but it is still small data because the analysis is based on attributes that the light bulbs themselves were able to report.
When you look at your data over time to get at some root cause of light bulb failure that must be derived from statistical analysis of large data sets to spot recurring patterns or anomalies, that is when you cross into Big Data. In our light bulb example, looking at all your data longitudinally and identifying statistically significant failure rates during measured periods of uneven voltage (so measured because you managed to cram a volt meter into your light bulbs, perhaps?) can not only tell you the root cause of your product failures, but can also provide a predictive indicator of failure whenever electrical current is out of a tested band. You might choose to augment your product based on this information.
What about other data?
When thinking about your IoT deployment, consider your limitations. You are not likely going to put a volt meter in your light bulbs. So can you ever determine the real cause of your product failure rate, or are you just going to accept a certain breakage percentage? By incorporating external data, you may be able to arrive at a confidence level about your products that would serve the same purpose. If your analytics solution correlates your light bulb data with available power grid data, it may tell you that there is a statistically significant relationship between uneven electrical events and product failures, enabling you to conclude that the products themselves work fine except when the power to them is dirty or spikey. You could then implement a cheap surge protector in the next generation of the product and watch your product failure rate go to near zero, putting real dollars to the bottom line.
When considering your IoT Analytics needs for your project, keep in mind that the most useful solution is going to depend on what you are trying to accomplish, but that most IoT projects that are sensor-based, are going to rely more on Small Data than on Big Data. A solution that provides ample insight at that level and also deep analysis plus external data sources off-the-shelf may end up being the only solution you ever need.