Big data is all-encompassing, and that seems to be a problem. The term has been stretched in so many ways that in covering so much, it has come to mean — some say — too little. So we’ve been hearing about “XYZ data” variants. Small data is one of them. Sure, some datasets are small in size, but the “small” qualifier isn’t only or even primarily about size. It’s a reaction to big data that, if you buy advocates’ arguments, describes a distinct species of data that you need to attend to.
Nowadays, all data — big or small — is understood via models, algorithms, and context derived from big data. Our small data systems now effortlessly scale big. Witness: Until five years ago, Microsoft Excel spreadsheets maxed out at 256 Columns and 65,536 rows. In 2010, the limit jumped to 16,384 columns by 1,048,576 rows: over 17 billion cells. And it’s easy to to go bigger, even from within Excel. It’s easy to hook this software survivor of computing’s Bronze Age, the 1980s, into external databases of arbitrary size and to pull data from the unbounded online and social Web.
So we see —
Small is a matter of choice, rather than a constraint. You don’t need special tools or techniques for small data. Conclusion: The small data category is a myth.
Regardless, do discussions of small data, myth or not, offer value? Is there a different data concept that works better? Or with an obsessive data focus, are we looking at the wrong thing? We can learn from advocates. I’ll choose just a few, and riff on their work.
Delimiting Small Data
Allen Bonde, now a marketing and innovation VP at OpenText, defines small data as both “a design philosophy” and “the technology, processes, and use cases for turning big data into alerts, apps, and dashboards for business users within corporate environments.” That latter definition reminds me of “data reduction,” a term for the sort of data analysis done a few ages ago. And of course, per Bonde, “small data” describes “the literal size of our data sets as well.”
I’m quoting from Bonde’s December 2013 guest entry in the estimable Paul Greenberg’s ZDnet column, an article titled 10 Reasons 2014 will be the Year of Small Data. (Was it?) Bonde writes, “Small data connects people with timely, meaningful insights (derived from big data and/or ‘local’ sources), organized and packaged –- often visually -– to be accessible, understandable, and actionable for everyday tasks.”
So (some) small data is a focused, topical derivation of big data. That is, small data is Mini-Me.
Other small data accumulates from local sources. Presumably, we’re talking the set of records, profiles, reference information, and content generated by an isolated business process. Each of those small datasets is meaningful in a particular context, for a particular purpose.
So small data is a big data subset or a focused data collection. Whatever its origin, small data isn’t a market category. There are no special small-data technique nor small data tools or systems. That’s a good thing, because data users need room to grow, by adding to or repurposing their data. Small data collections that have value tend not to stay small.
Encapsulating: Smart Data
Tom Anderson builds on a start-small notion in his 2013 Forget Big Data, Think Mid Data. Tom offers the guidance that you should consider cost in creating a data environment sized to maximize ROI. Tom’s mid data concept starts with small data and incrementally adds affordable elements that will pay off. Tom used another term when I interviewed him in May 2013, smart data, to capture the concept of (my words:) maximum return on data.
Return isn’t something baked into the data itself. Return on data depends on your knowledge and judgment in collecting the right data and in preparing and using it well.
This thought is captured in an essay, “Why Smart Data Is So Much More Important Than Big Data,” by Scott Fasser, director of Digital Innovation for HackerAgency. His argument? “I’ll take quality data over quantity of data any day. Understanding where the data is coming from, how it’s stored, and what it tells you will help tremendously in how you use it to narrow down to the bits that allow smarter business decisions based on the data.”
“Allow” is a key word here. Smarter business decisions aren’t guaranteed, no matter how well-described, accessible, and usable your datasets are. You can make a stupid business decision based on a smart data.
Of course, smart data can be big and big data can be smart, contrary to the implication of Scott Fasser’s essay title. I used smart in a similar way in naming my 2010 Smart Content Conference, which focused on varieties of big data that are decidedly not traditional, or small, data. That event was about enhancing the business value of content — text, images, audio, and video — via analytics including application of natural language processing to extract information, and generate rich metadata, from enterprise content and online and social media.
(I decided to focus my on-going organizing elsewhere, however. The Sentiment Analysis Symposium looks at applications of the same technology set to but targeting discovery of business value in attitudes, opinion, and emotion in diverse unstructured media and structured data. The 8th go-around will take place July 15-16, 2015 in New York.)
But data is just data — whether originating in media (text, images, audio, and video) or as structured tracking, transactional, and operational data — whether facts or feelings. And data, in itself, isn’t enough.
Extending: All Data
I’ll wrap up by quoting an insightful analysis, The Parable of Google Flu: Traps in Big Data Analysis, by academic authors David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani, writing in Science magazine. So happens I’ve quoted Harvard Univ Professor Gary King before, in my 4 Vs For Big Data Analytics: “Big Data isn’t about the data. It’s about analytics.”
King and colleagues write, in their Parable paper, “Big data offer enormous possibilities for understanding human interactions at a societal scale, with rich spatial and temporal dynamics, and for detecting complex interactions and nonlinearities among variables… Instead of focusing on a ‘big data revolution,’ perhaps it is time we were focused on an ‘all data revolution,’ where we recognize that the critical change in the world has been innovative analytics, using data from all traditional and new sources, and providing a deeper, clearer understanding of our world.”
The myth of small data is that it’s interesting beyond very limited circumstances. It isn’t. Could we please not talk about it any more?
The sense of smart data is that allows for better business decisions, although positive outcomes are not guaranteed.
The end-game is analysis that exploits all data — both producing and consuming smart data — to support decision-making and to measure outcomes and help you improve processes and create the critical, meaningful change we seek.