A Look at the Text Analytics Industry

Text analytics is booming, an essential component of any comprehensive analytics initiative. Define text analytics an ensemble of processes and technologies that discover and communicate business insights from online, social, and enterprise text. Applications range from life sciences and military intelligence to market research, customer experience, and financial marketing. The technology is surfaced via advanced (semantic) search and navigation, data-mining and exploratory-analysis workbenches, and software tools designed for everyday business uses.

Based on my discussions with business users and solution providers, it is clear that text-analytics market growth remains strong.

I recently fielded questions, posed to me by Max Breitsprecher, who is studying Business Information Systems at business school ESCP Europe Berlin, regarding prospects for the text-analytics industry. Max is studying start-up and acquisition activity, one of my consulting sweet spots along with technology and market strategy. (My industry-related work in the coming months will include a third Text Analytics market study, a follow-on to my 2009 and 2011 studies, and there’s a very substantial text-analytics component to the Sentiment Analysis Symposium, a conference I organize. They symposium and associated workshops, slated next for March 5-6 2014 in New York, focus on “human data” including attitudes, emotion, and intent captured in text.)

Max gave me permission to post our exchange, so here it is, a Look at the Text Analytics Industry:

Max Breitsprecher, Question 1> What challenges/ risks do you see for the industry’s development? Which factors possess the potential to slowdown future industry growth?

Seth Grimes> Significant business challenges for organizations that identify as text-analytics providers is low entry-cost — noting the availability of open-source technology — and the diversity of competition from both specialized and general-purpose technology providers. The market is not completely aware of the need and desirability of domain adaptation of solutions, to specific business problems, industries, and data sources (e.g., surveys versus microblogs versus online news).

The risk for a provider that can not differentiate — that can not articulate value of software or a service beyond what’s provided by open-source tech and generic solutions — is, of course, inability to compete.

In addition, certain techniques such as sentiment analysis have been faced with questions about low accuracy and predictive usefulness. In this case, there’s the risk of inability to articulate value. These particular questions are easily answered — very briefly, the tech can deliver, but you have to make sure you’re applying it appropriately — but these sorts of questions reinforce the desirability of positioning a solution as solving a business problem rather than merely covering some technical function.

I don’t see adoption slowing, nor do I see a near or medium term risk of commoditization, which would slow industry revenue growth, given that there is much valuable information in sources that remains untapped. Growth in demand for implementation and consulting services should be substantial, especially where those services involve more general Big Data analytics implementation.

There is risk, however, for providers whose technologies are costly to maintain and extend, for instance, technologies that rely on lexicons and rulesets.

Max Breitsprecher, Q2> Are technological aspects of main text analytic software solutions an important differentiator in industry (or do main vendors possess a comparable level in technological terms)? Herewith associated, do you expect differences in R&D budgets play an important role for future vendor success?

Seth Grimes> Solutions on the market rely on a variety of approaches, mixing and matching statistical analysis with language rules and reliance on lexicons and taxonomies with machine learning. They exhibit a wide range of technological sophistication. The real market differentiator, however, is in packaging and application-focus of the underlying technology: installed versus Web service (via API); languages analyzed; information extracted from source materials; adaptation to particular business applications for customer experience, market research, pharmaceutical drug discovery, clinical medicine, intelligence, etc.

R&D counts, but ability to monetize R&D, to deliver a revenue-generating technology, application, solution, or service, is a more important industry factor.

Q3> Are there substantial switching costs for customers associated with the change from one text-analytics vendor to another?

Seth> Switching costs will vary in proportion to the degree to which the supplier’s offering is entwined with key business processes. At one extreme, we have suppliers that provide text analysis via Web services, accessed via an API. In principle, it is relatively simple to swap one API for another, meaning low switching costs. At the other extreme, text-analysis capabilities are embedded in end-user solutions. It’s costlier to switch from one business-solution provider to another, given that in a switch, users would need to learn new approaches, interfaces, and workflow associated with a different end-user solution.

Q4> How do you expect open-source solutions to affect the industry in the future? (e.g. Decrease in software revenues, foster adoption of text-analytics applications in smaller businesses, development of strong open-source community, etc.)

Open source lowers the barriers to entry for new solution providers. Open-source tools provide core services — and may be customized or extended to provide higher-order capabilities — although with some risk for those who use them, in particular that in most cases, you can’t control the project’s development.

From a market point of view: Availability of robust open-source tools allows solution providers to concentrate on higher-value, specialized, and domain-adapted functions and on business needs, rather than on “reinventing the wheel.”

Q5> What is the importance of patents and intellectual property in the field of text analytics? Is it a substantial barrier to entry for new vendors?

I can’t recall any significant patent or intellectual-property challenges associated specifically with text analytics. Perhaps that’s because fundamental approaches and techniques are the product of government and academic research that was not patented. I’m not seeing there won’t be licensing deals and lawsuits. I’m saying that there haven’t been any to date (that I can think of), and the collection of public-domain algorithms and usable technologies is large enough that any particular IP hurdle can be worked around.

Q6> What is your market share estimate of Top 10 text analytic vendors combined? Do you expect a stronger consolidation of the industry in the short/medium term?

To estimate contribution to market share, in some cases — take a typical social-media analytics dashboard —  you have to say, “Project X is 25% attributable to text analytics and 75% to other technologies.” Sysomos (Marketwired) and NetBase are social-media analytics examples, of applications that are text-analytics reliant but whose value go well beyond the included text analytics, as are Clarabridge, Kana, and Medallia in the customer-experience realm. There are examples in the e-discovery and search spaces. Consider: What’s the value of text analytics embedded in the Google, Yahoo, and Bing search and computational-advertising platforms and in products such as Apple Siri and Wolfram Alpha?

In creating market-size and market-share estimates, in the past, I’ve done this sort of value allocation, and also considered the value of in-house technology developed for own use in government and industry, for instance, at pharmaceutical companies.

The top text-analytics vendors, measured by text-analytics-derived revenue, are certainly HP Autonomy, IBM, SAP, and SAS. I’d guesstimate that they’re together responsible for something approaching half of text analytics revenue. I’d guess that the next 6 dozen or so, in the $15-40 million annual range, generate another quarter, and the long tail of dozens of other vendors generate the rest.

But the revenue they produce is likely far exceeded by the value of the text analytics in the search-applications (including advertising) platforms. For instance, Content Analyst licenses its technology for use in e-discovery and compliance and other applications, and Lexalytics‘ Salience engine is widely licensed for text information extraction, especially for sentiment analysis, delivering market value far beyond the revenues realized by those two companies.

Q7> How much does the sale of text analytics software itself and how much do associated services account in terms of revenues (average)?

I would assume that software/professional-services ratios match those typical in data-analysis software industry. Without checking, I believe that would typically be an 8 to 1 ratio.

Back in 2011, I estimated 2010 text-analytics provider revenues, for installed software and Web services and associated support, to be approaching $1 billion annually, with sustained annual growth of 25-40%. I haven’t done a systematic market-size since, but I’d say with some confidence that annual growth has been in/near that range, meaning cumulative revenues approaching $2 billion for 2014.

Q8> What are the main cost blocks of text analytics vendors?

Once again, costs likely track those of  software companies in other domains: Marketing; sales, administration, and operations; and R&D. Given the variety of business models out there, there’s no definitive answer. The picture for, say, IBM or SAP will be very different from the picture for a new entrant such as Kanjoya, Luminoso, or TheySay.

Not every company succeeds, by the way. Attensity was early to the market, but due to marginal management, questionable executive hires, and excessive R&D and operational costs — all inference on my part — the company has lost significant ground in the last couple of years. Another, smaller provider, Janya, went out of business last year although the owners transferred the company’s intellectual property into another company.

Q9> In which strategic groups would you differentiate vendors? Which of these groups do you expect to outperform the market in the future and why?

That’s not so hard. Here are 4 non-orthogonal (i.e., overlapping) groupings:

a) Software tools & Web services.

b) Search-oriented applications (e.g., search, e-discovery, online commerce, …).

c) Enterprise/departmental applications (e.g., customer experience, contact center, market research, social/media analysis, …).

d) Data-analysis applications (BI, data mining, investigative analyses).

Who will outperform the rest of the market? Obviously, earlier-stage companies have start with low revenue and can therefore outperform mature companies, if you measure revenue performance by revenue growth. If you measure according to profitability, it’s the solution-focused companies that will do best, my (b) and (c). But there’s loads of opportunity across all these groupings.

Leave a Reply