Text Analytics 2014

It’s past time for my yearly status/look-ahead report, on text analytics technology and market developments. Once again, I’ll cover the solution side — technical and financial — and also the conference scene and a few ways you can learn more about today’s text-analytics world.

My market analysis may surprise you. I’ll get to that point, and the rest of my report, after a quick invitation:

(Where's that secret decoder ring when I need it?)
Where’s that secret decoder ring when I need it?

I plan to release a market study, Text Analytics: User Perspectives on Solutions and Providers, later this spring. It will be a follow-on to the studies I conducted in 2011 and 2009. I’ve held the survey open. If you are a current or prospective text-analytics user, please respond by April 16. The survey will take only 5-10 minutes to complete. I appreciate your help.

Text Analytics as a Market Category

I reported positive technology and market outlooks in each of the last few years, in 2013 and in 2012. This year is a bit different. While technology development is strong, driven by the continued explosion in online and social text volumes, I feel that the advance of text analytics as a market category has stalled. The question is not business value. The question is data focus and analysis scope. Blame big data.

“Text analytics” naturally implies work primarily or exclusively with text. Contrast with big data analytics as a category (and put aside that we’ve been seeing a backlash against the “big data” label, as variously a) vague to the point of being without referent, b) limited to Hadoop, and c) more-talked-about-than-done). The big-data concept captures, in its Variety V, the notion that we should assimilate and integration data of all relevant sources and types. I’ve been preaching integrated analytics for years. Integrated analysis, whether labeled big data analytics, social intelligence, or something else, is preferable and possible for the majority of business needs. Text analytics, so often, isn’t, and shouldn’t be, enough.

Secondly, text-analytics technology is increasingly delivered embedded in applications and solutions, for customer experience, market research, investigative analysis, social listening, and many, many other business needs. These solutions do not bear the text-analytics label.


Lexalytics CEO Jeff Catlin talks about “highly accessible text analytics that doesn’t look like text analytics – where it’s just a natural part of how you go about your day.” About his own company, Jeff says, “We’re pushing partnerships and technology in 2014 that can help drive this once daunting technology to where it’s functionally invisible, just like search.”

Basis Technology co-founder Steve Cohen has a similar perspective. Steve says, “I’m not sure that text analytics is a separable ‘thing,’ or at least enough of a thing, to stand on its own as a market. We find that we sell to a search market (which is well enough understood), a compliance market (also mature), and have a growing activity in a solutions market that you could call ‘text enabled information discovery’.”

Fiona McNeill, Text Analytics Product Marketing Manager at SAS, could have been speaking about a broad set of solution providers, and not just her own employer, when she told me, “we will continue to extend text-based processing and insights into traditional predictive analysis, forecasting, and optimization.”

According to Clarabridge co-founder and CEO Sid Banerjee, “the market has seen a lot more competition by way of historically non-text analytics vendors adding various forms of text analytics solutions to their product mix.” Sid continues, “workforce management vendors… and even social CR, and social marketing vendors [have] started adding sentiment mining and text analytics capabilities into their product mix.”

Still, whether within or across market-category boundaries, there are significant text-analytics technology, market, and community developments to report.

Text Technology Developments

Certain text-technology developments track those of other technology domains. I’ll list several:

  • Modeling advances via deep learning and, especially, unsupervised and semi-supervised methods.

AlchemyAPI CEO Elliot Turner makes the case for these technologies as follows, explaining, “deep learning can produce more robust text and vision systems that hold their accuracy when analyzing data far different from what they were trained on. Plus, unsupervised techniques make it practical to keep up with the rapid evolution of everyday language.” According to Elliot, Google has never before seen 15% queries of submitted queries, over 500 million daily, a rate unchanged in the 15 years. Elliot states, “the ability to understand today’s short, jargon-filled phrases, and keep up with tomorrow’s new words, is predicated on mastering unsupervised, deep learning approaches.”

  • Scale-out via parallelized and distributed technologies.

SAS’s Fiona McNeill says “the ease of analyzing big text data (hundreds of millions or billions of documents) has improved over the past year,” which for SAS means “extensions of high-performance text mining to new distributed architectures, like Hadoop and Cloudera.” Looking ahead, Fiona explained that SAS “will continue to extend technologies and methods for examining big text data – continuing to taking advantage of multi-core processing and distributed memory architectures for addressing even the most complex operational challenges and decisions that our customers have.”

Let’s call this direction NoHadoop, as in, Not Only Hadoop. AttensityDigital Reasoning, HPCC Systems, Pivotal Greenplum, and Teradata Aster among others are doing interesting scaling work, building (on) a variety of technologies.

  • The rise of data as-a-service and the ascendance of APIs/Web-services and cloud implementations.

We know about data providers such as Gnip, DataSift, and Xignite. José Carlos González, co-founder of Spanish semantic-analysis developer Daedalus, describes how his company’s Textalytics service “represents a new semantic/NLP API concept in the sense that it goes well beyond the basic horizontal functionality that is being offered in the market: we also offer pre-packaged, optimized functionality for several industries and applications and the possibility for the customer to tailor the system with their dictionaries and models.”

You’ll find comparable capabilities in competing Web services, each with its strengths, such as AlchemyAPI, Bitext, CoginovConveyAPI, DatumBoxSemantria.

  • Knowledge-graph data representations.

Graphs are natural structures for management and query of data linked by complex interrelationships. Digital Reasoning builds one. So does Lexalytics, a concept matrixExpert System’s Cogito technology relies on a semantic network, and check out the Dandelion work from SpazioDati.

  • In-memory processing, cloud deployment, and streaming data capabilities.

“In memory” is SAP HANA‘s middle name (figuratively of course), with text analysis an integral part of the platform, although I will admit that it isn’t yet a selling point for many other vendors. That latter situation is changing. Material that IBM has online, about InfoSphere Streams, provides a very helpful (even if vendor specific) illustration of an implementation of text analysis on data streams. SAS’s Fiona McNeill refers to “linguistic rules augmenting business and predictive scoring in real-time data streams” and “moving more and more of our capabilities to cloud architectures.” These big-company moves are just text-analytics examples of a general IT deployment trend.

Others advances are specific to text. Is there any other technology domain that relies so heavily on classification rules? In the text case, we’re talking rules that discover meaning/sense (that is, context-dependent relationships) by capturing and applying lexical chains (word nets) and syntactic patterns and structures such as taxonomies and ontologies (those knowledge graphs).

But that’s all high-concept stuff, and I’m not writing for scientists, so this bit of technology coverage will be enough for now. Now, on to the business side.

Follow the Money: Investments

Investment activity is a forward-looking indicator, suggesting optimism about companies’ growth potential and profitability.

The big 2013 (+ early 2014) funding news, in the text analytics space, was:

  • An $80 million equity investment in Clarabridge, “to further expand its global operations, power continued product innovation, grow its employee base and increase reach through marketing and strategic transactions to capitalize on escalating market demand for CEM solutions.” (Surely, a substantial portion of that funding went to buy out earlier investors.)
  • Expert System’s February 2014 IPO, with $27 million in shares sold on the AIM Italia exchange. The money raised will go in part to fuel expansion in North America. (Added April 12:) The company reported to me a pre-IPO valuation of €27 million.

And the most interesting 2013 acquisition is one that went down in the first week of 2014, interaction-analytics leader Verint’s purchase of customer-experience vendor Kana. What’s does this transaction have to do with text analytics, you ask? Part of the purchase is the Overtone listening/analysis technology that Kana acquired in 2011. (What the Kana acquisition will mean for Verint’s relationship with Clarabridge, I can’t say.)

Microsoft’s March 2013 acquisition of Netbreeze GmbH, which “combines modern methods from Natural Language Processing (NLP), data mining and semantic text analysis to support 28 different writing systems,” also fits the category, although Swiss Netbreeze wasn’t a prominent text-analytics player. Text analytics was a sideline for speech-analytics vendor Utopy, which Genesys acquired in early 2013 to create a customer-service “actionable (interaction) analytics” offering. One more, September 2013 reporting: “Intel Has Acquired Natural Language Processing Startup Indisys, Price ‘North’ Of $26M, To Build Its AI Muscle,” TechCrunch reported.

Other transactions in the space:

(While Allegiance and Networked Insights use outside text-analysis software, text capabilities are central to their offerings.)

Interestingly, market-research vendor Vision Critical divested itself of the DiscoverText software it had acquired in early 2013, selling it back, later in the year, to inventor Texifter.

In past years, I’ve reported solution-providers sales results. I’m not going to do that this year. Revenue figures are always hard to get — non-publicly traded companies guard their numbers, and most numbers available are from larger, diversified providers whose business-line revenue is hard to determine. So I’m sorry to disappoint, but I won’t be relaying 2012-2013 revenue growth figures.

Reports and Community

Let’s finish with opportunities to learn more, starting with conferences, albeit unchronologically.

The market for business-focused text-analytics conferences is not thriving.

The Text Analytics Summit is skipping the spring — I chaired the event from its 2005 founding through last year’s Boston summit but am no longer involved — but reportedly will be back in the fall, in San Francisco. The Predictive Analytics World folks haven’t announced a repeat of their own fall, east-cost Text Analytics World event, and it appears their March, 2014 event drew only a small audience of under 40 attendees. Finally, the Innovation Enterprise folks appear to have abandoned the field after running Text Analytics Innovation conferences in 2012 and 2013. Semantic Technology & Business may be your best business-conference choice, August 19-21 in San Jose.

My own Sentiment Analysis Symposium, which includes significant text-analysis coverage, has been doing well. The March 5-6, 2014 symposium in New York had our highest turn-out yet with 178 total registered. I have presentations posted and should have videos on-line some time in April. I’m planning an October, 2014 conference in San Francisco.

On the vendor side, Clarabridge Customer Connections (C3) is slated for April 28-30 in Miami, and I enjoyed attending a day of the 2014 SAS Global Forum, which took place March 23-26 near Washington DC. Text analytics and sentiment analysis and their business applications are only a small (but growing, I believe) component of the overall SAS technology and solution suite, but there was enough coverage to keep me busy. The experience would be similar at other global-vendor conferences such as SAP’s and IBM’s. Again considering text-focused vendors, Linguamatics held a spring users’ conference April 7-9 in Cambridge, UK, but really, beyond that and Clarabridge’s conference, that’s all I know about.

Moving to non-commercial, research-focused and academic conferences:

NIST’s Text Analysis Conference is slated for November 17-18, 2014, near Washington DC.

I’ll be in Paris to speak at the International conference on statistical analysis of textual data (JADT), June 3-6. JADT overlaps the 8th instance of the International Conference on Weblogs and Social Media (ICWSM), June 2-4 in Ann Arbor. A few weeks later, LT-Innovate, covering the broader set of language technologies, will take place June 18-19 24-25 in Brussels.

The annual meeting of the Association for Computational Linguistics is an academic conference to check out, June 23-25 in Baltimore.

And reports?

Worth review is LT-Innovate’s March 2013 report, Status and Potential of the European Language Technology Markets. So is the Hurwitz Victory Index Report covering text analytics, which includes a number of useful vendor assessments. (Added April 16:) TDWI’s report, How to Gain Insight from Text, written by analyst Fern Halper, is useful for those seeking text-analytics implementation ideas.

Monitor my Twitter account, @SethGrimes, for notice of release of my own Text Analytics 2014: User Perspectives on Solutions and Providers, which I should have out some time in May.

Finally, for a bit more on the same topic as this article, read my December, 2013 article, A Look at the Text Analytics Industry, and my February, 2014 Sentiment Analysis Innovation: Making Sense of the Internet of People.

Who Am I?

As you may have picked up — in case you don’t know me — I make a living in part by understanding the various facets of markets — academic, research, solution provider, and the spectrum of technology users — that include text analytics, sentiment analysis, semantics/synthesis, and other forms of data analysis and visualization. This understanding is the basis of my consulting work, and of the writing I do in my own Breakthrough Analysis blog and for a variety of publications, and of my conference organizing.

Disclosure is in order. I’ve mentioned many companies. I consult for some of them. Some are sponsoring my in-the-works text-analytics market study. Some have sponsored my conferences and will sponsor my planned fall 2014 conference. I have taken money in the last year, for one or more of these activities, from: AlchemyAPI, Clarabridge, Converseon (ConveyAPI), Daedalus, Digital Reasoning, Gnip, IBM, Lexalytics, Luminoso, SAS, Teradata Aster, and Verint. Not included here are companies that have merely bought a ticket to attend one of my conferences.

I welcome the opportunity to learn about new and evolving technologies and applications, so if you’d like to discuss any of the points I’ve covered, as they relate to your own work, please do get in touch. Thanks for reading!

One thought on “Text Analytics 2014

Leave a Reply