Text Analytics in 2013

In last year’s Text Analytics in 2012, I invited you to check with me again in a year to see how 2012 panned out. Thirteen months later, I can tell you now, that 2012 was a good year, for text analytics adopters and solution providers alike.

But I’m not going to cover business users in this article — I’ll save them for a planned 2013 iteration of the market study, Text/Content Analytics: User Perspectives on Solutions and Providers, that I conducted in 2011 and 2009. Instead, I’ll cover the solution side — technical and financial — and also the vibrant conference scene and a few ways you can learn more about today’s text-analytics world.

Text Technology Developments

I asked a number of solution-provider thought leaders, from Bitext, Clarabridge, Luminoso, Pingar, SAP, and SAS — small and large companies, all but start-up Luminoso with an international presence — about 2012 significant text-analytics development and what we should we look for in 2013. Let’s start big…

One SAP 2012 accomplishment was “pushing down Natural Language Processing (NLP) as map-reduced batch jobs into Hadoop clusters for economical parallel processing and native text analysis within an in-memory database platform,” Anthony Waite, SAP senior text analysis product manager, told me. On the NLP front, Anthony mentioned specifically named entity recognition (names of persons, companies, places, etc.) and fact extraction, the identification of relationships among entities. A 2013 aim is even faster NLP and categorization and a deepening of entity extraction across a broader set of languages for greatly improve performance for big unstructured text data in real-time. SAP will “leverage text analysis semantic markup to add a new dimension to text mining capabilities within HANA,” SAP’s in-memory data management and analysis system, Anthony said.

According to Kathy Lange, a leader in SAS‘s business-analytics practice, “in 2012, SAS put a great deal of effort into analysis of ‘big data'” via in-memory High Performance Text Mining, and also released an active-learning capability — machine learning complemented by human domain expertise — and  new native-language support, for Farsi, Hindi, and Ukrainian. And for 2013, “SAS intends to increase emphasis on the use of text data to improve predictive analysis.” SAS will continue to promote visual analytics and extension of text analytics for business needs in fraud and warranty analysis, healthcare, customer insight, and other applications.

Luminoso is at the other end of the size spectrum, a start-up founded by Catherine Havasi, a research scientist in artificial intelligence and computational linguistics at the MIT Media Lab. Catherine says the big 2012 development was the search for depth. “Brands are starting to move beyond simply wondering whether their social media traffic is positive or negative and into figuring out what their customers are expressing across all kinds of channels and how they can learn from it.” In 2013, look for more languages. Luminoso’s take agrees with SAP’s and SAS’s, and Catherine adds, “we might start to see genuine solutions for analyzing text in multiple languages that the relevant analysts don’t necessarily speak. We all know translation is a poor answer and better ones are on the way.” (Catherine will be speaking on Multi and Cross-lingual, Concept-based Sentiment Analysis at the May 8 Sentiment Analysis Symposium in New York.) And Luminoso’s big goal for 2013? “We’re expecting to make the analytics process easier, clearer, faster, and bigger, with everything from data integrations and new visualizations to automated insight generation.”

Two other respondents, Alyona Medleyan from Pingar and Antonio Valderrábanos, CEO of Spain-based Bitext, cited more languages as a 2013 direction, making five out of six. I’m glad that the market has started to recognize the limitations of polarity-based sentiment analysis, that is, imposing positive/negative/neutral pigeonholes, so I’m glad that Antonio offers as a 2013 prediction, “Emotion analysis becomes more prevalent; the backlash against sentiment slows down as better quality solutions become more mainstream; and niche applications of text analytics e.g. ‘buying intent detection,’ become more standardized.” These latter comments echo Catherine Havasi’s.

Anthony Waite of SAP wasn’t my only respondent who mentioned “real time.” Sid Banerjee, CEO of customer experience + text analytics leader Clarabridge, says that “streaming analytics, big data, and real time capabilities were among the most significant text analytics technology developments” in 2012, that “text analytics developed into holistic and intelligent enterprise-wide solutions that allow businesses to operationalize and integrate customer feedback insights into business processes.” Clarabridge Collaborate and Clarabridge Engage were significant contributors according to Sid, allowing for “direct collaboration and communication in real time between internal business stakeholders, as well as directly between companies and their customers,” complemented by analysis capabilities that allow users to “compare products, competitors, brands, regions, stores, timeframes, or any other segmentation of data” in support of more informed business decisions.

Alyona Medelyan, Pingar‘s chief research officer, cited performance improvements and new capabilities as on-going Pingar focus points along with “packaging this technology in a way that any lay person can use it and has easy-to-understand tools for customizing text analytics methods to suit their needs.” Alyona also said that she found it significant that vendors (including Pingar) are working on “generating custom taxonomies from documents,” to improve navigation of document sets, facetted search, metadata extraction, and applications such as sentiment analysis. If you’d like an explanation how all this would work, you might check out the slides from a presentation by Alyona’s Pingar colleague Anna Divoli, How Taxonomies and Facets Bring End Users Closer to Big Data.

I’ll relay one last point, raised by Bitext CEO Antonio Valderrábanos. Antonio cited as one of 2012’s most significant text-analytics development, “Emergence of marketplaces for connecting text analytics to business data e.g. QlikMarket launch, expansion of Salesforce Insights ecosystem. In other words, movement towards different providers for text analytics, as opposed to one single partner.” I couldn’t agree more, about the existence and significance of this trend.

Market Results

Solution providers results were mixed in 2012.

Janya, which focused on government markets, went out of business, as did Evri, an semantics-powered news portal and of course HP’s 2011 Autonomy acquisition turned sour with a welter of 2012 accusations of improper accounting and a should-have-known realization that the semanticized IDOL platform isn’t the be-all-and-end-all it was imagined to be.

On the plus side, a number of market leaders announced excellent 2012 results. Attensity increased annual revenue by over 30 percent compared to 2011 (and also appointed a new, sales-focused CEO, Kirsten Bay). Clarabridge announced 60 percent sales growth in 2011. Lexalytics’ CEO Jeff Catlin says 2012 was a bit slow in revenue growth but ended incredibly strong, about 20% overall — nothing to sneeze at — and “2013 looks amazing, guessing about 50% given the numbers we’re seeing already.” And according to SAS Media Relations rep Steve Polilli, his company’s saw 10% revenue growth in the search and discovery software category, which covers text analytics, sentiment analysis, categorization, and ontology, outpacing SAS’s 5.4% 2012 overall revenue growth. Not bad growth for a company with $2.87 billion in revenue (in 2012).

2012 acquisitions were moderate in scale. Survey-research platform vendor Vision Critical bought Texifter’s DiscoverText text-analytics technology, and Eptica, “a global provider of multichannel customer interaction software” (a.k.a. CRM), bought French text-analytics provider Lingway. Contrast with customer-experience leader Medallia, which has made a very significant investment in enriching its own text-analytics technology.

These developments affirm a something Clarabridge CEO Sid Banerjee said. According to Sid, in 2012 “the customer experience and voice of the customer market became truly obsessed with analytics. Traditional vendors, such as CRM, social, or workflow vendors, all realized the need to integrate some form of text analytics into their solutions.”

One more acquisition: Lexmark acquired ISYS Search; they’ve renamed the offering Perceptive Search. Except for a few social-analytics acquisitions, those are all that come to mind.

We have also seen continual emergence of new solution providers. In the last year or two, I’ve become aware of Converseon (ConveyAPI), Decooda, Etuma, Fido LabsGavagai, KanjoyaLuminoso, MeshLabsMetavana, PolecatThey SayThrive Metrics, and Content Savvy, which inherited Janya’s intellectual property. Of course, I don’t mean to slight the many established text-analytics companies that continue to thrive; I just don’t have the energy to list them all.

Conferences, Reports, and Community

Let’s finish with opportunities to learn more, with conferences and reports.

I’ll be involved with the Text Analytics Summit once again. I’ve chaired every Boston summit since the series founding in 2005; this year’s is slated for June 5-6. Folks with more of a stats bent should check out a rival conference, Text Analytics World, to be held April 17-19 in association with the Predictive Analytics World conference.

If your interest is more application focused, check out a conference I organize, the Sentiment Analysis Symposium, May 8 in New York, preceded by a May 7 Research & Innovation session and a Practical Sentiment Analysis tutorial, this time taught by text-analytics pioneer Prof. Ronen Feldman. For folks with a research bent, there’s the 7th go-around of the International Conference on Weblogs and Social Media (ICWSM), July 8-10 in Boston, and if you’re in or can swing a trip to Europe, check out LT-Innovate, covering the broader set of language technologies, planned for June 26-27 in Brussels.

I really like (certain) vendor conferences. I’ll likely once again attend Clarabridge Customer Connections (C3), April 17-19 in San Diego, and the Lexalytics’ user-group conference on May 9 in New York, the day after the Sentiment Analysis Symposium, although I don’t have enough of a business case to justify a trip to the 2013 SAS Global Forum, April 28-May 1 in San Francisco.

And reports? The big news is that Gartner seems finally to be devoting attention to text analytics. It used to be that the only folks covering the field, other than myself, were Sue Feldman and colleagues at IDC, Leslie Owens at Forrester, and Fern Halper at Hurwitz and Associates. Check out Gartner’s Who’s Who in Text Analytics… although be careful using it. It applies a cookie-cutter approach to covering widely disparate set of products and a number of the capability appraisals are incorrect. For instance, Gartner’s write-up is simply wrong about coverage of “multiple languages” by AlchemyAPI, Expert System, Lexalytics, and Pingar. Gartner includes Salesforce.com as a text analytics company even though Salesforce and Verint, compelling as they are, do not have their own text analytics technology, and certain other inclusions smack of pay-for-play.

Judging from samples, I’m guessing that you will find Hurwitz and Associates’ Victory Index for Text Analytics more useful. I didn’t buy a copy, but I did read a couple of reviews that are freely available, of products from Provalis Research and SAS.

And I’m seeing an increasing volume of interesting stuff posted to the social Web. Check out Chris Phipps blog, The Lousy Linguist. SAS folks post good stuff at their company’s Text Frontier blog, and of course I’ll continue to cover text analytics and sentiment analysis in my Breakthrough Analysis blog. Also check out the Text Analytics group on LinkedIn, now well over 10,000 in membership.

In sum, 2012 was a good year for text analytics, and a couple of months in, 2013 is shaping up nicely as well.

