Read on for my annual year-past/look-ahead report, on text analytics technology and market developments, 2015 edition.
A refreshed definition: Text analytics refers to technology and business processes that apply algorithmic approaches to process and extract information from text and generate insights.
Text analytics hasn’t been around quite this long: Summary account of silver for the governor written in Sumerian Cuneiform on a clay tablet. From Shuruppak, Iraq, circa 2500 BCE. British Museum, London.
Text analytics relies on natural-language processing, based on statistical, linguistic, and/or machine-learning methods. Yet no single technical approach prevails, and no single solution provider dominates the market. There are myriad open source options and an array of via-API cloud services. Academic and industry research is thriving, realized in systems that scale up to go big, fast, and deep, and that scale out, going wide — distributed and accommodating source diversity.
Last year, 2014, was a good year for text analytics, and 2015 will be even better, with particular focus on streams, graphs, and cognitive computing and on an already extensive array of applications. As in 2014, these applications will not always be recognized as “text analytics” (or as natural language processing, for that matter). My 2014 observation remains true, “text-analytics technology is increasingly delivered embedded in applications and solutions, for customer experience, market research, investigative analysis, social listening, and many, many other business needs. These solutions do not bear the text-analytics label.”
So in this article:
- Market Perceptions
- Tech Developments
I released a market study, Text Analytics: User Perspectives on Solutions and Providers.
From Alta Plana’s 2014 text analytics market study: Do you currently extract or analyze…
The report is available for free download (registration required) via altaplana.com/TA2014, thanks to its sponsors. The key finding is More (and synonyms): Progressively more organizations are running analyses on more information sources, increasingly moving beyond the major European and Asian languages, seeking to extract more, diverse types of information, applying insights in new ways and for new purposes.
Text analytics user satisfaction continues to lag
Technical and solution innovation is constant and robust, yet user satisfaction continues to lag, particularly related to accuracy and ease of use. One more chart from my study, at right…
Text Technology Developments
In the text technology world, cloud based as-a-service (API) offerings remain big, and deep learning is, even more than last year, the alluring must-adopt method. Deep learning is alluring because it has proven effective at discerning features at multiple level in both natural language and other forms of “unstructured” content, images in particular. I touch on these topics in my March 5 IBM Watson, AlchemyAPI, and a World of Cognitive Computing, covering IBM’s acquisition (terms undisclosed) of a small but leading-edge cloud/API text and image analysis provider.
I don’t have much to say right now on the cognitive computing topic. Really the term is agglomerative: It represents an assemblage of methods and tools. (As a writer, I live for the opportunity to use words such as “agglomerative” and “assemblage.” Enjoy.) Otherwise, I’ll just observe that beyond IBM, the only significant text-analytics vendor that has embraced the term is Digital Reasoning. Still — Judith Hurwitz and associates have an interesting looking book just out on the topic, Cognitive Computing and Big Data Analytics, although I haven’t read it. Also I’ve recruited analyst and consultant Sue Feldman of Synthexis to present a cognitive-computing workshop at the 2015 Sentiment Analysis Symposium in July.
Let’s not swoon over unsupervised machine learning and discount tried-and-true methods — language rules, taxonomies, lexical and semantic networks, word stats, and supervised (trained) and non-hierarchical learning methods (e.g., for topic discovery) — in assessing market movements. I do see market evidence that software that over-relies on language engineering (rules and language resources) can be hard to maintain and adapt to new domains and information sources and languages, and difficult to keep current with rapidly emerging slang, topics, and trends. The response is two-fold:
- The situation remains that a strong majority of needs are met without reliance on as-yet-exotic methods.
- Hybrid approaches — ensemble methods — rule, and I mean hybrids that include humans in initial and on-going training process, via supervised and active learning for generation and extension of linguistic assets as well as (other) classification models.
I wrote up above that 2015 would feature a particular focus on streams and graphs. The graphs part, I’ve been saying for a while. I believe I’ve been right for a while too, including when I not-so-famously wrote “2010 is the Year of the Graph.” Fact is, graph data structures naturally model the syntax and semantics of language and, in the form of taxonomies, facilitate classification (see my eContext-sponsored paper, Text Classification Advantage Via Taxonomy). They provide for conveniently-queryable knowledge management, whether delivered via products such as Ontotext’s GraphDB or platform-captured, for instance in the Facebook Open Graph.
I did poll a few industry contacts, asking their thoughts on the state of the market and prospects for the year ahead. Ontotext CTO Marin Dimitrov was one of them. His take agrees with mine, regarding “a more prominent role for knowledge graphs.” His own company will “continue delivering solutions based on our traditional approach of merging structured and unstructured data analytics, using graph databases, and utilizing open knowledge graphs for text analytics.”
Marin also called out “stronger support for multi-lingual analytics, with support for 3+ languages being the de-facto standard across the industry.” Marin’s company is based in Bulgaria, and he observed, “In the European Union in particular, the European Commission (EC) has been strongly pushing a multi-lingual digital market agenda for several years already, and support for multiple languages (especially ‘under-represented’ European languages) is nowadays a mandatory requirement for any kind of EC research funding in the area of content analytics.”
José Carlos González, CEO of Madrid-based text analytics provider Daedalus, commented on the “‘breadth vs depth’ dilemma. The challenge of developing, marketing and selling vertical solutions for specific industries has lead some companies to focus on niche markets quite successfully.” Regarding one, functional (rather than industry-vertical) piece of the market, González believes “Voice of the Customer analytics — and in general all of the movement around customer experience — will continue being the most important driver for the text analytics market.”
One of Marin Dimitrov’s predictions was more emerging text analytics as-a-service providers, with a clearer differentiation between the different offers. Along these lines, Shahbaz Anwar, CEO of analytics provider PolyVista, sees the linking of software and professional as a differentiator. Anwar says, “We’re seeing demand for text analytics solutions — bundling business expertise with technology — delivered as a service, so that’s where PolyVista has been focusing its energy.”
Streams are kind of exciting. Analysis of “data-in-flight” has been around for years, for structured data, formerly primarily known as part of complex event process (CEP) and applied in fields such as telecomm and financial markets. Check out Julian Hyde‘s 2010 Data In Flight. For streaming (and non-streaming) text, I would call out Apache Storm and Spark. For Storm, I’ll point you to a technical-implementation study posted by Hortonworks, Natural Language Processing and Sentiment Analysis for Retailers using HDP and ITC Infotech Radar, as an example. For Spark, Pivotal published a similar and even-more-detailed study, 3 Key Capabilities Necessary for Text Analytics & Natural Language Processing in the Era of Big Data. Note all the software inter-operation going on. Long gone are the days of monolithic codebases.
But in the end, money talks, so now on to part 3 —
Follow the Money: Investments
Investment activity is a forward-looking indicator, suggesting optimism about a company’s growth potential and likely profitability and more particularly, the viability of the company’s technology and business model and the talent of its staff.
I’ll run through 2014 funding news and mergers & acquisitions activity, although first I’ll again note one 2015 acquisition in the space, IBM’s purchase of AlchemyAPI, and NetBase’s $24 million Series E round.
In 2014, chronologically:
- “Verint‘s $514m purchase of KANA opens lines beyond the call center.” I’ve linked to a 451 story; here’s Verint’s press release. Verint is a global playing in customer interaction analytics and other fields; target KANA had itself bought voice of the customer (VOC)/text analytics provider Overtone back in 2011. Big question: Has Verint (by now) replaced its OEM-licensed Clarabridge text analytics engine with the Overtone tech? (January 6, 2014)
- “HootSuite Acquires uberVU To Bolster Analytics Offering,” acquiring a social intelligence company whose tech stack includes text analytics. (January 22, 2014)
- “Confirmit Acquires Social Intelligence and Text Analytics Innovator Integrasco.” A couple of Norwegian firms get hitched: A voice of the customer (VOC)/enterprise feedback vendor and text analytics/social intelligence vendor. (January 22, 2014)
- Sentisis, which focuses on Spanish-language analysis, collected €200,000 in seed funding, per Crunchbase. Now, a year later, Sentisis has done a $1.3 million Series A round. (February 14, 2014 and March 18, 2015)
- A French social listening company: Synthesio Secures $20 Million Investment from Idinvest Partners. Here’s an interview I did with Synthesio’s text analytics lead, Pedro Cardoso, last November: “From Social Sources to Customer Value: Synthesio’s Approach.” (March 14, 2014)
- Gavagai, a Swedish start-up, pulled in 21 million knonor, which translates to about US$3.2 million. (The name “gavagai” is a sort-of inside joke and was used once before by an NLP company!) (March 24, 2014)
- Text analytics via Hadoop played a part in FICO‘s April acquisition of Karmasphere. (April 16, 2014)
- “Pegasystems buys Bangalore analytics startup MeshLabs,” reported the Times of India. (May 7, 2014)
- “Attensity Closes $90 Million in Financing.” Attensity was one of the first commercial text-analytics providers, going beyond entities to “exhaustive extraction” of relations and other information. I put out a two-part appraisal last summer, “Attensity Doubles Down: Finances and Management” and “Attensity, NLP, and ‘Data Contextualization’.” (May 14, 2014)
- “Inbenta, company from Barcelona specialized in Intelligent Customer Support software with Artificial Intelligence and Natural Language Processing, raises $2 million from Telefónica.” (May 14, 2014)
- Brandwatch pulled in $22 million in new funding. There’s a press release from the social analytics vendor, which has basic text analytics capabilities — former CTO Taras Zagibalov presented at my 2011 sentiment symposium (slides, video) — although it appears they’re a bit stale. (May 22, 2014)
- “Lexalytics Acquires Semantria To Bring Sentiment Analysis To The Masses,” reported TechCrunch. VentureBeat’s version was “Lexalytics buys Semantria, because you gotta be able to analyze text in the cloud.” VentureBeat reported, “the deal went down for less than $10 million,” but I’m guessing the reporter was unaware that Lexalytics already owned a hefty chunk of spin-off Semantria, 25% I believe. (July 14, 2014)
- “NetBase Completes $15.2 Million Round of Expansion Funding.” NetBase has strong multi-lingual text analytics and appears to be on a not-as-fast-as-they-would-hope path to an IPO: The company just took in another $24 million, in Series E funding, on March 13, 2015. Taking on more fuel before IPO take-off, I assume. (July 15, 2014)
- “Synapsify bags $850K.” Congratulations (again) to Stephen Candelmo! (July 24, 2014)
- “Innodata Announces Acquisition of MediaMiser Ltd.“, which you can learn about from the target’s perspective as well,
“Well, this happened: We’ve been acquired by Innodata!.” (July 28, 2014)
- “Digital Reasoning Raises $24 Million in Series C Round Led by Goldman Sachs & Credit Suisse Next Investors.” Cognitive computing! (October 9, 2014)
- “Maritz Research buys Allegiance, forms MaritzCX” This is an interesting take-over, by a research firm — Maritz is/was a customer of Clarabridge’s, and maybe of other text-analytics providers — of a customer-experience firm that in turn licensed Attensity’s and Clarabridge’s technology, although Clarabridge’s seemingly on a capability-limited basis. (November 5, 2014)
- “Brand a Trend, a Cloud – based Text Analytics Company based out of Heidelberg, Germany, announced a $4.5 million round of funding that it will use to push into the U.S. and the booming digital market” — that’s the SUMMICS product — following on a $600 thousand 2013 founding investment and a February 2014 $euro;800 thousand seed investment. (November 11th 2014)
- Natural language generation: “Narrative Science pulls in $10M to analyze corporate data and turn it into text-based reports.” Rival Arria did an IPO, as NLG.L, in December 2013. (November 28, 2014)
Reports and Community
Let’s finish with opportunities to learn more, starting with conferences because there is still no substitute for in-person learning and networking (not that I dislike MOOCs, videos, and tutorials.) Here’s a selection:
- Text Analytics World, March 31-April 1 in San Francisco, co-located with Predictive Analytics World.
- Text by the Bay, “a new NLP conference bringing together researchers and practitioners, using computational linguistics and text mining to build new companies through understanding and meaning.” Dates are April 24-25, in San Francisco.
- The Text Analytics Summit (a conference I chaired from its 2005 founding through 2013’s summit) will take place June 15-16 in New York, the same dates as…
- The North American instance of IIeX, Greenbook’s Insight Innovation Exchange, slated for June 15-17 in Atlanta. I’m organizing a text analytics segment; send me a note if you’d like to present.
- My own Sentiment Analysis Symposium, which includes significant text-analysis coverage, is scheduled for July 15-16 in New York, this year featuring a Workshops track in parallel with the usual Presentations track. In case you’re interested: I have videos and presentations from six of the seven other symposiums to date, from 2010 to 2014, posted for free viewing. New this year: A half-day workshop segment devoted to sentiment analysis for financial markets.
The 2014 LT-Accelerate conference in Brussels.
If you’re in Europe or fancy a trip there, attend:
On the vendor side,
Moving to non-commercial, research-focused and academic conferences… I don’t know whether the annual Text as Data conference will repeat in 2015, but I have heard from the organizers that NIST’s annual Text Analysis Conference will be scheduled for two days the week of November 16, 2015.
The 9th instance of the International Conference on Weblogs and Social Media (ICWSM) takes place May 26-29 in Oxford, UK. And the annual meeting of the Association for Computational Linguistics, an academic conference, move to Beijing this year, July 26-31.
I’ve already cited my own Text Analytics: User Perspectives on Solutions and Providers.
Butler Analytics’ Text Analytics: A Business Guide, issued in February 2014, provides a good, high-level business overview.
And I’m exploring a report/e-book project, topic (working title) “Natural Language Ecosystems: A Survey of Insight Solutions.”
If you know of other market activity, conference or resources I should include here, please let me know and I’ll consider those items for an update. In any case…
Thanks for reading!
I have mentioned many companies in this article. I consult to some of them. Some sponsored my 2014 text-analytics market study or an article or a paper. (This article is not sponsored.) Some have sponsored my conferences and will sponsor my July 2015 symposium and/or November 2015 conference. I have taken money in the last year, for one or more of these activities, from: AlchemyAPI, Clarabridge, Daedalus, Digital Reasoning, eContext, Gnip, IBM, Lexalytics, Luminoso, SAS, and Teradata. Not included here are companies that have merely bought a ticket to attend one of my conferences.
If your own company is a text analytics (or sentiment analysis, semantics/synthesis, or other data analysis and visualization) provider, or a solution provide that would like to add text analytics to your tech stack, or current or potential user, I’d welcome helping you with competitive product and market strategy on a consulting basis. Or simply follow me on Twitter at @SethGrimes or read my Breakthrough Analysis blog for technology and market updates and opinions.
Finally, I welcome the opportunity to learn about new and evolving technologies and applications, so if you’d like to discuss any of the points I’ve covered, as they relate to your own work, please do get in touch.