Category: text analytics

Announcing: LT-Accelerate — business value in text, speech & social data — Brussels, 23-24 November

LT-Accelerate is a unique event, the only European conference that focuses on business value in text, speech, and social data, taking place this year November 23-24 in Brussels.

LT-Accelerate participants represent brands and agencies, research, consultancies, and solution providers. The conference is designed for learning and sharing, networking, and deal-making.

Please join us! Visit for information and to benefit from the Super Early registration discount through September 15.

LT-Accelerate 2014LT-Accelerate speakers will present on data analysis technologies and their application for leading-edge customer service and support, market research, social engagement, media and publishing, and policy. Speaker organizations include:- Research & insights firms Ipsos, TNS, DigitalMR, and Confirmit

– Media & publishing companies Belga News Agency, Wolters Kluwer, and Acceso

– Technology leaders Cisco Systems, Xerox, and Yahoo Research

– Global services firm Deloitte

– Innovative solution providers econob GmbH, Eptica, Gavagai, Ontotext, and Semalytix

LT-Accelerate is an international conference produced by LT-Innovate, the forum for Europe’s language technology industry, and my U.S. consultancy, Alta Plana Corporation. Participating speakers hail from Austria, Belgium, Bulgaria, France, Germany, Ireland, Portugal, Spain, Sweden, the UK, and the United States. Speakers will present in English.

Program information and registration are available online at Please join us 23-24 November in Brussels!

P.S. We have program space for a few additional brand/agency speakers, and we welcome solution provider exhibitors/sponsors. You or your organization? Contact us!

Where Are The Text Analytics Unicorns?

Customer-strategy maven Paul Greenberg made a thought-provoking remark to me back in 2013. Paul was puzzled —

Why haven’t there been any billion-dollar text analytics startups?

Text analytics is a term for software and business processes that apply natural language processing (NLP) to extract business insights from social, online, and enterprise text sources. The context: Paul and I were in a taxi to the airport following the 2013 Clarabridge Customer Connections conference.

Customer experience leaders outpace laggards in key performance categories, according to a 2014 Harvard Business Review study.

Customer experience leaders outpace laggards in key performance categories, according to a 2014 Harvard Business Review study.

Clarabridge is a text-analytics provider that specializes in customer experience management (CEM). CEM is an extremely beneficial approach to measuring and optimizing business-customer interactions, if you accept research such as Harvard Business Review’s 2014 study, Lessons from the Leading Edge of Customer Experience Management. Witness the outperform stats reported in tables such as the one to the right. Authorities including “CX Transformist” Bruce Temkin will tell you that CEM is a must-do and that text analytics is essential to CEM (or should that be CXM?) done right. So will Clarabridge and rivals that include Attensity, InMoment, MaritzCX, Medallia, NetBase, newBrandAnalytics, NICESAS, Synthesio, and Verint. Each has text analytics capabilities, whether the company’s own or licensed from a third-party provider. Their text analytics extracts brands, product/service, and feature mentions and attributes, as well as customer sentiment, from social postings, survey responses, online reviews, and other “voice of the customer” sources. (A plug: For the latest on sentiment technologies and solutions, join me at my Sentiment Analysis Symposium conference, taking place July 15-16 in New York.)

So why haven’t we seen any software companies — text analytics providers, or companies whose solutions or services are text-analytics reliant — started since 2003 and valued at $1 billion or more?

… continued in VentureBeat.

Six Intelligence/Data Trends, as Seen by the U.S. Former Top Spy

Gen. Michael Hayden, former CIA and NSA director, keynoted this year’s Basis Technology Human Language Technology Conference. Basis develops natural language processing software that is applied for search, text analytics for a broad set of industries, and investigations. That a text technology provider would recruit an intelligence leader speaker is no mystery: Automated text understanding, and insight synthesis across diverse sources, is an essential capability in a big data world. And Hayden’s interest? He now works as a principal at the Chertoff Group, an advisory consultancy that, like all firms of the type (including mine, in data analysis technologies) focuses on understanding and interpreting trends and shaping reactions and on maintaining visibility by communicating its worldview.

Gen. Michael Hayden

Gen. Michael Hayden keynotes Basis Technology’s Human Language Technology Conference

Data, insights, and applications were key points in Hayden’s talk. (I’m live-blogging from there now.)

I’ll provide a quick synopsis of six key trend points with a bit of interpretation. The points are Hayden’s — applying to intelligence — and the interpretation is generally mine, offered given broad applicability that I see to a spectrum of information-driven industries. Quotations are as accurate as possible but they’re not guaranteed verbatim.

Emergent points, per Michael Hayden:

1) The paradox of volume versus scarcity. Data is plentiful. Information, insights, are not.

2) State versus non-state players. A truism here: In the old order, adversaries (and assets?) were (primarily) larger, coherent entities. Today, we live and operate, I’d say, in a new world disorder.

3) Classified versus unclassified. Hayden’s point: Intelligence is no longer (primarily) about secrets, about clandestine arts. Open source (information, not software) is ascendant. Hayden channels an intelligence analyst who might ask, “How do I create wisdom with information that need not be stolen?”

4) Strategic versus specific. “Our energy is now focuses on targeting — targeted data collection and direct action.” Techniques and technologies now focus on disambiguation, that is, to create clarity.

5) Humans versus machines. Hayden does not foresee a day (soon?) when a “carbon-based machine” will not be calling the shots, informed by the work of machines.

6) The division of labor between public and private, between “blue and green.” “There’s a lot of true intelligence work going on in the private sector,” Hayden said. And difficulties are “dwarfed by the advantage that the American computing industry gives us.”

Of course, there’s more, or there would be were Hayden free to talk about certain other trend points he alluded to. Interpreting: Further, the dynamics of the intelligence world can not be satisfyingly reduced to bullet trend points, whether the quantity is a half dozen or some other number. The same is true for any information-driven industry. Yet data reduction is essential, whether you’re dealing with big data or with decision making from a set of over-lapping and potentially conflicting signals. All forms of authoritative guidance are welcome.

The Myth of Small Data, the Sense of Smart Data, Analytics for All Data

Big data is all-encompassing, and that seems to be a problem. The term has been stretched in so many ways that in covering so much, it has come to mean — some say — too little. So we’ve been hearing about “XYZ data” variants. Small data is one of them. Sure, some datasets are small in size, but the “small” qualifier isn’t only or even primarily about size. It’s a reaction to big data that, if you buy advocates’ arguments, describes a distinct species of data that you need to attend to.

I disagree.

Nowadays, all data — big or small — is understood via models, algorithms, and context derived from big data. Our small data systems now effortlessly scale big. Witness: Until five years ago, Microsoft Excel spreadsheets maxed out at 256 Columns and 65,536 rows. In 2010, the limit jumped to 16,384 columns by 1,048,576 rows: over 17 billion cells. And it’s easy to to go bigger, even from within Excel. It’s easy to hook this software survivor of computing’s Bronze Age, the 1980s, into external databases of arbitrary size and to pull data from the unbounded online and social Web.

So we see —

Small is a matter of choice, rather than a constraint. You don’t need special tools or techniques for small data. Conclusion: The small data category is a myth.

Regardless, do discussions of small data, myth or not, offer value? Is there a different data concept that works better? Or with an obsessive data focus, are we looking at the wrong thing? We can learn from advocates. I’ll choose just a few, and riff on their work.

Delimiting Small Data

Allen Bonde, now a marketing and innovation VP at OpenText, defines small data as both “a design philosophy” and “the technology, processes, and use cases for turning big data into alerts, apps, and dashboards for business users within corporate environments.” That latter definition reminds me of “data reduction,” a term for the sort of data analysis done a few ages ago. And of course, per Bonde, “small data” describes “the literal size of our data sets as well.”

I’m quoting from Bonde’s December 2013 guest entry in the estimable Paul Greenberg’s ZDnet column, an article titled 10 Reasons 2014 will be the Year of Small Data. (Was it?) Bonde writes, “Small data connects people with timely, meaningful insights (derived from big data and/or ‘local’ sources), organized and packaged –- often visually -– to be accessible, understandable, and actionable for everyday tasks.”

Small data: Mini-Me

Small data: Mini-Me

So (some) small data is a focused, topical derivation of big data. That is, small data is Mini-Me.

Other small data accumulates from local sources. Presumably, we’re talking the set of records, profiles, reference information, and content generated by an isolated business process. Each of those small datasets is meaningful in a particular context, for a particular purpose.

So small data is a big data subset or a focused data collection. Whatever its origin, small data isn’t a market category. There are no special small-data technique nor small data tools or systems. That’s a good thing, because data users need room to grow, by adding to or repurposing their data. Small data collections that have value tend not to stay small.

Encapsulating: Smart Data

Tom Anderson builds on a start-small notion in his 2013 Forget Big Data, Think Mid Data. Tom offers the guidance that you should consider cost in creating a data environment sized to maximize ROI. Tom’s mid data concept starts with small data and incrementally adds affordable elements that will pay off. Tom used another term when I interviewed him in May 2013, smart data, to capture the concept of (my words:) maximum return on data.

Return isn’t something baked into the data itself. Return on data depends on your knowledge and judgment in collecting the right data and in preparing and using it well.

This thought is captured in an essay, “Why Smart Data Is So Much More Important Than Big Data,” by Scott Fasser, director of Digital Innovation for HackerAgency. His argument? “I’ll take quality data over quantity of data any day. Understanding where the data is coming from, how it’s stored, and what it tells you will help tremendously in how you use it to narrow down to the bits that allow smarter business decisions based on the data.”

“Allow” is a key word here. Smarter business decisions aren’t guaranteed, no matter how well-described, accessible, and usable your datasets are. You can make a stupid business decision based on a smart data.

Of course, smart data can be big and big data can be smart, contrary to the implication of Scott Fasser’s essay title. I used smart in a similar way in naming my 2010 Smart Content Conference, which focused on varieties of big data that are decidedly not traditional, or small, data. That event was about enhancing the business value of content — text, images, audio, and video — via analytics including application of natural language processing to extract information, and generate rich metadata, from enterprise content and online and social media.

(I decided to focus my on-going organizing elsewhere, however. The Sentiment Analysis Symposium looks at applications of the same technology set to but targeting discovery of business value in attitudes, opinion, and emotion in diverse unstructured media and structured data. The 8th go-around will take place July 15-16, 2015 in New York.)

But data is just data — whether originating in media (text, images, audio, and video) or as structured tracking, transactional, and operational data — whether facts or feelings. And data, in itself, isn’t enough.

Extending: All Data

I’ll wrap up by quoting an insightful analysis, The Parable of Google Flu: Traps in Big Data Analysis, by academic authors David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani, writing in Science magazine. So happens I’ve quoted Harvard Univ Professor Gary King before, in my 4 Vs For Big Data Analytics: “Big Data isn’t about the data. It’s about analytics.”

King and colleagues write, in their Parable paper, “Big data offer enormous possibilities for understanding human interactions at a societal scale, with rich spatial and temporal dynamics, and for detecting complex interactions and nonlinearities among variables… Instead of focusing on a ‘big data revolution,’ perhaps it is time we were focused on an ‘all data revolution,’ where we recognize that the critical change in the world has been innovative analytics, using data from all traditional and new sources, and providing a deeper, clearer understanding of our world.”

The myth of small data is that it’s interesting beyond very limited circumstances. It isn’t. Could we please not talk about it any more?

The sense of smart data is that allows for better business decisions, although positive outcomes are not guaranteed.

The end-game is analysis that exploits all data — both producing and consuming smart data — to support decision-making and to measure outcomes and help you improve processes and create the critical, meaningful change we seek.

Digital Reasoning Goes Cognitive: CEO Tim Estes on Text, Knowledge, and Technology

Cognitive is a next computing paradigm, responding to demand for always-on, hyper-aware data technologies that scale from device form to the enterprise.

Cognitive computing is an approach rather than a specific capability. Cognitive mimics human perception, synthesis, and reasoning capabilities by applying human-like machine-learning methods to discern, assess, and exploit patterns in everyday data. It’s a natural for automating text, speech, and image processing and dynamic human-machine interactions.

IBM is big on cognitive. The company’s recent AlchemyAPI acquisition is only the latest of many moves in the space. This particular acquisition adds market-proven text and image processing, backed by deep learning, a form of machine learning that resolves features at varying scales, to the IBM Watson technology stack. But IBM is by no means the only company applying machine learning to natural language understanding, and it’s not the only company operating under the cognitive computing banner.

Digital Reasoning founder and CEO Tim Estes

Digital Reasoning Founder and CEO Tim Estes

Digital Reasoning is an innovator in natural language processing and, more broadly, in cognitive computing. The company’s tag line:

We build software that understands human communication — in many languages, across many domains, and at enormous scale. We help people see the world more clearly so they can make a positive difference for humanity.

Tim Estes founded Digital Reasoning in 2000, focusing first on military/intelligence applications and, in recent years, on financial markets and clinical medicine. Insight in these domains requires synthesis of facts from disparate sources. Context is key.

The company sees its capabilities mix as providing a distinctive interpretive edge in a complex world, as will become clear as you read Tim’s responses in an interview I conducted in March, to provide material for my recent Text Analytics 2015 state-of-the-industry article. Digital Reasoning has, in the past, identified as a text analytics company. Maybe not so much any more.

Call the interview —

Digital Reasoning Goes Cognitive: CEO Tim Estes on Text, Knowledge, and Technology

Seth Grimes: Let’s start with a field that Digital Reasoning has long supported, text analytics. What was new and interesting in 2014?

Tim Estes: Text analytics is dead, long live the knowledge graph.

Seth: Interesting statement, both parts. How’s text analytics dead?

Tim: I say this partially in jest: Text analytics has never been needed more. The fact is, the process of turning text into structured data is now commoditized and fragmented. As a component business, it’s no longer interesting, with the exits of Attensity and Inxight, and the lack of pure plays.

I don’t think the folks at Attensity are aware they’ve exited, but in any case, what’s the unmet need and how is it being met, via knowledge graphs and other technologies?

What is replacing text analytics is a platform need, the peer of the relational database, to go from human signals and language into a knowledge graph. The question leading enterprises are asking, especially financial institutions, is how do we go from the unstructured data on our big data infrastructure to a knowledge representation that can supply the apps I need? That’s true for enterprises, whether [they’ve implemented] an on-premise model (running on a Hadoop stacks required by large banks or companies, with internal notes and knowledge) or a cloud model with an API.

You’re starting to get a mature set of services, where you can put data in the cloud, and get back certain other meta data. But they’re all incomplete solutions because they try to annotate data, creating data on more data — and a human can’t use that. A human needs prioritized knowledge and information — information that’s linked by context across everything that occurs. So unless that data can be turned into a system of knowledge, the data is of limited utility, and all the hard work is left back on the client.

Building a system of knowledge isn’t easy!

The government tried this approach, spending billions of dollars across various projects doing it that way, and got very little to show for it. We feel we’re the Darwinian outcome of billions of dollars of government IT projects.

Now companies are choosing between having their own knowledge graphs or whether to trust a third-party knowledge graph provider, like Facebook or Google. Apple has no knowledge graph so it doesn’t offer a real solution because you can’t process your data with it so it is behind the market leaders. Amazon has the biggest platform but they also have no knowledge graph and no ability to process your data in as a service, so it also has a huge hole. Microsoft has the tech and is moving ahead quickly but the leader is Google, with Facebook is a fast follower.

That’s them. What about us, the folks who are going to read this interview?

On the enterprise side, with on-premise systems, there are very few good options to go from text to a knowledge graph. Not just tagging and flagging. And NLP (natural language processing) is not enough. NLP is a prerequisite.

You have to get to the hard problem of connecting data, lifting out what’s important. You want to get data today and ask questions tomorrow, and get the answers fast. You want to move beyond getting information about the patterns you had in the NLP today that detected what passed through it. That involves static lessons-learned, baked into code and models. The other provides a growing vibrant base of knowledge that can be leveraged as human creativity desires.

So an evolution from static to dynamic, from baseline NLP to…

I think we’ll look back at 2014, and say, “That was an amazing year because 2014 was when text analytics became commoditized at a certain level, and you had to do much more to become valuable to the enterprise. We saw a distinct move from text analytics to cognitive computing.” It’s like selling tires versus selling cars.

Part-way solutions to something more complete?

It’s not that people don’t expect to pay for text analytics. It’s just that there are plenty of open source options that provide mediocre answers for cheap. But the mediocre solutions won’t do the hard stuff like find deal language in emails, much less find deal language among millions of emails among tens of billions of relationships, that can be queried in real time on demand and ranked by relevance and then supplied in a push fashion to an interface. The latter is a solution that provides a knowledge graph while the former is a tool. And there’s no longer much business in supplying tools. We’ve seen competitors, who don’t have this solution capability, look to fill gaps by using open source tools, and that shows us that text analytics is seen as a commodity. As an analogy, the transistor is commoditized but the integrated circuit is not. Cognitive computing is analogous to the integrated circuit.

What should we expect from industry in 2015?

Data accessibility. Value via applications. Getting smart via analytics.

The enterprise data hub is interactive, and is more than a place to store data. What we’ve seen in the next wave of IT, especially for the enterprise, is how important it is to make data easily accessible for analytic processing.

But data access alone doesn’t begin to deliver value. What’s going on now, going back to mid-2013, is that companies haven’t been realizing the value in their big data. Over the next year, you’re going to see the emergence of really interesting applications that get at value. Given that a lot of that data is human language, unstructured data, there’s going to be various applications that use it.

You’re going to have siloed applications. Go after a use case and build analytic processing for it or start dashboarding human language to track popularity, positive or negative sentiment — things that are relatively easy to track. You’re going to have more of these applications designed to help organizations because they need software that can understand X about human language so they can tell Y to the end user. What businesses need are applications built backwards from the users’ needs.

But something’s missing. Picture a sandwich. Infrastructure and the software that computes and stores information are the bottom slice and workflow tools and process management are the top slice. What’s missing is the meat — the brains. Right now, there’s a problem for global enterprises: You have different analytics inside every tool. You end up with lots of different data warehouses that can’t talk to each other, silo upon silo upon silo — and none of them can learn from another. If you have a middle layer, one that is essentially unified, you have use cases that can get smarter because they can learn from the shared data.

You mentioned unstructured data in passing…

We will see more ready utilization of unstructured data inside applications. But there will be very few good options for a platform that can turn text into knowledge this year. They will be inhibited by two factors: 1) The rules or the models are static and are hard to change. 2) The ontology of the data and how much energy it takes to fit your data into the ontology. Static processing and mapping to ontologies.

Those problems are both alleviated by cognitive computing. Our variety builds the model from the data — there’s no ontology. That said, if you have one, you can apply it to our approach and technology as structured data.

So that’s one element of what you’re doing at Digital Reasoning, modeling direct from data, ontology optional. What else?

With us, we’re able to expose more varieties of global relationships from the data. We aim for it to be simple to teach the computer something new. Any user — with a press of a button and a few examples — can teach the system to start detecting new patterns. That should be pretty disruptive. And we expect to move the needle in ways that people might not expect to bring high quality out of language processing — near human-level processing of text into people, places, things and relationships. We expect our cloud offering to become much more mature.

Any other remarks, concerning text analytics?

Microsoft and Google are duking it out. It’s an interesting race, with Microsoft making significant investments that are paying off. Their business model is to create productivity enhancements that make you want to keep paying them for their software.They have the largest investment in technology so it will be interesting to see what they come up with. Google is, of course, more consumer oriented. Their business model is about getting people to change their minds. Fundamentally different business models, with one leaning towards exploitation and the other leading to more productivity, and analytics is the new productivity.

And think prioritizing and algorithms that work for us —

You might read 100 emails a day but you can’t really think about 100 emails in a day — and that puts enormous stress on our ability to prioritize anything. The counterbalance to being overwhelmed by all this technology — emails, texts, Facebook, Twitter, LinkedIn, apps, etc. — available everywhere (on your phone, at work, at home, in your car or on a plane) — is to have technology help us prioritize because there is no more time. Analytics can help you address those emails. We’re being pushed around by algorithms to connect people on Facebook but we’re not able to savor or develop friendships. There’s a lack of control and quality because we’re overwhelmed and don’t have enough time to concentrate.

That’s the problem statement. Now, it’s about time that algorithms work for us, push the data around for us. There’s a big change in front of us.

I agree! While change is a constant, the combination of opportunity, talent, technology, and need are moving us faster than ever.

Thanks, Tim, for the interview.

Disclosure: Digital Reasoning was one of eight sponsors of my study and report, Text Analytics 2014: User Perspectives on Solutions and Providers. While Digital Reasoning’s John Liu will be speaking, on Financial Markets and Trading Strategies, at the July 2015 Sentiment Analysis Symposium in New York, that is not a paid opportunity.

For more on cognitive computing: Judith Hurwitz, Marcia Kaufman, and Adrian Bowles have a book just out, Cognitive Computing and Big Data Analytics, and I have arranged for consultant Sue Feldman of Synthexis to present a Cognitive Computing workshop at the July Sentiment Analysis Symposium.

Roads to Text Analytics Commercialization: Q&A with José Carlos González, Daedalus

Commercial text analytics worldwide is dominated by US, UK, and Canadian companies, despite the presence of many exceptional academic and research centers in Europe and Asia. Correlate market successes not only with English-language capabilities, but also with minimal government interference in business development. I’m referring to two sorts of interference. The tech sector in eurozone countries is often over-reliant on governmental research funding, and it is hampered by business and employment rules that discourage investment and growth. Where these  inhibitors are less a factor — for text analytics, notably in Singapore, Scandinavia, and Israel — commercialized text analytics thrives.

José Carlos González, professor at the Universidad Politécnica de Madrid and DAEDALUS founder

José Carlos González, professor at the Universidad Politécnica de Madrid and DAEDALUS founder

Eurozone entrepreneurs — such as Spain’s DAEDALUS — aim similarly to grow via a commercial-markets focus and by bridging quickly, even while still small, to the Anglophone market, particularly to the US. (The euro’s current weakness supports this latter choice.)

This point emerges from a quick Q&A I recently did with José Carlos González. José founded DAEDALUS in 1998 as a spin-out of work at the Universidad Politécnica de Madrid, where he is a professor, and other academic research. I interviewed him for last year’s Text Analytics 2014 story. This year’s Q&A, below, was in support of my  Text Analytics 2015 report on technology and market developments

My interview with José Carlos González —

Q1) What was new and interesting, for your company and industry as a whole, and for the market, in 2014?

Through the course of  2014, we have seen a burst of interest in text analytics solutions from very different industries. Niche opportunities have appeared everywhere, giving birth to a cohort of new players (startups) with integration abilities working on top of ad-hoc or general-purpose (open or inexpensive) text analytics tools.

Consolidated players, which have been delivering text analytics solutions for years (lately in form of APIs), face the “breadth vs depth” dilemma. The challenge of developing, marketing and selling vertical solutions for specific industries has lead some companies to focus on niche markets quite successfully.

Q2) And technology angles? What approaches have advanced and what has been over-hyped? 

The capability of companies to adapt general purpose semantic models to a particular industry or company in a fast and inexpensive way has been essential in 2014, to speed up the adoption of text analytics solutions.

Regarding deep learning or general purpose artificial intelligence approaches, they show slow progress beyond the research arena.

Q3) What should we expect from your company and from the industry in 2015?

Voice of the Customer (VoC) analytics — and in general, all the movement around customer experience — will continue being the most important driver for the text analytics market.

The challenge for the years to come will consist in providing high-value, actionable insights to our clients. These insights should be integrated with CRM systems to be treated along with structured information, in order to fully exploit the value of data about clients in the hands of companies. Privacy concerns and the difficulties to link social identities with real persons or companies, will be still a barrier for more exploitable results.

Q4) Any other remarks, concerning text analytics?

Regarding the European scene, the situation in 2015 is worse than ever. The Digital Single Market, one of the 10 priorities of the new European Commission, seems a kind of chimera — wished for but elusive — for companies providing digital products or services.

The new Value Added Tax (VAT) regulation, in force since January 2015, compels companies to charge VAT in the country of the buyer instead of the seller, to obtain different evidences about customer nationality and to store a large amount of data for years. These regulations, intended to prevent internet giants from avoiding paying VAT, is in fact going to make complying with VAT so difficult that the only way to sell e-products will be to sell via large platforms. Thus, small European digital companies are suffering an additional burden and higher business expenses, while the monopoly of US online platforms is reinforced. The road to hell is paved with good intentions!

I thank José for this interview and will close with a discloser and an invitation: DAEDALUS, in the guise of the company’s MeaningCloud on-demand Web service (API), is a sponsor of the 2015 Sentiment Analysis Symposium, which I organize, taking place July 15-16, 2015 in New York. If you’re concerned with the business value of opinion, emotion, and intent, in social and enterprise text, join us at the symposium!

And finally, an extra: Video of DAEDALUS’s Antonio Matarranz, presenting on Voice of the Customer in the Financial Services Industry at the 2014 Sentiment Analysis Symposium.

Text Analytics 2015

Read on for my annual year-past/look-ahead report, on text analytics technology and market developments, 2015 edition.

A refreshed definition: Text analytics refers to technology and business processes that apply algorithmic approaches to process and extract information from text and generate insights.

Text analytics hasn't been around quite this long: Summary account of silver for the governor written in Sumerian Cuneiform on a clay tablet. From Shuruppak, Iraq, circa 2500 BCE. British Museum, London.

Text analytics hasn’t been around quite this long: Summary account of silver for the governor written in Sumerian Cuneiform on a clay tablet. From Shuruppak, Iraq, circa 2500 BCE. British Museum, London.

Text analytics relies on natural-language processing, based on statistical, linguistic, and/or machine-learning methods. Yet no single technical approach prevails, and no single solution provider dominates the market. There are myriad open source options and an array of via-API cloud services. Academic and industry research is thriving, realized in systems that scale up to go big, fast, and deep, and that scale out, going wide — distributed and accommodating source diversity.

Last year, 2014, was a good year for text analytics, and 2015 will be even better, with particular focus on streams, graphs, and cognitive computing and on an already extensive array of applications. As in 2014, these applications will not always be recognized as “text analytics” (or as natural language processing, for that matter). My 2014 observation remains true, “text-analytics technology is increasingly delivered embedded in applications and solutions, for customer experience, market research, investigative analysis, social listening, and many, many other business needs. These solutions do not bear the text-analytics label.”

So in this article:

  • Market Perceptions
  • Tech Developments
  • Investments
  • Community

Market Perceptions

I released a market study, Text Analytics: User Perspectives on Solutions and Providers.

From Alta Plana's 2014 market study: Do you currently extract or analyze...

From Alta Plana’s 2014 text analytics market study: Do you currently extract or analyze…

The report is available for free download (registration required) via, thanks to its sponsors. The key finding is More (and synonyms): Progressively more organizations are running analyses on more information sources, increasingly moving beyond the major European and Asian languages, seeking to extract more, diverse types of information, applying insights in new ways and for new purposes.

Text analytics user satisfaction continues to lag

Technical and solution innovation is constant and robust, yet user satisfaction continues to lag, particularly related to accuracy and ease of use. One more chart from my study, at right…

Text Technology Developments

In the text technology world, cloud based as-a-service (API) offerings remain big, and deep learning is, even more than last year, the alluring must-adopt method. Deep learning is alluring because it has proven effective at discerning features at multiple level in both natural language and other forms of “unstructured” content, images in particular. I touch on these topics in my March 5 IBM Watson, AlchemyAPI, and a World of Cognitive Computing, covering IBM’s acquisition (terms undisclosed) of a small but leading-edge cloud/API text and image analysis provider.

I don’t have much to say right now on the cognitive computing topic. Really the term is agglomerative: It represents an assemblage of methods and tools. (As a writer, I live for the opportunity to use words such as “agglomerative” and “assemblage.” Enjoy.) Otherwise, I’ll just observe that beyond IBM, the only significant text-analytics vendor that has embraced the term is Digital Reasoning. Still — Judith Hurwitz and associates have an interesting looking book just out on the topic, Cognitive Computing and Big Data Analytics, although I haven’t read it. Also I’ve recruited analyst and consultant Sue Feldman of Synthexis to present a cognitive-computing workshop at the 2015 Sentiment Analysis Symposium in July.

Let’s not swoon over unsupervised machine learning and discount tried-and-true methods — language rules, taxonomies, lexical and semantic networks, word stats, and supervised (trained) and non-hierarchical learning methods (e.g., for topic discovery) — in assessing market movements. I do see market evidence that software that over-relies on language engineering (rules and language resources) can be hard to maintain and adapt to new domains and information sources and languages, and difficult to keep current with rapidly emerging slang, topics, and trends. The response is two-fold:

  • The situation remains that a strong majority of needs are met without reliance on as-yet-exotic methods.
  • Hybrid approaches — ensemble methods — rule, and I mean hybrids that include humans in initial and on-going training process, via supervised and active learning for generation and extension of linguistic assets as well as (other) classification models.

I wrote up above that 2015 would feature a particular focus on streams and graphs. The graphs part, I’ve been saying for a while. I believe I’ve been right for a while too, including when I not-so-famously wrote “2010 is the Year of the Graph.” Fact is, graph data structures naturally model the syntax and semantics of language and, in the form of taxonomies, facilitate classification (see my eContext-sponsored paper, Text Classification Advantage Via Taxonomy). They provide for conveniently-queryable knowledge management, whether delivered via products such as Ontotext’s GraphDB or platform-captured, for instance in the Facebook Open Graph.

I did poll a few industry contacts, asking their thoughts on the state of the market and prospects for the year ahead. Ontotext CTO Marin Dimitrov was one of them. His take agrees with mine, regarding “a more prominent role for knowledge graphs.” His own company will “continue delivering solutions based on our traditional approach of merging structured and unstructured data analytics, using graph databases, and utilizing open knowledge graphs for text analytics.”

Marin also called out “stronger support for multi-lingual analytics, with support for 3+ languages being the de-facto standard across the industry.” Marin’s company is based in Bulgaria, and he observed, “In the European Union in particular, the European Commission (EC) has been strongly pushing a multi-lingual digital market agenda for several years already, and support for multiple languages (especially ‘under-represented’ European languages) is nowadays a mandatory requirement for any kind of EC research funding in the area of content analytics.”

José Carlos González, CEO of Madrid-based text analytics provider Daedalus, commented on the “‘breadth vs depth’ dilemma. The challenge of developing, marketing and selling vertical solutions for specific industries has lead some companies to focus on niche markets quite successfully.” Regarding one, functional (rather than industry-vertical) piece of the market, González believes “Voice of the Customer analytics — and in general all of the movement around customer experience — will continue being the most important driver for the text analytics market.”

One of Marin Dimitrov’s predictions was more emerging text analytics as-a-service providers, with a clearer differentiation between the different offers. Along these lines, Shahbaz Anwar, CEO of analytics provider PolyVista, sees the linking of software and professional as a differentiator. Anwar says, “We’re seeing demand for text analytics solutions — bundling business expertise with technology — delivered as a service, so that’s where PolyVista has been focusing its energy.”

Further —

Streams are kind of exciting. Analysis of “data-in-flight” has been around for years, for structured data, formerly primarily known as part of complex event process (CEP) and applied in fields such as telecomm and financial markets. Check out Julian Hyde‘s 2010 Data In Flight. For streaming (and non-streaming) text, I would call out Apache Storm and Spark. For Storm, I’ll point you to a technical-implementation study posted by Hortonworks, Natural Language Processing and Sentiment Analysis for Retailers using HDP and ITC Infotech Radar, as an example. For Spark, Pivotal published a similar and even-more-detailed study, 3 Key Capabilities Necessary for Text Analytics & Natural Language Processing in the Era of Big Data. Note all the software inter-operation going on. Long gone are the days of monolithic codebases.

But in the end, money talks, so now on to part 3 —

Follow the Money: Investments

Investment activity is a forward-looking indicator, suggesting optimism about a company’s growth potential and likely profitability and more particularly, the viability of the company’s technology and business model and the talent of its staff.

I’ll run through 2014 funding news and mergers & acquisitions activity, although first I’ll again note one 2015 acquisition in the space, IBM’s purchase of AlchemyAPI, and NetBase’s $24 million Series E round.

In 2014, chronologically:

  1. Verint‘s $514m purchase of KANA opens lines beyond the call center.” I’ve linked to a 451 story; here’s Verint’s press release. Verint is a global playing in customer interaction analytics and other fields; target KANA had itself bought voice of the customer (VOC)/text analytics provider Overtone back in 2011. Big question: Has Verint (by now) replaced its OEM-licensed Clarabridge text analytics engine with the Overtone tech? (January 6, 2014)
  2. HootSuite Acquires uberVU To Bolster Analytics Offering,” acquiring a social intelligence company whose tech stack includes text analytics. (January 22, 2014)
  3. Confirmit Acquires Social Intelligence and Text Analytics Innovator Integrasco.” A couple of Norwegian firms get hitched: A voice of the customer (VOC)/enterprise feedback vendor and text analytics/social intelligence vendor. (January 22, 2014)
  4. Sentisis, which focuses on Spanish-language analysis, collected €200,000 in seed funding, per Crunchbase. Now, a year later, Sentisis has done a $1.3 million Series A round. (February 14, 2014 and March 18, 2015)
  5. A French social listening company: Synthesio Secures $20 Million Investment from Idinvest Partners. Here’s an interview I did with Synthesio’s text analytics lead, Pedro Cardoso, last November: “From Social Sources to Customer Value: Synthesio’s Approach.” (March 14, 2014)
  6. Gavagai, a Swedish start-up, pulled in 21 million knonor, which translates to about US$3.2 million. (The name “gavagai” is a sort-of inside joke and was used once before by an NLP company!) (March 24, 2014)
  7. Text analytics via Hadoop played a part in FICO‘s April acquisition of Karmasphere. (April 16, 2014)
  8. Pegasystems buys Bangalore analytics startup MeshLabs,” reported the Times of India. (May 7, 2014)
  9. Attensity Closes $90 Million in Financing.” Attensity was one of the first commercial text-analytics providers, going beyond entities to “exhaustive extraction” of relations and other information. I put out a two-part appraisal last summer, “Attensity Doubles Down: Finances and Management” and “Attensity, NLP, and ‘Data Contextualization’.” (May 14, 2014)
  10. Inbentacompany from Barcelona specialized in Intelligent Customer Support software with Artificial Intelligence and Natural Language Processing, raises $2 million from Telefónica.” (May 14, 2014)
  11. Brandwatch pulled in $22 million in new funding. There’s a press release from the social analytics vendor, which has basic text analytics capabilities — former CTO Taras Zagibalov presented at my 2011 sentiment symposium (slides, video) — although it appears they’re a bit stale. (May 22, 2014)
  12. Lexalytics Acquires Semantria To Bring Sentiment Analysis To The Masses,” reported TechCrunch. VentureBeat’s version was “Lexalytics buys Semantria, because you gotta be able to analyze text in the cloud.” VentureBeat reported, “the deal went down for less than $10 million,” but I’m guessing the reporter was unaware that Lexalytics already owned a hefty chunk of spin-off Semantria, 25% I believe. (July 14, 2014)
  13. NetBase Completes $15.2 Million Round of Expansion Funding.” NetBase has strong multi-lingual text analytics and appears to be on a not-as-fast-as-they-would-hope path to an IPO: The company just took in another $24 million, in Series E funding, on March 13, 2015. Taking on more fuel before IPO take-off, I assume. (July 15, 2014)
  14. Synapsify bags $850K.” Congratulations (again) to Stephen Candelmo! (July 24, 2014)
  15. Innodata Announces Acquisition of MediaMiser Ltd.,” which you can learn about from the target’s perspective as well,
    “Well, this happened: We’ve been acquired by Innodata!.” (July 28, 2014)
  16. SmartFocus acquired Content Savvy, an NLP provider, for incorporation into “the UK’s first omni-channel digital marketing system.” (September 15, 2014)
  17. Digital Reasoning Raises $24 Million in Series C Round Led by Goldman Sachs & Credit Suisse Next Investors.” Cognitive computing! (October 9, 2014)
  18. Maritz Research buys Allegiance, forms MaritzCX” This is an interesting take-over, by a research firm — Maritz is/was a customer of Clarabridge’s, and maybe of other text-analytics providers — of a customer-experience firm that in turn licensed Attensity’s and Clarabridge’s technology, although Clarabridge’s seemingly on a capability-limited basis. (November 5, 2014)
  19. Brand a Trend, a Cloud – based Text Analytics Company based out of Heidelberg, Germany, announced a $4.5 million round of funding that it will use to push into the U.S. and the booming digital market” — that’s the SUMMICS product — following on a $600 thousand 2013 founding investment and a February 2014 $euro;800 thousand seed investment. (November 11th 2014)
  20. Natural language generation: “Narrative Science pulls in $10M to analyze corporate data and turn it into text-based reports.” Rival Arria did an IPO, as NLG.L, in December 2013. (November 28, 2014)

Reports and Community

Let’s finish with opportunities to learn more, starting with conferences because there is still no substitute for in-person learning and networking (not that I dislike MOOCs, videos, and tutorials.) Here’s a selection:

  • Text Analytics World, March 31-April 1 in San Francisco, co-located with Predictive Analytics World.
  • Text by the Bay, “a new NLP conference bringing together researchers and practitioners, using computational linguistics and text mining to build new companies through understanding and meaning.” Dates are April 24-25, in San Francisco.
  • The Text Analytics Summit (a conference I chaired from its 2005 founding through 2013’s summit) will take place June 15-16 in New York, the same dates as…
  • The North American instance of IIeX, Greenbook’s Insight Innovation Exchange, slated for June 15-17 in Atlanta. I’m organizing a text analytics segment; send me a note if you’d like to present.
  • My own Sentiment Analysis Symposium, which includes significant text-analysis coverage, is scheduled for July 15-16 in New York, this year featuring a Workshops track in parallel with the usual Presentations track. In case you’re interested: I have videos and presentations from six of the seven other symposiums to date, from 2010 to 2014, posted for free viewing. New this year: A half-day workshop segment devoted to sentiment analysis for financial markets.

The 2014 LT-Accelerate conference in Brussels.

If you’re in Europe or fancy a trip there, attend:

On the vendor side,

Moving to non-commercial, research-focused and academic conferences… I don’t know whether the annual Text as Data conference will repeat in 2015, but I have heard from the organizers that NIST’s annual Text Analysis Conference will be scheduled for two days the week of November 16, 2015.

The 9th instance of the International Conference on Weblogs and Social Media (ICWSM) takes place May 26-29 in Oxford, UK. And the annual meeting of the Association for Computational Linguistics, an academic conference, move to Beijing this year, July 26-31.


I’ve already cited my own Text Analytics: User Perspectives on Solutions and Providers.

Butler Analytics’ Text Analytics: A Business Guide, issued in February 2014, provides a good, high-level business overview.

And I’m exploring a report/e-book project, topic (working title) “Natural Language Ecosystems: A Survey of Insight Solutions.”

If you know of other market activity, conference or resources I should include here, please let me know and I’ll consider those items for an update. In any case…

Thanks for reading!

Disclosures +

I have mentioned many companies in this article. I consult to some of them. Some sponsored my 2014 text-analytics market study or an article or a paper. (This article is not sponsored.) Some have sponsored my conferences and will sponsor my July 2015 symposium and/or November 2015 conference. I have taken money in the last year, for one or more of these activities, from: AlchemyAPI, Clarabridge, Daedalus, Digital Reasoning, eContext, Gnip, IBM, Lexalytics, Luminoso, SAS, and Teradata. Not included here are companies that have merely bought a ticket to attend one of my conferences.

If your own company is a text analytics (or sentiment analysis, semantics/synthesis, or other data analysis and visualization) provider, or a solution provide that would like to add text analytics to your tech stack, or current or potential user, I’d welcome helping you with competitive product and market strategy on a consulting basis. Or simply follow me on Twitter at @SethGrimes or read my Breakthrough Analysis blog for technology and market updates and opinions.

Finally, I welcome the opportunity to learn about new and evolving technologies and applications, so if you’d like to discuss any of the points I’ve covered, as they relate to your own work, please do get in touch.

IBM Watson, AlchemyAPI, and a World of Cognitive Computing

In the news: IBM has bought text- and image-analysis innovator AlchemyAPI, for inclusion in the Watson cognitive computing platform.

AlchemyAPI sells text analysis and computer vision capabilities that can be integrated into application, services, and data systems via a SaaS API. But I don’t believe you’ll find the words “cognitive computing” on AlchemyAPI‘s Web site. So where’s the fit? What gap was IBM seeking to fill?

For an IBM description of Watson cognitive computing, in business-accessible terms, see the video embedded in this article.

IBM explains Watson cognitive computing

My definition: Cognitive computing both mimics human capabilities — perception, synthesis, and reasoning — and applies human-like methods such as supervised learning, trained from established examples, to discern, assess, and exploit patterns in everyday data. Successful cognitive computing is also superhuman, with an ability to apply statistical methods to discover interesting features in big, fast, and diverse data. Cognitive platforms are scalable and extensible, able to assimilate new data and methods without restructuring.

AlchemyAPI fits this definition. The automated text understanding capabilities offered by AlchemyAPI and competitors — they include TheySaySemantria, OntotextPingar, MeaningCloud, LuminosoExpert System, DatumboxConveyAPI, BitextAylien, and others, each with its own strengths — add value to any social or enterprise solution that deals with large volumes of text. I haven’t even listed text analysis companies that don’t offer an on-demand, as-a-service option!

AlchemyAPI uses a hybrid technical approach that combines statistical, machine learning, and taxonomy-based methods, adapted for diverse information sources and business needs. But what sets AlchemyAPI apart is the company’s foray into deep learning, the application of a hierarchy of neural networks to identifying both broad-stroke and detailed language and image features.

So AlchemyAPI isn’t unique in the natural-language processing (NLP) domain, but the company does have lasting power. The success is measurable. AlchemyAPI, founded in 2005, was relatively early to market with an on-demand text analysis service and has won an extensive developer following although I’ll bet you $1 that the widely circulated 40,000 developer figure counts API-key registrations, not active users. The company is continually rolling out new features, which range from language detection and basic entity extraction to some of the most fine-grained sentiment analysis capabilities on the market. By contrast, development of the most notable early market entrant, OpenCalais from Thomson Reuters, stalled long ago.

Agility surely plays a role in AlchemyAPI’s success, management foresight that led the company to jump into computer vision. CEO Elliot Turner described the opportunity in an April, 2014 interview:

“Going beyond text, other areas for big progress are in the mining of audio, speech, images and video. These are interesting because of their incredible growth. For example, we will soon see over 1 billion photos/day taken and shared from the world’s camera phones. Companies with roots in unsupervised deep-learning techniques should be able to leverage their approaches to dramatically improve our ability to correctly identify the content contained in image data.”

Yet there’s competition in image analysis as well. Given work in sentiment analysis, most of the companies I follow apply the technology for emotion analytics — they include Affectiva, Emotient, Eyeris, and RealEyes — but consider that Google couldn’t build a self-driving car without technology that “sees.” The potential impact of computer vision and automated image analysis seems limitless, with plenty of opportunity to go around.

Why did IBM, a behemoth with immense research capabilities, need to go outside by acquiring AlchemyAPI? I’d speculate that IBM’s challenge is one that share by many super-large companies: Inability to effectively commercialize in-house innovation. Regardless, the prospect of bringing onto the Bluemix cloud platform all those NLP-interested developers, whether 40,000 or some lesser active number, was surely attractive. The AlchemyAPI technology will surely plug right in: Modern platforms accommodate novelty. As I wrote above, they’re able to assimilate new data and methods without restructuring.

And Watson? It’s built on the IBM-created Apache UIMA (Unstructured Information Management Architecture) framework, designed for functional extensibility. AlchemyAPI already fits in, via a set of “wrappers” that I expect will be updated and upgraded soon. But truth is, it seems to me that given Watson’s broad and proven capabilities, these added capabilities provide only a relatively small technical boost, in two directions. First, AlchemyAPI will provide market-proven unsupervised learning technology to the Watson stack, technology that can be applied to diverse language-understanding problem. Second, as I wrote, AlchemyAPI offers some of the most fine-grained sentiment analysis capabilities on the market, providing certain information-extraction capabilities not currently closely linked to Watson. What IBM will do with AlchemyAPI’s image-understanding capabilities, I can’t say.

Beyond these technical points, I’m guessing that the bottom-line attractions were talent and opportunity. IBM’s acquisition press release quotes AlchemyAPI CEO Elliot Turner: “We founded AlchemyAPI with the mission of democratizing deep learning artificial intelligence for real-time analysis of unstructured data and giving the world’s developers access to these capabilities to innovate. As part of IBM’s Watson unit, we have an infinite opportunity to further that goal.” It’s hard to beat infinite opportunity or, for a company like IBM, a chance to build on a combination of agility, talent, enthusiasm, market-sense, and foresight that is hard to find in house or in the commercial marketplace.

Disclosure: I have mentioned numerous companies in this article. AlchemyAPI, IBM, Converseon (ConveyAPI), Daedalus (MeaningCloud), Lexalytics (Semantria), Luminoso, Ontotext, and TheySay have paid to sponsor my Sentiment Analysis Symposium conference and/or my Text Analytics 2014 market study and/or the Brussels LT-Accelerate conference, which I co-own.

An extra: Video of a talk, Deep Learning for Natural Language Processing, with Stephen Pulman of the University of Oxford and text-analysis solution provider TheySay, offered at the 2014 Sentiment Analysis Symposium. Deep learning techniques are central to AlchemyAPI’s text and image analysis capabilities as well.

Lipika Dey, Tata Consultancy Services

The Analytics of Digital Transformation, per Tata Consultancy Services

Next month’s LT-Accelerate conference will be the third occasion I’ve invited Lipika Dey to speak at a conference I’ve organized. She’s that interesting a speaker. One talk was on Goal-driven Sentiment Analysis, a second on Fusing Sentiment and BI to Obtain Customer/Retail Insight. (You’ll find video of the latter talk embedded at the end of this article.) Next month, at LT-Accelerate in Brussels, she’ll be speaking on a particular topic that’s actually of quite broad concern, E-mail Analytics for Customer Support Centres.

As part of the conference lead-up, I interviewed Lipika regarding consumer and market analytics, and — given her research and consulting background — techniques that best extract practical, usable insights from text and social data. What follows are a brief bio and then the full text of our exchange.

Dr. Lipika Dey, Tata Consultancy Services

Dr. Lipika Dey, senior consultant and principal scientist at Tata Consultancy Services

Dr. Lipika Dey is a senior consultant and principal scientist at Tata Consultancy Services (TCS), India with over 20 years of experience in academic and industrial R&D. Her research interests are in content analytics from social media and news, social network analytics, predictive modeling, sentiment analysis and opinion mining, and semantic search of enterprise content. She is keenly interested in developing analytical frameworks for integrated analysis of unstructured and structured data.

Lipika was formerly a faculty member in the Department of Mathematics at the Indian Institute of Technology, Delhi, from 1995 to 2006. She has published in international journals and refereed conference proceedings. Lipika has a Ph.D. in Computer Science and Engineering, M.Tech in Computer Science and Data Processing, and 5 Year Integrated M.Sc in Mathematics from IIT Kharagpur.

Our interview with Lipika Dey —

Q1: The topic of this Q&A is consumer and market insight. What’s your  personal background and your current work role, as they relate to these domains?

Lipika Dey: I head the research sub-area of Web Intelligence and Text Mining at Innovation Labs, Delhi of Tata Consultancy Services. Throughout my academic and a research career, I have worked in the areas of data mining, text mining and information retrieval. My current interests are focused towards seamless integration of business intelligence and multi-structured predictive analytics that can reliably and gainfully use information from multitude of sources for business insights and strategic planning.

Q2: What roles do you see for text and social analyses, as part of comprehensive insight analytics, in understanding and aggregating market voices?

Lipika Dey: The role of text in insight analytics can be hardly over-emphasized.

Digital transformation has shifted control of the consumer world to consumers from providers. Consumers — both actual and potential — are demanding, buying, reviewing, criticising, influencing others, and thereby controlling the market. The decreasing cost of smart gadgets is ensuring that all this is not just for the elite and tech-savvy. Ease of communicating in local languages on these gadgets is also a contributing factor to the increased user base and increased content generation.

News channels and other traditional information sources have also adopted social media for information dissemination, thereby paving the way for study of people’s reactions to policies and regulations.

With so much expressed and exchanged all over the world, it is hard to ignore content and interaction data to gather insights.

Q3: Are there particular tools or methods you favor? How do you ensure business-outcome alignment?

Lipika Dey: My personal favourites for text analytics are statistical methods and imprecise reasoning techniques used in conjunction with domain and business ontologies for interpretation and insight generation. Statistical methods are language agnostic and ideal for handling noisy text. Text inherently is not amenable to be used within a crisp reasoning framework. Hence use of imprecise representation and reasoning methodologies based on fuzzy sets or rough sets is ideal for reasoning with text inputs.

The most crucial aspect for text analytics based applications is interpretation of results and insight generation. I strongly believe in interactive analytics platforms that can aid a human analyst comprehend and validate the results. Ability to create and modify business ontology with ease and view the content or results from different perspectives is also crucial for successful adoption of a text analytics based application. Business intelligence is far too entrenched in dashboard-driven analytics at the moment. It is difficult to switch the mind-set of a whole group at once. Thus text analytics at this moment is simply used as a way to structure the content to generate numbers for some pre-defined parameters. A large volume of information which could be potentially used is therefore ignored. One possible way to practically enrich business intelligence with information gathered from text is to choose “analytics as a service” rather than look for a tool.

As a researcher I find this the most exciting phase in the history of text analytics. I see a lot of potential in the yet unused aspects of text for insight generation. We are at the confluence where surface level analytics has seen a fair degree of success. The challenge now is to dive below the surface and understand intentions, attitudes, influences, etc. from stand-alone or communications text. Dealing with ever-evolving language patterns that are also in turn influenced by the underlying gadgets through which content is generated just adds to the complexity.

Q4: A number of industry analysts and solution providers talk about omni-channel analytics and unified customer experience. Do you have any thoughts to share on working across the variety of interaction channels?

Lipika Dey: Yes, we see many business organizations actively moving towards unified customer experience. Omni-channel analytics is catching up. But truly speaking I think at this point of time it is an aspirational capabilty. A lot of information is being pushed. Some of it is contextual. But I am not sure whether the industry is still in a position to measure its effectiveness or for that matter use it to its full potential.

It is true that a multitude of sources help in generating a more comprehensive view of a consumer, both as an individual as well as a social being. Interestingly, as data is growing bigger and bigger, technology is enabling organizations to focus on smaller and smaller groups, almost to the point of catering to individuals.

As a researcher I see exciting possibilities to work in new directions. My personal view is that the success of omni-channel analytics will depend on the capability of data scientists to amalgamate domain knowledge and business knowledge with loads and loads of information gathered about socio-cultural, demographic, psychological and behavioural factors of target customers. Traditional mining tools and technologies will play a big role, but I envisage an even greater role for reasoning platforms which will help analysts play around with information in a predictive environment, pick and choose conditional variables, perform what-if analysis, juggle around with possible alternatives and come up with actionable insights. The possibilities are endless.

Q5: To what extent does your work involve sentiment and subjective information?

Lipika Dey: My work is to guide unstructured text analytics research for insight generation. Sentiments are a part of the insights generated.

The focus of our research is to develop methods for analysing different types of text, mostly consumer generated, to not only understand customer delights and pain-points but also to discover the underlying process lacunae and bottlenecks that are responsible for the pain-points. These are crucial insights for an enterprise. Most often the root cause analysis involves overlaying the text analytics results with other types of information available in the form of business rules, enterprise resource directory, information exchange network etc. for generating actionable insights. Finally it also includes strategizing to involve business teams to evaluate insights and convert the insights into business actions with appropriate computation of ROI.

Q6: How do you recommend dealing with high-volume, high-velocity, diverse data — to ensure that analyses draw on the most complete and relevant data available and deliver the most accurate results possible?

Lipika Dey: Tata Consultancy Services has conducted several surveys across industry over the last two years to understand organizational big data requirements. The findings are published in several reports available online. (See the Tata Consultancy Services Web site, under the Digital Enterprise theme.) One of the key findings from these surveys was that many business leaders saw the impending digital transformation as siloed components affecting only certain parts of the organization. We believe that this is a critical error.

The digital revolution that is responsible for high volumes of diverse data arriving at high velocity does not impact only a few parts of business — it affects almost every aspect. Thus our primary recommendation is to harness a holistic view of the enterprise that encompasses both technology and culture. Our focus is to help organizations achieve total digital transformation through an integrated approach that spans sales, customer service, marketing, and human resources, affecting the entire universe of business operations. The message is this: Business processes need to be rethought. The task at hand is to predict and prioritize the most likely and extreme areas of impact.

Q7: So what are the elements of that rethinking and that prioritization?

Lipika Dey: We urge our clients to consider the four major technology shifters under one umbrella. Big data initiatives should operate in tandem with social-media strategy, mobility plans, and cloud computing initiatives. I’ve talked about big data. The others —

Social media has tremendous potential for changing both business-to-business and business-to-consumer engagement. It is also a powerful way to build “crowdsourcing” solutions among partners in an ecosystem. Moving beyond traditional sales and services, social media also has tremendous role in supply-chain and human resource management.

Mobile apps are here to transform the way business operated for ages. They are also all set to change the way employees use organizational resources. Thus there is a pressure to rethink business rules and processes.

There will also soon be a need for complete infrastructure revision to ward off the strains imposed in meeting data needs. While cloud computing initiatives are on the rise, we still see them signed up by departments rather than enterprises. The fact that cloud offerings are typically paid for by subscription makes them economical when signed up by enterprises.

Having said that we also believe there is no “one size fits all” strategy. Enterprises may need to redesign their workplaces where business will work closely with IT to redesign its products and services, mechanisms for communicating with customers, partners, vendors and employees, business models and business processes.

Q8: Could you say more about data and analytical challenges?

Lipika Dey: The greatest challenges while dealing with unstructured data analytics for an enterprise is to measure accuracy, especially in absence of ground truths and also effectiveness of measures taken. To check effectiveness of actionable insights, one possibility is to use the A/B testing approach. It is a great way to understand the target audience and evaluate different options. We also feel it is always better to start with internal data — something that is assumed to be intuitively understood. If results match known results, well and good — your faith in the chosen methods increase. If they don’t match — explore, validate and then try out other alternatives, if not satisfied.

Q9: Could you provide an example (or two) that illustrates really well what your organization and clients have been able to accomplish via analytics, that demonstrate strong ROI?

Lipika Dey: I will describe two case studies. In the first one, one of our clients wanted to analyze calls received over a particular at their toll-free call-center. These calls were of unusually high duration. The aim was to reduce operational cost for running the call center without compromising on customer satisfaction. The calls were transcribed into text. Analysis of the calls revealed several insights that could be immediately transformed into actionable insights. The different types of analyses carried out and insights revealed were broadly categorized into different buckets as follows:

(a) Content based analysis  identified that these calls contained queries pertaining to existing customer accounts, queries about new products or services, status updates about transactions, and eventually requests for documents.

(b) Structural analysis revealed that each call requested multiple services and for different clients, which eventually led to several context switches for search of information, thereby leading to high duration. It also revealed that calls often landed at wrong points and had to be redirected several times before they could be answered.

Based on the above findings, a restructuring of the underlying processes and call-center operations were suggested with an estimated ROI based on projected reduction in number of calls requesting for status updates or documents to be dispatched etc. based on available statistics.

In the second case study, analysis of customer communications for the call-center of an international financial institution, done periodically over an extended period, revealed several interesting insights about how customer satisfaction could be increased from their current levels. The bank wished to obtain aggregated customer sentiments around a fixed set attributes related to their products, staff, operating environment, etc. We provided those, and the analysis also revealed several dissatisfaction root causes that were not captured in the fixed set of parameters. Several of these issues were not even within the bank’s control since those were obtained as external services. We correlated sentiment trends for different attributes with changes in customer satisfaction index to verify correctness of actions taken.

In this case, strict monetary returns were not computed. Unlike in retail, computing ROI for financial organizations require long-term vision, strategizing, investment and monitoring of text analytics activities.

Q10: I’m glad you’ll be speaking at LT-Accelerate. Your talk is titled “E-mail Analytics for Customer Support Centres — Gathering Insights about Support Activities, Bottlenecks and Remedies.” That’s a pretty descriptive title, but is there anything you’d like to add by way of a preview?

Lipika Dey: A support centre is the face of an organization to its customers and emails remain the life-line of support centres for many organizations. Hence organizations spend a lot of money on running these centres efficiently and effectively. But unlike other log-based complaint resolution systems, when all communication within the organization and with the customers occur through emails, analytics becomes difficult. That’s because a lot of relevant information about the type of problems logged, the resolution times, the compliance factors, the resolution process, etc. remains embedded within the messages and that too not in a straight forward way.

In this presentation we shall highlight some of the key analytical features that can generate interesting performance indicators for a support centre. These indicators can in turn be used to measure compliance factors and also characterize group-wise problem resolution process, inherent process complexities and activity patterns leading to bottlenecks — thereby allowing support centers to reorganize their mechanisms. It also supports a predictive model to incorporate early warnings and outage prevention.

Thanks Lipika, for sharing insights in this interview and in advance for your December presentation.

The Voice of the Customer × 650 Million/Year at Sony Mobile

We understand that customer feedback can make or break a consumer-facing business. That feedback — whether unsolicited, social-posted opinions, or gained during support interactions, or collected via surveys — captures valuable information about product and service quality issues. Automated analysis is essential. Given data volume and velocity, and the diversity of feedback sources and languages that a global enterprise must deal with, there is no other way to effectively produce insights.

Olle Hagelin, Sony Mobile

Olle Hagelin, Sony Mobile

Consumer and market analytics — and supporting social, text, speech, and sentiment analysis techniques — are subject matter for the LT-Accelerate conference, taking place December 4-5, 2014 in Brussels. We’re very happy that we were able to recruit Olle Hagelin from Sony Mobile as a speaker.

Olle started in the mobile phone business 1993 as a production engineer. He has held many roles as a project and quality manager. He was responsible for the Ericsson Mobile development process and for quality at a company level. Olle is currently quality manager in the Quality & Customer Service organization at Sony Mobile Corporation. Olle is responsible for handling feedback from the field.

Our interview with Olle Hagelin —

Q1: The topic of this Q&A is consumer and market insight. What’s your personal background and your current work role, as they relate to these domains?

Olle Hagelin: My responsibility is to look into all customer interactions to determine Sony Mobile’s biggest issues from the customer’s point of view. We handle around 650 million interactions per year.

Q2: What roles do you see for text and social analyses, as part of comprehensive insight analytics, in understanding and aggregating market voices?

Olle Hagelin: I think text and social analyses can replace most of what is done today.

Everyone’s customer will sooner or later express what they want on the Net. And opinions won’t be colored by your questions. You just put your ear to the ground and listen. You probably want to ask questions too but that will be to get details, to fine tune — not to understand the picture, only to understand what particular shade of green the customer is seeing out of 3,500 shades of green.

Q3: Are there particular tools or methods you favor? How do you ensure business-outcome alignment?

Olle Hagelin: You will always prefer the tool you use/can. For our purposes what we get from Confirmit and the tool Genius is perfect. But again it is to find issues, to mine text to find issues and understand sentiment of issues. If you are a marketing person it may be that other tools that are better.

Business-outcome alignment is a big statement and I don’t try to achieve that. If it comes, nice, but my aim is only to understand customer issues and to ensure that they are fixed as soon as possible. And I suppose the in-the-end result is business-outcome alignment?

Q4: A number of industry analysts and solution providers talk about omni-channel analytics and unified customer experience. Do you have any thoughts to share on working across the variety of interaction channels?

Olle Hagelin: Yes. Do it. I do. Sorry. Politically correct: Sony Mobile does and has since 2010. All repairs, all contact center interactions, and as much social as possible. As said above, we handle around 650 million interactions per year.

Q5: To what extent does your work involve sentiment and subjective information?

Olle Hagelin: A lot although it could be more. Especially to determine which issues hurt the customer most. Identifying the biggest, most costly issues etc. is easy, but to add on pain-point discovery would be good.

Sentiment/subjective analyses are used frequently to look into specific areas but not as part of the standard daily deliverable. Hopefully everyday will be in place in a year or two.

Q6: How do you recommend dealing with high-volume, high-velocity, diverse data — to ensure that analyses draw on the most complete and relevant data available and deliver the most accurate results possible?

Olle Hagelin: This can be discussed for days. But in short: Look at what you have and start from that. Build up piece-by-piece. Don’t attempt a big do-it-all system because it will never work and always be outdated. If you know only one part well — say handling either structured data or unstructured data — don’t try yourself to take a big bite of the other part, the part you don’t know well. Instead, buy help and learn slowly.

Sony Mobile works to split the data up into structured and unstructured parts. We work with them separately to identify issues first and then compare. We know structured data well and got very good support and help with the unstructured part. After four years we can do a lot ourselves, but without support from Confirmit with the hard unchewed mass of unstructured data — Confirmit handles text in the language it is written in (no translations) — we wouldn’t be able manage.

The end result is to make it quick and easy to get to the point.

After working with this data many years, we now have a good understanding of what issues that will be seen in all systems and which will not.

Q7: Could you provide an example (or two) that illustrates really well what your organization and clients have been able to accomplish via analytics, that demonstrate strong ROI?

Olle Hagelin: Two cases that we fixed quickly recently —

First is an issue when answering a call. The call always went to speaker mood. We identified the problem and it was fixed by Google within two weeks — it was an issue in Chrome.

Another one was several years ago: A discussion about a small and in-principle invisible crack in the front of a phone stopped sales in Germany. After we issued a statement that the problem is covered by warranty and will be fixed within warranty coverage, sales started again. It turned out almost no one wanted a fix! As I said, you had to look for the crack to see it.

I have many more examples, but I think for daily work, the possibility of quick-checking social to see whether an issue has spread or not has been the most valuable contributor. And that ability keeps head count down.

Q8: You’ll be presenting at LT-Accelerate. What will you be covering?

Olle Hagelin: I’ll show how Sony Mobile uses social and also text mining of CRM data to quickly identify issues, and how we get an understanding of how big they are with complementing structured data.

Added to this, the verbatim from customers can be used as feedback to engineers so they can reproduce issues in order to fix them.

My thanks to Olle. Please consider joining us at LT-Accelerate in Brussels to hear more!