Think Mid Data, and Triangulate: Tom H.C. Anderson on Next Generation Research Methods

tomhcandersonI’m always happy to have occasion to talk, about the meeting ground of business and analytical technologies, with Tom Anderson, a pioneer in defining and promoting Next Generation Market Research (NGMR). NGMR is characterized by inclusion of the full-spectrum of research sources, including high-velocity social sources and psychological and behavioral profiles, alongside conventional surveys and focus groups. NGMR derives insights from the ensemble of sources via the application of advanced analytical methods including data mining and text analytics.

Tom’s company, Anderson Analytics, is the leader in NGMR implementations and develops OdinText, a text-analytics software package designed for market researchers. I reached out to him in the course of researching topics for my next Sentiment Analysis Symposium, a business-focused conference that explores technologies and solutions that harvest attitudes, opinions, and emotions from online, social, and enterprise sources. Sentiment analysis, especially as implemented via text analytics, forms part of the NGMR toolkit. Tom plans to attend the May 8, 2013 symposium in New York, but even if you can’t make it, you can still benefit from Tom’s insights on research concerns and methods, as relayed in a Q&A I recently conducted with him, as follows.

Seth Grimes> Tom, what’s your assessment of Big Data as a research concern? Are conventional descriptions of Big Data useful, or is the emphasis skewed or misplaced? Are there types or sources of data we should ignore and others that are essential?

Tom H.C. Anderson> I recently blogged about how “Mid Data” is actually a more realistic and useful term that takes into account ROI. There are IT vendors out there currently selling text analytics solutions with the main value proposition being that they will somehow make your data more valuable by simply combining it with other data. Unfortunately it in reality it doesn’t work that way. You need to stop thinking Big Data and start thinking Smart Data.

If you’ve never gotten much value out of a particular data source, then adding it to other sources is not likely to make it any more useful. Companies need to think first about which of their individual sources of unstructured data is most important and how they can get more value out of that. Then and only then should you consider whether or not marrying it with any other data makes sense, in most cases (and I’m talking about analytics/insights now) it doesn’t.

Seth> What’s your view as a practitioner of the research value of new methods in neuroscience, of facial measurement in images for emotion detection from images, of speech analytics? If your company applies them, how do you use them?

THCA> I find some of them more interesting than others, but certainly keep my eyes on things to see where there is anything we can learn. I’m more interested in speech analytics as a way to economically transcribe audio for use in text analytics. I’m hopeful but haven’t seen anything being broadly adopted in that area yet.

Neuroscience contains so much, a lot of it unproven and more than a bit. Hokey in my opinion. That said, I do certainly believe that measuring emotion in text is worthwhile and it’s something we have been working with for several years now. It’s not the most important aspect of text analytics, but a part that has value none the less.

Seth> We’ve been talking about triangulation for a number of years, the combination of data from multiple sources, for instance, from attitudinal analysis (surveys and content), behavior tracking, and psychological and demographic profiling. Are we there yet?

TomHCA>  I first advocated the idea of triangulation in individual text analytics projects at a software conference in 2005. But this was more in terms of using it within a single text analytics data source, and triangulating using more than one text analytics approach, and usually also with a human POV. I think our software, as well as understanding of how to leverage best leverage text analytics now has gotten better so that this triangulation approach is now less important than it was then.

But the triangulation you are talking about here, basically the other way around, adding multiple data sources together as I mentioned earlier, well I have not found that to be very effective. Usually it’s a bit of a boondoggle and net negative ROI.

Once in a while we’ve seen cases were marrying two related sources of data makes a lot of sense. Usually each has individual value and we have explored and understand each set well first, and then we look at them combined.

Where I’ve seen it make far less sense is in adding three or more data sources with weak connections. Social Media data, which is just Tweets and blog posts (mainly spam) if we’re honest, is one such source.

There’s lot’s of talk about combining social media data with other sources like consumer survey data and all sorts of other internal metrics. But in actuality the data is typically not worth much more when combined with other data then it is when separate.

Seth> What are the most important research insights that can be derived from attitudinal data — from opinions, emotions, and other forms of sentiment?

THCA> Basically any feedback that you typically get from human communication can benefit from text analytics. The “What” and “Why” questions are certainly important. What customers want, what your competitors are doing, how to improve satisfaction and increase purchasing, these are just a few of the things we regularly answer with OdinText.

Seth> Tom, thanks for participating in this Q&A. Last question, given the focus of my own work: Where should we head next with sentiment analysis? (Please note that this question doesn’t assume an analysis method, which could involve expert analysis, crowd-sourcing, or automated NLP.)

THCA> I’ve never viewed sentiment analysis as separate from text analytics, it’s just one part of it. I prefer to put the clients business objectives and data ahead of the technique. So where we take sentiment will depend on what is needed.

We’re focusing more on how we can help clients best with the current accuracy of sentiment as I believe it’s really already good enough to do the job, and contrary to what some say I haven’t seen anyone improve is in any significant way without a lot of data specific customization.

AugieRay

Augie Ray on Informed Customer Experience

Positive customer experience is a key contributor to satisfaction, loyalty, advocacy, and, ultimately, profitability, which makes customer experience management (CEM) — the application of practical and analytical methods to measure and improve CX through better design and delivery of products and services — a key corporate competence. How do you measure customer experience? You’ll want to study customer interactions and also customer sentiment, whether determined via direct approaches such as surveys or via indirect methods that include monitoring online and social media.

To collect and crunch customer-experience data at scale, you need technology. This point is where sentiment analysis and related solutions come into play. To learn about best practices, there’s no substitute for consulting expert practitioners and analysts. A couple weeks ago, I approached Sid Banerjee, co-founder and CEO of “intelligent customer experience” provider Clarabridge, to discover his take on the CX topic. Next up, in this article, a conversation with social-business visionary Augie Ray. (Check out Augie’s blog, Social Experiences that Build Brands.)

AugieRayAugie is slated to keynote the May 8, 2013 Sentiment Analysis Symposium in New York. His talk is titled “Customer Affinity Meets Brand Vectors: Sentiment That Matters.” The aim is to explore the relationship between sentiment and engagement and how companies must build brand vectors (which I’ll interpret as indicating strength and direction in “the set of expectations, memories, stories and relationships that, taken together, account for a consumer’s decision to choose one product or service over another,” per the other Seth’s 2009 definition) into their social strategies.

(The symposium covers technologies and solutions that harvest attitudes, opinions, and emotions from online, social, and enterprise sources. Application areas include market research, social/media analysis, customer service and customer experience, capital markets, and a spectrum of other uses.)

On to our Q&A with Augie Ray

Seth Grimes> Augie, in a sentence or two: What constitutes great customer experience?

Augie Ray> A great customer experience is one that furnishes the customer what (or more) he or she wants or expects when they want it with no friction regardless of medium, but responsive customer experiences are becoming the table stakes. Increasingly, great brands will not just respond to wants and expectations but proactively respond to needs, perhaps even ones the customers have not yet recognized. Moreover, brands that have more success will fashion customer experiences that do not merely satisfy customers’ desire for immediacy and ease but also convey something deeper about the brand’s customer relationship, mission or culture.

I think of it this way:  You can order a PC or an iPad online. Both will have essentially the same ecommerce screens and the item will arrive in roughly the same timeframe. But from the moment you unbox the device and launch it, the experience is much different. While PCs are closing the gap, the Apple customer experience conveys something more than just “Here’s the hardware you ordered” and instead says something meaningful to customers about beauty, usability and simplicity.

Seth> A working assumption is that the right data and appropriate analyses are keys to design and delivery of great customer experience. What are “the right data” and “appropriate analyses” for you in your work?

Augie> I’m currently focused on how to create great, brand-building experiences for consumers within social media. For me, it is essential to understand how consumers are using social media today, will use them in the future and how the social media experience will impact brand and financial performance. For example, today many brands are offering little to no customer service in social media, but recent studies show that consumers are increasingly turning to social media to get questions and complaints addressed, and how brands handle these inquiries has an impact on consumer loyalty.  It is vital not just to look at what people say today about the social customer experience but to recognize the trends and use analysis and insight to extrapolate into the future.

This idea of extrapolation is key — the data is one thing, but insight and vision based on the data are another. For example, I look back to year 2000, when many retailers were struggling to consider how ecommerce may or may not impact the bottom line. The data at the time demonstrated that few people had an interest in entering their credit card into a web site and that less than 1% of US retail was happening via ecommerce. The conclusion might have been (and was for some retailers) that ecommerce was not an essential part of the consumer experience. Fast forward little more than a decade, and Borders was put out of business while Amazon is approaching Target’s level of retail sales; Borders relied too much on the current data while Amazon continued to recognize the trends and how customer behaviors and expectations would change.

Seth> What are the most important customer insights that can be derived from attitudinal data, from opinions, emotions, and other forms of sentiment?

Augie> In social media today, too many brands are moving cautiously for lack of ROI data. The problem is that social media is not first and foremost a direct marketing channel–it is a relationship one. Plus, in a world of such complex multi-channel interactions, it is difficult to isolate just one channel, strategy or tactic from the others.  (Coke recently reinforced this when its marketing executive took to the Coke blog to defend a report that social media chatter contributed almost nothing to sales lift.) Perhaps more importantly, social customer experiences can deliver vital attitudinal changes even if direct sales/ROI are difficult to measure.

This is why it becomes important to evaluate brand lift in social media and not merely conversions and sales. One of my favorite examples comes from several years ago when I was at Forrester. I researched a P&G social media program called “Let Her Jump,” a movement sponsored by antiperspirant brand Secret to allow female ski jumpers to compete in the 2014 Winter Olympics. The brand did not just count likes and retweets–it used surveys to uncover that the belief that Secret deodorant works better than other deodorants increased 8 points and purchase intent jumped 11% among those who interacted with the social media program. That’s the kind of attitudinal data that matters!

Seth> Text analytics and sentiment analysis enable intelligent customer interactions via issue alerting, root-cause analysis, pattern and trend analysis, and by bringing new sources into the research mix. Have they delivered on their promise?

Augie> There is healthy skepticism among senior leaders that today’s social analytics tools can uncover deep, insightful knowledge. For example, a brand that launches a campaign with happy tweets about pets may see a substantial improvement in the sentiment around the brand, but is this really impacting a positive change in people’s attitude about the brand or does it merely reflect that people like tweets about pets?  Today, I see a great deal of social media strategy driven by a desire for mere engagement rather than by a desire to drive meaningful changes in brand perception. I am not sure that today’s tools are capable enough to tell the difference.

Seth> Augie, thanks for participating in this Q&A session and, in advance, in the up-coming Sentiment Analysis Symposium. To wrap up for now: What’s the next frontier for customer experience — what techniques will be center-stage a year or two from now — or will attention have moved on from CX to something new?

Augie> In the next few years, I expect that corporate use of social media will increasingly shift from marketing to customer service and business. I’ve already mentioned the trends around social customer service, but I also see changes in how social technologies and behaviors will impact products, services and business models. Look at the growth of car sharing (RelayRides), place sharing (Airbnb) and P2P lending (Prosper and Lending Club), and you begin to see that social will be a medium for far more than just marketing and reputation management. Plus, the smart use of social media data to drive one-to-one, real-time relationship building holds great promise. The increase in digital, mobile and social behaviors and tech will drive significant changes in customer expectations around brand experiences.

Reminder: Sentiment Analysis Symposium, May 7-8 in New York

A reminder –Sentiment/Social/Signals

The Sentiment Analysis Symposium takes place next week, May 7-8 2013, in New York. This is a business-focused conference with strong academic/research, practitioner, and industry participation. The focus is technologies that mine emotions, opinions, and attitudes from social, online, and enterprise sources, for analysis in conjunction with other social, transactional, personal, and business-operations data.

The Wednesday, May 8 symposium will be preceded by two optional, half-day sessions on Tuesday, May 7: A Research & Innovation session and a Practical Sentiment Analysis tutorial, the latter taught by Prof. Ronen Feldman of the Hebrew University in Jerusalem.

We offer a 50% discount to academic & government attendees (registration code GOVACAD) and have a special rate for full-time students, $100 for the symposium and $50 for each of the optional sessions (contact sas (at) altaplana.com).

On the agenda for the May 8 main event –

  • Visionaries Augie Ray (financial services), Gary Kazantsev (Bloomberg), and VS Subrahmanian (Univ of Maryland).
  • Tech/research stars including Carol Haney (Toluna), Stuart Shulman (Vision Critical, ex-UMass), Catherine Havasi (MIT Media Lab and Luminoso), and Philip Resnik (Univ of Maryland).
  • Research-business leaders including Stephen Rappaport (Advertising Research Foundation), David Rabjohns (MotiveQuest), Julie Wittes Schlack (Communispace), Joseph Hughes (Accenture), and Hal Bloom (Sage Software).
  • Business user presenters include Han Lai (PayPal), Amelia Burke-Garcia (Westat), and Lindsey Sanford (Bernard Hodes Group).

We have some really cool folks attending too, from CapitalOne, Consumer Reports, Dell, Evolve24, the Federal Reserve, JP Morgan, Thomson Reuters, and other organizations. We’re thrilled to have as sponsors Accenture, Bloomberg, Lexalytics, Gnip, Converseon, Dow Jones, and NetBase. And we even have non-sponsoring technology companies attending including Oracle, SAS, and Verint, Angoss, Basis Technology, and Sysomos.

Aside from the technology and implementation elements, application focal areas include public relations, marketing, market research, customer service/support, financial markets, media, and healthcare.

Please do join us. Visit sentimentsymposium.com to register today.

Clarabridge 6.0 and the Intelligent Customer Experience Journey

Clarabridge launched in early 2006 as a text mining platform, providing “smart solutions to complex business problems… backed by reliable analytics.” The company’s promise was to “unlock the full power of your information assets by transforming unstructured content into rich, structured data.”

Within a few years, however, Clarabridge (wisely) shifted its focus to high-value business applications centering on the then-new field of Customer Experience Management. The transformation was timely; interaction-focused Customer Relationship Management were not providing the insights that businesses needed (and need) in order to boost customer satisfaction and loyalty. And now, seven years after Clarabridge’s start, while smart solutions and reliable analytics are still too-infrequently encountered, Clarabridge continues to impress, including via its recent Intelligent Customer Experience rebranding.

SidBanerjee
Clarabridge CEO Sid Banerjee

It’s not my purpose today to describe Clarabridge’s solution set. You can learn for yourself about the recent Clarabridge 6.0 release — it consists of three components: Analyze, Collaborate, and Engage — via the company’s product page or the Clarabridge 6.0 press announcement. Instead, I thought I’d explore the meaning of Intelligent Customer Experience. What better way to do that than via a Q&A with Clarabridge co-founder and CEO Sid Banerjee? So here goes –

Seth Grimes> Let’s start with a definition. What’s Intelligent Customer Experience?

Sid Banerjee> ICE is the merging of INTELLIGENCE with CUSTOMER EXPERIENCE. Customer experience management solutions have traditionally been associated with transactional platforms — such as workforce management systems that provide feedback with agent interactions, or survey platforms that provide feedback on customer/company transactions, or social media monitoring platforms that provide a means for organizing and responding to social conversations.

Seth> So businesses have all this disparate customer-experience raw material. How do they generate intelligence from it, to make progress toward Intelligent Customer Experience?

Sid> ICE is really about integrating all the content, from all the sources, and applying INTELLIGENCE to the multichannel content. Apply text and sentiment analytics, a mix of descriptive, statistical, machine learning, and ontological approaches to organize, quantify, track, and distill insights from many customers, from many sources, into actionable insights and actions. And ICE is about applying these interdisciplinary algorithms and approaches in a way that produces business friendly, usable output useful to non-technical, customer-centric organizations.

Seth> Functional examples? What’s the business advantage in ICE?

Sid> Examples of ICE include:

  1. Intelligence that can differentiate between incidental shifts in customer feedback, and material issues that are likely to drive down satisfaction loyalty, and profitability.
  2. Intelligence that can catch quickly moving problems and issues when they are small, so they can be addressed before they become big and costly.
  3. Intelligence that can look at all the data, from all customer listening posts across sales, marketing support, and social channels, and determine who within an organization needs to be notified, proactively across an enterprise so that operational decisions can be made to fix an issue, or change a program.
  4. Intelligence that can scalably and accurately triage a wide range of customer feedback sources to differentiate between useful and non-useful content, analytically useful content to be monitored, and tactically immediate feedback that requires direct response to a customer.

Seth> And what sort of business problems does ICE respond to?

Sid> ICE lets organizations address important questions such as:

  • What are the material changes in my customers’ experiences adversely affecting efficiency, loyalty, or profitability?
  • Which customers require immediate response, regardless of where they are and how they are expressing themselves?
  • How can I ensure that strategic and operational insights from my customers are delivered to the right people, across an organization, in an actionable format, to allow the a timely response and resolution?

Seth> Sid, thanks for participating in this Q&A. So in sum, Intelligent Customer Experience is…

Sid> For a shorter answer, how about this: ICE is the intelligent merging of content from all platforms containing customer interactions and insights, the application of advanced text/sentiment, statistical, machine learning, and alerting algorithms against the multichannel content, and the delivery of business critical insights in the right form factor, to the right people to ensure insight and issue resolution.

Google Really Sucks At Speech Transcription (“uterus publication i saw a t-shirt uh…”)

I was just now checking out a video recording that Lenny Murphy, founding partner at Gen2 Advisors and the GreenBook blog‘s head honcho (that is, I don’t know what his title there is) made of a chat he and I had about social media, market research, and sentiment analysis technologies. (GreenBook is a media sponsor of my up-coming Sentiment Analysis Symposium, May 7-8, 2013 in New York.) I was checking out the recording on YouTube, of course bought by Google back in 2006, and…

I tried out the Transcript function. What a laugh. See for yourself:

YouTube video, “Interview with Social Media/Text Analytics Guru Seth Grimes”

And the YouTube/Google-generated transcript:

0:01 good morning everybody this is larry murphy
0:04 with agreement blog in joining me today insisting that the thing that surprised
0:08 gruesome than l suspects analytics and uh… we’re going to talk a little bit
0:13 belt
0:13 several suspects politics as well as
0:16 assess upcoming upcoming event bsn analysis of presumed which i think will
0:21 be intense interest everybody here in iraq works saw setup self
0:25 mata hari
0:27 and i’m glad that i do other famously berkeley had side if you do have this
0:31 conversation with me
0:32 captured in overhead srinagar from our our or fairness issue wouldn’t be able
0:37 to uh… only experienced sat
0:39 such not support the defense in your own little world however faded as better
0:44 uterus publication i saw a t-shirt uh…
0:48 organize rule
0:50 you know allegedly myo mine and i thought i really needed that feature for
0:54 myself but uh… and classic beyond that it’s not that i think you got the big
0:59 get that
1:00 but apartment

That’s one minute’s worth. There’s more, another 20+ minutes of witty repartee between Lenny and me, egregiously mistranscribed. You can produce the transcript for yourself by going to the non-embedded video and clicking the Transcript button Transcript.

Don’t get me wrong: I hugely appreciate all the amazing things Google does for us for free. (Yeah, yeah, I know: “If you don’t pay for a service, you’re the product, not the customer.”) You can even hear me praise Google in my chat with Lenny. But I have to say, it’s a relief that there are things Google isn’t great at. I just hope that whoever implemented YouTube speech transcription isn’t working on Google’s self-driving car project.

Big Data Analytics: Facts and Feelings

Old-school BI and analytics are about crunching numbers, and only about numbers. The old school produces indicator values, abstracted away from context, nonetheless applied to justify context-sensitive decisions. Models tend to be opaque, with little explanatory power. Root causes are teased out only via a laborious slice-and-dice search for patterns that correlate, but do not definitively link, measurements and business outcomes. And for all our talk about “closed loop” decision-making, the old school over-relies on dashboards and visualizations that illustrate issues, without explanation or any action recommendations.

While old-school approaches still dominate, contrast their limitations with the possibilities afforded by modern Big Data analytics. With Big Data, we embrace the variety of database-captured facts, streamed machine/sensor/device data, and media-sourced feelings. Big Data velocity is about data still in context, still situationally relevant, analyzed in-flight to enable us to react appropriately to conditions as they unfold. And volume is especially evident when you consider the amount of media (text, speech, images, and video) being produced and consumed online and on-social.

As we unify (even if still too-infrequently) the elements of this complex mix of numbers and media-derived information, we gain a more complete, timely, and sensitive view of our customers, prospects, and markets. We create descriptive and predictive models that encompass behaviors mined from clickstreams and geographic tracking linked to actions. Models (can) extend to psychological and demographic profiles. They weigh transactions and interactions.

In this rich data mix, the ability to discern and exploit feelings – mood, attitudes, opinions, and emotion – has become more critical than ever. It is through automated sentiment analysis that this ability scales. Until recent years, you had to have trained analysts read each comment, review, e-mail message, and article you sought to understand for business purposes. Nowadays, trained software will do the job, handling the variety, velocity, and volume of Big Data sources with the analytical uniformity that only automation can deliver.

Automation here covers both analytical algorithms and human analyses, the latter via crowd-sourcing where the machine parcels out tasks to raters and performs verification tasks to ensure reliability. But most potential users – in market research and marketing, financial services and capital markets, customer service and customer experience, clinical medicine, and the spectrum of research disciplines – do associate the term “sentiment analysis” with machine analyses.

Some methods apply machine learning, typically but not always supervised (that is, starting from a training set), sometimes utilizing active learning to incorporate human feedback that can boost model accuracy. Other methods use linguistic artifacts and techniques – lexicons, linguistic rules, taxonomies, word nets, and deep parsing – sometimes generated by analysts, sometimes enhanced via machine learning.

The aim is to transform text and other media into data sources, to extract sentiment along with the entities (names of people, places, companies, etc.) and topics, themes, and concepts that extracted mood, opinions, and emotion applies to. These forms of sentiment explain the why behind the what we derive raw numbers. They illuminate the root causes behind transactional and behavioral patterns. The goal is to add feelings to facts as fuel for a new generation of Big Data Analytics.

Beyond Crap: Idiocy from Stephen Arnold

The following is a comment I posted to Stephen Arnold’s Beyond Search blog, in response to a posting by his employee Cynthia Murrell titled Big Claims of Analytics Progress. (A footer reads “Written by Stephen E. Arnold.”) Do read Murrell’s manure, if only to understand how not to blog. Thankfully the article is short. Further, it’s so laughable as sacrifice any credibility. Still, I had a hard time leaving it unanswered. Here’s what I posted:

Your debunking is ill-founded, even idiotic, consisting of precisely and only two nonsensical points:

1) “But. . . isn’t that what they all say?” Your logic is that because (Ventana Research is reporting that) Clarabridge is claiming to do what others claim to do, then Clarabridge must not be doing it.

2) Because Clarabridge, according to your analysis of Ventana’s report, hasn’t delivered “the next real breakthrough,” what they have delivered is unworthy.

In support, you offer nothing but unsubstantiated assertions that offer no support for your attempt to impugn a respected vendor and a respected industry analyst firm, and further you suggest some form of overly-cosy relationship — “these outfits and their cheerleaders” — that does not exist so far as I know.

Stephen Arnold, would you please state whether you support what your employee Cynthia Murrell has posted? If you do, please fill in the huge gaps in order to bring some semblance of substance to this piece. If you do not, please explain why you allow crap like this to be published under your name.

This isn’t the first instance of idiocy from Arnold (not that I read his blog regularly: I don’t). I covered a similar scurrilous attack on a vendor in early 2012, in my Stephen Arnold Blows a Gasket. Unfortunately, I doubt Stephen Arnold is done spewing crap.

The Rise and Stall of Social Media Listening

“Listen First!” is sound advice, a social-media (and enterprise feedback) analogue of “look before you leap.” It’s advice, however, that doesn’t address what comes next. Say you’ve put a listening program in place. How do you advance your use of social/customer insights distilled from Voice of the Customer and other sources? Unfortunately, “companies really don’t know what to do,” according to Stephen Rappaport, who published a book with the “Listen First!” title in 2011, when, Rappaport says, listening was at its peak. Since then?

We have experienced the rise and stall of social-media listening.

From Monitoring to Listening

Listening builds on social-media monitoring and on traditional methods such as surveys. By contrast with surveys, which we carefully design and target, social media conversations and participation are unbounded. So monitoring starts with mentions. To be useful, you must disambiguate and discern the interesting (to you) elements in social chatter. (You get a taste for the disambiguation challenge in the title of a talk, “Smoking… Cigarettes, Weed, Hot Girls & BBQ,” that Stuart Shulman from Vision Critical is slated to deliver at the May 8 Sentiment Analysis Symposium. Whether beef brisket is best smoked with hickory or oak isn’t germane if you’re studying lung cancer.)

Monitoring has delivered clear benefits in areas such as customer service (a.k.a. engagement), reputation management, and crisis early warning. Add in analysis that aggregates disparate voices, discovers patterns, and maps trends, and you have the building blocks of a listening solution, typically delivered via a dashboard interface. (Even better: apply text analytics to uncover root causes of identified issues.)

Listening is a research technique, but programs to date — based on the seemingly obvious notion that marketing, product management, and customer-support programs should respond to actual customer and market voices — have delivered limited benefit. We monitor and survey, then we analyze and report. Typical activity, influence, and engagement measures have proven inadequate predictors of business-relevant outcomes; so much of the ”social intelligence” available is a poor guide to effective action. Those ubiquitous dashboards don’t help. They describe but don’t guide. We are left with a decision gap.

Listening Next Steps

Listening is a given; support for sensible action the goal. I gleaned five intertwined, research-oriented steps, intended to help you get advance your listening efforts, from a series of conversations over the last month:

  1. Get the right data for a complete picture.
  2. Learn the challenges and not just the software.
  3. Understand customer dimensions.
  4. Rethink your analyses.
  5. Create a framework for analysis and action.

I will elaborate.

A bit of (misguided) management wisdom says that “you can’t manage what you don’t measure,” which has an unfortunate corollary. “The assumption is that what’s measured is meaningful,” says Stephen Rappaport, who is knowledge solutions director at the Advertising Research Foundation (ARF). “That’s not always the case. So many measures are just useless. They relate more to the business model [baked into software] than to reality.” Rappaport elaborates, “People are trained in using software. They’re not really trained in listening.”

Attensity CEO Kirsten Bay echoed this concern when she told me that part of her company’s role is to leverage its broad experience to “teach customers how to make decisions.” Bay says that one of her company’s goals is to “create the intersection” of data and action, which Attensity accomplishes via an analytics platform with rich workflow management capabilities.

So you have to select the right measures and design analyses that link data to desired outcomes.

“Measurement must be very specific, by client,” according to Nan Dawkins, founder and CEO of SocialSnap, a social-media analytics start-up. Deep domain knowledge helps.

David Rabjohns, CEO of MotiveQuest, which specializes in strategic social market research, says his company struggled to understand which metrics matter. MotiveQuest identified that “advocacy correlates with sales and share.” That is, it’s not enough to identify someone as an influencer. The message matters. I recently visited MotiveQuest, and Rabjohns and his colleague Kirsten Recknagel ran a number of case studies by me. One telecomm example was quite interesting, where the key was to understand “how 12 [distinct] categories of people talk about great customer service.”

So add people-understanding to the mix. Rabjohns and Recknagel explained that MotiveQuest’s approach seeks to distinguish rational, emotional, and social responses, that “each matters in a different way for different [product and consumer] categories.”

I heard a similar message from Becky Wang, head of analytical strategy at agency Droga5. Wang says, regarding ability to predict, “I can do all the social listening I want, but unless I have a psychographic profile or demographic information that goes beyond gender and age, I’m really limited.” But even with the right data, Wang asks, “How do I actually tie social and digital metrics to a purchase? How do I know that I’ve moved the needle?”

Again citing Stephen Rappaport: “Emotion is important.” Rappaport described to me a study conducted by social media agency Converseon for the ARF, on the role of digital and mobile and the emotional journey that people go through when they’re shopping. According to Rappaport, “rises and falls in emotion are opportunities for brands to intervene.” In particular, at the final point pre-purchase, “in certain [brand] categories emotions are very polar. In other cases, there’s more of a range.” Individual brands should leverage their “emotional profiles,” Rappaport concludes.

(Disclosure: Converseon is a Sentiment Analysis Symposium sponsor, and I recruited Stephen Rappaport to moderate a symposium panel that will include MotiveQuest CEO David Rabjohns.)

The Listening Journey

Sentiment analysis helps you quantify the emotional journey, via analytical approaches that include automated natural language processing (NLP), crowd-sourcing, and expert evaluation. The aim is to discern and aggregate attitudes, mood, and emotion in the array of available information sources, not in isolation but instead linked to other, appropriate measures, to psychographic profiles, to behaviors and social contexts and, ultimately to outcomes.

This ensemble points to a new approach to listening. Treat listening as a process, carried out within a business-domain-appropriate framework of action-aligned measures, data linkages, and analyses, designed to guide you from data to outcome. Customer-brand interactions involve a journey, and your listening program must as well.

All About Natural Language Processing

Natural Language Processing is the machine handling of written and spoken human communications. Methods draw on linguistics and statistics, coupled with machine learning, to model language in the service of automation.

What Good Is NLP for Business?

There are myriad applications. Every business process (or personal need) that involves speech or text — with volume, velocity, or complexity sufficient to push you to seek automated assistance — can benefit from Natural Language Processing. Let’s review, systematically, what NLP can do for you. Here are 22 facets, with examples that illustrate both implementations and R&D initiatives. Let’s start with computing’s second-oldest application, search, and then explore NLP uses from everyday to analytical to unusual.

Information Extraction and Search

If all the world’s information were neatly binned in database fields, we wouldn’t need search. Information retrieval would be nothing more than queries. But instead, notionally, 80 percent of business-relevant information originates in unstructured form, primarily text. The vast majority of that text is “natural language” (as opposed to formal language, found for instance in a computer programming or algebraic equation). Google and Bing and other search systems use NLP to extract terms from text (#1) to populate their indexes and to parse search queries (#2). Those terms may include “named entities” such as people, companies, brands, ticker symbols, and places. Other features of interest may include dates, addresses, URLs, and the like; NLP will automate extraction of pattern-identified information (#3) and extraction of attributes associated with terms (#4) whether factual or subjective: expensive watch, black car, 4.6 kg fish.

The more advanced engines apply NLP to identify relationships (#5) (“this is a that”) in order to build their knowledge graphs. NLP feeds the computational knowledge engines behind Apple Siri , Wolfram Alpha, and Google Now as well as resources for your own lexical analyses such as Lexaltyics’ Concept Matrix, built via NLP application to the Wikipedia dataset to identify “concept topics” and “facets” as well as associated sentiment. According to Lexalytics CEO Jeff Catlin, “these features allow users to easily build classifiers for very broad topics as well as roll-up opinions into buckets of similarity.” Pingar’s Taxonomy Generator is another take on the same idea: Use NLP methods to build a knowledge structure for later application to search, classification, and other business information-management needs.

Concepts, Topics, Sentiment, and Similarity, Plus Notes on Methods

“Buckets of similarity”: Those would be categories determined by an analyst or via statistical clustering. Classification is the act of placing cases into categories  based on attributes or into clusters based on best fit. Classification (#6) is part of the NLP task, whether it involves grouping terms or documents. One variety of term grouping involves creating conceptual classes, for instance “vehicle manufacturers” from Fiat, Ford, General Motors, Nissan, Toyota, etc. Another variety involves coreference — multiple ways of referring to a given thing; to illustrate, “Barack H. Obama is the 44th President of the United States. His story is the American story… President Obama was born in Hawaii” refers to a given person in four underlined ways, one of them via a pronoun (“his”) that refers to that person only in context.

Want to see real-world entity extraction and coreference? Try Language Computer Corporation’s Cicero system demo. Process the Web page where I found the above lines, http://www.whitehouse.gov/administration/president-obama. Click on one of the “he” or “his” occurrences in the marked-up text and you’ll see that these pronouns have been correctly resolved to “President Obama.”

I suppose I’ll grant numbers to concept extraction (#7) and to topic extraction (#8), that is, to information extraction (per the previous subsection) that involves abstraction. Sentiment is also abstract, although sentiment analysis (#9) can be characterized (in a very simplistic way) as simply another classification problem, whether involving the usual positive/negative/neutral categories, more nuanced emotion categories (e.g., angry, happy, sad), or intent signals (e.g., to buy, sell, renew, cancel). Visit the Web site of text-analytics mavens Daedalus for an online sentiment classification demo. The Nerily online demo will extract a variety of other text features.

Sentiment analysis and opinion mining are central topics for me. I’ve written a lot about them, and I organize a twice-yearly conference, the Sentiment Analysis Symposium, next up May 8, 2013 in New York, preceded on May 7 by an optional, half-day Research & Innovation session and an optional, half-day Practical Sentiment Analysis tutorial. A disclosure: Lexalytics, cited above and later in this article, is a sponsor.

All of this information extraction is what makes NLP a key asset for text analytics, which models and structures the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. (That’s a definition I wrote back in 2007, in a TechWeb article, that made its way to Wikipedia.)

I’ll digress to explain that you can automate human handling of many Natural Language Processing tasks, via a crowdsourcing using CrowdFlower for “human-powered sentiment analysis” and other systems built on platforms such as Amazon Mechanical Turk. Also, you can also extract sentiment and other information by analyzing non-textual sources that range from transaction records to images and speech.

We’ll get back to speech bit later. For now, I’ll cite one last function related to classification and similarity, and then let’s change tacks. That last for-now function is plagiarism detection (#10), essentially passage-similarity evaluation across retrieved text, as explained on the PAN-13 conference site, with a bit of data and source code to get the Python programmers among you started. (PAN = Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection. I guess PAAINDD is kind of awkward as a an acronym.)

Spelling, Grammar, and Style

Want to right gud? Lucky for you: NLP is built into your favorite word-processing software. Spell check (#11) is NLP at its most basic. Spell check will flag a word that’s not in the dictionary and maybe suggest corrections. If you have ever written a document with Microsoft Word (or OpenOffice, Google Docs or any of countless other authoring environments), you’ve seen a spelling checker. But spell check won’t identify the two errors in “I went their at tree o’clock.” Try that sentence in JSpell or at SpellCheck.net as proof. For syntactic errors, you need a grammar checker. And how does a machine check grammar?

A linguistic approach to grammar checking might involve resolving parts of speech, via sentence diagramming (#12)illustrated here, part-of-speech tagging (#13), as seen in a Univ. of Illinois demo system, or via study of syntactic relations (#14), à la this Connexor demo. (I’m a bit behind myself, actually. Syntactic parsing is one method of discerning relationships among entities, my #5, above.)

What do some of the tools out there think of my writing? I pasted the three-sentence paragraph above into one. It found “3 critical writing issues” — two alleged spelling errors and an accusation of wordiness — and said my writing is “weak, needs revision.” (Free access doesn’t provide detail so I’ll withhold the tool’s name.) Try some others: LanguageTool open source proofreading software (which I didn’t find particularly useful, but you might) and Stilus from my friends at Daedalus.

Two more varieties of stylistic analysis to mention: Lymbix analyzes e-mail sentiment via the ToneCheck tool, and automated social-comment moderation is another interesting application, although I’ve been unable to identify an independent provider comparable to Adaptive Semantics, which the Huffington Post bought back in 2010.

Summarization and Translation

Text summarization (#15) is the first of several NLP functions I’ll cite that involve both natural-language understanding (covered in #1 through #14) and natural-language generation. A summarizer has to understand the source text sufficiently to generate a shortened version that is faithful to the content and purpose of the original. Abstracting is a related function

Visionary researcher Hans Peter Luhn described an approach to automatic text abstracting in has April, 1958 IBM Journal paper, The Automatic Creation of Literature Abstracts: “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the ‘auto-abstract’.”

Developer Andreas Gohr, at his SplitBrains.org site, provides a Web interface to Nadav Rotem‘s open source Open Text Summarizer code. Try it!

Machine translation (#16) is a wonderful NLP application. It doesn’t require explanation; I’ll just point you to Google translate where you can try it yourself. Note the automatic language identification (#17) feature.

Translation involves more than just rendering words from one language to another. Each language has its own syntax and idioms. A translator, whether a human or a machine, needs to make sense of the text provided and to make sense in the destination language. That is, like summarization, machine translation involves natural-language generation. So does the next example.

Question Answering

IBM Watson is the most prominent example of question answering (#17) at work: Information retrieval that produces usable guidance — situationally-relevant facts, in a form that reflects question context — to respond to a query. When Watson played Jeopardy, it formulated responses as questions; very different from how it will respond to medical-diagnostic challenges. I’ll point you to an academic illustration, START from Boris Katz and associates at MIT, and also refer you to the EasyAsk and Inbenta Web sites for explanations how Q-A can work in general business contexts.

Speech Recognition

Let’s recognize that speech is natural language too, and cite speech recognition (#18) and speech generation or synthesis (#19) as two more NLP functions.

Speech is more than just spoken text. It conveys genre, sentiment, mood, and emotion, detectable from word and sentence inflection (an interrogatory sentence — a question — is inflected up at the end) and from changes in speech volume and rapidity and other indicators. Check out a recent IEEE Spectrum podcast, Teaching Computers to Hear Emotions, an interview with University of Rochester Professor Wendi Heinzelman, and you’ll hear what I mean.

You don’t have to render the spoken word as text in order to make analytical use of it, also speech transcription (#20) certainly counts as an NLP function. Plenty of academic and industrial work has been done on phonological analysis, which examines sounds and sound patterns, and there are industrial systems that perform voice search (#21) on phonemes and patterns. If you’d like to see phonetic transcription in action, check out Daedalus’s online demo.

On the flip side, text-to-speech (#22) — having the machine read to you with properly accented pronunciation, inflection, pacing, etc. — is another bit of NLP. Ivona, recently acquired by Amazon — the software is already used on the Kindle Fire — has a cool online demo that will read for you in a wide variety of languages and accents.

Building Blocks

Finally, a note on tools, on the bits and pieces of code you can apply to hobble together your own solution, and on learning more. A disclaimer however: I didn’t intend, in this article, to systematically catalog available software and services, open source or other.

Cognitive linguist Christopher Phipps observes, in his Lousy Linguist blog, ”luckily, the NLP field has matured into an open access friendly crowd, so there are lots of resources freely available.” Phipps focuses on text understanding, and for that, there’s no better catalog than Stanford University NLP’s page, “Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources,” although as Phipps cautions, it’s not for newbies. I list a number of open source tools in a last-year blog article, What are the most powerful open-source sentiment-analysis tools? Two I didn’t include there, because they’re not optimized for sentiment (that article’s topic), are Apache OpenNLP. and the Mallet machine-learning toolkit.

You also have at your disposal a plethora of service offerings that implement NLP, invokable via online APIs, most with free for either trials or limited use. Off the top of my head, there are: AlchemyAPI, ApiculturBitextClarabridge, ConveyAPIOpenAmplify, Pingar, Saplo, Semantria (backed by the Lexalytics Salience engine), and ViralheatMashape lists many more. Capabilities, quality, and cost vary widely. Some do only entity or sentiment tagging while others do more-elemental text analysis. The Apicultur service and Jacob Perkins’ Web API for Python NLTK, at text-processing.com, are examples of the latter. I’ll withhold detail and judgments but maybe write them out in an article at some point, and I’m also not going to write now about install-yourself software options, which aren’t as easy to simply try as a Web API.

As for learning more, other than via do-it-yourself, what better way than through an online course? Coursera has one going, taught by Michael Collins of Columbia University, and the videos and lecture materials from Christopher Manning’s popular Stanford University course are available online. A third option is the Statistics.com course taught by Dr. Nitin Indurkhya, planned for a July 19 start.

Now That You Know

In business contexts, you’re most often going to apply NLP in conjunction with collection, integration, and analysis of disparate forms of online, social, and enterprise data. All that text-and speech-extractable goodness I’ve been discussing: In today’s world of heterogeneous big data, it doesn’t stand on its own. This statement is true for business analytics – you get lift by applying and integrating an appropriate variety of methods and data — and it’s also true for activities that seemingly don’t involve non-textual or non-speech sources, for activities such as Web search. Even in those latter cases, smart, sense-making engines take into account your profile, location, past online/on-social activities, and social connections, in conjunction with NLP, to provide the best situationally-relevant results.

Natural-language processing can do many things for you. It’s an essential tool for leading-edge analytics. Understanding is just the start.

Seth Grimes is an analytics strategy consultant with Alta Plana and organizes the Sentiment Analysis Symposium, May 8, 2013 in New York. Follow him on Twitter at @sethgrimes.

Text Analytics in 2013

In last year’s Text Analytics in 2012, I invited you to check with me again in a year to see how 2012 panned out. Thirteen months later, I can tell you now, that 2012 was a good year, for text analytics adopters and solution providers alike.

But I’m not going to cover business users in this article — I’ll save them for a planned 2013 iteration of the market study, Text/Content Analytics: User Perspectives on Solutions and Providers, that I conducted in 2011 and 2009. Instead, I’ll cover the solution side — technical and financial — and also the vibrant conference scene and a few ways you can learn more about today’s text-analytics world.

Text Technology Developments

I asked a number of solution-provider thought leaders, from Bitext, Clarabridge, Luminoso, Pingar, SAP, and SAS — small and large companies, all but start-up Luminoso with an international presence — about 2012 significant text-analytics development and what we should we look for in 2013. Let’s start big…

One SAP 2012 accomplishment was “pushing down Natural Language Processing (NLP) as map-reduced batch jobs into Hadoop clusters for economical parallel processing and native text analysis within an in-memory database platform,” Anthony Waite, SAP senior text analysis product manager, told me. On the NLP front, Anthony mentioned specifically named entity recognition (names of persons, companies, places, etc.) and fact extraction, the identification of relationships among entities. A 2013 aim is even faster NLP and categorization and a deepening of entity extraction across a broader set of languages for greatly improve performance for big unstructured text data in real-time. SAP will “leverage text analysis semantic markup to add a new dimension to text mining capabilities within HANA,” SAP’s in-memory data management and analysis system, Anthony said.

According to Kathy Lange, a leader in SAS‘s business-analytics practice, “in 2012, SAS put a great deal of effort into analysis of ‘big data’” via in-memory High Performance Text Mining, and also released an active-learning capability — machine learning complemented by human domain expertise — and  new native-language support, for Farsi, Hindi, and Ukrainian. And for 2013, ”SAS intends to increase emphasis on the use of text data to improve predictive analysis.” SAS will continue to promote visual analytics and extension of text analytics for business needs in fraud and warranty analysis, healthcare, customer insight, and other applications.

Luminoso is at the other end of the size spectrum, a start-up founded by Catherine Havasi, a research scientist in artificial intelligence and computational linguistics at the MIT Media Lab. Catherine says the big 2012 development was the search for depth. “Brands are starting to move beyond simply wondering whether their social media traffic is positive or negative and into figuring out what their customers are expressing across all kinds of channels and how they can learn from it.” In 2013, look for more languages. Luminoso’s take agrees with SAP’s and SAS’s, and Catherine adds, “we might start to see genuine solutions for analyzing text in multiple languages that the relevant analysts don’t necessarily speak. We all know translation is a poor answer and better ones are on the way.” (Catherine will be speaking on Multi and Cross-lingual, Concept-based Sentiment Analysis at the May 8 Sentiment Analysis Symposium in New York.) And Luminoso’s big goal for 2013? “We’re expecting to make the analytics process easier, clearer, faster, and bigger, with everything from data integrations and new visualizations to automated insight generation.”

Two other respondents, Alyona Medleyan from Pingar and Antonio Valderrábanos, CEO of Spain-based Bitext, cited more languages as a 2013 direction, making five out of six. I’m glad that the market has started to recognize the limitations of polarity-based sentiment analysis, that is, imposing positive/negative/neutral pigeonholes, so I’m glad that Antonio offers as a 2013 prediction, “Emotion analysis becomes more prevalent; the backlash against sentiment slows down as better quality solutions become more mainstream; and niche applications of text analytics e.g. ‘buying intent detection,’ become more standardized.” These latter comments echo Catherine Havasi’s.

Anthony Waite of SAP wasn’t my only respondent who mentioned “real time.” Sid Banerjee, CEO of customer experience + text analytics leader Clarabridge, says that “streaming analytics, big data, and real time capabilities were among the most significant text analytics technology developments” in 2012, that “text analytics developed into holistic and intelligent enterprise-wide solutions that allow businesses to operationalize and integrate customer feedback insights into business processes.” Clarabridge Collaborate and Clarabridge Engage were significant contributors according to Sid, allowing for “direct collaboration and communication in real time between internal business stakeholders, as well as directly between companies and their customers,” complemented by analysis capabilities that allow users to “compare products, competitors, brands, regions, stores, timeframes, or any other segmentation of data” in support of more informed business decisions.

Alyona Medelyan, Pingar‘s chief research officer, cited performance improvements and new capabilities as on-going Pingar focus points along with “packaging this technology in a way that any lay person can use it and has easy-to-understand tools for customizing text analytics methods to suit their needs.” Alyona also said that she found it significant that vendors (including Pingar) are working on “generating custom taxonomies from documents,” to improve navigation of document sets, facetted search, metadata extraction, and applications such as sentiment analysis. If you’d like an explanation how all this would work, you might check out the slides from a presentation by Alyona’s Pingar colleague Anna Divoli, How Taxonomies and Facets Bring End Users Closer to Big Data.

I’ll relay one last point, raised by Bitext CEO Antonio Valderrábanos. Antonio cited as one of 2012′s most significant text-analytics development, ”Emergence of marketplaces for connecting text analytics to business data e.g. QlikMarket launch, expansion of Salesforce Insights ecosystem. In other words, movement towards different providers for text analytics, as opposed to one single partner.” I couldn’t agree more, about the existence and significance of this trend.

Market Results

Solution providers results were mixed in 2012.

Janya, which focused on government markets, went out of business, as did Evri, an semantics-powered news portal and of course HP’s 2011 Autonomy acquisition turned sour with a welter of 2012 accusations of improper accounting and a should-have-known realization that the semanticized IDOL platform isn’t the be-all-and-end-all it was imagined to be.

On the plus side, a number of market leaders announced excellent 2012 results. Attensity increased annual revenue by over 30 percent compared to 2011 (and also appointed a new, sales-focused CEO, Kirsten Bay). Clarabridge announced 60 percent sales growth in 2011. Lexalytics’ CEO Jeff Catlin says 2012 was a bit slow in revenue growth but ended incredibly strong, about 20% overall — nothing to sneeze at — and “2013 looks amazing, guessing about 50% given the numbers we’re seeing already.” And according to SAS Media Relations rep Steve Polilli, his company’s saw 10% revenue growth in the search and discovery software category, which covers text analytics, sentiment analysis, categorization, and ontology, outpacing SAS’s 5.4% 2012 overall revenue growth. Not bad growth for a company with $2.87 billion in revenue (in 2012).

2012 acquisitions were moderate in scale. Survey-research platform vendor Vision Critical bought Texifter’s DiscoverText text-analytics technology, and Eptica, “a global provider of multichannel customer interaction software” (a.k.a. CRM), bought French text-analytics provider Lingway. Contrast with customer-experience leader Medallia, which has made a very significant investment in enriching its own text-analytics technology.

These developments affirm a something Clarabridge CEO Sid Banerjee said. According to Sid, in 2012 “the customer experience and voice of the customer market became truly obsessed with analytics. Traditional vendors, such as CRM, social, or workflow vendors, all realized the need to integrate some form of text analytics into their solutions.”

One more acquisition: Lexmark acquired ISYS Search; they’ve renamed the offering Perceptive Search. Except for a few social-analytics acquisitions, those are all that come to mind.

We have also seen continual emergence of new solution providers. In the last year or two, I’ve become aware of Converseon (ConveyAPI), Decooda, Etuma, Fido LabsGavagai, KanjoyaLuminoso, MeshLabsMetavana, PolecatThey SayThrive Metrics, and Content Savvy, which inherited Janya’s intellectual property. Of course, I don’t mean to slight the many established text-analytics companies that continue to thrive; I just don’t have the energy to list them all.

Conferences, Reports, and Community

Let’s finish with opportunities to learn more, with conferences and reports.

I’ll be involved with the Text Analytics Summit once again. I’ve chaired every Boston summit since the series founding in 2005; this year’s is slated for June 5-6. Folks with more of a stats bent should check out a rival conference, Text Analytics World, to be held April 17-19 in association with the Predictive Analytics World conference.

If your interest is more application focused, check out a conference I organize, the Sentiment Analysis Symposium, May 8 in New York, preceded by a May 7 Research & Innovation session and a Practical Sentiment Analysis tutorial, this time taught by text-analytics pioneer Prof. Ronen Feldman. For folks with a research bent, there’s the 7th go-around of the International Conference on Weblogs and Social Media (ICWSM), July 8-10 in Boston, and if you’re in or can swing a trip to Europe, check out LT-Innovate, covering the broader set of language technologies, planned for June 26-27 in Brussels.

I really like (certain) vendor conferences. I’ll likely once again attend Clarabridge Customer Connections (C3), April 17-19 in San Diego, and the Lexalytics’ user-group conference on May 9 in New York, the day after the Sentiment Analysis Symposium, although I don’t have enough of a business case to justify a trip to the 2013 SAS Global Forum, April 28-May 1 in San Francisco.

And reports? The big news is that Gartner seems finally to be devoting attention to text analytics. It used to be that the only folks covering the field, other than myself, were Sue Feldman and colleagues at IDC, Leslie Owens at Forrester, and Fern Halper at Hurwitz and Associates. Check out Gartner’s Who’s Who in Text Analytics… although be careful using it. It applies a cookie-cutter approach to covering widely disparate set of products and a number of the capability appraisals are incorrect. For instance, Gartner’s write-up is simply wrong about coverage of “multiple languages” by AlchemyAPI, Expert System, Lexalytics, and Pingar. Gartner includes Salesforce.com as a text analytics company even though Salesforce and Verint, compelling as they is, do not have their own text analytics technology, and certain other inclusions smack of pay-for-play.

Judging from samples, I’m guessing that you will find Hurwitz and Associates’ Victory Index for Text Analytics more useful. I didn’t buy a copy, but I did read a couple of reviews that are freely available, of products from Provalis Research and SAS.

And I’m seeing an increasing volume of interesting stuff posted to the social Web. Check out Chris Phipps blog, The Lousy Linguist. SAS folks post good stuff at their company’s Text Frontier blog, and of course I’ll continue to cover text analytics and sentiment analysis in my Breakthrough Analysis blog. Also check out the Text Analytics group on LinkedIn, now well over 10,000 in membership.

In sum, 2012 was a good year for text analytics, and a couple of months in, 2013 is shaping up nicely as well.