I included a set of tool/solution selection criteria in a recent conference presentation, Text and Sentiment Analysis for Research and Insights. An attendee asked whether I was publishing my recommendations. Definitely. That’s this article: a list of 12 criteria for choosing a text/social analytics provider, with an added explanation of each point.
The conference, Analytics with Purpose, was organized by the American Marketing Association, but the criteria apply across a wide set of text and social media analytics applications, not just in marketing, market research, and consumer insights, but also in customer experience, life sciences, media and publishing, finance, and so on. You have dozens of options, open source and commercial, ranging from Web service APIs and code libraries to business solutions that integrate analysis into a business workflow. Make a careful, informed choice.
Some preliminary advice: Work back from your business goals. Determine what sorts of indicators, insights, and guidance you’ll need. No business is going to need 98.7% sentiment analysis accuracy in 48 languages across a dozen different business domains. Be reasonable; stay away from over-detailed requirements checklists that rate options based on capabilities you’ll never use. Create search criteria that separate the essentials from the nice-to-haves and leave off the don’t-needs. Then design an evaluation that suits your situation – include proof-of-concept prototyping, if possible – to confirm whether each short-list option can transform data relevant to your business into the outputs you need, with the performance characteristics and at a cost you expect.
That advice out of the way, here are –
12 Criteria for Choosing a Text/Social Analytics Provider
- Industry &/ business function adaptation.
We seek solutions built around the “frame semantics” notion that words may have different senses in different domains and from different points of view. “Thin” is a favorite example: Thin is good for a mobile phone, while in a hotel, thin walls mean a noisy room and thin, describing sheets, is associated with worn rather than warm. “Responsive” means very different things in e-discovery and in customer service.
Customizability, whether by you or only by the provider, to ensure that your analyses are true to life.
Domain adaptation may not be enough, if your company, customers, and prospects use distinctive language around product and features. Coke Life and Pepsi True are products. If a product you’re evaluating hasn’t been adapted for soft drinks, it may miss or misclassify social mentions of life and true, which after all are common words. Can you build/modify the rules, taxonomies, training sets, and other artifacts that drive marketing analyses in that category, to capture the way people talk about these brands? If not, you’ll be missing insights.
- Data source suitability, e.g., Twitter vs reviews vs chat vs reports.
The best algorithms for extracting information from 140-character tweets, from long-form Yelp or TripAdvisor reviews, from FlyerTalk discussion threads, and from e-mail or chat exchanges will differ. Does the tool you’re considering handle well a data source that’s important to you?
- Languages supported.
Some tools handle only a single language, typically English. Others claim to handle dozens, but you may find that some languages are handled much more carefully than others. Your provider may translate material from less-common languages into English for processing, in which case idiom and culture-related nuance may be lost. And even when non-English material is handled natively, it may be handled with less refinement. Ensure that the languages you need are supported, adequately for your needs.
- Analysis functions provided.
I’m thinking here both about information extraction – for instance, lots of software will resolve entities or topics but not necessarily both, and they’ll score sentiment, but not necessarily at a topic, entity, or attribute level – and about analytical functions such as clustering, regression (for trending), or link or path analysis.
- Interfaces, outputs & usability.
A nice graphical interface is… nice, but does the candidate tool’s GUI match your work practices? If you need to automate a frequently repeated process, are there scripting possibilities? Or if your developers will be plugging an external Web service into your own software via an application programming interface (API), is there a software development kit (SDK) that fits your coding tools (e.g., Python, Java, C++, Ruby) or are you forced into a generic RESTful interface?
- Accuracy: precision, recall, relevance & results granularity.
Accuracy is a chimera. (See my July 2012 Never Trust Sentiment Accuracy Claims.) There are no measurement standards, but you need only good-enough accuracy anyway, not some unobtainable absolute. But we’re dealing with human language data. If you can read the inputs (or a colleague can if they’re in a language you don’t read), you can assess a candidate tool’s accuracy for yourself. Do it.
- Performance: speed, throughput & reliability.
No explanation needed here, I think.
- Provider track record, market position & financial condition.
Best if you can find a provider with experience solving your types of problems, working with the sorts of data that matter to you. That’s obvious. I’ll bring up two, more-specific points: 1) While there is no dominant text or sentiment analysis provider, there are well-established players, but some of them are struggling. One element is that their platforms are not aging well –rule-based NLP is particularly expensive to maintain – and another is that expansion plans, fueled by venture funding, have proven to be over-ambitious. One leading customer experience text analytics provider with both issues has burned its partner network and lacks capacity to implement newly sold projects. That company’s former chief rival went through the same experience just a few years back. 2) There’s a constant stream of emerging tech providers, particularly given machine learning’s power and promise, and at the other end of the range, several tech giants – IBM, SAS, HP – in the space. Expect consolidation.
- Provider’s alliances and tool & data integrations.
This criterion is a bit tricky, and in most cases, I’d prioritize it lowest of the twelve listed. The reasoning here is that many of your projects will apply data from more than one, if not several, sources. We talk about omni-channel marketing and about the customer journey, which involves the multiple touchpoints, each of which may generate data. But a given organization may use different research and software providers for surveys, social listening, device-recorded data, customer relationship management (CRM), sales, and loyalty programs, for instance. Many solution providers recognize that their clients have multiple vendors and form alliances with their sometime rivals, in a spirit of co-opetition, in order to better serve shared clients. And even if there’s no alliance, they can take step to facilitate integrations, whether via specialized connectors or data import/export using common interchange formats.
- Cost: Price, licensing terms, and TCO.
Text and sentiment analysis provider pricing models vary widely. Shop around. But also, going back to criterion #2, consider total cost of ownership. You may find that to get the results accuracy you require, you will need to customize or extend the baseline lexicons, taxonomies, rules sets, and search expressions provided by a candidate supplier. That means professional services costs or training and staffing expenses if you do the work in-house.
- Proof of Concept
Try before you buy, using your data, in a proof-of-concept prototype that produces output samples sufficient to demonstrate whether a candidate tool or solution can deliver the insights you need. Again, we’re dealing with human-language data. It should be clear whether accuracy and performance meet your business-goal-driven needs.
These criteria are offered as a guide. Each situation is unique; each organization has its own priorities and must-haves. So there are no cookie-cutter evaluations, in the selection of a text or social analysis provider, or in other vendor and tool selection processes. Just remember the first principle: Work back from your business goals. Keep outcomes in mind and design an evaluation that suits your situation and you will choose well.