Already, as of 2010, a quarter of Americans (24%) had posted product reviews or comments online, and 78% of Internet users had gone online for product research. But those are ancient stats. Numbers are higher now. More recently, BrightLocal found in 2016 that 91% of consumers regularly or occasionally read online reviews, with 47% taking sentiment of local-business reviews — the tonality of a review’s text — into account in purchasing decisions. Breaking out the figures, 74% of consumers say that positive reviews make them trust a local business more and 60% say that negative reviews make them not want to use a business, according to BrightLocal.
So reviews are important, and the feelings expressed are key. To understand review content including sentiment, at Web and social scale and velocity, you need automated natural language processing (NLP) and other forms of AI.
Commercial review-management platforms — from Bazaarvoice, PowerReviews, Yotpo, and others — help brands and online commerce sites collect reviews and redeploy them to boost sales. That’s an important function: Bazaarvoice reports, “Our clients see 65% average lift in revenue per visit and 52% lift in conversion on product pages with ratings and reviews.” Not all platforms bake NLP into their products and services, however. It’s the companies that do that interest me, the ones that look at what’s actually said in reviews, beyond the star ratings. Let’s look at four, and then at do-it-yourself approaches to customer-review analysis.
Four AI startups that help brands exploit customer reviews
Customer reviews contain several forms of salient information. First there’s the star rating, but ratings, even when broken out into categories — on Airbnb, for example, accuracy, communication, cleanliness, location, check-in, and value — have zero explanatory power. So we have review text: free-form, voice of the customer reactions. This text tells a story, and stories sell, so we need to know the aspects of a product or service discussed, the wording used to describe them, and the sentiment expressed.
Review text also reveals a lot about the reviewer, as Stanford University Prof. Dan Jurafsky explains in an exploration of review language, Natural Language Processing on Everyday Language. (Jurafsky’s data science study how restaurants and reviewers talk about food — including the connection between menu wording and item price — is really illuminating. Also, NLP and AI can help in review moderation by identifying abusive language and can detect fraudulent reviews, but those are topics for another article.) Finally, reviewer identity is key, demographic characteristics such as age, gender, and geographic location and also reviewer reputation or ratings and the reviewer’s social profile, and other factors come into play such as review recency.
We’re describing a complex data scenario. A Bazaarvoice blog post will take you through some of the data science challenges but it doesn’t cover solutions. The startups I will profile deploy analytics — NLP, machine learning, and other forms of AI — to respond to the challenges.
Revuze focuses on products and product attributes in addition to brand health, with a couple of differentiators. One is special attention to discovering different ways people talk about a given topic, and a second is ability to identify sentiment in phrases that lack obvious clues, that don’t use words like “good,” “happy,” and “terrible.”
Revuze analyses aren’t limited to reviews; the company’s tech applies applies NLP for topic, keyword, and sentiment extraction from, additionally, survey responses, call-center text, and social media. Source text is analyzed against category taxonomies generated via semi-supervised machine learning. The company graduated from the Nielsen Innovate incubator in Caesarea, Israel in the fall of 2015 and has turned its attention to Product Experience Management (PEM), enabling clients to “measure customer perception of the holistic product and service experience.”
Technically similar to Revuze but with a very different orientation —
Aspectiva provides aspect-based review aggregation and product search, focused on reviewers’ perceptions of product attributes and capabilities. Aspect extraction, implemented via unsupervised machine learning, is coupled with behind the scenes analytics to reveal “the true uses of any product and generates recommendations based on the full user experience.”
A results graphic — product aspects and sentiment ratings as shown in the image at right — may be embedded in an online commerce site. The goal, per the Brandwatch report cited above, is to boost conversion rates and sales revenue. Aspectiva also provides an API, allowing customers to build their own front-ends to Aspectiva analytics, and a search function that is cognizant of product attributes.
Aspectiva deploys an NLP–machine learning combination that “scans texts written by consumers and learns what people are saying when they are happy or unhappy with products they write about, beyond the obvious ‘sentiment words’ themselves… determining sentiment also with factual sentences.” This capability isn’t unique to Aspectiva — Revuze claims something similar — but it makes for more robust sentiment detection than is found in competing products.
SmartMunk story.ly analysis centers on satisfaction drivers rather than on product and service attributes. The principle is that customer satisfaction drives business outcomes so it’s important to focus on elements that make customers happy or that disappoint.
The company’s offers a hybrid methodology that quantifies qualitative insights discovered in consumer generated content, targeting brand product development and marketing functions. “story.ly directly gathers your reviews including filter variables from seller platforms. Within seconds you see the story in your smart online report.”
I especially like the skyline ontology graphic seen in the lower-right corner of the Customer Satisfaction Dashboard at right. SmartMunk founder Andera Gadeib, who also runs market-research agency Dialego, points out the functional and emotional attributes captured in the ontology, a categorical representation of satisfaction drivers. As for the theme cluster: It is generated via TF-IDF term ranking, based on the relative frequency of term occurrence within the set of inputs. According to Andera Gadeib, although “clients like to look at coded data” — at consumer-generated content classified according to product attributes (rather than satisfaction drivers) — the company “has had good success going with a less category-driven view.”
SentiGeek is a pre-launch customer feedback/review-sentiment company. (I informally advise founder Mara Tsoumari, who presents features and possibilities in an online video.) Candidate markets — indicating the breadth of review-sentiment interest — include online retail, market research, marketing, financial institutions, and public administrations. The product is designed to support reporting, analysis, and monitoring options. It extracts opinion words and phrases and fine-grained sentiment and provides analyses by review and by opinion holder with the ability to generate customer profiles.
It’s the reviewer-focused analyses that differentiates SentiGeek, but rather than provide an interface screenshot for illustration, I thought it would be more interesting to look at SentiGeek’s technical architecture, shown in the image at right. Note the use of spaCy, an innovative, open-source NLP package that features parsing and named-entity extraction capabilities. Note also the use of an RDF-structured knowledgebase — RDF is the Resource Description Framework, which originated as a Semantic Web standard — designed for domain-specific aspect–sentiment evaluations.
Do-it-yourself is an option if you have strong data-wrangling and analytical skills. DIY option #1 is to analyze review text within data analysis workbench. Examples: Aylien describes Building a Text Analysis process for customer reviews in RapidMiner and MeaningCloud covers Text Classification in Excel: build your own model. If you have data science skills, Python is a great choice, perhaps using gensim, NLTK, Stanford CoreNLP (via a Python wrapper), or TensorFlow. (Links are resources and examples.) Use spaCy NLP — SentiGeek’s choice — for Python native information extraction; spaCy plays nice with TensorFlow, Keras, Scikit-Learn, Gensim, and the rest of Python AI ecosystem.”
Build-it-yourself, applying a commercial NLP service, is another option. Hearing John Kelley describe his work at TripAdvisor (using tech from Lexalytics) is what first opened my eyes to review-analytics possibilities. Video of Kelley’s presentation, What Travelers Say… Using Sentiment to Improve User Engagement, is dated but still quite interesting.
A look ahead
Expect consumers to continue sharing product and service perceptions, focusing on both features and experiences. “Word of mouth” matters: influence on purchase decisions, of reviews and social postings, will likely only grow. We will see a continued strong market for analytics — NLP, machine learning, and data science — as a best response to the volume–impact combination, text analytics and also image and video analytics that can detecting entities, emotions, and context in diverse media. If you’re a brand, build or adopt an analytics solution, whether from an established vendor or a startup, or fall behind.
For more on review Voice of the Customer analyses and related topics, check out the 2017 Sentiment Analysis Symposium, taking place June 27-28 in New York, tagline Emotion–Influence–Activation. We have a Call for Speakers open through January 31.