Eleven Things Research Pros Should Know about Sentiment Analysis

Sentiment analysis has been portrayed, variously, as a linchpin of the New MR and as snake oil that produces pseudo-insights that are little better than divination. Who’s right?

Me, I’m with the first camp. Automated sentiment analysis, powered by text analytics, infuses explanatory power into stale Likert-reliant methodologies and allows researchers to stay on top of fast emerging trends, and to tap the unprompted voice of the customer, via social listening.

emotions-36365_640I suspect that most in the second, nay-sayer camp have distorted ideas of sentiment analysis capabilities and limitations. These ideas have perhaps been engendered by extravagant and unsupported claims made by less-than-capable solution providers. Whatever their source, I’ve take it on myself to debunk them, to give a truer sense of the technology.

We aim to encourage appropriate use of sentiment technologies and to discourage their misuse. Call the effort market education. I do a lot of it, via conferences such as my up-coming Sentiment Analysis Symposium, taking play July 15-16 in New York, and via articles such as this one, which offers —

Eleven things research pros should know about sentiment analysis:

  1. Sentiment analysis via term look-up in a lexicon is an easy but crude method. Meaning varies according to word sense, context, and what’s being discusses. Look for methods that apply linguistic and statistical methods to the analysis task.
  2. Document-level sentiment analysis is largely passé. Aim for sentiment resolution at the entity, concept, or topic level. (Examples: An Apple iPhone 6 is an entity; the iPhone line is a conceptual category; smart phones are a topic.)
  3. The common-language definition of ‘sentiment’ includes attitude, opinion, feelings, and emotion. Capable sentiment analysis will allow you to go beyond positive/negative scoring to allow rating according to emotion — happy, surprised, afraid, disgusted, angry, and sad — and mood and not just valence.
  4. Expanding on that broad view: Sentiment analysis is part of the world of affective computing, “computing that relates to, arises from, or deliberately influences emotion or other affective phenomena,” quoting the MIT Media Lab’s Affective Computing group. Contrast with cognitive and sensory computing: All linked, but with distinctions in the technologies and methods applied.
  5. Not all sentiment is created equal. You should strive to understand both valence and intensity, and also significance, how sentiment translates into actions.
  6. Whether you apply language engineering, statistical methods, of machine learning to the task, properly trained domain-adapted models will outperform generic classification.
  7. You need to beware of accuracy claims. There’s no standard measuring stick, and some solution providers even cook the measurement process. The accepted approach is to measure accuracy against a “gold standard” of human annotated/classified material. That means setting humans and machines on the same tasks and seeing the degree of agreement. But if you have your software take a shot at the task, and then have a human decide whether it was right of not, that’s not legit. And no standard measuring stick: Some software does only doc level analysis while others does sentiment or phrase and others resolves at the entity or concept level. Maybe 70% at the entity level is better than 97% at the doc level?
  8. Text is the most common sentiment data source, but it’s not the only one. Apply facial coding to video, and speech analysis to audio streams, in order to detect emotional reaction: These are advanced methods for assessment of affective states. The next frontier: Neuroscience and wearables and other means of studying physiological states.
  9. Language is among the most vibrant and fast-evolving tools humans use. Personal and social computing have given us unprecedented expressive power and ability to amplify our voices, via old-new methods such as emoji. More than just nuanced amplifiers — 😀 vs. 😈— emoji have taken on syntax and semantics of their own, and of course social media is awash in imagery. Sentiment analysis is keeping pace with the emergence of new forms of expression. (A plug: I’m particularly excited about a pair of Sentiment Analysis Symposium presentations in this area. Francesco D’Orazio, of UK agency FACE and Pulsar Social, will speak on Analyzing Images in Social Media, and Instagram engineer Thomas Dimson will be speaking on the semantics of emoji, on “Emojineering @ Instagram.” Other symposium presenters cover others topics I’ve mentioned in this article.)
  10. You can gain analytical lift, and predictive power, by linking sentiment and behavior models, and by segmenting according to demographic and cultural categories. There’s lots of data out there. Use it. Here’s why —
  11. Advanced concepts such as motivation, influence, advocacy, and activation are built on a foundation of sentiment and behavioral modeling and network analysis. If the goal of research, in the insights industry, is consumer and market understanding, the goal of understanding is to create the conditions for action. You need to work toward these concepts.

Alright, there’s my take. Consider it when you design your next survey — don’t shy away from free-response verbatims — and as you wonder how to bring social-media mining into your studies. Think about the variety of affective-computing methods available to you and which might help you, in conjunction with behavior analyses and more advanced segmentation, generate insights that your clients can act on. Market researchers and insight pros, relook sentiment analysis in order to add New to your MR.

10 Reasons You Should Attend the NY Sentiment Symposium

If you’re reading this, chances are you should attend the July 15-16 Sentiment Analysis Symposium in New York.


SAS15-160x300XBecause sentiment — covering emotion, intent, and the spectrum of social signals — has never been more important for business (and healthcare, finance, government, and academia). You need to keep up with the technology and applications, and the symposium is the single best place to learn, share, network, and make deals.

The symposium is a labor of love for me, and I’m going to go overboard in offering 9 more reasons to attend. I will attest to you that —

1) This is seriously the best program of the eight I’ve organized so far. I don’t mean to slight anyone, but I’ll single out a few of the speakers as really cool: Fran D’Orazio on visual social; Thomas Dimson on emoji semantics (“Emojineering @ Instagram”) 👏; Vika Abrecht from Bloomberg’s machine learning group; Rohini Srihari on inferring demographic data; Michael Czerny on Word2Vec for Sentiment Analysis; Scott Amyx on wearables. That’s in addition to coverage of mainstream customer experience, market insights, social intelligence, healthcare, financial, and other use cases.

2) Your peers, competitors, and potential business partners will be there. In text/social analytics: ABBYY, Basis Technology, Bottlenose, CrowdFlower, Digital Reasoning, InMoment, Kanjoya, Lexalytics, NICE Systems, Oracle, Percolate, Pulsar Social, Rant & Rave, SAS, Sentimetrix, Socialgist, Teradata, TheySay. (I’m not even listing financial sector firms.)

3) You may learn about new (to you) markets: capital markets, emotion analytics, healthcare, digital marketing, media measurement.

4) You’ll meet visionaries past, present & future: Dave Schubmehl of IDC; ex-IDC Sue Feldman & Hadley Reynolds; Joel Rubinson, former Chief Research Officer at the Advertising Research Foundation; also ex-ARF Steve Rappaport; Anjali Lai from Forrester Research.

5) You’ll network with attendees from Etsy, Hallmark, Lenovo, Millward Brown, Sony Music, Verizon, and Wipro; from Bank of America, Capital Fund Management, Wells Fargo, the Federal Reserve Board, and the Singapore Defense Ministry (!). (I’m cherry-picking of course.)

6) You’ll meet research authorities, among them: Prof. Bing Liu (sentiment analysis workshop); Prof. Robert Dale (natural language generation workshop); Dr. David Forbes (market science); Dr. Daniel McDuff (facial coding); and a passel of data science types.

7) We have great sponsors. MeaningCloud (Daedalus), Lexalytics, TheySay, DandelionAPI (SpazioDati), and Revealed Context (Converseon) provide text/social analysis services. Socialgist is a data leader, Social Market Analytics makes social sensible for financial markets, and Emotient has pioneered emotion analytics via facial coding.

8) You won’t find better concentrated coverage of Sentiment Analysis for Financial Markets. That’s our Thursday, July 16 Workshop track session, moderated by Battle of the Quants organizer Bartt Kellermann.

Remember: You’re free to mix-and-match Presentation and Workshop track sessions.

9) The New York Academy of Sciences is a great venue.

Check out the program and do join us, for either day or both.

Faces, Emotions, and Insights: Q&A with Affectiva Daniel McDuff

Emotion influences our actions and colors our interactions, which, to be blunt, means that emotion has business value. Understand emotions and model their associations with actions and you can gain insights that, if you do it right, enable “activation.”

Humans communicate emotion in many ways, notably via speech and written words, and non-verbally through our facial expressions. Our facial expressions are complex primitives that are fundamental to our knowing and understanding one another. They reveal feelings, that is, “affective states,” hence the company name Affectiva. Affectiva has commercialized facial coding and emotion analytics work done at the MIT Media Lab. The claim is that “deep insight into consumers’ unfiltered and unbiased emotional reactions to digital content is the ideal way to judge your content’s likability, its effectiveness, and its virality potential. Adding the emotion layer to digital experiences enriches these interactions and communications.”

Affectiva Principal Scientist Daniel McDuff

Affectiva Principal Scientist Daniel McDuff

I recruited Affectiva to speak at the up-coming Sentiment Analysis Symposium, taking place July 15-16, 2015 in New York. Principal Scientist Daniel McDuff, an alumnus of the MIT Media Lab, will represent the company. He will speak on “Understanding Emotion Responses Across Cultures,” of course about applying facial coding methods to the task.

Seth Grimes> Affectiva measures emotional reaction via facial coding. Would you please take a shot at describing the methods in just a few sentences?

Daniel McDuff> We use videos (typically from webcams) of people, track their face and analyze the pixel data to extract muscle movements. This is an automated way of coding Paul Ekman and Wallace Friesen’s facial taxonomy. We then infer emotion expression information based on the dynamic facial muscle movement information.

Seth> That’s the What. How about the How? What are the technical ingredients? A camera, obviously, but then what?

Affectiva image

Daniel> For image capture a normal webcam or smartphone camera is sufficient.  Analysis can be performed in two ways, 1) via the cloud in which case images are streamed to a server and analyzed or 2) on the device.  The algorithms can be optimized to work in real-time and with very small memory footprint, even on a mobile device.

You earned your PhD as part of the Affective Computing group at MIT Media Lab, where Affectiva originated. (Not coincidentally, we had Affectiva co-founder Roz Picard keynote last year’s symposium.) What did your dissertation cover?

My dissertation focused on large-scale “crowdsourcing” of emotion data and the applications of this in media measurement. In the past behavioral emotion research focused on data sets with only a relatively small (~100) numbers of people. By using the Internet we are now able to capture data from 100,000s of people around the world very quickly.

Why are you capturing this data? For model building or validation? For actual purpose-focused analyses?

This data is a gold mine of emotional information. Emotion research has relied on studying the behavior of small groups of people until now.  This has limited the types of insights that can be drawn from the data.
Now we are able to analyze cross-cultural data from millions of individuals and find significant effects even within noisy observations.

If/when you capture data from 100,000s of people around the world, what more do you know, or need to know, about these people to make full, effective use of the data?

It is extremely helpful to have demographic information to accompany facial videos. We now know that there are significant differences between genders, age groups and cultures when it come to facial behavior.  We may find that other factors also play a role.  Affluence, personality traits and education would all be interesting to study.

You’ll be speaking at SAS15 on emotional response across cultures. How close or far apart are emotions and the way they’re expressed in different cultures? Are there universal emotions and ways of expressing them?

There are fascinating differences between cultures in terms of how facial expressions are exhibited. Indeed there is a level of cross-cultural consistency in terms of how some states are expressed (e.g. disgust, surprise). However, on top of this there are complex culturally dependent “display rules” which augment these expressions in different ways. Some of these relationships fit with intuition, others are more surprising.

A variety of affect-measurement technologies have emerged at MIT and other research centers that include text and speech analysis. Are cultural analyses consistent across the various approaches?

Emotion research is a HUGE field and to a certain extent the “face” community has been separate from the “voice” and “text” communities in the past. However, we are now seeing much more focus on “multi-modal” research which considers many channels of information and models the relationships between them. This is extremely exciting as we are well aware that different channels contain different types of emotional information.

What are some scenarios where facial coding performs best? Are there problems or situations where facial coding just doesn’t work?

Facial coding is most effective when you have video of a subject and they are not moving around/looking away from the camera a lot. It is also very beneficial to have context (i.e. what is the subject looking at, what environment are they in, are they likely to be talking to other people, etc.). Interpreting facial coding data can be challenging if you don’t know that context. This is the case for almost all behavioral signals.

What business problems are people applying facial coding to?

All sorts of things. Examples include: media measurement (copy-tesing ads, testing pilot TV shows, measuring cinema audience reactions), robotics, video conferencing, gaming, tracking car driver emotional states.

Could you discuss a scenario, say tracking car driver emotional states? Who might use this information and for what purpose? Say a system detected that a driver is angry. What then?

Frustration is a very common emotional state when driving.  However, today’s cars cannot adapt to the drivers state.  There is the potential to greatly improve the driving experience by designing interfaces that can sensitively respond when the driver’s state changes.

In a open situation like that one, with many stimuli, how would the system determine the source and object of the anger?

Once again, context is king.  We need other sensors to capture environmental information in order to ascertain what is happening.  Emotions alone is not the answer. An integrated multi-modal approach is vital.

Can facial-coding results be improved via multimodal analysis or cross-modal validation? Have Affectiva and companies like it started moving toward multimodal analysis, or toward marrying data on sensed emotions with behavioral models, psychological or personality profiles, and the myriad other forms of data that are out there?

Yes, as mentioned above different channels are really important. Affectiva has mostly looked at the face and married this data with contextual information. However, I personally have done a lot of work with physiological data as well. I will also present some of those approaches at the workshop.

You’re principal scientist at Affectiva. What are you currently working on by way of new or refined algorithms or technologies? What will the state of the art be like in 5 years, on the measurement front and regarding the uses emotion analytics will be put to?

As there are so many applications that could benefit from being “emotion aware” I would expect almost all mobile and laptop/desktop computer operating systems to have some level of emotion sensing in 5 years. This will facilitate more large-scale research on emotions.

And finally, do you have any getting-started advice for someone who’d like to get into emotion analytics?

Don’t under estimate the importance of context. When analyzing emotion data it is essential to understand what is happening since emotions and reactions are complex and vary between people.

Meet Dan at the July 15-16 Sentiment Analysis Symposium in New York. He’ll be speaking Thursday afternoon, July 16, in a segment that includes other not-only-text (NoText?) technologies — speech analytics, wearables, virtual assistants — proven but with huge market still in store. These talks follow another that’s really bleeding edge, a study of the semantics of emoji, “Emojineering @ Instagram,” presented by Instagram engineer Thomas Dimson. If you do attend the symposium, you can join us for either of the two days or both, and mix-and-match attendance at presentations and at longer-form technical workshops.


LinkedIn and My Multiple Personalities

I lead a double life… in which I’m far from alone. Most of us have multiple identities. At a minimum, we distinguish and maintain boundaries between our work and family/community lives. Online, that means keeping professional social networking separate from friends & family nets. Me, I use LinkedIn exclusively for work and Facebook for family, friends, and community. I have a couple of separate Twitter accounts. I recognize that most of what interests my work network is going to be a total bore for my brother-in-law. Only on topic-focused platforms such as Yelp does the personal/professional dichotomy not matter.

But my situation is more complicated: I have two professional identities. I have two distinct paid jobs with two non-intersecting networks. I spend most of my time covering text analytics, sentiment analysis, and data visualization as an IT industry analyst and consultant. (Check out my up-coming Sentiment Analysis Symposium conference in New York.) And I’m an elected government official, serving on the Takoma Park, Maryland city council. Believe me, the jobs don’t mix.

Professional networking means LinkedIn, yet I’m in the awkward position of turning down legitimate LinkedIn connection requests that don’t fit my LinkedIn focus. I reserve Linkedin for my 40-hour per week (hah!) job, so I ax invitations from political and community and contacts. Sorry!

Wouldn’t it be great if LinkedIn created the concept of personas, of different faces shown to different cohorts, reflecting our collective multiple personalities?

No, I’m not going to create distinct, separate LinkedIn accounts, one for each role. LinkedIn doesn’t allow the practice. (“To use the Services, you agree that… you will only have one LinkedIn account, which must be in your real name,” per the User Agreement.) Put aside that curating a LinkedIn profile is hard work. Applications (including apps and Web browsers) don’t support login to more than one account at a time, so you’d have to use a different for each of your accounts. Given the widespread use of Oauth for networked service authentication, you’d face major inconvenience.

What would LinkedIn profile personas look like? Facebook has something similar figured out, via the ability to create a page (which will have its own, distinct URL/address) and the ability to designate “Who should see this?” for the content you post. This stuff isn’t the same as ability to maintain multiple personas within a single account, but it works for Facebook. Google+, of course, similarly allows selective sharing with designated circles and communities.GplusCircles

So LinkedIn, what I want is this:

  • A distinct tag-line, background photo, and Background-section summary for each persona.
  • Ability to select the elements that are shown in the Experience, Skills, Organizations, Honors & Awards, and other sections, and to control their order.
  • Ability to associate Recommendations, Groups (group memberships), etc. with a persona.
  • Ability to separate Connections by persona, and to determine which set(s) of connections see a given status update, photo, or post.

Doable? I’d think so.

What’s in it for LinkedIn?

Satisfaction. Loyalty. Expanded use, because if we could create personas, we’d connect with a whole lot more people and post many more Linked updates.

LinkedIn, do you recognize that one-size-fits-each doesn’t cut it in today’s complex social world? Personas. I have more than one. LinkedIn can match my multiple-personality needs by becoming a multi-faceted social platform.

What do you say?

Consumer Insights Lead to Activation: Q&A with MotiveQuest’s Brook Miller

Brands search constantly for consumer insight, seeking to understand customers, prospects, and market directions and to discern what works in creating desire, response, satisfaction, and loyalty. These latter concepts seem straightforward, yet they’re not so easy to compute. Measures that are typically applied, for instance the Net Promoter Score, paint an over-simplified picture based on attitudes rather than actions; they lack predictive ability. The prevailing method of studying actions, in the online world at least — digital analytics — falls far short due to lack of explanatory power. And these methods provide what are in essence point-in-time measurements. They record only a small portion of an often-extensive set of customer interactions that occur across multiple channels over time, the customer journey.

Brands get better answers, according to insights agency MotiveQuest, via study of motivation and advocacy. We inhabit a big data world; we’re entering an Age of Algorithms. Insights voodoo doesn’t cut it. Instead, marketing science dictates application of a rigorous analytical framework, and clients demand that findings be presented in useful form, translated into useful, usable strategy. Technology including text and sentiment analysis is a key element, but in the words of MotiveQuest CTO Brook Miller, “We’ve done interesting work to understand the emotional states along the customer journey, but it always has to come back to making it actionable for our clients.”

Brook Miller, CTO at MotiveQuest

Brook Miller, CTO at insights agency MotiveQuest

I interviewed Brook in the run-up to the next Sentiment Analysis Symposium conference, taking place July 15-16 in New York. Brook will be speaking; his talk is titled “Segmenting Advocates to Develop Marketing Strategies and Communications.” As a preview, here’s an —

Interview with MotiveQuest CTO Brook Miller, on brands, listening, insights, and value

Seth Grimes> In just a few sentences, what does MotiveQuest do and how do you do it?

Brook Miller> MotiveQuest delivers consumer insights to our clients to help them improve their communications and marketing strategy, as well as uncover new consumer segments and product opportunities for growth. Our strategy team uses our proprietary software tools to listen to billions of organic consumer conversations happening across online communities and social networks, and then turn that data into insights, opportunities and recommendations.

Seth> I see three judgments implicit in the Web-site statement, “At MotiveQuest, we leverage custom curated consumer data from online communities to help our clients see the world through their customers’ eyes, by listening, not asking.” I read into that statement that MotiveQuest is dissing surveys (that is, asking), open-social listening (given that you favor communities), and uncurated data. So where and how, exactly, do surveys and social listening fall short?

Brook> Is it ok to use the word “dissing”? This is fantastic! Expect to hear that during my presentation at the Sentiment Analysis Symposium.

Our approach delivers the qualitative nature and deep understanding of focus groups at tremendous scale and we can do it faster and more efficiently. Surveys have their place; for example consumers rarely talk about advertising unprompted, so if you need to test a specific copy or creative some sort of asking based research will be involved.

Our listening research can also be very complementary to traditional. Many of our clients find that traditional research methods are much more powerful after they’ve engaged with us to identify better questions and even new consumer segments to evaluate. Then they are able to direct additional quant and qual into sizing and clarifying opportunities. In some cases, we’ve even partnered with technology enabled “asking” based research companies to provide our clients with a holistic view of their consumers by combining asking and listening research at scale.

Over the last 10 years there’s been a tremendous expansion in the number and type of social channels and we absolutely use the broad social networks to inform our analysis, but the communities with consumers talking back and forth with each other (typically outside of the brand / company’s influence) gives us the best fodder for deep understanding. A lot of our analysis starts with the perspective of consumers / category rather than looking at the brand.

What sort of signals do you look for in the data, that is, what do you measure and how do you transform what you measure into insight?

Typically we’re casting a wide net to surface the key topics and drivers for consumers in a given category and then we’ll want to see how those ebb and flow over time. We’re looking for the dynamic trends and interesting changes that our clients can act upon. We really try to not get too bogged down in all the “interesting” data but to focus on data our clients can use to make decisions and move their businesses forward.

One more inference from that Web-site snippet: Does “[we] help our clients” imply that do-it-yourself doesn’t deliver for brands, even for the majors among MotiveQuest’s customers? Or is the crux of the matter not capability but rather the degree of access clients are allowed to core assets such as the curated customer data and analytical framework?

Have you ever seen someone that worked in I-Banking at Goldman Sachs build a financial model? I’m pretty handy with Excel, but at Kellogg [School of Management] I’d just be opening the file and they’d already have 15 tabs with a 3 year forecast completed. Our strategists spend more than half their week deep into our software utilizing our existing or building new frameworks to understand consumers.

Our best clients are looking to push their businesses forward and while the insights we deliver are a part of that, they also have to execute, manage, plan, etc. We deliver the insights with recommendations for our clients to act upon, which we think drives a lot more value than just a toolset.

Your SAS15 talk is about segmenting advocates. How do you define an advocate, what sorts of segmentation deliver value to clients, and how may that value be measured?

Advocacy has been a linchpin of our ability to provide insight for the last 8 years. We worked in conjunction with professors from Northwestern to build a model tying the people promoting brands and products to others to sales and share. I think it’s an accepted fact that the most effective promotional channel is word of mouth from people like you and with our tools, we’re able to listen in on the online set of these conversations, that have always taken place.

Once we understand advocates, we can break them apart by interests. Is this person a Gamer or Mom, or both? For each group which driver is more important: customization or price?

I think the segmentation depends a lot on whether our client is trying to find white space for a brand extension or a hook to spur their social campaign launching next week.

A recent MotiveQuest blog post stated, “Brands that stand tall for something have many advantages, the most important of which is a strong emotional connection with their audiences.” The focus on emotional connection is really interesting. What technology and methods do you apply to discern, measure, and exploit emotional connection?

We’ve built frameworks and linguistic sets of the ways in which people express emotion as a pretty standard part of our toolkit. We’ve done interesting work to understand the emotional states along the customer journey, but it always has to come back to making it actionable for our clients. Knowing that people are “frustrated” in customer service is not so helpful. Knowing consumers are 10x more frustrated with wireless carrier A vs. wireless carrier B’s customer service can start to spur some action. Being able to then unpack that frustration into topics can create the need for change as well as a recommendation for what that change should be.

Seth> What role does data visualization play for you and your clients?

A MotiveQuest visualization: Emotions detection for brand-category understanding

A MotiveQuest visualization: Emotions detection for brand-category understanding

I will probably sound like a luddite, but line charts, bar charts, x-y plot with straight forward axis make up the majority of what we do. We employ stream graphs, clustering, heat maps and force directed diagrams as part of our toolset but try not to include those just as eye candy in our work for clients. We see a lot of “interesting” visualization ideas but are often left scratching our heads by the ambiguity the visual creates and we ask, “why didn’t they just use a bar chart?”

Where are you heading next? What would you be measuring if you could, that you aren’t already measuring? Are you working to bring new or improved algorithms to bear on the customer-understanding challenges?

The visual web is fascinating, and we utilize a lot of the imagery that consumers create to bring our ideas to life, but going beyond “does the visual have a logo in it?” or counting how many times a particular visual meme is shared in an automated fashion, to be honest I don’t know exactly what that will look like yet. We’re certainly not ready to extract emotional states from imagery… (Google might be, if you haven’t used their photos app, you have to try it.)

I think we’re still on the precipice of what value can be delivered through listening insights. Rather than innovation in methodology, I think I’m most excited by innovation in the marketing organization and process, such as what happens when we’re able to deploy a lean start up approach to the marketing org.

If we can build a virtuous cycle where consumer insights lead to activation ideas that get piloted and then scaled across marketing channels, I think we can usher in the new era of agile consumer research, leading to more effective insights, and marketing tactics.

Again, hear directly from Brook at the July 15-16 Sentiment Analysis Symposium in New York. Attend either of the two days or both, and mix-and-match attendance at presentations — our speakers represent Instagram and Affectiva, Verizon and Lenovo, Face Group and Cision, IDC and Forrester Research, and many others — and at longer-form technical workshops. And stay tuned, by following @SentimentSymp on Twitter, for additional interviews in this series.

An extra: MotiveQuest CEO David Rabjohns’ 2014 Sentiment Analysis Symposium presentation, Mapping Human Motivations to Move Product…

Basic Advice for Your Language Tech Start-up

I talk frequently to companies in, or entering, the language technology market. That’s text and social analytics, sentiment analysis, and all things applied NLP, from good-old entity extraction to natural language generation (NLG) to emoji semantics. Companies that contact me want guidance on feature sets, technical capabilities, competitive positioning, and potential sales targets, and they want to show off their wares in order to win attention. Early-stage companies covet coverage, and most welcome funding, partner, talent, and (what’s golden:) prospective-customer referrals.

Instagram/emojiThe ones I reach out to: Well, I make it my business to spot players and trends early, to help advisees place the winning bets. Sometimes I write about startups and innovation and I regularly bring them in to events I organize including — do check it out — the Sentiment Analysis Symposium conference, taking place July 15-16 in New York. (The emoji reference above is to what should be a fascinating SAS15 talk, Emojineering @ Instagram, presented by engineer Thomas Dimson; and Prof. Robert Dale will offer an NLG workshop.)

Geneea, a Czech start-up founded last year, aims to build an “intelligent multilingual text analytics and interpretation platform.” Sounds ambitious, doesn’t it? Actually, technically, it’s almost the opposite. Open source software — Geneaa’s chose options including OpenNLP and Mallet — eliminates technology barriers to entry, including in text analytics. You do have to choose the most appropriate options and use them effectively, but I see the greater challenge in finding a market and a path to it. The path to market is facilitated by connections, but you do have to prove your technical capabilities by delivering data interpretation that suits business tasks. Not so easy.

I had a productive conversation last month with several Geneea team members. I’ll distill out and share some key points, from that conversation and others, acknowledging that I may learn as much from the startups I talk to — around the same time, folks including industry veteran Alyona Medelyan (check out Maui automatic extraction of keywords, concepts & terminology) and David Johnson and colleagues at Decooda (“cognitive text mining and big data analytics” targeting the insights industry).

An early-stage needs to recognize that, per Tom Nowak of Geneaa (quoting with his permission), “any piece of wisdom — experience & expertise — is most welcome and very important for startup strategy.” So point #1 is:

  1. Solicit targeted advice, early.

Obvious, yes, but in my experience, some start-ups stay heads-down developing technology that ends up over-fitting any paying application. Also:

  1. Look for comparators, companies to learn from that have succeeded (or failed) in what you hope to accomplish, whether similar in business model, function, technology, or target market.
  2. Exploit open source. It’s free, proven, and comes with community support. What successful text analytics companies have built around open source? Attivio and Crimson Hexagon for two.
  3. Open source isn’t your only tech-acceleration option. Check out, as an example, Basis Technology’s Text Analytics Startup Program. Luminoso is a participant.

Tom and his Geneea colleagues have been working since last summer on their text analysis platform, which they’ll deploy online, available via a Web service, RESTful API (application programming interface). Others I cited above — Maui, Decooda, Luminoso — are also deploying via an API, which fits another bit of guidance:

  1. Design to industry standards, at least to start, to allow your product to be easily plugged in to others’ platforms and workflows.

Lock-in is for later, once your established. A bit of related wisdom:

  1. Market education is expensive. Time spent in explaining your idiosyncratic methods or terminology is time that communicates costs rather than business benefit.

(Decooda, how’s the “cognitive” label working out for you? Sure, IBM uses it, but I’m not convinced anyone understands it.)

Especially if you design to standards, you need to differentiate.

  1. Identify, build out, and communicate things you do — not just better, faster, or cheaper than others, but that others don’t. Competing on better (including more accurately), faster, or cheaper is competing. You want to avoid competition, if you can swing it.
  2. In the language-technology world, ability to handle under-resourced languages or excel in under-supported business domains is a good differentiator.
  3. Another differentiator: ability to discern and extract information that others don’t.

Language coverage is a differentiator for Geneea, which is located in the Prague and supports Czech in addition to English. Czech is a jump-off point to other central and eastern European language, many of which are under-resourced. But I believe…

  1. You need more than one competence, more than one selling point. Seek to create technical differentiation if you can, also design to meet someone’s business needs.
  2. As you go broader, seek synergies. An example? Tourism-transport-hospitality-weather. Tourism-electronics? Nah.
  3. Develop use cases and demonstration prototypes (which don’t have to be fully functional or bullet proof) that will help a prospect understand what you can do.
  4. But focus. Don’t go too wide. You’ll waste time and, as a small player, you’ll lack credibility.

Cooperation is another principle.

  1. Seek to partner with established organizations including agencies and consultancies. They have assets you don’t: brand visibility, technology, domain knowledge, and business relationships. They provide a channel.
  2. Partners (and investors) have a stake in your success. Keep that in mind.
  3. But be wary of partnerships where you’re just one player among many. Some companies cultivate ecosystems that play tech partners off, one against the other, with no revenue assurance. (Salesforce, I’m looking at you.)

If you’ve gotten this far without skipping to the bottom, you realize that the majority of my points apply broadly. They’re not specific to language tech companies. Do they reflect your experience? I’d welcome knowing, via comments or direct contact.

And if you’re in the text or social analytics world, commercializing technology or developing solutions in NLP, sentiment analysis, or related areas, I’d love to hear your story. Get in touch!

Loyalty: Earned, Owned, and Paid

Paid/owned/earned media per Forrester Research, 2009

Paid/owned/earned media per Forrester Research, 2009

Digital marketers talk of earned, owned, and paid media — when others tell your story via their preferred channels (earned), when you maintain the platform or channel (owned), and when you exploit others’ channels to get the word out (paid). Some DMers split out shared as a fourth species, effectively the amplification of earned/owned/paid messages. There’s lots of marketing science behind this analysis, from researchers such as Forrester (image at right) and marketers, advertising mavens, and platforms. Search for yourself; my interest at this moment is elsewhere. It’s on application of the e/o/p(/s) concept to an important aspect of consumer behavior, to loyalty.

IMG_20150605_055349_957I’m writing this article while in transit, from O’Hare Airport. In thinking about my travel experience, I’ve realized that the e/o/p(/s) categories do indeed work well in describing customer loyalty!

My analysis —

Owned loyalty — lock-in — stems from lack of (practical) choice. I had a one-stop outbound flight because a non-stop would’ve meant using an airline other than American (or partner), at an exorbitant ticket cost. Flying AA’s fine with me — I’ve done it often — and I’m sure that the Phoenix Airport has many hidden charms that weren’t visible during my stop-over, but I sure would have preferred a non-stop flight. My loyalty to AA was owned.

I earned advantage loyalty-program miles for American flights, but like others who fly only a few times a year, miles don’t trump schedule or ticket price when I choose flights. But some of my friends fly A LOT and have earned status, status that promises upgrades. Loyalty-program miles and points — and repeat-buyer discounts, credit card rebates, and other incentives and concessions offered in the face of buyer choice — constitute paid loyalty.

Just as earned media is best — the product of doing, saying, or offering something noteworthy — earned loyalty the type we crave. You win earned loyalty by delivering exemplary customer experience — in the form of products, services, and interactions — to create customer satisfaction that freezes out the competition. Earned loyalty may even allow for a price premium, because the customer perceives your offering as that good.

And finally category 4, shared loyalty, a concept that has another name, advocacy. It’s one thing to say that you “would recommend” a product or service — Net Promoter polls recommendation likelihood — and another to actually do it. The point is that just as satisfaction doesn’t guarantee loyalty — a happy customer may still choose a competitor’s product or service based on price or convenience — a “promoter” rating doesn’t mean that there will be an actual recommendation. A promoter may not be an advocate. A promoter may not even be loyal. But a loyalist — a repeat buyer — who shares, that’s golden!

Reader, does my extension of earned/owned/paid/shared, from media to loyalty, make sense to you? Your comment is welcome!

Where Are The Text Analytics Unicorns?

Customer-strategy maven Paul Greenberg made a thought-provoking remark to me back in 2013. Paul was puzzled —

Why haven’t there been any billion-dollar text analytics startups?

Text analytics is a term for software and business processes that apply natural language processing (NLP) to extract business insights from social, online, and enterprise text sources. The context: Paul and I were in a taxi to the airport following the 2013 Clarabridge Customer Connections conference.

Customer experience leaders outpace laggards in key performance categories, according to a 2014 Harvard Business Review study.

Customer experience leaders outpace laggards in key performance categories, according to a 2014 Harvard Business Review study.

Clarabridge is a text-analytics provider that specializes in customer experience management (CEM). CEM is an extremely beneficial approach to measuring and optimizing business-customer interactions, if you accept research such as Harvard Business Review’s 2014 study, Lessons from the Leading Edge of Customer Experience Management. Witness the outperform stats reported in tables such as the one to the right. Authorities including “CX Transformist” Bruce Temkin will tell you that CEM is a must-do and that text analytics is essential to CEM (or should that be CXM?) done right. So will Clarabridge and rivals that include Attensity, InMoment, MaritzCX, Medallia, NetBase, newBrandAnalytics, NICESAS, Synthesio, and Verint. Each has text analytics capabilities, whether the company’s own or licensed from a third-party provider. Their text analytics extracts brands, product/service, and feature mentions and attributes, as well as customer sentiment, from social postings, survey responses, online reviews, and other “voice of the customer” sources. (A plug: For the latest on sentiment technologies and solutions, join me at my Sentiment Analysis Symposium conference, taking place July 15-16 in New York.)

So why haven’t we seen any software companies — text analytics providers, or companies whose solutions or services are text-analytics reliant — started since 2003 and valued at $1 billion or more?

… continued in VentureBeat.

Six Intelligence/Data Trends, as Seen by the U.S. Former Top Spy

Gen. Michael Hayden, former CIA and NSA director, keynoted this year’s Basis Technology Human Language Technology Conference. Basis develops natural language processing software that is applied for search, text analytics for a broad set of industries, and investigations. That a text technology provider would recruit an intelligence leader speaker is no mystery: Automated text understanding, and insight synthesis across diverse sources, is an essential capability in a big data world. And Hayden’s interest? He now works as a principal at the Chertoff Group, an advisory consultancy that, like all firms of the type (including mine, in data analysis technologies) focuses on understanding and interpreting trends and shaping reactions and on maintaining visibility by communicating its worldview.

Gen. Michael Hayden

Gen. Michael Hayden keynotes Basis Technology’s Human Language Technology Conference

Data, insights, and applications were key points in Hayden’s talk. (I’m live-blogging from there now.)

I’ll provide a quick synopsis of six key trend points with a bit of interpretation. The points are Hayden’s — applying to intelligence — and the interpretation is generally mine, offered given broad applicability that I see to a spectrum of information-driven industries. Quotations are as accurate as possible but they’re not guaranteed verbatim.

Emergent points, per Michael Hayden:

1) The paradox of volume versus scarcity. Data is plentiful. Information, insights, are not.

2) State versus non-state players. A truism here: In the old order, adversaries (and assets?) were (primarily) larger, coherent entities. Today, we live and operate, I’d say, in a new world disorder.

3) Classified versus unclassified. Hayden’s point: Intelligence is no longer (primarily) about secrets, about clandestine arts. Open source (information, not software) is ascendant. Hayden channels an intelligence analyst who might ask, “How do I create wisdom with information that need not be stolen?”

4) Strategic versus specific. “Our energy is now focuses on targeting — targeted data collection and direct action.” Techniques and technologies now focus on disambiguation, that is, to create clarity.

5) Humans versus machines. Hayden does not foresee a day (soon?) when a “carbon-based machine” will not be calling the shots, informed by the work of machines.

6) The division of labor between public and private, between “blue and green.” “There’s a lot of true intelligence work going on in the private sector,” Hayden said. And difficulties are “dwarfed by the advantage that the American computing industry gives us.”

Of course, there’s more, or there would be were Hayden free to talk about certain other trend points he alluded to. Interpreting: Further, the dynamics of the intelligence world can not be satisfyingly reduced to bullet trend points, whether the quantity is a half dozen or some other number. The same is true for any information-driven industry. Yet data reduction is essential, whether you’re dealing with big data or with decision making from a set of over-lapping and potentially conflicting signals. All forms of authoritative guidance are welcome.

The Myth of Small Data, the Sense of Smart Data, Analytics for All Data

Big data is all-encompassing, and that seems to be a problem. The term has been stretched in so many ways that in covering so much, it has come to mean — some say — too little. So we’ve been hearing about “XYZ data” variants. Small data is one of them. Sure, some datasets are small in size, but the “small” qualifier isn’t only or even primarily about size. It’s a reaction to big data that, if you buy advocates’ arguments, describes a distinct species of data that you need to attend to.

I disagree.

Nowadays, all data — big or small — is understood via models, algorithms, and context derived from big data. Our small data systems now effortlessly scale big. Witness: Until five years ago, Microsoft Excel spreadsheets maxed out at 256 Columns and 65,536 rows. In 2010, the limit jumped to 16,384 columns by 1,048,576 rows: over 17 billion cells. And it’s easy to to go bigger, even from within Excel. It’s easy to hook this software survivor of computing’s Bronze Age, the 1980s, into external databases of arbitrary size and to pull data from the unbounded online and social Web.

So we see —

Small is a matter of choice, rather than a constraint. You don’t need special tools or techniques for small data. Conclusion: The small data category is a myth.

Regardless, do discussions of small data, myth or not, offer value? Is there a different data concept that works better? Or with an obsessive data focus, are we looking at the wrong thing? We can learn from advocates. I’ll choose just a few, and riff on their work.

Delimiting Small Data

Allen Bonde, now a marketing and innovation VP at OpenText, defines small data as both “a design philosophy” and “the technology, processes, and use cases for turning big data into alerts, apps, and dashboards for business users within corporate environments.” That latter definition reminds me of “data reduction,” a term for the sort of data analysis done a few ages ago. And of course, per Bonde, “small data” describes “the literal size of our data sets as well.”

I’m quoting from Bonde’s December 2013 guest entry in the estimable Paul Greenberg’s ZDnet column, an article titled 10 Reasons 2014 will be the Year of Small Data. (Was it?) Bonde writes, “Small data connects people with timely, meaningful insights (derived from big data and/or ‘local’ sources), organized and packaged –- often visually -– to be accessible, understandable, and actionable for everyday tasks.”

Small data: Mini-Me

Small data: Mini-Me

So (some) small data is a focused, topical derivation of big data. That is, small data is Mini-Me.

Other small data accumulates from local sources. Presumably, we’re talking the set of records, profiles, reference information, and content generated by an isolated business process. Each of those small datasets is meaningful in a particular context, for a particular purpose.

So small data is a big data subset or a focused data collection. Whatever its origin, small data isn’t a market category. There are no special small-data technique nor small data tools or systems. That’s a good thing, because data users need room to grow, by adding to or repurposing their data. Small data collections that have value tend not to stay small.

Encapsulating: Smart Data

Tom Anderson builds on a start-small notion in his 2013 Forget Big Data, Think Mid Data. Tom offers the guidance that you should consider cost in creating a data environment sized to maximize ROI. Tom’s mid data concept starts with small data and incrementally adds affordable elements that will pay off. Tom used another term when I interviewed him in May 2013, smart data, to capture the concept of (my words:) maximum return on data.

Return isn’t something baked into the data itself. Return on data depends on your knowledge and judgment in collecting the right data and in preparing and using it well.

This thought is captured in an essay, “Why Smart Data Is So Much More Important Than Big Data,” by Scott Fasser, director of Digital Innovation for HackerAgency. His argument? “I’ll take quality data over quantity of data any day. Understanding where the data is coming from, how it’s stored, and what it tells you will help tremendously in how you use it to narrow down to the bits that allow smarter business decisions based on the data.”

“Allow” is a key word here. Smarter business decisions aren’t guaranteed, no matter how well-described, accessible, and usable your datasets are. You can make a stupid business decision based on a smart data.

Of course, smart data can be big and big data can be smart, contrary to the implication of Scott Fasser’s essay title. I used smart in a similar way in naming my 2010 Smart Content Conference, which focused on varieties of big data that are decidedly not traditional, or small, data. That event was about enhancing the business value of content — text, images, audio, and video — via analytics including application of natural language processing to extract information, and generate rich metadata, from enterprise content and online and social media.

(I decided to focus my on-going organizing elsewhere, however. The Sentiment Analysis Symposium looks at applications of the same technology set to but targeting discovery of business value in attitudes, opinion, and emotion in diverse unstructured media and structured data. The 8th go-around will take place July 15-16, 2015 in New York.)

But data is just data — whether originating in media (text, images, audio, and video) or as structured tracking, transactional, and operational data — whether facts or feelings. And data, in itself, isn’t enough.

Extending: All Data

I’ll wrap up by quoting an insightful analysis, The Parable of Google Flu: Traps in Big Data Analysis, by academic authors David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani, writing in Science magazine. So happens I’ve quoted Harvard Univ Professor Gary King before, in my 4 Vs For Big Data Analytics: “Big Data isn’t about the data. It’s about analytics.”

King and colleagues write, in their Parable paper, “Big data offer enormous possibilities for understanding human interactions at a societal scale, with rich spatial and temporal dynamics, and for detecting complex interactions and nonlinearities among variables… Instead of focusing on a ‘big data revolution,’ perhaps it is time we were focused on an ‘all data revolution,’ where we recognize that the critical change in the world has been innovative analytics, using data from all traditional and new sources, and providing a deeper, clearer understanding of our world.”

The myth of small data is that it’s interesting beyond very limited circumstances. It isn’t. Could we please not talk about it any more?

The sense of smart data is that allows for better business decisions, although positive outcomes are not guaranteed.

The end-game is analysis that exploits all data — both producing and consuming smart data — to support decision-making and to measure outcomes and help you improve processes and create the critical, meaningful change we seek.