Emotion influences our actions and colors our interactions, which, to be blunt, means that emotion has business value. Understand emotions and model their associations with actions and you can gain insights that, if you do it right, enable “activation.”
Humans communicate emotion in many ways, notably via speech and written words, and non-verbally through our facial expressions. Our facial expressions are complex primitives that are fundamental to our knowing and understanding one another. They reveal feelings, that is, “affective states,” hence the company name Affectiva. Affectiva has commercialized facial coding and emotion analytics work done at the MIT Media Lab. The claim is that “deep insight into consumers’ unfiltered and unbiased emotional reactions to digital content is the ideal way to judge your content’s likability, its effectiveness, and its virality potential. Adding the emotion layer to digital experiences enriches these interactions and communications.”
I recruited Affectiva to speak at the up-coming Sentiment Analysis Symposium, taking place July 15-16, 2015 in New York. Principal Scientist Daniel McDuff, an alumnus of the MIT Media Lab, will represent the company. He will speak on “Understanding Emotion Responses Across Cultures,” of course about applying facial coding methods to the task.
Seth Grimes> Affectiva measures emotional reaction via facial coding. Would you please take a shot at describing the methods in just a few sentences?
Daniel McDuff> We use videos (typically from webcams) of people, track their face and analyze the pixel data to extract muscle movements. This is an automated way of coding Paul Ekman and Wallace Friesen’s facial taxonomy. We then infer emotion expression information based on the dynamic facial muscle movement information.
Seth> That’s the What. How about the How? What are the technical ingredients? A camera, obviously, but then what?
Daniel> For image capture a normal webcam or smartphone camera is sufficient. Analysis can be performed in two ways, 1) via the cloud in which case images are streamed to a server and analyzed or 2) on the device. The algorithms can be optimized to work in real-time and with very small memory footprint, even on a mobile device.
You earned your PhD as part of the Affective Computing group at MIT Media Lab, where Affectiva originated. (Not coincidentally, we had Affectiva co-founder Roz Picard keynote last year’s symposium.) What did your dissertation cover?
My dissertation focused on large-scale “crowdsourcing” of emotion data and the applications of this in media measurement. In the past behavioral emotion research focused on data sets with only a relatively small (~100) numbers of people. By using the Internet we are now able to capture data from 100,000s of people around the world very quickly.
Why are you capturing this data? For model building or validation? For actual purpose-focused analyses?
This data is a gold mine of emotional information. Emotion research has relied on studying the behavior of small groups of people until now. This has limited the types of insights that can be drawn from the data.Now we are able to analyze cross-cultural data from millions of individuals and find significant effects even within noisy observations.
If/when you capture data from 100,000s of people around the world, what more do you know, or need to know, about these people to make full, effective use of the data?
It is extremely helpful to have demographic information to accompany facial videos. We now know that there are significant differences between genders, age groups and cultures when it come to facial behavior. We may find that other factors also play a role. Affluence, personality traits and education would all be interesting to study.
You’ll be speaking at SAS15 on emotional response across cultures. How close or far apart are emotions and the way they’re expressed in different cultures? Are there universal emotions and ways of expressing them?
There are fascinating differences between cultures in terms of how facial expressions are exhibited. Indeed there is a level of cross-cultural consistency in terms of how some states are expressed (e.g. disgust, surprise). However, on top of this there are complex culturally dependent “display rules” which augment these expressions in different ways. Some of these relationships fit with intuition, others are more surprising.
A variety of affect-measurement technologies have emerged at MIT and other research centers that include text and speech analysis. Are cultural analyses consistent across the various approaches?
Emotion research is a HUGE field and to a certain extent the “face” community has been separate from the “voice” and “text” communities in the past. However, we are now seeing much more focus on “multi-modal” research which considers many channels of information and models the relationships between them. This is extremely exciting as we are well aware that different channels contain different types of emotional information.
What are some scenarios where facial coding performs best? Are there problems or situations where facial coding just doesn’t work?
Facial coding is most effective when you have video of a subject and they are not moving around/looking away from the camera a lot. It is also very beneficial to have context (i.e. what is the subject looking at, what environment are they in, are they likely to be talking to other people, etc.). Interpreting facial coding data can be challenging if you don’t know that context. This is the case for almost all behavioral signals.
What business problems are people applying facial coding to?
All sorts of things. Examples include: media measurement (copy-tesing ads, testing pilot TV shows, measuring cinema audience reactions), robotics, video conferencing, gaming, tracking car driver emotional states.
Could you discuss a scenario, say tracking car driver emotional states? Who might use this information and for what purpose? Say a system detected that a driver is angry. What then?
Frustration is a very common emotional state when driving. However, today’s cars cannot adapt to the drivers state. There is the potential to greatly improve the driving experience by designing interfaces that can sensitively respond when the driver’s state changes.
In a open situation like that one, with many stimuli, how would the system determine the source and object of the anger?
Once again, context is king. We need other sensors to capture environmental information in order to ascertain what is happening. Emotions alone is not the answer. An integrated multi-modal approach is vital.
Can facial-coding results be improved via multimodal analysis or cross-modal validation? Have Affectiva and companies like it started moving toward multimodal analysis, or toward marrying data on sensed emotions with behavioral models, psychological or personality profiles, and the myriad other forms of data that are out there?
Yes, as mentioned above different channels are really important. Affectiva has mostly looked at the face and married this data with contextual information. However, I personally have done a lot of work with physiological data as well. I will also present some of those approaches at the workshop.
You’re principal scientist at Affectiva. What are you currently working on by way of new or refined algorithms or technologies? What will the state of the art be like in 5 years, on the measurement front and regarding the uses emotion analytics will be put to?
As there are so many applications that could benefit from being “emotion aware” I would expect almost all mobile and laptop/desktop computer operating systems to have some level of emotion sensing in 5 years. This will facilitate more large-scale research on emotions.
And finally, do you have any getting-started advice for someone who’d like to get into emotion analytics?
Don’t under estimate the importance of context. When analyzing emotion data it is essential to understand what is happening since emotions and reactions are complex and vary between people.
Meet Dan at the July 15-16 Sentiment Analysis Symposium in New York. He’ll be speaking Thursday afternoon, July 16, in a segment that includes other not-only-text (NoText?) technologies — speech analytics, wearables, virtual assistants — proven but with huge market still in store. These talks follow another that’s really bleeding edge, a study of the semantics of emoji, “Emojineering @ Instagram,” presented by Instagram engineer Thomas Dimson. If you do attend the symposium, you can join us for either of the two days or both, and mix-and-match attendance at presentations and at longer-form technical workshops.