Text analytics is an enabling technology for deep social media understanding. We apply natural language processing (NLP) and data analysis and visualization techniques in an effort to make sense of the diversity of social postings. The social intelligence that results advances customer engagement and informs efforts to meet marketing, customer experience, product management, and reputation management needs.
I interviewed Pedro Cardoso of social intelligence leader Synthesio as part of preparation for December’s LT-Accelerate conference. Pedro will be speaking on language morphology (forms) in sentiment analysis. That’s a fairly technical topic, reflecting Pedro’s role as text analytics director at Synthesio, but one that will help business attendees understand the ins-and-outs of attitudes, opinions, and emotions in social and other text sources.
Pedro’s background: He earned an engineering degree in electronics and control systems and a masters in speech processing. His career path started in Portugal, as a research engineer, followed by 4 years in Japan and 5 years in France. For the majority of this time, he worked on speech processing, mostly relying on machine learning for acoustic and language modeling. For the last 2 years, Pedro has been working on natural language processing at Synthesio in Paris.
Q1: The topic of this Q&A is social media analytics. What’s your personal SMA background and your current work role?
Pedro Cardoso> My background is in machine learning applied to language technology. I started in development of speech recognition systems — language and acoustic statistical models. The focus was not on social media analysis (SMA), even if over the years I did some call-center development, including tests on sentiment analysis in voice. Over the last two and half years, ever since I joined Synthesio, I have been working full-time on SMA.
Currently I am responsible for NLP and text analytics development at Synthesio. Our objective is to create algorithms that help process and analyse social data collected by Synthesio, so that it can easily understood and exploited by our customers. This work includes data visualisation, document topic classification, and sentiment analysis.
Q2: What are key technical and business goals of the analyses you’re involved in?
Pedro Cardoso> Business drives technology, and customers needs drive business.
As mentioned above, our objective in the text analytics group is to find ways to structure and present information from social media sources in a simple way that customers can understand and get value from it. Our focus is on text. We classify and summarize it with the goal of obtaining meaningful key performance indicators (KPIs) from large quantities of data, which would be impossible without technology.
We also develop methods for detecting key influencers and deriving demographic information. This allows our customers to focus their searches on particular groups of social media users.
Q3: And what particular analytics approaches or technologies do you favor, whether for text, network, geospatial, behavioral, or other analyses?
Pedro Cardoso> If we focus on my work, I favor text and also study of network connections between online users. But if the question is what I believe to be the best technologies for SMA, that would have to be text also. Text is the medium, it is what customers use for communication. Network, geospatial, and other analytics are important, but mainly to focus our listening on a specific group. In the end, it is text, what SM users say, that counts.
Recently there has been interest on image analysis. People share more and more pictures. Sharing the picture of a brand logo or a product carries a strong brand loyalty message. Still, we need better image processing techniques and to learn how to best use information from images, in particular how it combines with text, in case of comments.
Social media allows us to focus on particular customers and groups, it allows us to have more personalized communications. In these cases, technologies such as demographic analysis and group detection gain favor, but discussing further, we would be getting off-topic.
Q4: To what extent do you get into sentiment and subjective information?
Pedro Cardoso> Automatic sentiment analysis is a great part of what I do as text analytics director. Our team is responsible for the development of automatic sentiment analysis at Synthesio, and has developed internally current support for 15 languages offered as part of the product.
Subjectivity is a very complicated subject, and one that I believe no one has managed to solve. To understand subjectivity, you need first to understand well the user and the context in which a message was written. After all, the real meaning is in the person’s mind. We are still not there, and it might take a long while to get there.
Q5: How do you recommend dealing with high-volume, high-velocity, diverse social postings — to ensure that analyses draw on the most complete and relevant data available and deliver the most accurate results possible?
Pedro Cardoso> We have developed data crawlers that ensure we can capture, enrich and standardize data from different sources worldwide should they come from largest social networks (Twitter, Facebook, Sina Weibo, VKontakte, etc.), mainstream media sources, and blogs or forums (thanks to a dedicated sourcing team of 5 people). This approach allows us to deal with several million social mentions each day and to provide for each of them a sentiment assessment, a global influence ranking (proprietary algorithm), and potential reach (another proprietary algorithm), on an ongoing basis and in near real time. It takes less than 2 minutes for a data to be crawled, parsed, enriched and pushed into client interfaces. Once structured with both metadata and enriched data, our clients can then access their dashboard. They can either work on global data volumes for main KPI tracking and trend analysis and/or on focused subsamples for deeper human qualitative analysis.
Q6: Could you provide an example (or two) that illustrates really well what Sythesio’s customers been able to accomplish via SMA, that demonstrate strong ROI?
Pedro Cardoso> Sure. One of our clients in the automotive industry has achieved, through deep analysis of first-customer feedback in European forums, identification of key barriers when it comes to acquiring an electric car. Based on the lessons, they had the ability to create a far more efficient digital and social media campaign. ROI was there for reducing costs before the campaign both in terms of message crafting and media planning. ROI was there after the campaign, which drove far more traffic to the Web site, and to dealerships for test drives, than previous efforts.
Another example we can give is a telco company that uses Synthesio for both listening and engaging directly with its clients on social networks, regarding client questions and complaints. By defining a precise listening scope and by clustering, combined with precise workflows for answer validation and publication, the client was able to measure ROI based on average answer time for any given question. By socializing answers to most frequent topics they also built up a C to C advice platform, which allows top users to directly address other customers questions. ROI is also achieved via fewer inbound calls to the call center.
Q7: Do you have recommendations to share, regarding choice of data sources, metrics, analytical methods, and visualizations, in order to best align with desired business outcome?
Pedro Cardoso> At Synthesio we hold two key principles when it comes to social data and metrics.
- We believe social analytics and intelligence have to be global. We have sources covering more than 200 countries, networks crawled natively in more than 50 languages, etc.
- And they have to be simple. We built business oriented metrics, comparable KPIs, and customizable interfaces to make sure that every single client within a company (from PR to marketing, from CRM to sales) can access the right data at the right moment.
Furthermore we know that social analytics can’t be envisaged as another data silo. That’s why we pay so much attention to openness and interconnections with other digital marketing tools (such as consumer review platforms like Bazaarvoice, owned communities platforms like Lithium, social marketing platforms like Spredfast, etc.), CRM (Salesforce.com, Microsoft Dynamics, etc.), or BI (IBM, etc.) tools used by our clients. Our open API helps them to both push data to such tools but also integrate data from other sources to get a 360° view of customer feedback, for instance.
Last recommendation we would like to share is “Don’t get too focused on data: Next step is people.” To better measure ROI, our clients have to go back to where it all began: Business is conducted by people and not by a data set. Being customer centric for better targeting, better personalization of messages, and better understanding of the brand relationship is what guides all of our present and future developments. Even though our roadmap is our best kept secret, be prepared to see more demographic profiling, audience targeting tools, and sales oriented measurement and anticipation metrics.
Q8: I’m glad you’ll be speaking at LT-Accelerate. Your topic is fairly technical — exploiting languages’ morphology for automatic sentiment analysis — noting that we do have a range of presentations on the program. Would you please tell me about your presentation, briefly: What attendees will learn.
Pedro Cardoso> The first thing we need to understand is the definition of morphology. Morphology of a word defines its structure: the root, part-of-speech, gender, conjugation, etc. And this is the first giveaway of the presentation.
Continuing, I will show how the use of morphological information of words helped us at Synthesio in building sentiment analysis, in particular for less represented languages, those that offer less labeled [training] data. Also, it is an important part of the system for agglutinative languages, whose vocabulary is theoretically close to infinite.
That wraps up this interview. I’m looking forward to Pedro Cardoso’s LT-Accelerate presentation. If you’re intrigued by what you read here, please do visit the conference Web site to learn more. And I hope you’ll join us 4-5 December 2014 in Brussels.