I recently posted an article, Metadata, Connection, and the Big Data Story, covering the big-data analysis process as applied to “human data” that is communicated in intentionally expressive sources such as text, video, and social likes and shares and in implicit expressions of sentiment.
The article is spun out from Q&A interview of four industry figures: Fernando Lucini (HP Autonomy), Marie Wallace (IBM), Elliot Turner (AlchemyAPI), and Stephen Pulman (University of Oxford and TheySay). (TheySay sponsored my recent, New York Sentiment Analysis Symposium.) Read each interview by clicking or tapping on the name of the interviewee.
This interview is with Stephen Pulman, and as a bonus, you’ll find a video embedded at the foot of this article, of Prof. Pulman’s March 6, 2014 Sentiment Analysis Symposium presentation, Deep Learning for Natural Language Processing. First —
Analytics, Semantics & Sense: Q&A with Stephen Pulman
1) What’s the really interesting information content that we’re not really getting at yet, and what’s interesting about it?
Well, “interesting” is relative to organizations and individuals. For organizations that are listening to customers, I’d say that properties of the message (e.g. likely to be fake/humorous/sincere) and the author (simple things like gender, age, maybe also more sophisticated things like influence, ideology etc) are the things that we are not always getting. They tell us how we should treat the content of the message, as well as being interesting in themselves.
2) How well are we doing with Natural Language Processing, noting that formally, “processing” includes both understanding and generation, two parts of a conversation?
Not up to speed on generation, I’m afraid: it does not seem a very active research area at present.
For analysis, we are making steady progress on parsing and semantic role labeling etc. for well-behaved text. Performance goes down pretty steeply for texts like tweets or other more casual forms of language use, unfortunately. Finding ways to customise existing parsers etc to new genres is an important research task.
3) And how well are we able to mine and automate understanding of affective states, of mood, emotion, attitudes, and intent, in the spectrum of sources available to us?
Again, reasonably well in well-behaved text. But a very difficult task is to pre-filter the texts so that only genuine expressions of these states are counted, otherwise any conclusions drawn (if you include texts like advertisements etc) will be misleading. And in some areas, even recognizing the entities you are interested in can be challenging: one of my students, for example, was interested in what people were saying about horse racing, but we found it almost impossible to harvest relevant data because of the wide range of names for horses: e.g. “Ask the wife,” “Degenerate,” and even “Say.” And in tweets there’s often no capitalization to help. I think this kind of data cleaning or data pre-processing will become more important as the proportion of robot-generated text out there increases.
4) Deep learning, active learning, or maybe some form of machine learning that’s being cooked up in a research lab: What business benefits are delivered by these technologies, and what are the limits to their usefulness, technical or other?
All forms of machine learning should deliver benefits in rapid adaptation to new languages and domains. But there is usually a long way to get from a neat research finding to an improved or novel product, and the things that researchers value (getting an extra 1% accuracy on a benchmark, for example) are often less important than speed, robustness and scalability in a practical setting. But for a practical demonstration of the utility of deep learning we have Google’s speech rec…
5) Mobile’s growth is only accelerating, complicating the data picture, accompanied by a desire for faster, more accurate, and more useful, situational insights delivery. How is your company keeping up?
Wearing my TheySay hat, we are about to release a service, MoodRaker, which will offer real time analysis of a large choice of different text streams, along a variety of dimensions, configurable from any (sensible) browser.
6) Where does the greatest opportunity reside, for your company as a solution provider? Internationalization? Algorithms, visualization, or other technical advances? In data integration and synthesis and expansion to new data sources? In monetizing data, that is, yourselves, or via partners, or assisting your customers? In untapped business domains or in greater uptake in the domains you already serve?
I wish I knew! One thing we have learned at TheySay is that a combination of text analysis like sentiment along with other, often numerical, data gives insights that you would not get from either in isolation, particularly in the financial services or brand management domains. Finding the right partners with relevant domain expertise is key to unlocking this potential.
7) Anything to add, regarding the 2014 outlook for analytical and semantic and sensemaking technologies?
No predictions, but I’m looking forward to seeing what happens…
Thank you to Stephen!
Click on the links that follow to read other Analytics, Semantics & Sense Q&A responses: Fernando Lucini, Marie Wallace, and Elliot Turner. And click here for the article I spun out from them, Metadata, Connection, and the Big Data Story.