Text Analytics 2014: Tom Anderson, Anderson Analytics and OdinText

I post a yearly look at the Text Analytics industry — technologies and market developments — from the provider perspective. This year’s is Text Analytics 2014.

To gather background material for the article, and for my forth-coming report Text Analytics 2014: User Perspectives on Solutions and Providers (which should be out by late May), I interviewed a number of industry figures: Lexalytics CEO Jeff Catlin, Clarabridge CEO Sid Banerjee, Fiona McNeill of SAS, Daedalus co-founder José Carlos González, and Tom Anderson of Anderson Analytics and OdinText. (The links behind the names will take you to the individual Q&A articles.) This article is –

Text Analytics 2014: Q&A with Tom Anderson, Anderson Analytics and OdinText

Tom H.C. Anderson

Tom H.C. Anderson, founder of Anderson Analytics and OdinText

1) How has the market for text technologies, and text-analytics-reliant solutions, changed in the past year? Any surprises?

Customers are starting to become a little more savvy than before which is something we really welcome. One of two things used to happen before, we either had to explain what text analytics was and what the value was or two, sometimes had to deal with a representative from purchasing who represented various departments all with different unrealistic and unnecessary expectations on their wish list. The latter especially is a recipe for disaster when selecting a text analytics vendor.

Now more often we are talking directly to a manager who oftentimes has used one of our competitors, and knows what they like and don’t like, has very real needs and wants to see a demo of how our software works. This more transparent approach is a win-win for both us and our clients.

2) Do you have a 2013 user story, from a customer, that really illustrates what text analytics is all about?

I have several great ones, but perhaps my favorite this year was how Shell Oil/Jifffy Lube used OdinText to leverage data from three different databases and discover exactly how to drive profits higher : http://adage.com/article/dataworks/jiffy-lube-net-promoter-score-goose-sales/243046/

3) How have perceptions and requirements surrounding sentiment analysis evolved? Where are sentiment capabilities heading, in your view?

OdinText handles sentiment quite a bit differently than other companies. Without getting into that in detail, I will say that I’m pleased to see that one good thing has happened in regard to the discourse around sentiment. Specifically, vendors have stopped making sentiment accuracy claims, as they seem to have figured out what we have known for quite some time, that accuracy is unique to data source.

Therefore the claims you used to hear like “our software is 98% accurate” have stopped. This is refreshing. Now you are likely to only hear accuracy claims from academia, since usually they have very limited access to data and are less likely to work with custom data.

Equally important in the industry realizing that sentiment accuracy claims don’t make sense is the fact that even clients have started to realize that comparing human coding to text analytics is apples to oranges. Humans are not accurate, they are very limited, Text analytics is better, but also very different!

4) What new features or capabilities are top of your customers’ and prospects’ wish lists for 2014? And what new abilities or solutions can we expect to see from your company in the coming year?

We’ve been adding several new powerful features. What we’re struggling with is adding more functionality without making user interface more difficult. We’ll be rolling some of these out in early 2014.

5) Mobile’s growth is only accelerating, complicating the data picture, accompanied by a desire for faster, more accurate, and more useful, situational insights delivery. How are you keeping up?

I know “real time analytics” is almost as popular buzzword as “big data”, but OdinText is meant for strategic and actionable insights. I joke with my clients when discussing various “real-time reporting” issues that (other than ad-hoc analysis of course) “if you are providing standard reports any more often than quarterly or at most monthly, then no one is going to take what you do very seriously”. I may say it as a joke, but of course it’s very true. Real-time analytics is an oxymoron.

6) Where does the greatest opportunity reside, for you as a solution provider Internationalization? Algorithms, visualization, or other technical advances? In data integration and synthesis and expansion to new data sources? In providing the means for your customers to monetize data, or in monetizing data yourselves? In untapped business domains or in greater uptake in the domains you already serve?

I do think there’s a lot more opportunity in monetizing data, one of the things we are looking at.

7) Do you have anything to add, regarding the 2014 outlook for text analytics and your company?

End of 2013 was better than expected, so very excited about 2014.

Thank you to Tom! Click on the links that follow to read other Text Analytics 2014 Q&A responses: Lexalytics CEO Jeff Catlin, Clarabridge CEO Sid BanerjeeFiona McNeill of SAS, and Daedalus co-founder José Carlos González. And click here for this year’s industry summary, Text Analytics 2014.

Text Analytics 2014: Sid Banerjee, Clarabridge

I post a yearly look at the Text Analytics industry — technologies and market developments — from the provider perspective. This year’s is Text Analytics 2014.

To gather background material for the article, and for my forth-coming report Text Analytics 2014: User Perspectives on Solutions and Providers (which should be out by late May), I interviewed a number of industry figures: Lexalytics CEO Jeff Catlin, Clarabridge CEO Sid Banerjee, Fiona McNeill of SAS, Daedalus co-founder José Carlos González, and Tom Anderson of Anderson Analytics and OdinText. (The links behind the names will take you to the individual Q&A articles.) This article is –

SidBanerjeeText Analytics 2014: Q&A with Sid Banerjee, Clarabridge

1) How has the market for text technologies, and text-analytics-reliant solutions, changed in the past year? Any surprises?

The market has seen a lot more competition by way of historically non-text analytics vendors adding various forms of text analytics solutions to their product mix.  In 2013 several survey companies added text and sentiment analytics capability.  Workforce Management vendors highlighted their customer experience analytics capabilities, many powered by text (and speech) analytics capabilities – which were, depending on the vendor, home grown, or licensed from pure play text companies.  And even social CRM, and social marketing vendors – whose primary focus until this year was social communication and marketing automation processes, started adding sentiment mining and text analytics capabilities into their product mix.  As a result the market got a bit more confusing from a vendor selection perspective.  Firms like Clarabridge have continued to tout “multichannel” customer experience intelligence and text/sentiment capabilities – partly because it’s always been our focus, but also to seek to differentiate from the new crop of mostly point solution providers of text analytics.  It’s likely that this trend of more point solutions focused on single source based analytics and deployment to departmental users, while enterprise providers focus more on multichannel analytics, and enterprise deployments, will continue in 2014.

2) Do you have a 2013 user story, from a customer, that really illustrates what text analytics is all about?

A few.  An airline customer merged with a major airline and over the course of 2013 used text analytics (from surveys, social, and customer call centers) to ensure critical customer feedback was incorporated into the inevitable changes that occur when companies come together.  Feedback was used to figure out how to manage switching a customer base to a new brand of coffee with minimum disruption.  Feedback was used to identify which boarding processes (from the two airlines) was most acceptable to the passengers of the other airline.  And branding, support, frequent flyer programs, and many other processes were monitored and modified, as needed to ensure customer needs and wants were met.

My favorite story comes from a leading wireless vendor, who used Clarabridge during Hurricane Sandy. (while Sandy occurred in 2012, I learned about it in early 2013).  The carrier suffered extensive network outages along the Jersey Shore, and of course the outages, and general suffering from their customers who suffered displacement and devastation affected entire communities.  As the carrier was tracking outages, customer feedback, and general recovery efforts after the hurricane, they caught wind of a trending topic via social and other channels of customers wondering if they were going to be charged for the days and weeks their service was out.  Left unaddressed, the company realized they were likely to see a growing chorus of requests for credits flooding their call centers from unhappy and inconvenienced customers.  After consulting with the business owners across the region, the carrier decided to proactively notify all affected customers that if they were residents of areas incurring outages, their charges would be proactively suspended while reconstruction work was going on.  The positive impact on customer satisfaction was immediate, and the company averted a frustrating and distracting impact on its customer relationships.

Both stories highlight a consistent theme about the value of text analytics in the context of customer experience.  If you listen to your customers, you can find problems, you can fix things more proactively, you can avert cost and inconvenience for both you and your customers, and you can create a more loyal and and lasting relationship between you and your customers.   That’s really what it’s all about.

3) What new features or capabilities are top of your customers’ and prospects’ wish lists for 2014?  And what new abilities or solutions can we expect to see from your company in the coming year?

At a high level – expect to see the following:

More Big Data: we will support ever high data volumes

More Big Data Analytics: we will support more use of analytics to separate actionable data from non actionable data, to identify trends and insights, to make recommendations and predictions, and to suggest interactions and conversations between companies and customers.

More Uses:  In the past our customers have generally looked to Clarabridge insights and analytics, powered by text and sentiment analytics.  They will continue to see business value and application value in these areas, but our products have evolved in 2013 and will continue to evolve in 2014 to include more mobile deployability, more collaboration and alerting capability, and more capability to recommend and enable customer engagement.  Our solutions will increasingly be designed to support the specific usability and functionality requirements of key practitioners of customer experience analysis, customer engagement, and customer support.

4) Mobile’s growth is only accelerating, complicating the data picture, accompanied by a desire for faster, more accurate, and more useful, situational insights delivery. How are you keeping up?

We launched Clarabridge GO in fall 2013. With this application Android, iPhone, and iPad users can run reports, get alerts, view dashboards, collaborate with fellow employees, and directly engage with end customers, all from their mobile applications.  The application brings together social, survey and feedback content into a single mobile portal, alerting, and engagement framework.  We are seeing more and more of our customers looking for mobile access to insights and looking for the platform to engage and respond. Clarabridge GO is designed to package the Clarabridge capability for the mobile user.

5) Where does the greatest opportunity reside, for you as a solution provider? Internationalization? Algorithms, visualization, or other technical advances? In data integration and synthesis and expansion to new data sources? In providing the means for your customers to monetize data, or in monetizing data yourselves? In untapped business domains or in greater uptake in the domains you already serve?

More markets/languages – we will continue to internationalize our product for more markets, languages, and use cases.

Algorithms – we are continuing to invest in making our algorithms more scalable (to handle more volumes), more “intelligent” to provide more recommendation/action based findings, not just insights), and more accurate/useful (separating useful data from noise, ensuring the most accurate mappings and tagging are occurring as we process text and unstructured content), and more linked (connecting more data points from more disparate sources into integrated, linked insights across more and more customer interaction touch points.

Extending the core platform to more “uses” for more “users.” – lots of plans here – we will be announcing more in 2014.

More content types.  We intend to follow customer expression in whatever form factor it takes.  Customers increasingly are mixing media, structured, semistructured, unstructured.  We will continue to look for ways to follow customer conversations across media, and apply intelligence to structure, deduce, provide insights, and help make recommendations.

6) Do you have anything to add, regarding the 2014 outlook for text analytics and your company?

More partnerships – 2014 is the year I expect to see major productive partnerships developing between technology service, and support partners.  Companies are making major investments in institutionalizing customer experience across the enterprise, powered by customer insights extracted from unstructured customer interaction and feedback data.  To support the investments, expect to see the big Systems Integrators, Business Process Outsources (BPOs), marketing services companies, and technology vendors working more and more closely together to common cause – helping customers realize value to customer experience insights.  Making better products.  Creating positive customer experiences.  Marketing more relevant and successful campaigns. And more smartly managing customer relationships.  All aided by intelligent customer experience technologies like Clarabridge.

Thank you to Sid! Click on the links that follow to read other Text Analytics 2014 Q&A responses: Lexalytics CEO Jeff Catlin, Fiona McNeill of SAS, Daedalus co-founder José Carlos González, and Tom Anderson of Anderson Analytics and OdinText. And click here for this year’s industry summary, Text Analytics 2014.

Text Analytics 2014: Fiona McNeill, SAS

I post a yearly look at the Text Analytics industry — technologies and market developments — from the provider perspective. This year’s is Text Analytics 2014.

To gather background material for the article, and for my forth-coming report Text Analytics 2014: User Perspectives on Solutions and Providers (which should be out by late May), I interviewed a number of industry figures: Lexalytics CEO Jeff Catlin, Clarabridge CEO Sid Banerjee, Fiona McNeill of SAS, Daedalus co-founder José Carlos González, and Tom Anderson of Anderson Analytics and OdinText. (The links behind the names will take you to the individual Q&A articles.) This article is –

Text Analytics 2014: Q&A with Fiona McNeill, SAS

Fiona McNeill is Global Product Marketing Manager at SAS, co-author of The Heuristics in Analytics: A Practical Perspective of What Influences Our Analytical World. The following are her December, 2013 Q&A responses:

1) How has the market for text technologies, and text-analytics-reliant solutions, changed in the past year? Any surprises?

Text analytics is now much more commonly recognized as mainstream analysis, seen to improve business decisions, insights and helping drive more efficient operations.   Historically, those of us in this field spent time gaining mindshare that text should be analyzed (beyond analysis of sentiment, mind you) – and over the past year this has shifted to best practice methods of describing the ROI from text analytics to upper management.  This demonstrates common recognition within organizations that there is value in doing text analysis in the first place. And now the focus has shifted to how best to frame that value for senior stakeholders.

The ease of analyzing big text data (hundreds of millions or billions of documents) has also improved over the past year, including extensions of high-performance text mining (from SAS) to new distributed architectures, like Hadoop and Cloudera.  Such big content technologies will continue to expand and we can expect functionality to extend to more interactive and visual text analytics capabilities over the coming year.

2) Do you have a 2013 user story, from a customer, that really illustrates what text analytics is all about?

We can speak to customer applications that illustrate what text analytics is all about, not mentioning names unfortunately.   One is a retail client, that recognized text data as a rich source, addressing a wide range of initial business challenges – from real-time digital marketing, bricks-and-mortar risk monitoring, automatic detection of issues and sentiment from customer inquiries, internal problem identification from on-line help forums, improve web purchases with more relevant content, improving predictive model scores for job candidate suitability, and more. This SAS customer understood that text data is everywhere, which means that analysis of text data will help them better answer whatever business question they have.

Another customer is a manufacturer, who strategically understands that the power of text analytics and how it improves collaboration, communication and productivity within an organization. As such, they wanted an extensible platform to address all types of text documents. They also had a wide-range of written languages that they needed to integrate into existing search and discovery methods, in order to provide more accurate and more relevant information across their entire business. This SAS customer understood the innovation that can come when resources are freed from searching, and when they are empowered with finding the answers they need and when they need it, creating an organization with “The Power to Know.”

We have a European customer announcement [that came] out in February, focused on leveraging WiFi browsing behavior and visitor profiles to create prescriptive advertising promotions in real-time to in-store shoppers. This is big data, real-time, opportunistic marketing – driven by text insights and automated operational decision advertising execution. In other words, putting big text insights to work – before the data is out of date.

3) How have perceptions and requirements surrounding sentiment analysis evolved? Where are sentiment capabilities heading, in your view?

It is no longer necessary to explain why sentiment analysis is important, it’s been largely accepted that customer, prospect and the public perception an organization is useful to understand product and brand reputation.  Historically, there was a focus on how well these models worked. It’s gradually being understood that there are tradeoffs between precision and recall associated with sentiment scores, at least in some domains.  Acceptance it appears (and as with any new modeling technique), has occurred within the bounds of applicability to adding previously unknown insight into the context of comments, reviews, social posts and alike.  To that end, and when a generalized methodology is used, as is the case at SAS, the sentiment polarity algorithm is evolving to examine an even broader set of scenarios – from employee satisfaction, author expertise, mood of an individual, and so forth.  Sentiment appears to be headed to the structured data analysis realm – becoming a calculated field that is used in other analysis – like predictions, forecasts, and interactive visual interrogation. And as such, identifying the ROI of sentiment analysis efforts is expected to become easier.

4) What new features or capabilities are top of your customers’ and prospects’ wish lists for 2014? And what new abilities or solutions can we expect to see from your company in the coming year?

At SAS, all software development is driven by our customer needs – and so products you see coming from SAS are based on what they told us require to solve business challenges and take advantage of market opportunities. For text analytics, our customers continue to want to more interactive text visualizations – to make it even easier to explore data to both derive analysis questions and to understand the insights from text results. They want easier methods to develop and deploy text models. Our customers also want more automation to simplify the more arduous text related tasks, like taxonomy development. They want to easily access the text, understand it, the sentiment expressed in it, extract facts and define semantic relationships – all in one, easy-to-use environment. They don’t want to learn a programming language, spend time and resource integrating different technologies or use multiple software packages.  We’ve responded to this with the introduction of SAS Contextual Analysis – that will, by mid-year 2014 expand to provide an extremely comprehensive, easy-to-use and highly visual environment for interactively examining and analyzing text data. It leverages the power of machine learning and includes with end-user subject matter expertise.

We will also continue to extend technologies and methods for examining big text data – continuing to taking advantage of multi-core processing and distributed memory architectures for addressing even the most complex operational challenges and decisions that our customers have. We have seen the power of analyzing big data with real-time data-driven operations and will continue to extend platforms, analytic methods and deployment strategies for our customers. In October, 2013, we announced our strategic partnership with SAP – to bring SAS in-memory analytics to the SAP HANA platform. You’ll see our joint market solutions announced over the coming year.

5) Mobile’s growth is only accelerating, complicating the data picture, accompanied by a desire for faster, more accurate, and more useful, situational insights delivery. How are you keeping up?

With a single platform for all SAS capabilities we have ability to interchange a wide range of technologies, which can easily be brought together to solve even the most complex analytic business challenges, for mobile or other types of real-time insight delivery. SAS offers a number of real-time deployment options, including SAS Decision Manager (for deploying analytically sound operational rules), SAS Event Stream Processing Engine (for analytic processing within event streams), SAS Scoring Accelerator for Hadoop (as well as other big data stores – for real-time model deployment), and real-time environments for analyzing and reporting data – that operate on mobile devices, such as SAS Visual AnalyticsSAS also has native read/write engines and support for web services, and as mentioned above, we have recently announced strategic partnership with SAP for joint technology offerings bring the power of analytics to the SAP/HANA platform.

We are constantly extending such capabilities, recognizing that information processing is bigger, faster and more dependent on well-designed analytic insight (including that from text data) than ever before. This growing need will only continue.

6) Where does the greatest opportunity reside, for you as a solution provider? Internationalization? Algorithms, visualization, or other technical advances? In data integration and synthesis and expansion to new data sources? In providing the means for your customers to monetize data, or in monetizing data yourselves? In untapped business domains or in greater uptake in the domains you already serve?

Given our extensive portfolio of solutions, SAS continues to invest in technology advances that our customers tell us they want to address the growing complexities of their business.  This includes ongoing advances in algorithms, deployment mechanisms, data access, processing routines and other technical considerations. We continue to expand our extensive native language support, with over 30 languages and dialects already available in our text analytics products. Additional languages will be added as customer needs dictate.  And while we already offer solutions to virtually every industry, we continue to further develop these products to provide leading edge capabilities for big data, high-performance, real-time analytically-driven results for our customers.  You’ll see us moving more and more of our capabilities to cloud architectures.  For SAS, another great opportunity is the production deployment of analytics to automate, streamline and advance the activities of our customers. You’ll continue to see announcements from us over the coming year.

7) Do you have anything to add, regarding the 2014 outlook for text analytics and your company?

At SAS, text data is recognized as being a rich source of insight that can improve data quality, accessibility and decision-making. As such, you’ll see text-based processing capabilities in products outside of the pure-play text analytics technologies.  And because of the common infrastructure that has been designed by SAS – all of these capabilities are readily integrated, and can be used to address a specific business scenario.  We will continue to extend text-based processing and insights into traditional predictive analysis, forecasting and optimization – as well as new solutions that include text analysis methods, and updates to existing products, like SAS Visual Analytics and our upcoming release of a new in-memory product for Hadoop (release announcement pending).   From a foundational perspective, text-based processing continues to be extended throughout our platform, with pending linguistic rules augmenting business and predictive scoring in real-time data streams, with extensions to analytically derived metadata from text and more.  And given the nature and recognition of text and what it can bring to improved insights, you’ll also see our industry solutions continue to extend the use of text-based knowledge.

Thank you to Fiona! Click on the links that follow to read other Text Analytics 2014 Q&A responses: Lexalytics CEO Jeff Catlin, Clarabridge CEO Sid Banerjee, Daedalus co-founder José Carlos González, and Tom Anderson of Anderson Analytics and OdinText. And click here for this year’s industry summary, Text Analytics 2014.

Text Analytics 2014: Jose Carlos Gonzalez, Daedalus

I post a yearly look at the Text Analytics industry — technologies and market developments — from the provider perspective. This year’s is Text Analytics 2014.

To gather background material for the article, and for my forth-coming report Text Analytics 2014: User Perspectives on Solutions and Providers (which should be out by late May), I interviewed a number of industry figures: Lexalytics CEO Jeff Catlin, Clarabridge CEO Sid Banerjee, Fiona McNeill of SAS, Daedalus co-founder José Carlos González, and Tom Anderson of Anderson Analytics and OdinText. (The links behind the names will take you to the individual Q&A articles.) This article is –

Jose Carlos GonzalezText Analytics 2014: Q&A with José Carlos González, Daedalus

1) How has the market for text technologies, and text-analytics-reliant solutions, changed in the past year? Any surprises?

Over the past year, there has been a lot of buzz around text analytics. We have seen a sharp increase of interest around the topic, along with some caution and distrust by people from markets where a combination of sampling and manual processing has been the rule until now.

We have perceived two (not surprising) phenomena:

- The blooming of companies addressing specific vertical markets incorporating basic text processing capabilities. Most of the time, text analytics functionality is achieved through integration of general-purpose open source, simple pattern matching or pure statistical solutions. Such solutions can be built rapidly from large resources (corpora) available for free, which has lowered entry barriers for newcomers at the cost of poor adaptation to the task and low accuracy.

- Providers have strengthened the effort carried out to create or educate the markets. For instance, non-negligible investments have been made to make the technology easily integrable and demonstrable. However, the accuracy of text analytics tools depends to some extent on the typology of text (language, genre, source) and on the purpose and interpretation of the client. General-purpose and do-it-yourself approaches may lead to deceive user expectations due to wrong parametrization or goals outside the scope of particular tools.

2) Do you have a 2013 user story, from a customer, that really illustrates what text analytics is all about?

One of our most challenging projects in 2013 was about real-time analysis of social media content for a second screen application, where text analytics has had multiple roles: First, providing capabilities to focus on aspects of the social conversation about TV programs (actors, characters, anchor, sponsors, brands, etc.), analyzing at the same time the sentiment expressed on them.

Second, recommending and delivering content extracted automatically from external sources. These sources can be general (Wikipedia), specialized (TV magazines and websites), personal or organizational (web, Facebook or LinkedIn pages of people or organizations), and popular content shared by users in social media.

Third, providing clues for contextual and intent-based advertising. Fourth, profiling user interest and (inferred) demographic features for targeted and personalized advertising (beyond contextual). The project, which we fully carried out for a telecom company owning a DTTV license with multiple TV channels, involved real-time text analytics in a big data environment, plus all the visualization and user interface design.

This user case is shown to demonstrate the versatility of text analytics to fulfill multiple roles in a single project.

3) What new features or capabilities are top of your customers’ and prospects’ wish lists for 2014? And what new abilities or solutions can we expect to see from your company in the coming year?

Our goal for 2014 is to cover progressively the specific needs of our clients by helping them to develop solutions in vertical markets, freeing them from the complexity of language processing technology. This involves developing our API offering in Textalytics (http://textalytics.com), our Meaning as a Service product.

The first new API to be delivered in 2014 is specialized in semantic publishing. A second one planned in our road map will be for Voice of Customer Analysis.

As personalization is also a need perceived across markets, we are also integrating a user management API, allowing our clients to edit and manage by themselves specialized dictionaries, classification taxonomies and models.

4) Mobile’s growth is only accelerating, complicating the data picture, accompanied by a desire for faster, more accurate, and more useful, situational insights delivery. How are you keeping up?

As explained above, one of our main working topics in 2013 was about real-time analysis of social media streams for different purposes, integrated in smartphone and tablet apps. In particular, we have developed a second screen app for the TV industry. We perceive that mobile apps will continue acting as a major force, designing new scenarios and driving further opportunities for text analytics technologies.

5) Where does the greatest opportunity reside, for you as a solution provider? Internationalization? Algorithms, visualization, or other technical advances? In data integration and synthesis and expansion to new data sources? In providing the means for your customers to monetize data, or in monetizing data yourselves? In untapped business domains or in greater uptake in the domains you already serve?

Having experience in many different industries (media, telecom, defense, banking, market intelligence, online marketing, etc.) and in many different languages, our greater challenge is internationalization. The Textalytics brand implements our strategy for a global, multi-industry offering. Textalytics represents a new semantic/NLP API concept in the sense that it goes well beyond the basic horizontal functionality that is being offered in the market: we also offer pre-packaged, optimized functionality for several industries and applications and the possibility for the customer to tailor the system with their dictionaries and models. The customer benefits are a much higher productivity with low risks and costs.

6) Do you have anything to add, regarding the 2014 outlook for text analytics and your company?

In 2013 we have developed a good amount of paid pilots and proof-of-concept prototypes for clients from different areas. Clients start to understand the real potential, limitations and place of text analytics technologies. In 2014, clients are able to see the value already deployed in solutions addressing scenarios very similar to their own, which will foster a more rapid adoption of text analytics solutions.

Thank you to José! Click on the links that follow to read other Text Analytics 2014 Q&A responses: Lexalytics CEO Jeff Catlin, Clarabridge CEO Sid BanerjeeFiona McNeill of SAS, and Tom Anderson of Anderson Analytics and OdinText. And click here for this year’s industry summary, Text Analytics 2014.

Text Analytics 2014: Jeff Catlin, Lexalytics

I post a yearly look at the Text Analytics industry — technologies and market developments — from the provider perspective. This year’s is Text Analytics 2014.

To gather background material for the article, and for my forth-coming report Text Analytics 2014: User Perspectives on Solutions and Providers (which should be out by late May), I interviewed a number of industry figures: Lexalytics CEO Jeff Catlin, Clarabridge CEO Sid Banerjee, Fiona McNeill of SAS, Daedalus co-founder José Carlos González, and Tom Anderson of Anderson Analytics and OdinText. (The links behind the names will take you to the individual Q&A articles.) This article is –

JeffCatlinText Analytics 2014: Q&A with Jeff Catlin, Lexalytics

1) How has the market for text technologies, and text-analytics-reliant solutions, changed in the past year? Any surprises?

I just did a blog post on the state of the industry. The basic position is that the market/industry looks nothing like it did at the beginning of 2013. Most of the traditional players are gone, or are focusing their businesses vertically. The entire post can be seen here… http://www.lexalytics.com/lexablog

2) How have perceptions and requirements surrounding sentiment analysis evolved? Where are sentiment capabilities heading, in your view?

This is a very interesting and important question… I believe it should be heading to a simpler place where broad applicability would ease its adoption. Lexalytics is pushing hard on this idea with the concept of “Juiciness”. If you ask people to explain sentiment they really struggle, but if you said tell me what a Juicy story is and you get surprisingly similar answers. We believe this is where sentiment should go because it’s business value is mostly on the edges (very positive and very negative), and that’s what Juiciness is. Others are going totally the other direction and pouring emotional states into the mix, which is both difficult to do, and even harder to get people to agree on. This is clearly one of the areas where many people think it’s heading, but we don’t see how broad adoption can happen with such a complex idea.

3) What new features or capabilities are top of your customers’ and prospects’ wish lists for 2014? And what new abilities or solutions can we expect to see from your company in the coming year?

The push is on two fronts: one is simply to keep improving the basic features and functions, like better sentiment scoring and better ad-hoc categorization, while the other is around a couple of new features, document morphology and user intention analysis. Morphology simply means understanding the type and structure of a document. This is very important if you’re processing big pdf or word docs where knowing that a table is in fact a table, or knowing that something is a section heading is important. Intention analysis on the other hand is the ability to profile the author of a piece of text to predict that they might be planning a purchase (intent to buy), or to sell, or in another domain there might be an intention to harm.

As a company, Lexalytics is tackling both the basic improvements and the new features with a major new release, Sallience 6.0 which will be landing sometime in the second half of the year. The core text processing and grammatic parsing of the content will improve significantly, which will in turn enhance all of our core features of the engine. Additionally, this improved grammatic understanding will allow us to be the key to detecting intention, which is the big new feature in Salience 6.0

4) Mobile’s growth is only accelerating, complicating the data picture, accompanied by a desire for faster, more accurate, and more useful, situational insights delivery. How are you keeping up?

You can look at mobile from two different angles. One, it’s simply a delivery mechanism for results, and two, it’s a generator of more user insights (how are people interacting with their applications?). The first aspect of this is the most interesting to us. As a delivery mechanism, it poses unique challenges for packing as much as possible into small, responsive real estate. This is where techniques like summarization and extraction of meaning makes for some really interesting applications.

You use an interesting phrase: “situational insights delivery” – which I would take to mean “what’s relevant to me right now.” Let’s take the simplest, most common application that is still highly relevant to “situations” – email. Wouldn’t it be nice to be able to scan and see, across all these emails that you don’t really want to have to parse on your iPhone, just what exactly you need to do? In Spring, 2013, we used some of our rich contextual grammar language to ship “imperative sentence” (aka “action item”) extraction. If it’s “blah, blah, blah, please send me the powerpoint from yesterday” – we’ll strip out the blah and give you the “please send me the powerpoint from yesterday.” This technology is in use with companies providing customer support services and applications, to help their reps get to the meat of what they need to do – right now.

This same thinking applies to any of a number of other sets of applications. One way of looking at the whole science of Text Analytics is extracting the essence of meaning and compacting it into a form that is the smallest possible representation of this information. “Who are the people that are involved?” ” Is this a product that I care about?” “Whoa, that’s really bad, I should get right on that.” And, with small screens, data that has been quickly compressed into units with high informational value is more useful than the original text stream.

5) Where does the greatest opportunity reside, for you as a solution provider? Internationalization? Algorithms, visualization, or other technical advances? In data integration and synthesis and expansion to new data sources? In providing the means for your customers to monetize data, or in monetizing data yourselves? In untapped business domains or in greater uptake in the domains you already serve?

As described in question 1, the industry is changing dramatically, and all vendors will have to change with it. We are progressing on 2 fronts. We are continuing to push to improve the core technology (Algorithms, Internationalization and big data) while looking for new vertical market opportunities. Our specific focus has been on mobile content applications and the backend framework to enable the rapid development and deployment of mobile content applications.

6) Do you have anything to add, regarding the 2014 outlook for text analytics and your company?

The walls are tumbling down. 2013 and 2014 are the enterprise data equivalent of the fall of the Berlin Wall, where data that was jealously guarded by individual groups is now available enterprise-wide. What this means, paradoxically, is that there is a lot more demand for small-group analysis. The data is corporate, but the results need to be local to help a sales team, or figure out where to go with the marketing of a single product. This is a really important driver for highly accessible text analytics that doesn’t look like text analytics – where it’s just a natural part of how you go about your day. We’re pushing partnerships and technology in 2014 that can help drive this once daunting technology to where it’s functionally invisible, just like search.

Thank you to Jeff! Click on the links that follow to read other Text Analytics 2014 Q&A responses: Clarabridge CEO Sid BanerjeeFiona McNeill of SAS, Daedalus co-founder José Carlos González, and Tom Anderson of Anderson Analytics and OdinText. And click here for this year’s industry summary, Text Analytics 2014.

Text Analytics 2014

It’s past time for my yearly status/look-ahead report, on text analytics technology and market developments. Once again, I’ll cover the solution side — technical and financial — and also the conference scene and a few ways you can learn more about today’s text-analytics world.

My market analysis may surprise you. I’ll get to that point, and the rest of my report, after a quick invitation:

(Where's that secret decoder ring when I need it?)

Where’s that secret decoder ring when I need it?

I plan to release a market study, Text Analytics: User Perspectives on Solutions and Providers, later this spring. It will be a follow-on to the studies I conducted in 2011 and 2009. I’ve held the survey open. If you are a current or prospective text-analytics user, please respond by April 16. The survey will take only 5-10 minutes to complete. I appreciate your help.

Text Analytics as a Market Category

I reported positive technology and market outlooks in each of the last few years, in 2013 and in 2012. This year is a bit different. While technology development is strong, driven by the continued explosion in online and social text volumes, I feel that the advance of text analytics as a market category has stalled. The question is not business value. The question is data focus and analysis scope. Blame big data.

“Text analytics” naturally implies work primarily or exclusively with text. Contrast with big data analytics as a category (and put aside that we’ve been seeing a backlash against the “big data” label, as variously a) vague to the point of being without referent, b) limited to Hadoop, and c) more-talked-about-than-done). The big-data concept captures, in its Variety V, the notion that we should assimilate and integration data of all relevant sources and types. I’ve been preaching integrated analytics for years. Integrated analysis, whether labeled big data analytics, social intelligence, or something else, is preferable and possible for the majority of business needs. Text analytics, so often, isn’t, and shouldn’t be, enough.

Secondly, text-analytics technology is increasingly delivered embedded in applications and solutions, for customer experience, market research, investigative analysis, social listening, and many, many other business needs. These solutions do not bear the text-analytics label.

Affirmation?

Lexalytics CEO Jeff Catlin talks about “highly accessible text analytics that doesn’t look like text analytics – where it’s just a natural part of how you go about your day.” About his own company, Jeff says, “We’re pushing partnerships and technology in 2014 that can help drive this once daunting technology to where it’s functionally invisible, just like search.”

Basis Technology co-founder Steve Cohen has a similar perspective. Steve says, “I’m not sure that text analytics is a separable ‘thing,’ or at least enough of a thing, to stand on its own as a market. We find that we sell to a search market (which is well enough understood), a compliance market (also mature), and have a growing activity in a solutions market that you could call ‘text enabled information discovery’.”

Fiona McNeill, Text Analytics Product Marketing Manager at SAS, could have been speaking about a broad set of solution providers, and not just her own employer, when she told me, “we will continue to extend text-based processing and insights into traditional predictive analysis, forecasting, and optimization.”

According to Clarabridge co-founder and CEO Sid Banerjee, “the market has seen a lot more competition by way of historically non-text analytics vendors adding various forms of text analytics solutions to their product mix.” Sid continues, “workforce management vendors… and even social CR, and social marketing vendors [have] started adding sentiment mining and text analytics capabilities into their product mix.”

Still, whether within or across market-category boundaries, there are significant text-analytics technology, market, and community developments to report.

Text Technology Developments

Certain text-technology developments track those of other technology domains. I’ll list several:

  • Modeling advances via deep learning and, especially, unsupervised and semi-supervised methods.

AlchemyAPI CEO Elliot Turner makes the case for these technologies as follows, explaining, “deep learning can produce more robust text and vision systems that hold their accuracy when analyzing data far different from what they were trained on. Plus, unsupervised techniques make it practical to keep up with the rapid evolution of everyday language.” According to Elliot, Google has never before seen 15% queries of submitted queries, over 500 million daily, a rate unchanged in the 15 years. Elliot states, “the ability to understand today’s short, jargon-filled phrases, and keep up with tomorrow’s new words, is predicated on mastering unsupervised, deep learning approaches.”

  • Scale-out via parallelized and distributed technologies.

SAS’s Fiona McNeill says “the ease of analyzing big text data (hundreds of millions or billions of documents) has improved over the past year,” which for SAS means “extensions of high-performance text mining to new distributed architectures, like Hadoop and Cloudera.” Looking ahead, Fiona explained that SAS “will continue to extend technologies and methods for examining big text data – continuing to taking advantage of multi-core processing and distributed memory architectures for addressing even the most complex operational challenges and decisions that our customers have.”

Let’s call this direction NoHadoop, as in, Not Only Hadoop. AttensityDigital Reasoning, HPCC Systems, Pivotal Greenplum, and Teradata Aster among others are doing interesting scaling work, building (on) a variety of technologies.

  • The rise of data as-a-service and the ascendance of APIs/Web-services and cloud implementations.

We know about data providers such as Gnip, DataSift, and Xignite. José Carlos González, co-founder of Spanish semantic-analysis developer Daedalus, describes how his company’s Textalytics service “represents a new semantic/NLP API concept in the sense that it goes well beyond the basic horizontal functionality that is being offered in the market: we also offer pre-packaged, optimized functionality for several industries and applications and the possibility for the customer to tailor the system with their dictionaries and models.”

You’ll find comparable capabilities in competing Web services, each with its strengths, such as AlchemyAPI, Bitext, CoginovConveyAPI, DatumBoxSemantria.

  • Knowledge-graph data representations.

Graphs are natural structures for management and query of data linked by complex interrelationships. Digital Reasoning builds one. So does Lexalytics, a concept matrix. Expert System’s technology Check out the Dandelion work from SpazioDati.

  • In-memory processing, cloud deployment, and streaming data capabilities.

“In memory” is SAP HANA‘s middle name (figuratively of course), with text analysis an integral part of the platform, although I will admit that it isn’t yet a selling point for many other vendors. That latter situation is changing. Material that IBM has online, about InfoSphere Streams, provides a very helpful (even if vendor specific) illustration of an implementation of text analysis on data streams. SAS’s Fiona McNeill refers to “linguistic rules augmenting business and predictive scoring in real-time data streams” and “moving more and more of our capabilities to cloud architectures.” These big-company moves are just text-analytics examples of a general IT deployment trend.

Others advances are specific to text. Is there any other technology domain that relies so heavily on classification rules? In the text case, we’re talking rules that discover meaning/sense (that is, context-dependent relationships) by capturing and applying lexical chains (word nets) and syntactic patterns and structures such as taxonomies and ontologies (those knowledge graphs).

But that’s all high-concept stuff, and I’m not writing for scientists, so this bit of technology coverage will be enough for now. Now, on to the business side.

Follow the Money: Investments

Investment activity is a forward-looking indicator, suggesting optimism about companies’ growth potential and profitability.

The big 2013 (+ early 2014) funding news, in the text analytics space, was:

  • An $80 million equity investment in Clarabridge, “to further expand its global operations, power continued product innovation, grow its employee base and increase reach through marketing and strategic transactions to capitalize on escalating market demand for CEM solutions.” (Surely, a substantial portion of that funding went to buy out earlier investors.)
  • Expert System’s February 2014 IPO, with $27 million in shares sold on the AIM Italia exchange. The money raised will go in part to fuel expansion in North America. (Added April 12:) The company reported to me a pre-IPO valuation of €27 million.

And the most interesting 2013 acquisition is one that went down in the first week of 2014, interaction-analytics leader Verint’s purchase of customer-experience vendor Kana. What’s does this transaction have to do with text analytics, you ask? Part of the purchase is the Overtone listening/analysis technology that Kana acquired in 2011. (What the Kana acquisition will mean for Verint’s relationship with Clarabridge, I can’t say.)

Microsoft’s March 2013 acquisition of Netbreeze GmbH, which “combines modern methods from Natural Language Processing (NLP), data mining and semantic text analysis to support 28 different writing systems,” also fits the category, although Swiss Netbreeze wasn’t a prominent text-analytics player. Text analytics was a sideline for speech-analytics vendor Utopy, which Genesys acquired in early 2013 to create a customer-service “actionable (interaction) analytics” offering. One more, September 2013 reporting: “Intel Has Acquired Natural Language Processing Startup Indisys, Price ‘North’ Of $26M, To Build Its AI Muscle,” TechCrunch reported.

Other transactions in the space:

(While Allegiance and Networked Insights use outside text-analysis software, text capabilities are central to their offerings.)

Interestingly, market-research vendor Vision Critical divested itself of the DiscoverText software it had acquired in early 2013, selling it back, later in the year, to inventor Texifter.

In past years, I’ve reported solution-providers sales results. I’m not going to do that this year. Revenue figures are always hard to get — non-publicly traded companies guard their numbers, and most numbers available are from larger, diversified providers whose business-line revenue is hard to determine. So I’m sorry to disappoint, but I won’t be relaying 2012-2013 revenue growth figures.

Reports and Community

Let’s finish with opportunities to learn more, starting with conferences, albeit unchronologically.

The market for business-focused text-analytics conferences is not thriving.

The Text Analytics Summit is skipping the spring — I chaired the event from its 2005 founding through last year’s Boston summit but am no longer involved — but reportedly will be back in the fall, in San Francisco. The Predictive Analytics World folks haven’t announced a repeat of their own fall, east-cost Text Analytics World event, and it appears their March, 2014 event drew only a small audience of under 40 attendees. Finally, the Innovation Enterprise folks appear to have abandoned the field after running Text Analytics Innovation conferences in 2012 and 2013. Semantic Technology & Business may be your best business-conference choice, August 19-21 in San Jose.

My own Sentiment Analysis Symposium, which includes significant text-analysis coverage, has been doing well. The March 5-6, 2014 symposium in New York had our highest turn-out yet with 178 total registered. I have presentations posted and should have videos on-line some time in April. I’m planning an October, 2014 conference in San Francisco.

On the vendor side, Clarabridge Customer Connections (C3) is slated for April 28-30 in Miami, and I enjoyed attending a day of the 2014 SAS Global Forum, which took place March 23-26 near Washington DC. Text analytics and sentiment analysis and their business applications are only a small (but growing, I believe) component of the overall SAS technology and solution suite, but there was enough coverage to keep me busy. The experience would be similar at other global-vendor conferences such as SAP’s and IBM’s. Again considering text-focused vendors, Linguamatics held a spring users’ conference April 7-9 in Cambridge, UK, but really, beyond that and Clarabridge’s conference, that’s all I know about.

Moving to non-commercial, research-focused and academic conferences:

NIST’s Text Analysis Conference is slated for November 17-18, 2014, near Washington DC.

I’ll be in Paris to speak at the International conference on statistical analysis of textual data (JADT), June 3-6. JADT overlaps the 8th instance of the International Conference on Weblogs and Social Media (ICWSM), June 2-4 in Ann Arbor. A few weeks later, LT-Innovate, covering the broader set of language technologies, will take place June 18-19 24-25 in Brussels.

The annual meeting of the Association for Computational Linguistics is an academic conference to check out, June 23-25 in Baltimore.

And reports?

Worth review is LT-Innovate’s March 2013 report, Status and Potential of the European Language Technology Markets. So is the Hurwitz Victory Index Report covering text analytics, which includes a number of useful vendor assessments. (Added April 16:) TDWI’s report, How to Gain Insight from Text, written by analyst Fern Halper, is useful for those seeking text-analytics implementation ideas.

Monitor my Twitter account, @SethGrimes, for notice of release of my own Text Analytics 2014: User Perspectives on Solutions and Providers, which I should have out some time in May.

Finally, for a bit more on the same topic as this article, read my December, 2013 article, A Look at the Text Analytics Industry, and my February, 2014 Sentiment Analysis Innovation: Making Sense of the Internet of People.

Who Am I?

As you may have picked up — in case you don’t know me — I make a living in part by understanding the various facets of markets – academic, research, solution provider, and the spectrum of technology users — that include text analytics, sentiment analysis, semantics/synthesis, and other forms of data analysis and visualization. This understanding is the basis of my consulting work, and of the writing I do in my own Breakthrough Analysis blog and for a variety of publications, and of my conference organizing.

Disclosure is in order. I’ve mentioned many companies. I consult for some of them. Some are sponsoring my in-the-works text-analytics market study. Some have sponsored my conferences and will sponsor my planned fall 2014 conference. I have taken money in the last year, for one or more of these activities, from: AlchemyAPI, Clarabridge, Converseon (ConveyAPI), Daedalus, Digital Reasoning, Gnip, IBM, Lexalytics, Luminoso, SAS, Teradata Aster, and Verint. Not included here are companies that have merely bought a ticket to attend one of my conferences.

I welcome the opportunity to learn about new and evolving technologies and applications, so if you’d like to discuss any of the points I’ve covered, as they relate to your own work, please do get in touch. Thanks for reading!

Can Sentiment Analysis Decode Cross-Cultural Social Media?

Can sentiment analysis shed light on cross-cultural social-media use?

Beyond use: What tools can help measure cross-cultural social-platform expansion and cross-cultural networks?

I like these questions. They originate with writer and researcher Lydia Laurenson, who contacted me after coming across a conference I organize, the Sentiment Analysis Symposium. (I did rephrase them a bit.) They recognize that sentiment analysis, which seeks to classify and quantify opinion, emotion, and connection, can help decode (her label:) social’s “cultural dimensions.” (If you have your own references or thoughts to share, please contact Lydia. Her site is journalismforbrands.com and she’s on Twitter at @lydialaurenson.)

I’m going to riff on these questions, by which I mean, I’m going to both interpret the idea of cross-culture social media and explore responses.

Differentiation

Let’s start with an analysis of the questions, observing that they rightly differentiate social-media use from platforms from networks — content from channel from connections.

Social networks link individuals and organizations. They have directionality — I may follow you and see your messages, even though you don’t follow me — and temporality in that connections change over time. Social networks depend on but transcend platforms. The network is you and me (and Justin Bieber) and not the particular channel we happen to use to communicate.

Facebook, Twitter, Pinterest, Sina Weibo, VKontakte: These are platforms, channels. They support activities: People join them (and dozens of others), create profiles, make connections, and post and consume others’ content. And while my non-friend Justin and I are both on a diversity of platforms — some of the ones I list above, and others — with profiles tailored to the character of each platform, our networks span platforms.

Finally, social-media use covers our actual posting behaviors and the content we post. The more interesting forms of social-media measurement study use — computed values such as advocacy and engagement — the active utilization of connections, which are of little value when unused.

Multi-Cultural and Cross-Cultural

Each and every social platform is stylistically mono-cultural. Each is designed for certain types of content and certain activities and each encourages, per the infamous Social Media and Donuts, a certain style of use. Certain platforms, for instance Sina Weibo and VKontact, are largely mono-lingual or appeal only to a few language groups. But keeping in mind the platform/network/usage differentiation, we do observe that any platform that’s large enough will host a culturally diverse collection of users and content.

Social-media users share and interact with whoever’s interested, whether neighbors, family, a professional audience, or listeners, whether brands or the NSA. Culture and connection derives from the content: If my tweet

“Facial Analytics: What Are You Smiling at?” asks @M_Steinhart in @AllAnalytics, regarding #SAS14 prez by Jacob Whitehill of @EmotientInc

interests you, you’re my people. Of course, same is true for those interested by a tweet from another account I use,

We Want Real, Healthy Food in the Montgomery County Public Schools – @RealFoodMCPS petition http://www.gopetition.com/petitions/we-want-real-healthy-food-in-mcps.html #MontCo #Maryland

Seth Grimes's LinkedIn InMap, 2014 February 23

LinkedIn InMap clustering my connections: Each of us is a bridge spanning cultures.

I’m not the only one who’s multi-cultural in the sense of having networks that extend to different audiences. I have multiple Twitter accounts, which I use to reach non-overlapping audiences, but on LinkedIn, my one account connects me to several different audience segments (as seen in the colored clusters in the InMap at right).

In a sense, each of us is a cross-network, cross-cultural bridge.

Organizations that use social media typically seek to reach and engage individuals from many cultures, using media and languages suited to reach and appeal to those cultures. Arguably, trying to communicate ideas and sell products, developed in/for one market, reflecting that market’s values and lifestyles, in/to another market, is a cross-cultural endeavor.

Sentiment as a Measurement Dimension

In order to evaluate cross-cultural social-media use, we first have to figure out how to measure cultural dimensions. This is a technology question given the premises that a) cultural elements are communicated or created via (online and) social media, b) cultural elements can be decoded from social-media postings and behaviors, and c) automated analysis allows you to systematize decoding or even uncover dimensions not before apparent. Natural language processing (NLP) technology is an answer, given the common notion that language and culture are tightly linked. (Search on Sapir-Whorf.)

Who is developing measures of cultural dimensions and using automated text-analysis technologies to compute them? Arguably, anyone who is doing sentiment analysis is measuring a cultural dimension, whether that work targets a business need — customer service, market research, financial-market trading strategy, something else — or analyzes depression and suicidal-mood indicators, or is applied for counter-terrorism.

We look to understand nuanced word senses, for example, the difference in meaning of “thin wallet” and “wallet is thin.” The first is possibly a product search, indicating buying intent. The second is an expression that says, I’m short on cash and unlikely to buy something. The user profile, wording, and social setting provide context for interpretation, and the needs of the person or organization doing the analysis determine the data treatment. All of this is cultural interpretation, even though a business analyst would never call it that.

Roadmap and Milestones

Who is focusing on the sort of text interpretations most aligned with the idea of measuring cultural dimensions? I’ll name two people I know, and I invite you to let me (and Lydia Laurenson) know about other work. My two are:

There’s plenty of other work out there along the same lines. Consider one other example, Kanjoya, commercializing Stanford University research, modeling “everything from traditional English language usage to social conversations riddled with emoticons, colloquialisms, and slang… the different meanings words can have based on their topical context, as well as how language varies across age, gender, and even geography.”

The understanding of idiom is key, but I’d venture that the vast majority of work relates to cultural elements expressed in a single language. There isn’t much technology out there (that I know of) for cross/multi-cultural analysis. A direction to explore, however, would be automated machine translation, which to be accurate, must deal with idiom that can’t be translated by simply translating words and syntax.

Sorry, Google, Translate’s rendering of the expression “my wallet is thin” into the French “mon portefeuille est mince” is not idiomatically correct. Nonetheless, advances in machine learning almost guarantee that translation will improve, in tandem with the many other promising “decoding” technologies that have emerged in research and industry settings.  NLP, stylistic analysis and profile extraction, and contextual interpretation, along with the (nascent) ability to map idiom and other cultural elements, will facilitate cross-cultural analysis. Sentiment analysis points the way.

Innovation in Big Data and Analytics: Tweetchat Q&A

I’m an admirer of IBM’s, and not just because — I’ll get the quite-germane disclosure out of the way — the company’s jStart Emerging Technologies team is a sponsor of my up-coming (next week!) conference, the Sentiment Analysis Symposium. I’m a fan because the company’s technology is a not-as-common-as-it-should-be combination of enterprise-ready and innovative, in areas I follow that include text and social analytics and information synthesis, that is, sense-making.

Innovation is a lumpy process. Sometimes you leap forward; sometimes progress is steady even if undramatic; sometime you seem stuck in place and can manage only incremental advances. These modes could be described at disruption, extension, and improvement; modes I describe in a recent article, Sentiment Analysis Innovation, in which I catalog industry innovation in my current-favorite technology/solution domain. Clearly the innovation topic has been on my mind, which is why I suggested it to the @IBMbigdata folks who approached me about participating in a #BigDataMgmt tweetchat, focusing in particular on Big Data and analytics innovation.

The IBMers and I jointly worked out the topics for the tweetchat, which took place on February 26, 2014. While we had a couple of dozen active participants (I haven’t counted them), I did pull out only key thoughts, my own and, selectively, other participants’, to share now. Many of my responses were prepared, frankly, but many were on-the-fly responses to others’ tweets. I have reformatted tweets and edited for readability. This form is an experiment. My own tweets are unquoted and mixed in with a bit (although not too much) of text that creates a narrative.

So here we go, a series of questions and short answers on Innovation in Big Data and Analytics:

1) How do you define innovation, as related to Big Data & Analytics of course?

I see 3 innovation aspects: 1) Improve existing; 2) Extend to new; 3) Disrupt. But further, defining innovation involves defining the questions you can ask and the elements that contribute to answers.

Now, Big Data itself embodies an innovative attitude, that you can do more with more, via analytics of course. But those  analytics approaches to Big Data have to be different, because conventional methods may not scale, and new methods, tailored to Big Data, may be able to extract insights that conventional analytics methods miss.

Participant Tracey Wallace had the astute observation, “Big data and analytics need to be actionable — do that and it’s innovative. I don’t want to have to use a data scientist.” My observation was that a data scientist needed may not always be needed, but judgment is always required and too often in short supply, prompting further tweets from Tim Crawford, “Title aside, it’s important to have someone that understands the business to provide context for data,” and Tracey Wallace, “That’s the innovation. Make big data easy to understand so journalists, etc. can make those judgements.”

(I’m pulling one thread, here, from a tangled tweetchat conversation, and I’ll do the same in my write-ups for the rest of the questions.)

My summation: Defining innovation involves defining the questions you can ask and the elements that contribute to answers.

2) What recent analytics innovations have high potential to be game changers?

I cite two:

  • Machine learning, in particular semi-supervised and unsupervised learning and also deep learning.
  • Also data integration/fusion (à la Watson, certainly!), of personal (profile), geospatial, transactional, attitudes.

We did get into IBM products. For instance, David Pittman brought up that “stream computing like InfoSphere Streams does is innovating many uses, from online gaming to fraud detection.” I do believe, and tweeted, that I don’t see stream computing as so recent. Complex Event Processing (CEP) — analytical computing on streams – has been around for quite a while. Of course, it’s continually improving.

Let’s be real however, that Watson is still mostly potential. Stuff like Wolfram Alpha (a “computational knowledge engine”) is more here-and-now. There’s other IBM tech such as I2 Analyst Workbench, which does synthesis and visualization, that is also more here-and-now. With Watson, Watson, the real innovations are the assembly of disparate technologies, and operation at scale, and question-answering capability.

3) How are businesses applying and commercializing these analytics innovations?

The best candidate areas relate to pattern recognition, across locations, time, populations. But that’s VERY broad. Better to be more specific…

Language understanding and image recognition are 2 great examples of tech functions that can be applied for many purposes. But really any area where traditional methods don’t scale or adapt well to new data may be a candidate.

Yes, Watson is a great example of incipient commercialization of Big Data analytical and semantic technology. The concepts aren’t so new. I’ll cite an analytics/semantic solution that predated Watson by 10-12 years: MedTAKMI, text mining for life sciences, out of IBM Research Tokyo.

Bob Hayes asked/suggested customer surveys as an example, regarding application of innovative analytics, including “us[e] of sentiment analysis for loyalty measurement in surveys instead of NPS (rated questions of loyalty).” (NPS is the Net Promoter Score.) I do see how surveys could be a Big Data challenge, if survey responses are cross-correlated with transactional and social data.

Tracey Wallace’s question/suggestion was that “transactional and social data seem more reliable than any survey, no?” I do agree but observe that transactional data is factual rather than attitudinal, so less ambiguous but not indicative of root causes, and social data is truer in a sense, since attitudes expressed are unsolicited, but it’s incredibly noisy.

Bob Hayes offered the thought that you find sentiment’s meaning by correlating sentiment with other measures. I add only that we compute indicators from measures in order to map collected data to outcomes.

4) Where, and how, do organizations begin to address opportunities, and what risks are involved, what potential down side?

I’m big on experimenting & exploring, with different data sets, analytical methods, visualizations, etc. Certain technologies such as R and Python (popular among data science types) lend themselves to exploratory analysis. I don’t see huge risk, until you put yourself in a bet-the-store position. The real risk is in standing still.

The @IBMBigData tweeted response was, “We always encourage experimentation too. That’s where innovation sprouts. Try, try, try again.” And Bob Hayes offered, regarding risks and potential downsides, “With so much new data involved, it could take you down a rabbit hole.”

5) How about some examples of areas — business challenges — ripe for innovation responses, even for disruption?

Anything expensive/slow. I hate to say this, but areas often handled by people, whom automation could make redundant. Think customer service, certain logistics functions. Are truckers looking forward to self-driving rigs?

Steve Massi called out “better use of real-time traffic and routing” and Mark Salke, “customer service,” feeling the need to add, “I’m not kidding.”

6) What investment is involved, in people, software/services, in-house R&D, or other elements, in harnessing Big Data & Analytics innovation?

Talent is the biggest and most difficult investment, and I don’t mean just hiring data scientist types. Need starts with ability to decide where to invest effort & resources and how to operationalize, productize & monetize insights. You can outsource, or buy as-a-service, software, platform, and R&D elements that aren’t core to your business.

In response to this question, Tracey Wallace offered “I truly believe the investment needs to be in the SaaS product (a smart data stack) that is intuitive for users,” and Bob Hayes‘s thought was “Need to consider organizational change agents that instill importance of Big Data and analytics in company culture,” both good points. Natasha Bishop offered, “culture must be in equation.”

Tim Crawford had a similar thought, that “Tech innovation is only part of the equation. Process, organization, and new paradigms are also needed to truly evoke innovation,” echoed in Marko Pitkanen‘s tweet, “In many cases cultural change is needed if an organization wants to be data driven. Decisions, people & tech need to be lined up.”

Marie Wallace‘s view was, “As well as data scientists we need biz folks that can understand and internalize analytics and integrate them into how they work.”

Bob Hayes added, “education in the research methodology would be good. Big Data does not speak for itself.”

7) Given the past few years Big Data & Analytics innovation experience: Where are we heading, and how can organizations stay nimble?

I don’t have a crystal ball. so I’ll fall back on the old (Bayesian?) standard of projecting from past experience…

We’re heading toward more data and better retrieval & analysis, so faster & more effective & more pervasive automation. How do you stay nimble? I’ll go back to my response to an earlier question: Experiment, explore. Stay aware and stay open.

I admit, that answer felt kind of obvious, platitudinous.

Tracey Wallace‘s thought was, “Big Data is a necessity now. The guys who use it right will prevail. To stay nimble, you need Big Data and people who can use it.” Steve Massi contributed, “Nimbleness [is] driven by [an] open knowledge base. Narrow base and lack of access to data points leads to rigid decision making.”

These are good points, and a good conclusion to the February 26, 2014 #BigDataMgmt tweetchat.

Plutchik's Wheel of Emotions

Sentiment Analysis Innovation Sources

Sentiment analysis aims to make sense of the Internet of People. That’s what interests me, technologies that take on the thinking, feeling, social network of interconnected individuals. Built into business solutions and cognitive systems that “learn and interact naturally with people to extend what either humans or machine could do on their own” (IBM), these technologies enrich and enhance our personal and business interactions.

Contrast with all we’ve been hearing about the Internet of Things. Who isn’t eager for killer apps such as a smart fridge that will tell you when the milk is sour or even get an order in for you? Well, me for one, although I recognize that IOT technologies will make life more efficient. While the situation is not either/or, the mechanics of efficiency, behind the IOT, are simply not as intriguing as the human-understanding challenge posed by the Internet of People.

Plutchik wheel of emotionsHuman understanding entails ability to decipher language, measure and decode expressions and movements, model behaviors, and grasp cognitive processes. These “inputs” are complex and ambiguous. Context, culture, and persona (situational profile) are critical elements. These factors make human data tougher to master than the IOT’s machine-data streams.

Fortunately, a little bit of sentiment analysis can take you far, in customer experience, market research, financial services, and social/media analysis. The technology has proven itself in these areas, even if most implementations have involved over-simplified positive/negative sentiment scoring. Fortunately also then that we do not lack for sentiment analysis innovation.

Three Innovation Categories

Let’s look at sentiment analysis innovation in three categories. I’ll provide innovator examples — technologies that work, outside the lab — and describe how each is advancing methods or applications.

My perspective is this: I work as an industry analyst and consultant covering business intelligence, text analytics, and sentiment analysis. I talk to a lot of people: Researchers, solution providers, business users, and fellow analysts, and I organize a conference that is unique in its coverage of industry applications of sensemaking technologies, the Sentiment Analysis Symposium, next up March 5-6 in New York. These technologies include text & speech analytics, natural language processing, semantics, etc., and the data acquisition, integration, and visualization components and industry adaptations that turn them into business solutions.

In this look at sentiment analysis innovation, let’s examine:

  1. Improvers: Who’s accomplishing mainstream tasks better — more accessibly, more accurately, and at greater scale.
  2. Extenders: Who’s accomplishing new tasks — bringing on-line new data sources, extracting fresh insights, answering new use cases.
  3. Disruptors: Who’s redefining the problem, via new methods and new solutions.

Some of the companies I cite could fit in more than one category, and without question, I’ll miss a few great start-ups. My aim is to provide examples, not an exhaustive or rigidly classified catalog of companies in the space. A final disclaimer: A number of the companies I will cite are sponsoring the symposium or a text-analytics market study I am currently conducting. I’ll list them in a disclosure toward the end of this article.

Improvers: More, Better

In this category, let’s start with participants in the API economy, analysis providers whose sentiment engines are accessed on-demand, via a Web-service application programming interface. Consider:

  • Semantria, providing via-API, on-demand access to the Lexalytics Salience engine, Lexalytics’ being the market’s leading pure-play sentiment-analysis provider. Semantria provides SDKs (software development kits) for a variety of programming environments and hooks into applications such as Excel. Net is that Semantria democratizes access to sentiment analysis.
  • ConveyAPI is breaking ground on accuracy and domain adaptation for brand social-media sentiment analysis. ConveyAPI was created by social-media agency Converseon, which recently spun it out into a subsidiary that provides on-demand, via-API access to the Conversation Miner engine. The technology relies on supervised learning from human-annotated social-media training data specific to a variety of industry domains.

Platforms are the inverse of services: They provide application and orchestration frameworks that support via-API invocation of external Web and local services. Programming languages popular among data scientists, such as Python, R, Java, and Scala, provide, in a sense, platforms for application-building, but I have in mind something higher-level. Consider instead platforms designed specifically for language-analysis such as:

  • GATE, the General Architecture for Text Engineering, still innovative after all these years (around 15 since project founding). I base this assessment on developer Diana Maynard’s tweeted comment about my recent Social Sentiment’s Missing Measures article, that a GATE project is dealing with advanced sentiment measures I cited, density, variation, and volatility.

GATE is an open-source project with development centered at the University of Sheffield and world-wide community participation. Where there’s a platform, there’s often a community of users, builders, and partners, and — examples such as the Apple App Store and Google Play are old hat — a marketplace that allows community members to distribute and sell their contributions. Notable platform+marketplace examples, supporting sentiment solution building — unusual in the sentiment-analysis world, are:

  • TEMIS, whose Luxid text-analysis platform is built around the Apache UIMA framework, which allows plug-in of external resources, namely of annotators that perform specialized information extraction and analysis functions. TEMIS characterizes the Luxid Community as “the first community platform for semantics.”
  • QlikView, which you may know as a BI tool for visual, exploratory data analysis, but which similarly boasts an open architecture, exploited by QlikMarket participants. I particularly like the work of QlikTech partner QVSource, which provides an array of connectors to social and online information sources, including for sentiment and text analysis.

In the social-analytics world, Radian6 — part of Salesforce Marketing Cloud — pioneered the platform/marketplace approach, but after repeatedly hearing messages like this one I received on February 6, “I am using Radian6 to pull in data, but their sentiment analysis is not useful and I ignore it,” I infer that the company has de-innovated on the sentiment-analysis front.

Extenders: New Frontiers

I’m particularly interested in ability to extract new types of information from existing and new sources and in new indicators (derived, computed values) that can be used with confidence to guide business decision-making.

The sentiment-analysis frontier is moving beyond systems that resolve sentiment at the feature level — sentiment about entities (named individuals, companies, and geographic locations, etc.) and topics — so my examples will focus on, let’s call it, extended subjectivity. Consider:

  • Datumbox, which aims to identify subjectivity, genre, gender, readability, and other beyond-meaning senses.
  • Social Market Analytics, bringing to Twitter-sourced analyses the sort of technicals beloved by financial-market traders, elements such as volatility and dispersion.
  • Kanjoya, a commercialization of the Experience Project at Stanford University, which analyzes expressions, emotions, and behaviors to further brand business goals.

Consider, also, information extraction from non-textual media, from speech, images, and video. Eye movement, facial expressions, speech tone and patterns, and other biometrics indicate mood and emotion. Without description, I’ll cite:

One special bit of coolness is indicated in a TechCrunch article: Affectiva Launches An SDK To Bring Emotion Tracking To Mobile Apps, offering ability to decode emotion in images captured on mobile devices. Others, notably React Labs, created by Philip Resnik, a linguistics professor at the University of Maryland, are working to toward real-time opinion gathering. But mobile sentiment measurement is no more a slam-dunk than any other cutting-edge implementation. Witness the demise of mobile feedback service Swipp despite $9 million in 2012-3 funding.

Finally, we have new indicators such as advocacy, engagement, and motivation, more sophisticated and useful than crude, first-generation quantities such as influence:

  • MotiveQuest calls its business online anthropology, an approach to understanding behaviors in order to promote customer advocacy and engagement. CEO David Rabjohns explains, “When we talk about motivations we use that term as a headline for the way people want to feel at a primal/tribal level (feel successful, feel creative, feel smart, etc.). With our MotiveScape tool we use sophisticated linguistic algorithms to explore and tag the different ways talk about the 12 broad motivational areas.”
  • IBM, similarly, is pursuing new initiatives in engagement analytics, an innovation I’ve chosen to showcase at my up-coming Sentiment Analysis Symposium in the form of a keynote by social-business technology leader Marie Wallace.

Disruptors: Killers and Creators

I’ll admit in advance that if I had a crystal ball for about-to-appear-out-of-nowhere disruptors, I’d be in a different line of business. But I do acknowledge certain now widely recognized disruption-enabling technologies, namely machine learning, mobile, and cloud.

I’ll reserve the mobile topic for another occasion, and I’ve covered cloud in citing a number of sentiment-as-a-Web-service offerings. I’ll add only how impressed I am by the extension of the coverage, capacity, and value of data-as-a-service providers such as DataSift, Gnip, and Xignite, which include sentiment among the varieties of supported tagging. They’re not analytics providers — the move to relocate analytics to the cloud is obvious rather than earth-shaking — rather, the emerging data economy is built around them and others like them.

I’ve saved the best for last.

We all recognize the benefits that machine learning — in particular, unsupervised methods and deep learning — are bringing to a host of computing problems. The algorithms are newly effective and efficient, required computing hardware is powerful and cheap, and data is abundant and available, so we can now let the machines find their own ways. We can move away from exclusive reliance on rigid and hard-to-maintain rules and supervised learning based on predefined categories (not that those systems aren’t and won’t remain right, best even, for a good many problems), where models apply well only for the language and business domain of the labeled training data.

Why unsupervised learning? Eliott Turner, who founded text analysis provider AlchemyAPI in 2005, says “a never-ending challenge to understanding text is staying current with emerging slang and phrases. Unsupervised learning can enable machines to discover new words without human-curated training sets.” (Contrast with semi-supervised learning, applying unsupervised learning to refine models built initially from labeled training data, and with non-learning unsupervised methods such a Latent Semantic Analysis for topic extraction.)

Luminoso applies unsupervised learning to populate a “multi-dimensional semantic space” and uses the learned patterns to text-analysis tasks. (Luminoso co-founder and CEO Catherine Havasi is co-author of a paper, New Avenues in Opinion Mining and Sentiment Analysis, that covers approaches to the sentiment challenges.)

And deep learning, involving multi-level, hierarchical models? “Deep learning can give us a much richer representation of text than is possible with traditional natural-language processing,… more robust text and vision systems that hold their accuracy when analyzing data far different from what they were trained on,” according to Turner, but requires massive training data sets, technical innovations, and a lot of affordable computing power.

No Guarantees

Clearly I’m an innovation fan, but like any sensible market watcher, I understand the reality that innovation alone doesn’t ensure success. Frederik Hermann, former VP Global Marketing at Swipp graciously allowed me to quote him on Swipp’s demise: “Unfortunately Swipp is dead. Corporate buy in to use our APIs and technology took too long and therefore revenues remained flat while the consumer app didn’t provide enough immediate value add unless coupled with a larger corporate partnership for real scale.” Innovation is only one part of a larger market puzzle.


Disclosures and entanglements: Gnip, IBM, Lexalytics, and Converseon are four of the ten March Sentiment Analysis Symposium sponsors, and I engaged the guy who helped build Converseon’s Conversation Miner engine, computational linguist Jason Baldridge of the Univ. of Texas, to teach the Practical Sentiment Analysis tutorial at the symposium. Affectiva co-founder Rosalind Picard MIT Media Lab will be keynoting at the symposium, speaking on “Adventures in Emotion Recognition,” and we also have Jacob Whitehill of Emotient and Yuval Mor, CEO of Beyond Verbal, on the agenda. David Rabjohns and Kanjoya’s Moritz Sudhof will also speak at the symposium. Finally, Luminoso, Lexalytics, and AlchemyAPI are three of the eight sponsors of my Text Analytics 2014 market study.

Sentiment/emotion/intent analytics: SAS14 conference March 5-6, 2014

Sentiment Analysis Symposium, March 5-6, 2014, New YorkThe 2014 Sentiment Analysis Symposium, March 5-6 in New York, opens with two, half-day, technical-track workshops –

  • Practical Sentiment Analysis: A tutorial taught by Jason Baldridge of the University of Texas (co-founder of the Apache OpenNLP project).
  • Technology & Innovation: nine presentations by industry & academic researchers including Prof. Gerald Penn of the University of Toronto; Prof. Stephen Pulman, Oxford University; Andy Hickl of ARO Inc.; data scientist Mark Gingrich of SDL; and Dr. Robert Nolker, Analyze.

Visit sentimentsymposium.com/workshops.html for information of these workshops and the two business-track workshops, An Insider’s Guide to Social Media Measurement and The Road to Customer Intelligence: Data, Analytics, Insight.

The March 6 conference opens with keynotes by industry analytics leader Bob E. Hayes, PhD, speaking on Voice of the Customer analytics, and by Prof. Rosalind Picard of the MIT Media Lab, on emotion recognition.

Other speakers include –

  • Accenture social-business expert Chris Boudreaux.
  • Marie Wallace, who built IBM LanguageWare from small research project into an enterprise technology that underpins IBM products including IBM Watson.
  • Prof. VS Subrahmanian of the Univ. of Maryland, speaking on sentiment & emotion propagation in social networks.
  • John Hoskins from Amazon Mechanical Turk.
  • Catherine Havasi, MIT Media Lab and start-up Luminoso.
  • Prof. Stephen Pulman on Bleeding Edge NLP; and
  • Technical leaders from Emotient (facial-expression recognition) and BeyondVerbal (emotion in speech).

The March 6 conference — agenda at sentimentsymposium.com/agenda.html – includes representatives of other start-ups and established companies, speaking on technologies and business applications that discover value in emotion, intent, and connection, expressed in online, social, and enterprise data sources.

Register by January 25 to save up to $200. Full-time academic and government employees benefit from a 50% discount, and we have a special low rate for full-time students, $100 for the March 6 conference and $50 for each workshop.

Please join us, for the March 5 workshops and the March 6 business-solutions conference. Register by January 25 for special early-registration rates, and do get in touch if you have questions or concerns.