[This article first appeared in the Clarabridge Bridgepoints newsletter, Q2 2008.]
Text analytics has evolved through the same stages as other would-be enterprise solutions. What started as stove-piped technologies were transformed by market-conscious vendors into a set of functional solutions. Text analytics has further matured to build toward emergence in high-value, business focused applications, primed to help non-technical users meet the “unstructured” data challenge – especially to respond to business problems that hinge on hearing the “voice of the customer” – on an enterprise scale.
Text analytics’ market trajectory has tracked that of complementary business intelligence solutions designed to extract insight from structured, transactional data. On the one side of the structured-unstructured divide, BI has grown from reporting and spreadsheet origins to increasingly provide collaborative, accessible, and pervasive solutions that meet corporate data analysis needs. Where BI once focused exclusively on analysis of historical, transactional data, techniques now extend to customer and market information from a variety of stakeholder touch points. On the other side of the divide, text analytics has progressed from a collection of inaccessible statistical and linguistic tools into a similarly important asset for business analysts, executives, and other end users. Both technologies, conventional BI and text analytics, benefit users with sales and marketing, product development, service and support, legal and compliance, and research responsibilities. The two sets of technologies jointly have the power to tap both in-house and “outside the firewall” information sources with broad data-integration possibilities.
Converged structured-unstructured technologies newly offer solutions to business-focused end users via familiar BI interfaces and analysis workbenches and also behind-the-scenes, embedded in line-of-business applications, as appropriate for the task at hand. With the added flexibility offered by “as a service” (in addition to traditional, installed-on-site) implementation options and by Web-service application programming interfaces (APIs), leading-edge text-BI vendors can now deliver solutions that scale to respond to the varied needs of enterprise users.
Feature convergence to meet enterprise needs is not limited to user interfaces and implementation modes. It extends to back-end capabilities called for by the variety of user roles and assignments present in any larger organization. Enterprise solutions must provide security and access-control functions, they must accommodate multiple projects and mixed workloads and collaboration, and they must not impose unreasonable management and administration burdens.
Enterprise readiness further entails handling diverse types and sources of information and often very significant data volumes. In these areas, text analytics can go beyond traditional BI and data warehousing. For a start, while we are now seeing emergence of the first petabyte-scale data warehouses, the information they capture is typically homogeneous and well formatted, simple compared to the even larger volumes of textual information both out on the Net and residing within the enterprise that may be germane in attacking given business problem. BlogPulse claims to identify over 78 million blogs, which generate many hundreds of thousands of posts a day; tens of thousands of news sources and on-line forums publish millions of articles and user postings daily. With the right information-retrieval (IR) front-end and with accurate and rich-enough information-extraction capabilities, this beyond-BI textual information becomes accessible for analysis.
Text analytics looks to taxonomies, dictionaries, and linguistic rules, coupled with statistical techniques, to make sense of documents and their contents and to reduce them to usable form. Attention has focused in recent years, as the technology has moved out of the lab to underpin business solutions, on going beyond tagging and extracting named entities, parts of speech, and basic facts, events, and relationships to understanding attitudinal data, the opinions and sentiments that comprise the “voice of the customer.” Certain words and terms and patterns can indicate sentiment polarity – positive, negative, or neutral – while others suggest the intensity. Sentiment modality, the form in which an opinion is expressed, is also a factor in correct interpretation. Enterprises have a strong desire to access and exploit all this attitudinal information, which is found not only in blog, forum, and news text, but also in survey responses, customer and corporate communications, contact-center notes and transcripts, and CRM systems. Enterprise readiness entails the ability to discern and analyze sentiment in its many forms and modes.
Taxonomies organize entities and concepts in hierarchical form. They are classification schemes, analogous to the dimensions used in traditional BI data modeling. Provision of pre-built taxonomies that cover entities and concepts important for particular analyses can allow enterprise users a head start on their analyses. But because of the high-dimensionality “feature space” typical of textual sources, and because predetermined taxonomies don’t always adequately structure the full set of relevant features, users who can customize taxonomies have an analytical edge. Enterprise-scale analysis is therefore further enhanced by the integration of taxonomy-management capabilities.
Just as pre-built taxonomies offer a head start to enterprise text-analytics users, the equivalents of the BI world’s packaged applications and guided analyses also boost enterprise suitability. Via information and application templates and via proven, standardized analytical approaches, text analytics tasks become much easier to accomplish for a broad set of less-technically-inclined end users.
BI integration and convergence, business-focused interfaces, enhanced application management and security, high text-processing capacity, provision of templates, taxonomy management, and sentiment-analysis capabilities: these collectively are hallmarks of enterprise readiness for text-analytics solutions. Forward-looking vendors have implemented these capabilities to transform text analytics from pure technology into a set of functional solutions and from there into high-value, business focused applications that meet enterprise-scale needs.