Wisdom of the Sports Crowd: Good Odds with Sentibet Sentiment Analysis

Sentiment analysis is hot among financial-market traders, so why not for that other great betting domain, sports?

Serafim Scandalos, an exec at Neurolingo, a Greek natural language processing (NLP) specialist, asked my reactions to Sentibet, currently in beta, which performs text analytics on sports-related tweets. Sentibet looks for Wishes, Feelings, and Predictions in tweets related to specific sports and specific contests/games, producing Sentiment Based Forecast (SBF) for each event.

Serafim gave a lightning talk at an October, 2010 conference I organized. Neurolingo’s Mnemosyne platform seemed promising then, and Sentibet seems on the right track now, worth a look. (I do not have any business relationship with Neuroling although I’m hoping Serafim or a colleague will present Sentibet at my next conference, the May 8 Sentiment Analysis Symposium in New York.)

The Sentibet interface is nice enough although a refocusing on sports contests with available analyses is in order. It’s off-putting to see the “SBF info not available yet!” and “Optional empty text…” messages that now populate many screens. Tell me what you know, not what you don’t.

To get to analyses, try the Most Tweeted and Finished Games areas. (Be patient: Response is far slower than it should be, typically 8 seconds to bring up a page. Something to work on.) Clicking through a particular match…

Sentibet screen shotThe Prediction/Feeling/Wish categorization is conceptually quite interesting. It reflects the richness of human emotion, that we may have a variety of different purposes and meanings when we express seemingly similar feelings, attitudes, and opinions. That you hope Manchester United will beat Arsenal doesn’t mean you think a draw won’t be the outcome.

Similarly, I really like the Team A (Home)/Team B (Away)/Draw categorization. It’s goal-aligned and therefore far more useful than the positive/negative/neutral sentiment classification that’s typical of the Twitter-sentiment toys that are far too common.

The best part is that you can view the tweets behind the dual classifications, and you can apply filters to select only tweets in certain categories. I was skeptical of the analysis displayed: 2% of tweets on an Arsenal-Manchester United match Wished for a draw?! (Who hopes for a draw?) Well, SentiBet got it right. Witness:

@darrenpage1983 revenge weekend spurs bt man city or atleast don’t get stuffed 5-1 arsenal man u would like to c a draw there as never want the scum to win

@Slickon_CFC Good Morning guys. Its super sunday today and i’m hoping for a City win and a draw between United and Arsenal. #Top3

@annemacey02 @Betherz_BCFC I want my spurs to thrash manc and a manu arsenal draw :p

Note a few points:

  • Accurate co-reference that correctly sees “Manchester United” – “man u” – “manu” as one and the same.
  • Ability to distinguish two different matches in a single tweet.

This is strong text analytics.

It’s a good sign when an app leaves you hungry for more, for more data, for additional analysis. Sentibet does. Within the Finished Games section, it would be interesting to be able to correlate scores and forecasts, to understand the correlation between tweet aggregates and match outcomes. How well do Sentibet forecasts predict final scores? It’s a Neurolingo work-in-progress as explained on the Sentibet Post Game Analysis page.

It would be even cooler to connect forecasts and betting lines, that is, bookmakers’ odds. As the expression goes, “The proof of the pudding is in the eating.” If you consider sports betting as a business, they payoff isn’t in scores predicted, it’s in your gambling winnings. Financial-market traders exploit the price-expectation gaps, and so do winning gamblers. Maybe another to-do for Neurolingo?

Serafim Scandalos tells me Neurolingo is seeking financing to accelerate service development, also that Sentibet is simply one application of the underlying Mnemosyne platform, meant to demonstrate possibilities to potential customers and investors. Judging from Sentibet, Neurolingo seems like a good bet.

Posted in natural language processing, sentiment analysis, text analytics | 2 Comments

Stephen Arnold Blows a Gasket

Stephen Arnold, a provider of “news and information… about search and content processing,” has his hatchet out in Temis, Spammy PR, and Quite Silly Assertions.

Stephen Arnold, “Thy wit is a very bitter sweeting; it is a most sharp sauce,” well served in to the goose (as Arnold characterizes himself) himself. Steve, if you want to be paid for your work, ask up front. The payback you offer Temis, a text-mining solution vendor, says far more about you than it does about the company you target.

In the words of a marketing exec at a semantic-analysis company, not Temis, “What a super childish article.”

Sectumsempra -- For Enemies Arnold’s article is no dissection of the fine points of competitive market positioning. It deals broad cutting strokes (“Sectumsempra — For Enemies”?) in an attempt to hack away, well, much more than just the credibility of a vendor’s public relations. Arnold brackets his article introduction of Temis (in a lede buried four paragraphs in, as deeply as his hatchet blade) around a Jean Genet quotation,

“I recognize in thieves, traitors, and murderers, in the ruthless and the cunning, a deep beauty – a sunken beauty.” (The Thief’s Journal)

WTF? Temis is (likened to) thieves, traitors, and murders!?

What spurred this attack, which also devotes several paragraphs to an attempted take-down of Temis’s client, the American Society for Microbiology?

Arnold received a press release “attributed to an individual identified as Martine Fallon [sic]” that he characterizes as spam.

Arnold writes, “I considered that Martine Fallon may be a ruse like Betty Crocker.”

WTF redux!

For the record, it’s Martine Falhon. She is marketing and communications manager at Temis. Her full rap sheet is publicly accessible on LinkedIn. She’s a quite pleasant person, actually. I met her when I keynoted Temis’s 2009 user conference (which, by way of disclosure, was a paid gig, and Temis was one of seven sponsors of my Text Analytics 2009 market study).

Oy. A vendor sends a release to an analyst in its space — with an unsubscribe option that Arnold was able to use! — and Stephen Arnold blows a gasket. Yes, Arnold states he “previously asked the firm’s public relations expert, who seems to be more inclined to spam than research, to cease sending me meaningless spammy news releases. My request was ignored.”

“Bad, bad, bad public-relations expert,” I say. “Get back to your research. They need your help in the labs.”

Everything copacetic now? Well, not quite. There’s more to Arnold’s complaint. His complaint’s true core is not spam PR. It is here:

“What fascinated me is that Temis asked me to facilitate an introduction for them to a $1.2 billion company’s president. I did this and moved on. I assumed in the manner of French cultural norms that I would be rewarded with entrecote. Wrong. My reward has been spam.”

It’s about money. Temis didn’t pay Stephen Arnold for freely offered advice. Temis didn’t pay Stephen Arnold for freely offered advice so Temis gets slammed, for supposed spam, for claims Arnold questions, for being French. Yes, for being French. Arnold writes,

“I quite like Expert System SA in Bologna, Italy, and Bitext in Madrid, Spain. Great food, interesting culture, and -– nota bene –- no spam. One has to get the semantics correct. No spam from Italy. No spam from Spain. Hmmm. There’s a cultural message perhaps?”

There is some, slim attempt at substance in Arnold’s article. Arnold questions Temis’s claim to being “the leading provider of Semantic Content Enrichment solutions for the Enterprise.” Then he writes, “Leading? Semantic content enrichment. What’s that?” and asks, “What about outfits like Access Innovations, Concept Searching, Expert System SA, Smartlogic, and more than 75 other firms in the semantic space.”

Of course, it’s legit to question, but where’s the logic in offering specific other providers as contenders when you claim not to know what they’re contending for?

In case you doubted: Arnold’s list of contenders shows that he’s not completely clueless about “semantic content enrichment.” That said, of the companies he listed, only Expert System is squarely in that biz.

Myself, I describe the term functionally in a sponsored newsletter article I wrote for OpenText (a former consulting client): “Semantic content enrichment adds value to online information by tagging topics and providing context-sensitive links.” Content enrichment is a significant capability for digital-publishing platforms, for instance, MarkLogic’s. MarkLogic’s Open Enrichment Framework came out in 2008, integrating text-analysis products from companies that include both Temis and Access Innovations.

I (and OpenText, MarkLogic, and Temis) don’t stand alone is seeing enterprise value in semantic content enrichment. I’ll further point you to Barry Graubart, who cites a number of user organizations in a blog entry about a Software & Information Industry Association seminar last June (which I attended),
Semantic Technology Driving Real Revenue for Publishers.

In the end, my only quibble, as an industry observer, with Temis’s leadership claim is that I see the company as “a” rather than “the” leading semantic-content-enrichment provider. My issue with Stephen Arnold’s Temis slur, while I don’t have a dog in the fight, is far more significant. I find Arnold’s vindictive payback quite distasteful and even destructive. It calls out for a response. I hope I’ve adequately answered the challenge.

Posted in marketing, semantics, text analytics | 12 Comments

What are the most powerful open-source sentiment-analysis tools?

I took a stab at a Quora question, What are the most powerful open-source sentiment-analysis tools?. Here’s my response:

I know of no open-source (software) tools dedicated to sentiment analysis. Instead, a variety of open-source text-analytics tools — natural-language processing for information extraction and classification — can be applied for sentiment analysis. They include –

- Python NLTK (Natural Language Toolkit), http://www.nltk.org/, but see also http://text-processing.com/demo/sentiment/

- R, TM (text mining) module, http://cran.r-project.org/web/packages/tm/index.html, including tm.plugin.sentiment.

- RapidMiner, http://rapid-i.com/content/view/184/196/.

- GATE, the General Architecture for Text Engineering, http://gate.ac.uk/sentiment/.

I’m sure you can also find UIMA-plug-in annotators for sentiment — Apache UIMA is the Unstructured Information Management Architecture, http://uima.apache.org/ — also sentiment classifiers for the WEKA data-mining workbench, http://www.cs.waikato.ac.nz/ml/weka/. See http://www.unal.edu.co/diracad/einternacional/Weka.pdf for one example.

I bet someone’s doing sentiment with the Stanford NLP tools, http://www-nlp.stanford.edu/software/, although my understanding is the maximum-entropy classification isn’t the best approach for sentiment. I’m no scientist so I won’t go into this.

Then there’s LingPipe, which can be characterized as pseudo-open source. See http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html.

Powerful, I can’t say. Where machine learning is involved, a lot will depend on your training set.

Note that the tools above work on textual sources. There may be open-source tools out there for information extraction from non-textual, sentiment-bearing sources such as speech (with the outputs fed into a classification engine such as some fo the above), but I haven’t looked into them. If you know of any, or have additions for my list above, please send me a note (grimes(at)altaplana.com).


Want to catch up… or stay ahead? Check out the Sentiment Analysis Symposium, May 8, 2012 in New York; also the May 7 Practical Sentiment Analysis tutorial, to be presented by Prof. Bing Liu.

Posted in open source, sentiment analysis, text analytics | Tagged , , , , , , , , , , , | 1 Comment

Text Analytics in 2012

Will 2012 be The Year of Text Analytics?

But wait. Wasn’t 2011 — weren’t 2010 and a few years before that — for those in the know? I think so, and I think 2012 will keep up the pace, seeing the text technologies and solutions adopted, directly and indirectly (embedded in applications) by significantly more users than ever before.

The question originated with my friend Tom Anderson, who collected and published responses from a variety of industry figures to his Next Generation Market Research blog. I was late in answering myself. Tom will add my response, but I’ll also post directly here:

The easiest prediction to make with confidence about 2012 text analytics is continued strong market growth — my estimate is 25% on a base that likely topped $1 billion globally in 2011 — as uptake expands throughout the enterprise and as the technology becomes a must-have value-booster for broad-market survey, social/media analytics, and CRM platforms.

With less certainty: We may look back on 2012 as the Year of Question Answering, of the deployment IBM Watson/Apple Siri-type technologies to respond to enterprise and consumer information-access needs ranging from customer (self-)service to medical diagnosis, as a semanticized replacement for tired old search.

And there are signs, from market leaders such as SAP and IBM and from innovative start-ups alike, that 2012 will be the year of effective data fusion across database and text (a.k.a. “unstructured”) sources. Business can’t, won’t, wait for prescriptivist, rigid Semantic Web approaches but is instead applying analytics to the job, to discover the connections that make for truly rich data. You need analytics to operate in real time, to keep up with the data torrent. Many of those efforts will incorporate information mined from audio (speech), image, and video sources as a evolution from text analytics to content analytics picks up speed.

Check with again a year from now and we’ll see how 2012 panned out!

Posted in text analytics | Tagged , , , , | 1 Comment

From Sentiment Analysis to Enterprise Applications

If your perception of sentiment analysis was shaped by Twitter-sentiment toys, it’s time for a relook. These “toys” simplistically score tweets positive/negative/neutral based solely on keyword presence without regard for context. Semantically rooted sentiment technologies do better by getting at contextual word sense and by discerning sentiment at “feature” level, and they handle more than just social-media analyses. Online/on-social measurement and engagement are important, but businesses interact with customers and the market and collect feedback via many channels, for instance, contact centers, e-mail, and surveys. Ability to handle these diverse sources, and to integrate with enterprise systems that capture customer transactions and profiles, is an essential ingredient of enterprise-scale sentiment analysis.

How does this bit of theory play out in practice, among people engaged in real-world customer relationship management (CRM), marketing and market research, and business intelligence (BI)?  I polled three industry figures to find out: Banafsheh Ghassemi, VP, Marketing – eCRM & Customer Experience (CE) at The American Red Cross; Marshall Toplansky, president of “mass opinion business intelligence” vendor WiseWindow; and Next-Generation Market Research guru Tom Anderson, who heads Anderson Analytics. I posed them three questions, exploring the path to enterprise-scale sentiment analysis.

My  first question gets at a basic question, essentially, is sentiment analysis worthwhile? The responses address ways sentiment analysis complements established methods and channels, in time frame and as a cross-check.

Seth> What has your sentiment-analysis experience been like?  Have you gained new customer or market insights, and have you been able to do anything new, anything you couldn’t have done without sentiment analysis?

Marshall> “The big revelation to us has been the volatile nature of both consumer sentiment and the business metrics they are indicators of.  When you use traditional marketing research to understand sentiment, you are dealing with long time frames.  Research does a good job of identifying long-term trends.  But, people are living increasingly in the short-term.  They capitalize on market moves quickly and have a set of short-term tactics.  Mass consumer sentiment from online sources, unlike marketing research, has been able to identify these tactics and measure results in the kind of fast timeframes contemporary businesses require.  You could never have even designed a survey in the time it takes to get in and out of moves indicated by sentiment.”

Banafsheh> “[Let me tell you about our] in-kind donation scenario. In recent years, due to harder economic times, people have shown more interest in contributing through unsolicited in-kind donations rather than money.  Due to cost structures associated with storage, transportation, and delivery of such contributions we are unable to accept such donations.  We have seen some negative emotional reaction to this on our social networks.  However, when we look for context in other channels where the same interest is voiced, such as calls to our public inquiry line, we have seen that we could do better to proactively communicate and educate the public in why we don’t accept these donations.  The proactive awareness could very well minimize, if not eliminate the negative sentiments that come with a preconceived expectation.”

Tom> ”There’s no doubt that sentiment analysis has been useful on several projects and that we have gained market insights thanks to being able to segment and code data based on various sentiment approaches. Overall usefulness really depends on the data though, which is different on a a case by case basis.

 ”Most of our data contains a lot of fields other than unstructured data, so we’ve been lucky to have a lot of options from the beginning. Sentiment becomes a lot more important when you are looking at data that is less rich, like Twitter which more or less is just 140 characters with a data stamp.”


On to Q2, how sentiment relates to other data elements. My one-sentence summary of the three answers is, There’s clear correlation, but you don’t want to make too much of it. But read the responses for yourself.

Seth> How are your organization and clients matching online or social sentiment or enterprise feedback with information from other data sources?

Banafsheh> “I wouldn’t say we ‘match’ the information, but look for context in our other more structured (e.g., CRM) and richer data sources (e.g., email which is extremely rich). [The in-kind donation scenario illustrates this.]

Marshall> “We have seen a good deal of client interest in correlating online consumer sentiment to a number of important business metrics.  For instance, in one case, correlation analysis found that changes in online sentiment relating to product problems was found to be a leading indicator of call center volume.  In another, changes in weekly sentiment related to product quality were found to lead weekly stock prices.  And, in another, sentiment related to a leading musical group was a strong indicator or changes in sales of music.

“To our minds, this is not surprising.  It seems obvious to us (with plenty of hindsight) that when you have hundreds of thousands of people expressing their preferences, their actions will tend to follow.”

Tom> “It depends on the project, there have been many so we’ve probably done just about everything at least once. We’ve merged LinkedIn data with survey data, CRM data with social media and survey data, and call center data with matching operator notes with field technicians notes, to name just a few.

“I think though that there is a misconception out there among some that it’s a good idea to pipe all text data into one source and make comparisons across data. Having worked with quantitative data for many years I can tell you that often times those cross comparisons sound better in theory than in reality. I think it’s far more important to think about the various resources you have, identify the most important ones and look at them individually. After that is done, you will have a better understanding of what can be gained by merging the data.”


My last question was intended to be practical and forward looking –

Seth> Do you have any guidance for folks who are new to sentiment analysis or who have been using sentiment technologies only for social-media analysis?

Banafsheh> “First, I would say using sentiment technologies with social data is valuable in any Voice of the Customer (VOC) program toolkit to the extent that it is not the only data source used. Otherwise, the insights produced will be very narrow and limited insights as it would be if any one of the other data source was used. If you examine each one of your touchpoints and channels where free format VOC is captured, such as surveys, call-center notes, customer e-mails, etc., in isolation, you will very likely find sentiment patterns in each that is more slanted in one direction than another.  For example, you may find your surveys show more positive sentiment, your letters may show more neutral, and your Twitter feed more negative, and so forth.  So if you only took one source you will draw incomplete conclusions.

“Second, these tools are still fairly challenged when it comes to social ‘parlance,’ the abbreviations, excessive snark, emoticons and so forth.  So be aware of those limitations and recognize that the outputs will still require human intervention and validation of the results.  Recently we compared coding using SPSS’ text analytics with the coding manually provided by Red Cross volunteers and found poor agreement. The text analysis software coded 26% of the positive comments as positive. The software was unable to assign a sentiment to more than half.  The 21% of positive comments coded as negative by the SPSS software were generally related to the use of words describing the seriousness of the hurricane or the extent of the damage.  Examples ‘hit very hard by Irene,’ ‘Many blood drives were forced to close,’ or ‘Instead of whining about the dud, how about joining the Red Cross.’ There was modestly more success with negative comments, where 53% were coded correctly by the software.”

Marshall> “My advice is about WHO should be adopting the use of sentiment technologies.  Give it to the people that run real metrics in the company.  If you let market research people control this tool, you will move too slowly.  If you give it to the social media engagement group, you will get no strategic value from it.  Give it to the operations people, who have to create better forecasts and design real-time key performance indicators for the business.  This is where the real strategic and tactical value lies for sentiment analysis.”

Tom> “In regard to sentiment generally, the tendency is to compare machine coded sentiment to human coding. The problem is most of us are so far removed from human coding that we forget just how inaccurate it can be. We also seem to forget that a lot of data really is neither negative nor positive.

“Personally I don’t like to compare human coding to machine coding, we find that we can do so much more with machine coded data. If companies are looking at text analytics and sentiment analysis as just a way to cut or eliminate human coding costs, they’re not understanding the true benefits.


I’ll split out part of Tom Anderson’s answer, regarding social media analysis, which characterizes as a “pet peeve.” Tom’s thoughts on this particular topic –

Tom> “Everyone is talking about social media analysis. In reality though 99% of what’s being called social media analysis is just Twitter or Twitter plus blog data. This represents only about 10% of the population, and for those of us who blog or are on Twitter, we know just how ‘special’ this population and messages we propagate are. A lot of it is definitely rather promotional in nature.

“I’m not saying Twitter and blog analysis has no value. It’s good to understand what drives online buzz and consequently some of the Search Engine Optimization (SEO) efforts. However, until Facebook tears down their walled garden (I think it will happen soon), we’re not seeing what most regular people are saying. Until then more focused research among brand enthusiasts on special discussion boards will probably remain most useful. Sadly, relatively little of this is actually being done. Those projects we’ve done in that area have been rather successful, and sentiment analysis certainly was a critical component in all of them.”


I had posed my questions to Marshall Toplansky, Banafsheh Ghassemi, and Tom Anderson in connection to the November 9, 2011 Sentiment Analysis Symposium. Banafsheh was a panelist, Marshall gave a lightning talk and his company was a sponsor, and Tom would’ve attended if he hadn’t had a schedule conflict. The conference went really well. See for yourself: Videos are online at sentimentsymposium.com/SS2011w/presentations.html. And please do revisit the symposium site in mid-January, when information on our May symposium, the 3rd New York symposium, should be online.

Posted in CRM, marketing, sentiment analysis, social media, text analytics | Tagged , , , , , , , , , | 1 Comment

How I Estimate (Social/Sentiment/Text Analytics) Market Size

Business loves market-size estimates. They quantify opportunity, measured as the gap between an existing and an addressable market, with growth rates providing a reality check. They help an individual solution provider understand how it stacks up against the competition, and they guide investors in deciding where to place funds. Being an obliging consultant and industry analyst, I do my best to compute good estimates for certain technology sectors I cover, for instance, text analytics.  My methods are quite simple, actually. (The real smart-work is in the data collection.) Allow me to clue you in on how I estimate social/sentiment/text analytics market size.

First, why those particular analytical software technologies, and why is an independent analyst the best (or even the only) source of market-size figures?

Text analytics applies natural-language processing (NLP) to extract information from text, bringing text-sourced business intelligence to a broad array of applications. Sentiment analysis is a particular NLP application, although it can also be tackled via human analysis (a.k.a. “reading” and “listening”) including via crowd-sourcing. Sentiment analysis is about finding and exploiting subjective information in content: Attitudes, opinions, emotions, and intent. Social analytics, my number 3 area, applies text and sentiment analysis to content and network analysis to understand interconnectedness and message flow. These three related analytics species are often most interesting when they’re linked to analysis of enterprise transactional, operational, and profile data, leavened with geospatial and behavioral analyses.

It’s a great time to be in these fields, which are experiencing strong, steady market growth, reflecting the technologies’ ability to make sense of online, social, and enterprise ‘unstructured’ sources. Yet the dollar/euro/renminbi/rupee value realized by vendors lags far, far behind than that of mainstream enterprise applications. In part, the gap stems from market maturity. While it took CRM a couple of decades to build to 2010′s $16.5 billion in applications revenue and BI and analytics 30+ years to get to $10.5 billion, I’d guess-date their first significant commercial uses back 12 years from text analytics, 6 for sentiment analysis, and 3 for social analytics. Further, the newer, more specialized analytical technologies aren’t widely built into everyday business operations despite their ability to transform business-stakeholders interactions.

Technologies that are broadly applied though too new or too narrow to have drawn the full attention of the big-firm analysts. Yet enterprise use seems typically, still, at an individual, small-group, or departmental level, or it forms only a small part of much larger applications such as e-discovery and customer experience management.  On the one hand, limited-scope enterprise use flies under the radar of enterprise CIOs and IT executives who buy expensive consulting and reports from the likes of Forrester, Gartner, and IDC so that analysts at those firms, were they inclined to run market-size numbers (Forrester, at least, isn’t in that business), don’t have a business driver.  And solution-focused analysts, concerned with e-discovery, market research, and CEM, very justifiably don’t typically get into the nitty-gritty of technology details.

Enough of the why. How do I compute market-size estimates?

My approach starts by identifying companies wholly or partly in the space. For social analytics, “partly in the space” includes Social CRM/engagement, competitive/market intelligence, and customer experience vendors. Similarly, text-analytics is a contributor to e-discovery, publishing, intelligence, and other applications.

I then find or estimate revenues and growth rates. Sometimes it’s easy: A company is publicly traded or for some other reason is required to release figures. French companies, for instance, must release revenue figures, which can be found online via sites such as Bilan Gratuits. Companies that do business with the U.S. government have to provide figures, which become part of public records that are searchable at sites such as USAspending.gov. Data on fast-growing companies is exposed when they apply for Inc 5000 recognition. A tip: You can learn a lot by simply, directly asking the CEO for data. Sometimes she’ll tell you, about her company and about competitors. A promise to keep individual data points confidential — to release only aggregate figures — can help.

When figures aren’t current, an estimate of past years’ growth, for an individual company and for a sector, can be used to project them forward.

When I can get only whole-company figures for a vendor that’s in multiple markets — large companies such as Autonomy, IBM, SAP, and SAS are examples — I allocate a portion of overall revenue to the narrower space.

Where necessary, I create or adjust estimates based on elements such as a) typical/average deal size, multiplied by the number of customers, b) headcount multiplied by an estimate of revenue per employee for a company at the subject-company’s market stage, and c) investment, factoring in the proportion of ownership that a given investment likely bought, dividing by a typical valuation-to-revenue multiple.

A fair amount of guess work is involved, also many approximations, and boundary setting is part of the game. I bin the ankle-biters — low-revenue start-ups, many of which are flying under the radar — into a single guesstimate. I exclude the value of academic, government, and industrial research from my estimates. Work doesn’t contribute to a market valuation unless something — a product or service — is sold. I also exclude the sometimes-substantial value of work done for in-house use, for instance text analytics done by companies such as Thomson Reuters or Reed Elsevier in the course of creating information products.

I end up with estimates that I consider accurate enough to release to clients or in an article, often with a disclaimer that my work should be taken as inexact and that actual values likely lie in a defined interval around my released figures. The aim is to provide useful business guidance, to support confident decision making in fast-moving, dynamic technologies markets.

Posted in sentiment analysis, social media, text analytics | Tagged , , , , , | Leave a comment

Entry-level Choices for Concept/Topic Extraction and Sentiment

I received an inquiry –

“I’m doing some industry research, and I’d like to run some documents through a text analytics program to understand sentiment and key concepts/topics present in the documents. Probably a ‘few hundred’ documents over all.

“Can you recommend something simple and low cost that I could try? Perhaps something in the open source community?”

Here’s my response –

“I’m guessing you want to do minimal or no training, that is, you want a tool that will discover concepts and topics on its own? You could try Wordstat from Provalis Research (http://www.provalisresearch.com/wordstat/Wordstat.html) or Leximancer (https://leximancer.com/). Neither does sentiment out of the box although in principle, sentiment is just a classification problem that either would be able to handle. If you’re willing to do some training: Provalis has a tool called QDA Miner that supports coding and is linked to Wordstat.

“Your best bet, however, may be RapidMiner, which is free, open source. See http://rapid-i.com/content/view/184/196/ .

“I hope this helps. Please let me know what tool you chose and how it worked out.”

What (else) would YOU recommend?

Seth

Posted in Uncategorized | Leave a comment

My New Approach to Blogging

It has been almost a year since UBM TechWeb folded Intelligent Enterprise into InformationWeek, a larger-circulation, better-resourced publication.  Yet I’ve written only four 2011 articles for IWK, the last published May 12.  IWK’s sense of itself — audience, topics, voice — is different from IE’s, and I no longer have the freedom to post whatever I feel will appeal to the BI/analytics audience.  My work is now edited; I used to be able to post directly. Editing may lead to clearer writing, and being read by a larger audience is great, but the need to respond to an editor’s requests imposes a significant time burden.

It’s time to accept the new (to me) rules and get back into the game.  I know, finally, how to do that in a way that nonetheless allows me to continue to work on my own terms.  A free-standing blog is the answer, this blog.  I will post here and also make entries available for selective republication by IWK or, when they’re not an IWK fit, other platforms.

This initial blog entry is my way of closing the door, yet my first step will include one last look back, an explanation why I felt cast adrift by the demise of Intelligent Enterprise.

I’d long written for IE, also occasionally for IWK and a variety of other outlets. I mentioned above the benefits of writing for InformationWeek, also the loss of freedom.  That IWK showed little regard for old Intelligent Enterprise content didn’t help. The old intelligententerprise.com URLs stopped working, forcing me and other authors to update our publication lists and, far worse, rendering useless Web-published links to our work. IWK couldn’t be bothered to devote a few programmer-days to mapping the old URLs to new ones?  An insult, frankly.  IWK did migrate old content, however with no regard for layout.  Check out an example.

These complaints made, let’s move on.  I have a variety of topics to write on.  It’s time to get back to article writing.

P.S. I’ve started a second blog for a part-time gig of mine.  Check it out at sethgrimes.blogspot.com.

Posted in Uncategorized | Leave a comment