Category: Uncategorized

2013 1H Conferences in Social Media, BI, Big Data, and Sentiment Applications

As a sometime conference organizer, I need to track events that will compete for the attention of my target audience. The competition consists of conferences internationally, or focused conferences in the New York region (where my next Sentiment Analysis Symposium will take place, on May 7-8, 2013), in areas related to or applying sentiment analysis or more general business intelligence technologies.

Why not share what I’ve found with others? I’ll be attending a number of the following events myself, and those who will find sufficient value in my event will attend it regardless of the competition. If you can think of a conference I’ve missed that belongs on my list – that is, with significant market presence, beyond local workshops – please drop me a line and I’ll track it.

So here goes — six months of social media / market research / customer experience / BI conferences — late January to July, 2013:

Reblogged: Is It Time For NoETL?

(TechWeb’s Intelligent Enterprise published my “Is It Time For NoETL?” article on Wednesday, March 24, 2010. IE was subsequently rolled into InformationWeek and TechWeb abandoned much of its old content, including my article.)

I’ve been bemused by NoSQL, the movement that propounds database-management diversity with the very valid claim that a one-size-fits-all relational approach is a poor match for emerging, demanding data challenges. Didn’t we all know that relational databases, based on tables and joins, aren’t always best? Hadn’t the issue been the lack of usable, reliable, enterprise worthy alternatives? Similarly, haven’t we long understood that wiring-up extract-transform-load (ETL) is laborious — all those adapters and rules and the need for hand-matching — even if necessary given the perceived need to gather, cleanse, and integrate diverse BI data sources? Is that preparatory data work still essential? Or is it now time for a NoETL movement, reflecting a new world of liberated, semantically enriched, analysis-ready, mashable data?

NoSQL

The “SQL” of NoSQL is Structured Query Language, which has been closely associated with relational databases since the ’70s, since the RDBMS early days. With Oracle’s and IBM’s support, SQL vanquished superior alternatives such as Ingres’s Quel. SQL is an easy target for criticism, on its own and standing in as a proxy for relational systems.

SQL is stateless, given which limitation, vendors have wrapped it in diverse, incompatible procedural languages to support multi-step data processes. SQL’s set-oriented approach creates a data-handling burden for application programmers so we have cursors, a row-/record-oriented retrieval kludge. Correlated subqueries are a usability nightmare, and the check-list demand for ACID compliance — transactional atomicity, consistency, isolation, durability — is simply overhead overkill for analytical applications.

NoSQL is a catch-all term for a grab bag of relational alternatives. NoSQL is a New Testament that seeks to supplant the Codd of Old.

SQL’s deficiencies have been known for years; nonetheless, SQL has served the database community well and supported the creation of immense business value for the many, many millions of RDBMS end users. So have the ETL technologies that feed relational (and other) databases from flat-file, spreadsheet, operational-system, and database sources — technologies, plural. Is ETL still relevant in a world of semantic computing?

Semantic computing

Semantic computing relies on meaning-ful data. That data may be stored in RDBMS tables with an associated metadata repository. It may be modeled with a graph structure, described via RDF (the XML based Resource Description Framework), and captured in a “triple store” for query via SPARQL. It may be mapped into an ontology, a mechanism for knowledge representation. (“Knowledge” here is a network of relationships, a.k.a. facts, that link entities within a subject-matter domain.)

Semantic computing involves methods and software designed to mine meaning, relationships, and usages from sources both conventional and unconventional, from structured databases and from the chaos that is the Web. All that good stuff is inferred from whatever definitions, data profiles (i.e., information on the distributions of the values of variables), and context are available.

The payoff is that you have all the ingredients necessary to support dynamic integration, to enable as-you-like-it data mashability.

Dynamic integration: NoETL

A number of tools claim/aim to support dynamic integration, some metadata or semantics driven, so that are essentially visually programmed without reliance, for the end user or behind the scenes, on the ETL equivalent of SQL. They include companies such as Expressor, Progress Software, and JackBe, the latter an enterprise mashup vendor.

I’ll credit JackBe with prompting me to think much more intently about this stuff than I would have otherwise. I wrote a short paper for them, Nimble Intelligence: Enterprise BI Mashup Best Practices, and presented on the same topic in a JackBe webinar yesterday. (I was paid for this work and for strategy consulting.) The thought is that mashups bring agility to BI, the possibility of integrating the data and application elements you need, when needed, without much or most of the overhead typically associated with conventional BI.

It’s freedom baby, yeah!

NoETL is an extension of this concept, actually a sort-of retake on Enterprise Information Integration (EII), a once-promising but now neglected notion that one can successfully build and query a unified virtual schema, spanning data sources, without requiring data collection into a single data warehouse or repository. In considering NoETL, let’s recognize the value of traditional ETL and of EII and use them where they fit best. Let’s also understand the promise and power of semantics, and of the diversity of NoSQL-ite data representations, in seeking data integration approaches that enable truly agile BI.

Please check out the Text Analytics Summit, Boston, June 12-13

The next Text Analytics Summit is coming up in four weeks. The June 12-13 conference will be the 8th annual Boston summit, the 8th Boston summit I’ve been privileged to chair. Will you join us?

The summit series was the first business-focused conference dedicated to BI on text, to techniques that turn text into data in the service of diverse applications. It remains the best, a testimony to outstanding speakers, great networking opportunities, and the unparalleled importance text plays in the Social, Big Data era.

As chair, I can extend to you a special $300 registration discount, via the code SG12. Use it and hear speakers on customer experience, marketing, e-discovery, financial services, and social-media analytics — from organizations that include American Express, eBay, Fidelity Investments, Maritz Research, Monster.com, NASA, and Walt Disney. Visit www.textanalyticsnews.com for information, and follow the Registration link to register now.

Whether you’re a veteran user or just getting started with text analytics, I hope you’ll join us next month in Boston!

Decoding Content at Tech@State: Real Time Awareness

I’m moderating the panel this afternoon at the Tech@State conference, convened by the State Department, taking place at George Washington University. We — David Broniatowski (Synexxus), Ravi Patel (Yahoo! Research), Noah Smith (Carnegie Mellon Univ), and V.S. Subrahmanian (Univ of Maryland) — will have 90 minutes of What Does It Tell Us?, looking at sense-making technologies that operate on social and online sources, within the context of the conference’s real-time awareness focus.

Here are my planned panel intro and starter discussion questions, shared in the hope that they, on their own, will provide insights into questions.

Panel intro

Our brief is to look at “Analyzing the vast amount of readily accessible data that flows constantly across the internet uncovers details, information and relationships that were unavailable a few years ago. This panel will examine methods and practices to glean sentiment from words and text, look at using this data to predict the future and discuss what information social networks can reveal – all accomplished with no limitation on language and on a real-time or near real-time basis.”

There’s a huge amount in that assignment. I count a dozen notions that are worth exploring. Start with “vast amount,” “readily accessible,” “data”,” “flows,” “constantly,” “analyzing… to uncover details, information, and relationships,” “unavailable a few years ago.” Then there’s “sentiment,” “predict the future,” “information social networks can reveal,” “no limitation on language,” “real-time or near real-time.”

That makes twelve notions (putting aside that some of them aren’t even atomic), or perhaps the count is muliplied given the interplay among individual notions. How do we detect “events” in “flows” and use them to “predict the future”? How is “sentiment” “data”? Is it truly “readily accessible,” and is there really “no limitation on language,” particularly when seeking to understand subjective information such as sentiment?

Let’s hear what our panelists have to say on the these points such as these, in particular as relates to today’s theme, Real Time Awareness.

… and Questions

It’s my job as moderator to prompt an interesting conversation. These questions will, I hope, serve the purpose. My expectation and hope, by the way, is that we’ll get through only a few of them. Here they are:

  1. Let’s start with sentiment. What role do sentiment, opinion, emotions, attitudes — various forms of subjectivity — play in analyses of the online and social worlds?
  2. Say you’re an analyst tasked with some business or research function (and I do include here study and formulation of policy, program analysis, intelligence, political strategy, and so on). There’s lots of information in text: “named entities” (people, place, organizations, and so on), geolocation, events, sentiment and opinion, identity clues, and so on. And then there are those imperatives: “real time,” prediction, flows. Where do you start, that is, what are the most important elements to understand, and the most important technical capabilities to have?
  3. We’re interested in social networks. Well, myself, I don’t view Facebook or Twitter as a social *network*. Instead, they’re platforms where networks consist of connected individuals and organizations whose links are rarely limited to any single platform. Certain technologies provide the ability to track individuals across platforms although they’re as-yet controversial. Anyway, to my question for you: How do analysis of content and of networks mesh up? Analytically, how do you match what people say to their actions and interactions? What can be learned from this sort or analysis?
  4. Has there been anything really cool, on the language technology front, that has emerged recently? IBM Watson, Siri, Wolfram Alpha? Something else? Only one rule for responses to this question: Please tell us about something other than what you’ve been working on yourself.
  5. How important, and how doable, are cross-lingual or multi-lingual analyses?
  6. There’s a temporal dimension to our analyses. Information sources capture, and are themselves, events. Patterns both simple and complex emerge from studying sources over time. Even the meaning of information evolves, both because today’s observer has different concerns from yesterday’s and because language changes over time. What’s your view on temporality and temporal analyses?
  7. In our discussion before today, some of you wanted to talk about the interplay between technical approaches and social-science techniques. Please tell us about that interplay.
  8. How is our field evolving? Where have we been and where are we heading? How do we make our tools more relevant now and more adaptable to emerging needs?

Entry-level Choices for Concept/Topic Extraction and Sentiment

I received an inquiry –

“I’m doing some industry research, and I’d like to run some documents through a text analytics program to understand sentiment and key concepts/topics present in the documents. Probably a ‘few hundred’ documents over all.

“Can you recommend something simple and low cost that I could try? Perhaps something in the open source community?”

Here’s my response –

“I’m guessing you want to do minimal or no training, that is, you want a tool that will discover concepts and topics on its own? You could try Wordstat from Provalis Research (http://www.provalisresearch.com/wordstat/Wordstat.html) or Leximancer (https://leximancer.com/). Neither does sentiment out of the box although in principle, sentiment is just a classification problem that either would be able to handle. If you’re willing to do some training: Provalis has a tool called QDA Miner that supports coding and is linked to Wordstat.

“Your best bet, however, may be RapidMiner, which is free, open source. See http://rapid-i.com/content/view/184/196/ .

“I hope this helps. Please let me know what tool you chose and how it worked out.”

What (else) would YOU recommend?

Seth

My New Approach to Blogging

It has been almost a year since UBM TechWeb folded Intelligent Enterprise into InformationWeek, a larger-circulation, better-resourced publication.  Yet I’ve written only four 2011 articles for IWK, the last published May 12.  IWK’s sense of itself — audience, topics, voice — is different from IE’s, and I no longer have the freedom to post whatever I feel will appeal to the BI/analytics audience.  My work is now edited; I used to be able to post directly. Editing may lead to clearer writing, and being read by a larger audience is great, but the need to respond to an editor’s requests imposes a significant time burden.

It’s time to accept the new (to me) rules and get back into the game.  I know, finally, how to do that in a way that nonetheless allows me to continue to work on my own terms.  A free-standing blog is the answer, this blog.  I will post here and also make entries available for selective republication by IWK or, when they’re not an IWK fit, other platforms.

This initial blog entry is my way of closing the door, yet my first step will include one last look back, an explanation why I felt cast adrift by the demise of Intelligent Enterprise.

I’d long written for IE, also occasionally for IWK and a variety of other outlets. I mentioned above the benefits of writing for InformationWeek, also the loss of freedom.  That IWK showed little regard for old Intelligent Enterprise content didn’t help. The old intelligententerprise.com URLs stopped working, forcing me and other authors to update our publication lists and, far worse, rendering useless Web-published links to our work. IWK couldn’t be bothered to devote a few programmer-days to mapping the old URLs to new ones?  An insult, frankly.  IWK did migrate old content, however with no regard for layout.  Check out an example.

These complaints made, let’s move on.  I have a variety of topics to write on.  It’s time to get back to article writing.

P.S. I’ve started a second blog for a part-time gig of mine.  Check it out at sethgrimes.blogspot.com.