Machine learning is cool, but let’s spend a few minutes talking process – the application of data science to derive business insights. Let’s look in particular at capital markets, where news and mood drive trading strategies.
Trading is highly competitive, yet traders like to talk, and StockTwits – “the largest social network for investors and traders” – is where they often do it.
Traders flock to the platform to share assertions and perceptions, analyses and predictions. This activity produces a combination of hard data and subjective information that can be profitably modeled via natural language processing, sentiment analysis, and machine learning.
StockTwits Senior Data Scientist Garrett Hoffman will be speaking on March 27 at the Sentiment Analysis Symposium in New York, on Deep Learning Methods for Text Classification. In the run-up to the symposium, I’ve taken the opportunity to interview him about his work…
How StockTwits Applies Social and Sentiment Data Science
Seth Grimes> StockTwits hosts both a social network and a sophisticated data-tools platform. What’s the intersection?
Garrett Hoffman> We strive to consistently create new and innovative ways to use data with the express purpose of helping our users discover new trading opportunities at the right time. Idea generation is a huge barrier for active trading, and StockTwits is in a unique position to bring creative solutions to this problem. This remains crucial as the next generation start to invest, as they are digitally-native users preferring social interactions around the decisions they make.
Seth Grimes> Your title is senior data scientist. What’s your StockTwits role?
At StockTwits, I use data to solve problems and build features and content for our platform. I break this down into three focus areas:
- Production Data Science involves building features into our platform that are driven by data and embedded into our technology stack. Our financial sentiment model that scores all messages as bearish or bullish based on the content, data driven recommendations on who to follow, or internal tools screening content for spam and abuse all fall into this category.
- Product Analytics looks at how our users are using the platform to assess the potential impact of future product decisions, particularly around changing features or adding new features. We may look at behavior around messages tagged with multiple stocks or around how users trade directly through StockTwits with their connected brokerage accounts.
- Insights Research involves looking at the content produced on our platform to derive unique insights from the StockTwits community as it relates to the market, or a specific event about the market. This might pertain to how sentiment and message volume on StockTwits changed for Tesla leading up to/following an earnings report. It could also summarize our community’s reaction to volatility in Bitcoin pricing. These insights are often published on our blog and in other financial media channels.
Seth> Your work applies machine learning to understand social dynamics. What’s the operative definition of “social dynamics”?
Garrett> Social dynamics is the study of social processes within a system. It involves analyzing individuals, groups, interaction-derived relationships, aggregate group and system behaviors, and how these things evolve over time. At the most macro level, this would be analyzing society as a whole, an extremely difficult problem that people have worked on for centuries. The internet gave rise to many digital micro-communities via social networks — StockTwits, Facebook, Twitter, Reddit, and so on. “Micro” is used loosely, of course, as these systems contain communities consisting of millions/billions of users. The explosion of data created by these communities as they interact makes these systems ripe for research and experimentation to draw conclusions about the social dynamics of the network. In the end, understanding these systems creates better experiences for users.
Seth> Whom are you studying and what are your data sources?
Garrett> Our main focus is understanding social dynamics within the StockTwits community (i.e. users and stocks, including ETFs, FX, cryptos, and what have you). For user-to-user and user-to-stock relationships, we can consider a user’s experience on our platform — the content they create and engage with, the stocks they watch and own, and the investors that they follow. Second-order stock-to-stock relationships can be derived from these same experiences, but are also considered directly through a discovery product that we offer called Lists. Lists are like a “playlist” for stocks and consist of a group of stocks that are related via manually curated conceptual themes. The “Self Driving Cars” list would contain car manufacturers like GM, Ford and Tesla, technology researchers like Baidu and Google, and hardware manufacturers like AMD and NVIDIA. There are also lists related by a technical theme based on algorithmically calculated technical indicators. “Crossing Below the 200 Day Moving Average,” a technical indicator that provides a bearish signal to investors, is a good example of this. As of today, we mainly focus on what happens inside of StockTwits, and the only external data we tie in is data from a user’s brokerage if they choose to connect it and trade directly through our platform.
Seth> You’ve written that you believe that data science is really about people – using what we know or can learning about complex systems to drive optimal decisions, experiences, and outcome. What are a couple of examples, illustrating how the insights you derive as used in decision making?
Garrett> Our Market Sentiment Model is a great example of this. Users incorporating StockTwits sentiment into their decision-making process often find it can be a great contrarian indicator; if the investor thinks s/he might want to buy a stock but the sentiment is overly bullish on StockTwits, they may think twice about it. Users can tag their content as bullish or bearish, though only about 20%-30% of all content is generally tagged, and these tags generally have a slightly bullish bias. Incorporating our real-time sentiment model increases this coverage to 100% of messages posted on StockTwits, providing a richer underlying data set to power metrics our investors incorporate into their trading flow.
Product analytics also play a huge role in creating a better StockTwits experience. Many traders make short-term investments around companies reporting earnings. Researching the behavior around our Earnings Calendar revealed it’s a popular feature for those who use it. Insights like these may impact our internal decision making around the navigation flow of our apps to make useful features like these easier to discover for new users.
Seth> You’ll be presenting on Deep Learning Methods for Text Classification at the upcoming Sentiment Analysis Symposium. What’s your toolset and
Garrett> For data science work, we primarily use the “Python Data Science Stack” which consists of open source software such as Numpy, SciPy, Pandas, Scikit-Learn, Jupyter Notebooks (for research and prototyping), and Flask (for API deployment). For our research involving deep learning, we use TensorFlow and our infrastructure is hosted on AWS EC2 instances (to easily spin up GPUs when necessary). Specific deep learning methods we explore are Recurrent Neural Networks and their variants, Word Embedding methods like Word2Vec, and other methods for representation learning like Autoencoders.
Seth> How do you stay on top of machine learning developments?
Garrett> Engineering and data blogs, conference talks, or Twitter accounts of researchers from top tech companies like Google, Apple, Spotify, and small tech startups are great ways to keep up with how others are using machine learning. If you have a math and computer science foundation, free online courseware from platforms like Coursera, edX, or Udacity are great resources for learning about machine learning and deep learning methodologies. Once you have a solid foundation in these methodologies, you can find research papers about complex methods and new developments published on ArXiv to further your studies.
Seth> What are you able to do really well, and what’s on your to-do list for improvement?
Garrett> Our team at StockTwits does two things in particular really well: We put the problem we are trying to solve first and the methods/models for tackling the problem second.
One area I’d like to improve in — and we are seeing this as a major pain point for a lot of data science teams — is “DevOps” around data science. By this, I mean the bridging the gap between this research/prototype phase and embedding it within the rest of the tech stack to deploy the new feature into production. Data science is pretty new to a lot of companies, and usually sits somewhere between Product and Engineering. Companies can address this problem by creating a collaborative culture between the Data Scientists and Engineers, and putting the right resources together to move from prototype to production as quickly and seamlessly as possible.
Seth> And key lessons learned?
Garrett> Advancements in big data, data science, and machine learning breed the misconception that the most complex solutions are the best. Complexity is better if it leads to a deeper understanding of the underlying system that yields better outcomes. Yet complexity is a spectrum and not a binary . Even small incremental movements towards increased complexity can drive better outcomes. So simpler solutions (heuristics, traditional statistics, classical machine learning) are great ways to get started with a data product that solves your problem. From there, you can research, iterate, and prototype to improve the product with more advanced techniques.
Seth> What’s next? What will you be focused on a year hence, or two years or five?
Garrett> Now that we have a market sentiment model, one of the next things to focus on is improving what our users can do with this data. We need to give them more flexibility and customizability over the analytics. Part of this involves digging deeper into different investment decision-making workflows and understanding how to make StockTwits a seamless part of that process. Another part of it is perfecting the interaction between the trader and our platform. This is just as much an exercise in UX as it is in data science, but it is something that we are excited about.
For the long term, I am interested in thinking about how newer advancements in AI around natural language processing and natural language understanding can incorporate into the platform. As a product, this may look something like summarization of the conversation around certain stocks or topics, improved attribution of sentiment around a stock to a specific topic, or possibly interacting with the platform via speech.
Seth> Thanks Garrett.
This interview with StockTwits data scientist Garrett Hoffman has described data science and machine learning applications that provide idea-generation solutions for active traders. Meet him at the Sentiment Analysis Symposium, March 27 in New York.