Random Quote

Try not to become a man of success, but rather try to become a man of value. — Albert Einstein

Subscribe

Enter your email address below to receive notifications of new posts.

Exploring social media analytics

I’ve previously touched on the use of social media analytics as one useful measure of investor sentiment. It does occur to me that it might be interesting to explore a particular source of such analytics. This could be of service by making the concepts more tangible and by illustrating the growth of the social media in the past few years.

There are many forms and platforms of social media, but a good starting point is the stream of Tweets and StockTwits. This is not because there’s no relevant information in the news or long-form blogs. The microblogs just provide a generally more timely media, and one which is more amenable to automated extraction of emotions and sentiment.

There are many providers of Twitter-based sentiment analytics out there, including the likes of Social Market Analytics, StockPulse, MarketProphitiSentium, Dataminr, RavenPack and MarketPsych, and we should not forget the various offerings available on the institutional Bloomberg and Thomson Reuters platforms. I’ve seen some interesting opportunities in the data from some of these providers, but those will need to stay private for contractual reasons.

In this post, I use public domain data from PsychSignal. These free data are available on daily frequency and delayed by two days, but for the present purposes the lack of intraday signals and the delay don’t matter. There’s an API, but PsychSignal data are also available on Quandl. If you’re not yet familiar with Quandl, it is very much worth a look, as it brings a wealth of global financial time series data — free or premium — right to your fingertips.

But enough of the preliminaries, let’s get started. If you are interested in the social media sentiment for Apple, for instance, you’ll want to look at Quandl codes PSYCH/AAPL_I (for sentiment readings) and PSYCH/AAPL_R (for sentiment volume). The data are available separately for bullish and bearish messages, aggregated by the stock or instrument. There’s data from July 2009 for many stocks, but the inception date does vary. The Quandl data can be exported (as CSV, JSON, or XML) for use in a spreadsheet or in the analytical software of your choice. There are also libraries which allow you to pull the data directly into Python, R, Matlab, and so on.

As an example, the following API call pulls out the most recent 10 days worth of sentiment data for Apple:

https://www.quandl.com/api/v1/datasets/PSYCH/AAPL_I.csv?auth_token=XXXXXX&sort_order=desc&rows=10

Note the use of an authorisation token (masked in the above) which identifies yourself to Quandl server. You get your own token once you’ve signed up and created a Quandl account. With the above API call, this is the data you’d get, indicating a pretty much even balance between positive and negative commentary in the social media:

If you wanted to use the PsychSignal API directly, the following call would return the Apple sentiment.

https://api.psychsignal.com/v1/sentiments?api_key=XXXXXX&symbol=AAPL&period=d&from=2015-01-23

The API only returns data going back for a few weeks at best, but the full history is available upon request. You’d receive your own API key (masked above) once you’ve signed up. The PsychSignal data are provided (by default) in JSON format.

At the time of writing in early February 2015, a technical caveat is in order. I’m told that PsychSignal is in the process of upgrading their API. Until the upgrade is complete, there are likely to be gaps and even errors in the sentiment data you see through Quandl.

Ok, so what does the social media data look like? The first thing to note is that there’s sentiment data for several asset classes: Futures, currencies, stock market indices, ETFs, ETNs, and equities. On equities, the data covers most stocks on NYSE and Nasdaq, and close to 200 stocks on TSX (i.e. Toronto). There are also time series data for stock market sectors, i.e. equities aggregated at the sector level. As to the breadth of coverage, there’s over 7000 symbols in the PsychSignal data set. Of these, about 4400 symbols are currently active with data as of 2015.

Let’s dive in to the data and have look at the growth of the social media traffic. We’ll do this by stock market sector, with the following chart showing the sum of the bullish and bearish volume since September 2009.

Chart 1. The volume of social sentiment traffic aggregated by U.S. stock market sector (source: Quandl/PsychSignal).

There’s a clear pecking order in the social media activity. Generally, the technology sector attracts the most traffic, followed by financials, healthcare, services, and consumer goods. Utilities and conglomerates have very little following, but there are, of course, way fewer companies in those sectors.

Smoothing gives us a better view of the time trends in the data. Here’s the same chart with a 250-day moving average applied to the sentiment volume.

Chart 2. The smoothed volume of social sentiment traffic aggregated by U.S. stock market sector (source: Quandl/PsychSignal).

It is apparent that the volume of the social media traffic has exploded after 2010, although some of the growth can be attributed to a wider coverage. There’s some evidence of cooling off in the past year or so especially in the technology sector. It remains to be seen if this is an indication of a shift in focus, or just a temporary hiatus.

Moving on, there’s sentiment data for 53 futures markets. The social media attention is concentrated on a relatively few instruments, though. The following pie chart (based on a sample of the last 250 days) shows that about 70% of the sentiment volume relates stop either the S&P 500, Nasdaq, Russell 2000, crude oil, natural gas, or gold futures. I’ll omit a time series chart, but it may not come as a surprise that crude oil has overtaken the stock market as a focal point of social media in the past half a year or so.

Chart 3. The relative volume of social sentiment traffic in futures markets during the most recent 250 days (source: Quandl/PsychSignal).

Although 38 currency pairs are covered by the social media data, the traffic is concentrated in the major pairs. As Chart 4 indicates, the euro, the yen, the sterling, the Aussie and the Canadian dollar dominate the discussion. These are followed by gold (which is often treated as currency in its own right) and various cross rates.

Chart 4. The relative volume of social sentiment traffic in currencies during the most recent 250 days (source: Quandl/PsychSignal).

There’s data for nearly 350 ETFs and ETNs, but about 90% of the attention is focused on just 20 top exchange traded funds and notes. SPDR S&P 500 is the undisputed king, followed by the other big stock market vehicles, gold and mining, crude oil and natural gas. The list of top 20 also includes three VIX ETNs. Long-dated Treasury bonds, biotechnology, a country fund (Switzerland) and an industrial commodity (lithium) are also represented.

Chart 5. The relative volume of social sentiment traffic in the 20 most active U.S. ETFs and ETNs during the most recent 250 days (source: Quandl/PsychSignal).

There’s sentiment data for about 3800 stocks. Again, the bulk of the attention is focused on a much smaller subset. To give some flavour of the data, Chart 6 shows the relative sentiment volume in the top 20 most active stocks or ADRs.

Chart 6. The relative volume of social sentiment traffic in the most active 20 U.S. stocks during the most recent 250 days (source: Quandl/PsychSignal).

Apple is pretty much the perennial favourite in the social media, followed by Facebook and Tesla Motors. The top 20 includes other known quantities, such as Netflix, Amazon, and Google. Interestingly, there are many stocks which are not — at least not yet — household names. There is NetQin Mobile (a provider of mobile Internet in China), Glu Mobile (a games developer), FireEye (a cybersecurity firm), Himax Technologies (semiconductors), Amarin Corporation (a small cap biopharmaceutical), GoGo (in-flight Internet connectivity and entertainment), Canadian Solar (solar technology), Baidu (a provider of Internet search in China), and E-Commerce China DangDang (as it says).

But what does the social media sentiment actually look like for specific stocks? I’ll discuss a number of examples in another post. If you cannot bear the suspense, you can, of course, do some exploration on your own. Over and out for now.

4 Comments

  1. Seb

    Hi there

    Enjoying your posts

    Which statistical package do you use? And why did you choose it?

    I’ve stuck with excel for too long and i need to learn a language – would be interested to hear thoughts

    Seb

    • Risto (Author)

      Thanks! As to the statistical packages, I tend to do much of the initial data munging in Python these days. The Pandas module is particularly handy in getting the data into a shape that you can work with. For deeper analysis and pretty charts, I go for Mathematica, you can get a lot done with a few lines of code or short programs. But if it is one language you’re looking to start with, I’d suggest Python. Others would vote for R or Matlab, though, and they have their strong points too.

      • Seb

        Thanks

        Why Python over R and Matlab?

        Are there any modules that cater to backtesting out of the box?

        Have heard Python is slower etc

        Really appreciate your help

        • Risto (Author)

          My sense is that Python has a somewhat wider following, so that there’s a large community out there, and lots of modules and folks who can help. That being said, R is popular too, and it’s really a matter of personal preference. Both are open source, while Matlab is not free (there’s Octave though which is free and mostly compatible). I don’t think you’ll find Python too slow, unless you’re looking at HF trading (in which case you’ll better of with C or C++ or similar). And there’s ways to speed up computations significantly, such as using the numpy (for vectorized calculations) or weave or Cython (for generating C code). For backtesting in Python, look at PyAlgoTrade (I write my own code though). But I suggest you do some exploration on your own, there’s plenty of language comparisons and tutorials available if you spend some time on Google.

Comments are now closed for this article.

Back

© The Behavioural Quant. All rights reserved. Powered by WordPress.