Who influences the market?
In order to identify those people who might influence the stock market, we decided to make two different analyses. The first one consists on finding the speakers whose sentiment expressed in their quotes correlates significantly with the daily variation of the stock market value. In the second analysis, instead, we look for those speakers for which it holds that when they publish sentiment-carrying words the stock market experiences a larger-than-average volume of traded shares. We then study for each speaker the correlation between the number of non-neutral quotes and the volume of shares traded, i.e. positive and negative quotes are treated equally.
- The idea of the first analysis is to detect if a speaker’s sentiment could impact negatively or positively the value of the market. We firstly transform the positive, negative and neutral values of the sentiment analysis into integers: 1,-1 and 0, respectively. Since many speakers have multiple quotes per day, we decided to set as daily sentiment the sum of the sentiments of all the quotes published in the same date because we believe that if a sentiment is repeated it should impact more on the daily market variation. At this point, we evaluated the Spearman rank correlation between the sentiment vector and the daily stock variation vector. We used the Spearman rank correlation to detect also non-linear correlations. We apply this procedure for all the speakers.
- The idea of the second analysis is to detect if, for each speaker, the number of sentiment-carrying quotes published has an impact on the volume of shares traded. Here we don’t make a difference between positive and negative quotes because both, if influential, will lead to an increase in the volume traded. So, for this analysis we set as quotes vector the one having as component the sum of all positive and negative daily quotes. We perform a one sided t-test for each speaker (where the alternative hypothesis states that volume values associated with dates with no quotes are statistically smaller) in order to compare the mean value of the volume of the days when some non-neutral quotes have been published with the mean of the other days. Since the mean is not robust to outliers, we remove them based on the interquantile range.
We now compare the results obtained with these two methods. Ideally, we would like to select those individuals who display both a correlation significantly different than zero and a small p-value for the t-test in the second analysis. The scatter graph below represents what we obtained, the x-axis represents the p-value of the t-test of point 2, while the y-axis represents the p-value of the Spearman rank test of point 1. The colors encode the Spearman correlation coefficients of the first analysis.