The Predictive Power of r/wallstreetbets

Norfolk Project
14 min readMay 5, 2021

Cross-correlation function (CCF) is a metric from signal processing that finds similarity between two series by measuring their displacement in increments. We apply time delay analysis, a method of CCF where we determine the time delay between two series by finding a time “lag” where they are best correlated.

We use time delay analysis to find correlations between mentions of tickers on Reddit trading forum r/wallstreetbets (WSB) and their price action (the change in the closing price of tickers on the stock market, in daily intervals) and volume (the number of shares traded in daily intervals). Our y-axis shows an adjusted correlation between two signals, while our x-axis shows the lag in trading days between the adjusted correlation of the two signals. Thus adjusted correlation with negative lag shows predictive power, while the adjusted correlation with positive lag shows reactionary power.

Our primary goal is to discover the strength of the correlation between mentions on WSB and action in select stock tickers and identify any predictive power that WSB may have on the stock market; particularly, to evaluate and quantify WSB’s impact on the recent short squeeze of $GME and related tickers.

Key terms: Cross-correlation function (CCF), lag, price action, r/wallstreetbets (WSB) mentions, volume

We apply CCF to a select basket of 11 tickers that represent various aspects of the stock market. These tickers are:

This is our basket of 11 tickers, which we later examine.

We made our basket to represent our goals while considering that r/wallstreetbets lacks in mentions data for many non-tech companies.

Beyond company tickers, we include $SPY to consider derivatives trading in WSB and $SLV to determine the impact of the apparent silver “short squeeze” on WSB. Our basket is mostly tickers in tech or tech-adjacent areas where we use $QQQ as a benchmark index for our basket.

Derivatives trading through QQQ on WSB is uncommon and unlikely to influence the CCF of $QQQ signals. The average $QQQ mentions per trading day are 78.2 with a standard deviation of 71.7, and since January 1, 2020 mentions have peaked at 526. It is implausible that so few retail investors could manipulate a fund whose constituent companies have a combined market cap of over $15 trillion.

These are the CCF functions graphed for $QQQ signals.

The left CCF graph is between WSB mentions of $QQQ and its price. This graph represents other stocks in our basket, and though some tickers will have higher adjusted correlation coefficients, the patterns will remain similar. The highest correlation is located around zero, and in this case, it is at lag = 3 (t = 3, r ~0.20).

The right graph shows the function between $QQQ mentions on WSB and its volume. The highest correlation is at (t = 0, r = 0.25) and when we move away from t the correlations rapidly become insignificant (t ≠ 0, r < 0.2). This is consistent with most other stock mentions vs. stock volume graphs in which the spike is at zero. Thus, stocks with significant mentions to volume correlations at any distance from t = 0 further warrant investigation. Stocks with high mentions to price correlations at substantial shifts from t=0 should also be examined.

II.

Since the emergence of the COVID-19 pandemic, stock trading forums and associated platforms have seen a sharp increase in popularity. In the first half of 2020, there were 3 million new Robinhood users, almost as much as the total user growth in 2019. This rise in active users came along with unique market conditions and heightened interest from retail traders. One company that saw a considerable rise in media coverage was Tesla, whose price appreciation saw a peak return of nearly 900% since 2020. Additionally, online trading forums on Reddit, Twitter, and discord flourished.

Tesla stock has been described as a bubble by many investment analysts. However, the cause of Tesla’s rapid price climb is heavily debated. Some research teams have attributed it to the rise in popularity of stock trading forums, such as the infamous forum WSB. Barclays is one of the biggest international banks and a bank that dedicates itself towards investment research. The automotive research team at Barclays found a statistically significant positive correlation between the number of mentions of TSLA on WSB and the performance of Tesla stock. They suggested, “social media memes can matter more for TSLA share performance than actual financial metrics, or (dare we say) valuation.” While this relationship between the two variables does exist, it is not sound for the Barclays team to use that relationship to conclude that WSB and other online attention are manipulating tesla stock. Instead of forums influencing Tesla, it is more likely that historical price action is influenced by something different entirely when looking at relationships between TSLA and mentions and volume.

The Barclays team made the bold conclusion that WSB posts affect how well TSLA stock performs. However, looking at the provided data and graphs, it seems that they found little evidence to support that claim. Below is a graphic that they provided in their article:

Barclay’s analysis of the relationship between WSB posts on TSLA and its returns.

The Barclays research team portrays the relationship between WSB mentions and TSLA returns in their article in a way that makes it appear so that it appears much more extreme and clear-cut than it is. The graph produced by Barclays shows how underwhelming the relationship is. The positive correlation is almost entirely driven by a few points with high leverage (points with extreme x-values that greatly influence correlation). If one were to take out the point furthest to the right, the one covering a day with more than 100 TSLA posts, the correlation would be a lot weaker. The correlation could even be negative given that the next two points with the highest leverage both show negative TSLA returns. The first section of the graph from 0 WSB posts to 25 WSB posts, which appears to be more than 90% of the data, is close to evenly distributed between negative and positive TSLA returns. The correlation for this section still seems positive, but due to the high amount of variability within the percentage returns, the correlation would be very weak. Their graph is essentially just a scatterplot with a few outliers that cause a statistically significant correlation. Barclays even said it themselves that the “situation has been so dynamic” and that they “cannot be confident of a stable process.” It is notable and surprising that the Barclays team ignored parts of their analysis and used this graph as a reasonable justification for their claim about the relationship between WSB and TSLA.

By applying a cross-correlation function to a stock’s mentions on WSB and its price over the same period, our team produced results similar to those of the Barclays team. In the case of Tesla, since the maximum of the CCF function occurred at a negative shift in time, it indicates that mentions are more predictive of prices in the future than the converse. Based on price action and mentions alone, it appears that Tesla’s price action is weakly correlated (t < 0, r > 0.2) to shifts in mentions on WSB over the past 15 months. That correlation improved over shorter, more recent time intervals. (t < 0, r > 0.3)

CCF for WSB mentions of $TSLA and its adjusted close across 15 months, 12 months, 6 months, and 3 months (top left, top right, bottom left, and bottom right respectively).

The next step was to apply the same process to a stock’s mentions on WSB and its trading volume over the same periods. When looking at Tesla and its CCF graph, we observe consistent results across all periods. The graphs of these relationships appear to be bell curves with negative kurtosis. For all cases, the relationship had the most correlation when no shift was applied (t = 0, r > .4). Similarly to the relationship between mentions and the stock price, the correlations also improved over shorter, more recent time frames, with even lower correlations at the tails (t = 0, r > .6).

CCF for WSB mentions of $TSLA and its volume across 15 months, 12 months, 6 months, and 3 months (top left, top right, bottom left, and bottom right respectively).

At first glance, WSB mentions appear to have predictive power over predict price. However, even at their highest, the correlation is relatively weak. Additionally, the correlations for price predicting mentions are generally statistically insignificant. It is important to note that mentions may be a function of long-term historical performance and mainstream news coverage of Tesla. The price of Tesla has gone up significantly over the past year and a half, so mentioning Tesla at any point in time would have been likely to yield positive returns. This is displayed by the Tesla price autocorrelation graph, where a stron g correlation exists (r > 0.9) for lags up to +/- 20. That is to say, the earlier Tesla is mentioned, the greater the returns are likely to be. The positive correlation can be explained by the fact that Tesla is a stock that has performed consistently well over the past year, so any media attention would give the impression that it predicted an increase in stock price.

The volume data provides a different perspective on the effect of WSB on price action. Volume is not affected by changes in mentions on WSB. The correlation coefficients are centered symmetrically around a shift of zero, meaning that volume and mentions move up and down simultaneously. Both volume and mentions have strong, symmetrical autocorrelation on lags -20 to 20, which is why “lagged” elevated correlations are around t = 0 when both are compared in a cross-correlation graph of the two variables. These factors indicate that WSB does not have any impact on the trading volumes of Tesla. This is further proof that WSB does not significantly influence the Tesla stock because if it did, there would be an apparent increase in the trading volume.WSB mentions are only a reactionary function of previous price performance and previous mentions and do not significantly impact the Tesla stock.

Autocorrelation functions between $TSLA mentions, price, and volume respectively.

While we have shown that Tesla is unlikely to be manipulated by WSB over a multi-day time frame, we have not looked into more granular data or other stocks. Smaller market cap stocks that show up on wall street warrant more investigation, but they do not get the same media attention that Tesla gets. Smaller stocks are more susceptible to social media-fueled manipulations, so more detailed research into small-cap/micro-cap stocks has a higher chance of protecting investors. Cryptocurrencies have recently been a widespread media subject, and small cryptos are very vulnerable to foul play. Given their unregulated nature and, in some cases, extremely high price swings, they are more suspect than large-cap stocks like Tesla.

III.

As of publication, there are 27.1 million results when Googling $GME (GameStop’s stock market ticker). This is twice that of $AAPL, the ticker for Apple stock.

Within the past few months, $GME has been the catalyst of a massive short squeeze and cultural movement. There has been a “rebellion in the stock market,” as described by Bloomberg Businessweek. GameStop’s share price has appreciated 3,910% from its 1-year low on July 20, 2020, and brought with it numerous other heavily shorted companies. This includes AMC Entertainment ($AMC), a movie theater chain that struggled due to the pandemic, and BlackBerry Limited ($BB), a tech company that expanded into cybersecurity and other industries. These three tickers compose our secondary basket, which we will use to observe the short squeeze.

Among the narratives circulating through financial media and retail investors is that WSB mentions are responsible for the positive $GME price action, and WSB acts efficiently and systematically. Bloomberg describes Reddit traders as “renegades” with the collective conscience to “single out hedge fund Melvin Capital… as particularly vulnerable” and “attack the [short-seller] establishment.”

As a result of this, WSB grew from a forum of fewer than 1 million members at the beginning of 2020 to 10 million members as of April 20, 2021. We ask whether the agent of price growth of tickers in our secondary basket was a short squeeze and other fundamental causes recognized by WSB members or the threat of WSB and its ever-growing hordes of online traders.

WSB mentions for the GameStop ticker and its share price in the past year or so.

There have been two notable peaks in GameStop’s share price: the primary short squeeze, occurring late January to early February of 2021; and a secondary bump in price beginning in late February that is still ongoing. This structure is typical in our secondary basket of stocks.

When observing the rapid growth of $GME mentions in WSB approaching the first short squeeze, we notice the relative change in mentions is considerably more significant than during the second price spike. Additionally, the second price spike is being sustained despite no meaningful increase in mentions, suggesting $GME share price can skyrocket without much WSB influence.

The CCFs between WSB mentions of $GME and its price, and between WSB mentions of $GME and its volume (in a 15-month interval), graphed.

Applying CCF to $GME data from January 1, 2020, to April 20, 2021, we see the highest spike near zero for both the function of WSB mentions to price, and WSB mentions to volume. From t = [-2,2], r averages to 0.5 between WSB mentions and adjusted close, and averages to 0.6 between WSB mentions and volume. Thus WSB mentions on a particular trading day are most correlated to how a ticker performs on that trading day, which falls in line with what we expect from our benchmark and consider when observing other functions.

As lag gets farther from zero, the correlation to price tends to decrease. Between WSB mentions for $GME and volume, we see results similar to a normal distribution with two statistically significant bumps near t = [-20, -16]∪[16, 20]. What is noteworthy is that for t = [0,5] $GME mentions have a high correlation to volume, suggesting volume is a predictor of $GME mentions in short time frames.

The function of WSB mentions to $GME price action is consistent with what we saw in our benchmark QQQ index. The mentions to volume CCF for $GME is similar to QQQ signals when lag is increased by 3 in the $GME function. The significance of this shift cannot be missed. The peak in price came three days before the peak in mentions, suggesting that $GME frenzy WSB was not predictive nor a driving force. Hence WSB’s involvement in significant $GME price action is either feeble or reactionary.

The CCFs of $BB and $AMC signals in a 15-month interval.

Expanding to our secondary basket, we see that CCFs between WSB mentions and volume for our secondary basket tickers all follow a shape roughly similar to a normal distribution, where r spikes as t approaches 0. A difference is that the function for $BB’s signals is more thin-tailed and thus more platykurtic than that of the other tickers. We also discover a statistically significant bump near the extremities of CCFs for $AMC and $GME, while $BB stays asymptotic to r = {0} — however, the correlations are weak enough that this is irrelevant.

With superficial differences, CCFs between WSB mentions and adjusted close for our secondary basket tickers are very similar. In all three CCFs, we again see a relationship where adjusted close is the predictor of WSB mentions. Similarly, we again see that the relation between signals weakens as t moves away from zero, especially in the positive direction.

Drawing back to our original question, it seems that both fundamental causes and the self-perpetuating inflows from thousands of retail traders were agents of the historic rise in the price of tickers in our secondary basket. Though the impact of WSB mentions on price action is reactionary, it is improbable that share price would have rocketed dramatically without the social movement and mainstream interest that the short squeeze cultivated.

WSB is not the primary agent of the short squeeze. Those are most likely algorithms such as high-frequency traders and fundamental causes. Ignoring their role in the price action of $GME and other tickers is naive, but portraying it as the only factor is more so.

It is understandable why the $GME narrative could spread so quickly. It satisfies all parties involved — the story of the WSB boogeyman allows hedge funds to feel victimized and WSB members to feel powerful, and to observers seems believable and exciting.

WSB mentions of the iShares Silver Trust and its adjusted close in the past 8 months or so.

Alternative to our secondary basket is another recent “short squeeze,” that of silver. Despite the various obstacles with a silver “short squeeze” (most institutions are long on silver), there was a noticeable rise in the price of $SLV and related assets on February 1, 2021. When considering our secondary basket, the WSB mentions of $SLV appear extremely platykurtic — there are nearly zero mentions for $SLV in a six-month interval when excluding the trading days near February 1.

The CCF between WSB mentions of $SLV and its share price, and between WSB mentions of $SLV and its volume, over the past 15 months.

Applying $SLV signals to CCF, we see a similar story to our secondary basket. The relationship between WSB mentions and adjusted close is weakening as t progresses, and mentions are reactionary to price in the same pattern as $GME. However, r peaks at approximately 0.34 at t = {-13, -6}, while peaking at 0.54 for $GME — in fact, the relationship between mentions and adjusted close is much weaker with $SLV than with our secondary basket.

The function between mentions and volume is leptokurtic against the CCFs of mentions and volume in the secondary basket. Furthermore, we see a sustained relationship that is statistically significant from t = [-11, 15], suggesting volume and mentions are tied to one another (though very weakly) even over long intervals.

The disparities between CCFs of $SLV and our secondary basket could be attributed to differences in sentiment as their “short squeeze(s)” progressed. As determined by Reddit’s search algorithm, the most relevant post on WSB when searching $GME is a “GME YOLO [‘You Only Live Once’] Update” of an investor with $30.938m of exposure to GameStop shares, while the most relevant post when searching $SLV proclaims that “SLV is a complete scam, its [sic] a scalp trade set up by banks to screw over investors.”

Future research: Though WSB’s relationship to price action of these “short squeezes” and ticker events have shown to be reactionary, our lag intervals are long considering how quickly social media can move from topic to topic; in fact, it is an anomaly that WSB has stayed engaged with the events of tickers in our secondary basket for months. Seeing that these events are still ongoing, when the volume and attention that these tickers are receiving has settled, revisiting CCF could produce different results.

Observing CCF and other transformations under smaller time intervals could show higher correlations in the short term. In this regard, applying wavelet transform to more granular data can identify the time interval where the correlation between WSB and ticker signals is highest, among other results. We could also discover the predictive power of WSB mentions in some instances. Furthermore, considering more WSB-specific information such as “megathreads” could impact how WSB’s involvement with tickers is measured when calculating predictive power.

Authored by: Diego Luca Gonzalez Gauss, Matvey Borodin, Phillip Forman, Robert Rogers, Benjamin Vyshedskiy, Liam Sheldon, Beatrix Metral, Sofia Noyes

References:

https://www.bloomberg.com/news/features/2021-02-04/gamestop-gme-how-wallstreetbets-and-robinhood-created-bonkers-stock-market https://www.bloomberg.com/news/articles/2021-01-25/how-wallstreetbets-pushed-gamestop-shares-to-the-moon

https://www.bloomberg.com/opinion/articles/2021-02-02/wall-street-didn-t-see-reddit-s-wallstreetbets-coming-for-gamestop-gme

https://www.reddit.com/r/wallstreetbets/comments/msblc3/gme_yolo_update_apr_16_2021_final_update/

https://www.reddit.com/r/wallstreetbets/comments/mbx510/slv_is_a_complete_scam_its_a_scalp_trade_set_up/

--

--

Norfolk Project

Norfolk Project is a student research group using data science & abridged fields to try and solve real-world problems. Contact us at: contact@norfolkproject.com