The rise of social media has given access to a huge volume of discussion on almost any topic imaginable. One such topic is company-related news with social media discussion acting as a proxy for public sentiment towards a given brand. We want to use discussion on social media to to try and predict performance of big companies in the stock market. We will try to use the sentiments shared towards companies on Twitter as an initial benchmark to compare against stock market movement. Then, we will look at investing communities on Reddit to see how discussions on another platform compare. If a correlation is found, we will have found a valuable source for companies to get instantaneous snapshots of how they are viewed by the public and how that image effects their monetary value.
# General data manipulation libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from datetime import datetime, timedelta
# API libraries/credentials
import config
import yfinance as yf
import tweepy
import praw
# NLP libraries
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import TweetTokenizer
nltk.download('punkt')
nltk.download('stopwords')
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
# ML libraries
from sklearn import linear_model as lm
import statsmodels
import statsmodels.api
This dataset contains financial tweets regarding stocks traded on NYSE, NASDAQ, and the S&P 500.
skiprows = [730, 2835, 3057, 3112, 3193, 3204, 3254, 3519, 4077, 4086, 4087, 4498]
tweet_df = pd.read_csv('./datasets/stockerbot-export.csv', skiprows=skiprows, parse_dates=['timestamp'])
# Drop last five rows since they seem to be in a different language
tweet_df = tweet_df[:-5]
tweet_df.head()
ax = tweet_df.company_names.value_counts().sort_values(ascending=False).head(10).plot.barh(figsize=(10, 6))
ax.set_title('Most Commonly Tweeted Stocks')
ax.set_xlabel('Number of Times Tweeted')
It is nice to see that this dataset has many tweets (around 100) for the most frequent companies. This will help us in determing the general sentiment around these companies over time to compare to stock prices. Many big names such as 21st Century Fox, Alphabet (parent company of Google), and Netflix also appear which are very established companies.
fig, (ax1, ax2) = plt.subplots(1, 2)
tweet_df.source.value_counts().head(10).plot.barh(ax=ax1, figsize=(15, 10))
ax1.set_title('Most Common Tweet Sources')
ax1.set_xlabel('Number of Times Tweeted')
tweet_df.source.value_counts().reset_index().plot(ax=ax2)
ax2.set_title('Distribution of Number of Tweets Per Source')
ax2.set_ylabel('Number of Tweets')
ax2.set_xlabel('Twitter Users')
This is also a nice feature of our data. The bar chart on the left worried me that our dataset was only going to have a few users that tweet a lot. This would create a lot of bias in any correlations we find. But if you look at the distribution graph on the right, it shows that there are only a few users who really tweet a lot; most users in this dataset only tweet a few times which gives us a better sample. Later we will try using the Twitter API to get an even broader collection of tweet sources.
ax = tweet_df.groupby([tweet_df.timestamp.dt.year, tweet_df.timestamp.dt.month])['id'].count().plot(kind='bar')
ax.set_title('Distribution of Times')
ax.set_xlabel('Year/Month')
ax.set_ylabel('Count')
The one unfortunate thing about this dataset is it primarily includes tweets from July 2018 and a few tweets it seems from February 2018. We will likely need to pull in more data using the Twitter API, but this dataset will provide a good starting place.
We were originally going to compare this Twitter dataset against some Kaggle stocks datasets. Given that our Twitter dataset only covers one month and the Kaggle datasets are very large, we are instead going to use the yfinance Python module to collect data for this specific month.
# Lets identify the top 30 stocks (based on frequency) from our Twitter dataset and use that with yfinance
common_tickers = tweet_df.symbols.value_counts().sort_values(ascending=False).head(30)
common_tickers.head()
# Queries Yahoo Finance API for these stocks on the month of July 2018
stocks_df = yf.download(tickers=list(common_tickers.index), start='2018-07-01', end='2018-07-31')
stocks_df.head()
# Only want Open, Close, and Volume. Drop the rest
stocks_df.drop(['Adj Close', 'Low', 'High'], axis=1, inplace=True)
# Columns are multi-indexed, so flatten this
stocks_df.columns = ['_'.join(multi_col) for multi_col in stocks_df.columns]
# Drop FOX, OMG, and ARNC cols since we weren't able to get data for them
stocks_df.drop(list(stocks_df.filter(regex='(FOX)|(OMG)|(ARNC)$')), axis=1, inplace=True)
stocks_df.head()
This is a lot easier to work with. The data is not exactly tidy since the tickers are included in column names, but this has every price lined up by the date which should make it easier to graph. When graphing, we will just use regex to select the columns corresponding to a specific ticker. Such a dataset would be tidy. Later during the Twitter API parsing, we transform this dataset into tidy data.
Now that we have some tweets and stock price information for the corresponding time period, we can begin to do some analysis. We will start by doing some NLP on the Tweets and then we can compare them to changes in stock price over time.
We will be using VADER (Valence Aware Dictionary and Sentiment Reasoner) to figure out the sentiment. This gives us a reliable way of determining sentiment without labeling our own data and generating models. This rule-based analysis tool was explicitly designed for social media data which is perfect for our scenario.
# First filter for Tweets talking about our common tickers
relevant_tickers = tweet_df[tweet_df.symbols.isin(common_tickers.index)].reset_index()
print(relevant_tickers.shape)
relevant_tickers.head()
tt = TweetTokenizer()
def clean_tweet(s):
# Removing punctuation and stop words (e.g 'the', 'is', 'which')
return ' '.join(wd for wd in tt.tokenize(s.lower()) if wd.isalnum() and wd not in set(stopwords.words('english')))
# Sample usage
print(relevant_tickers.text[1])
print(clean_tweet(relevant_tickers.text[1]))
# Now apply this cleanup to every tweet
relevant_tickers.text = relevant_tickers.text.apply(clean_tweet)
relevant_tickers.head()
sent = SentimentIntensityAnalyzer()
relevant_tickers['sentiment'] = relevant_tickers.text.apply(lambda s: sent.polarity_scores(s)['compound'])
# a compound sentiment value is positive if >= 0.05, negative if <= -0.05, neutral if in between
relevant_tickers.head()
So now each tweet has a value ranging from -1 to 1 representing how positive/negative the sentiment of the tweet is. My current worry is that many of these tweets are factual in nature meaning the sentiment score will not be indicative of stock performance. Later we will try to look at other things such as random Tweets from the last week and compare sentiment of those tweets to stock performance. For now let's take a look at the sentiment distribution.
# Violin plot to show sentiment distribution
ticker_subset = relevant_tickers[relevant_tickers.symbols.isin(['NFLX', 'IP', 'SQQQ', 'DFS', 'UPS'])]
ax = sns.violinplot(x='symbols', y='sentiment', data=ticker_subset)
ax.set_title('Company Sentiment Distribution July 2018')
ax.set_xlabel('Companies')
ax.set_ylabel('Sentiment')
Looking at the sentiment distributions, we see that these companies generally have a neutral sentiment score which will not be very interesting for finding correlations to stock prices. SQQQ in particular has a lot of neutral tweets which makes sense since it is an index fund and it is harder to have an opinion on a collection of stocks. Even with all the neutral tweets, this data still shows unique insights as to how people feel about these companies since each company has a slightly different distribution. It's also interesting that while the companies do have clusters of positive tweets, there are relatively few negative ones in the dataset
Now lets see how these sentiments over time correlate against stock performance
fig, axs = plt.subplots(2, 3)
syms = relevant_tickers.symbols
fig.tight_layout(pad=0.25)
# Graph Sentiment vs Time for 3 companies
axs[0, 0].set_ylim([-0.8, 0.8]); axs[0, 1].set_ylim([-0.8, 0.8]); axs[0, 2].set_ylim([-0.8, 0.8])
sentiment_kwargs = {'x': 'timestamp', 'y': 'sentiment', 'figsize': (20, 16), 'ylabel': 'Sentiment', 'xlabel': 'Timestamp'}
relevant_tickers[syms == 'NFLX'].plot.line(ax=axs[0, 0], title='Netflix Sentiment', **sentiment_kwargs)
relevant_tickers[syms == 'MOMO'].plot.line(ax=axs[0, 1], title='Momo Sentiment', **sentiment_kwargs)
relevant_tickers[syms == 'HON'].plot.line(ax=axs[0, 2], title='Honeywell Intl Sentiment', **sentiment_kwargs)
# Graph stock performance vs Time for 3 companies
price_kwargs = {'xlabel': 'Timestamp', 'ylabel': 'Close Prices', 'use_index': True}
stocks_df.plot.line(y='Close_NFLX', ax=axs[1, 0], title='Netflix Close Prices', **price_kwargs)
stocks_df.plot.line(y='Close_MOMO', ax=axs[1, 1], title='Momo Close Prices', **price_kwargs)
stocks_df.plot.line(y='Close_HON', ax=axs[1, 2], title='Honeywell Intl Close Prices', **price_kwargs)
We use close prices here since we want to allow time for the sentiments expressed in the tweets to impact and be reflected in the stock price for the company.
Unfortunately, it does not look like there is any real correlation between the sentiment and stock prices here. I hypothesize that the problem is just with our current Twitter dataset since these tweets do not contain significant sentiment, yet we are still tagging it with a sentiment value. Now let's move on to doing similar analysis but with tweets pulled in from the Twitter API to see if this remedies the issue.
The Kaggle Twitter dataset was interesting and provided a starting place for analysis. Since this data was already labeled with information such as the source and stocks that it mentioned, it was easy to compare against information pulled from the Yahoo Finance API. To try and find some more insight into indicators of stock performance we will use the Twitter API to get tweets that directly mention the companies we are looking at. Hopefully, this will produce more opinionated data which will make better use of our sentiment analysis
# Connect to the Twitter API using our dev account credentials
auth = tweepy.AppAuthHandler(config.api_key, config.api_secret)
api = tweepy.API(auth)
# This will be the datastructure we build our dataframe from later
search_res = [["Company", "Symbol", "Date", "Tweet"]]
# Get today's date and all the companies we will want to search for on Twitter
date = datetime.today()
company_handle = ["Apple", "exxonmobil", "Walmart", "cvspharmacy", "CapitalOne"]
companies = ["Apple", "Exxon", "Walmart", "CVS", "Capital One"]
tickers = ["AAPL", "XOM", "WMT", "CVS", "COF"]
The five companies chosen here (Apple, Exxon Mobil, Walmart, CVS, and Capital One) were selected because they are among the biggest in the US and thus should have a lot of discussion around them, giving us a larger pool of tweets to look at. They also represent a diverse variety of business sectors so we aren't exclusively looking at, for example, tech or finance companies but instead the business world at large.
for i in range(len(companies)):
days_ago = 6
while days_ago >= 0:
# Construct our Twitter query for this exact day and tweets that mention the company
query = "to:" + company_handle[i] + " since:" + (date - timedelta(days=days_ago)).strftime('%Y-%m-%d') + " until:" + (date - timedelta(days=days_ago-1)).strftime('%Y-%m-%d')
tweets = api.search(q=query, lang="en", count=50)
for twt in tweets:
# Add to our dataframe proxy
search_res.append([companies[i], tickers[i], (date - timedelta(days=days_ago)).strftime('%Y-%m-%d'), twt.text])
days_ago -= 1
# Showing a slice of these tweets to avoid those with curse words
search_res[4:7]
Due to restrictions placed on us by the Twitter API, we can only retrieve tweets from up to a week ago so for each company, we collect 50 tweets (again due to limits on the API) per day we have access to. The tweets are ones which directly mention the company's twitter account so they should be more strongly opinonated than the press-release-style tweets we had in the Kaggle dataset.
# Construct the new Tweet dataframe
tweet_df = pd.DataFrame(search_res)
tweet_df.columns = tweet_df.iloc[0] # The solumn names are in the first entry of search_res
tweet_df = tweet_df.iloc[1:] # Remove the names row
# Clean up the Tweets with NLP and add the Vader sentiment score
tweet_df["Tweet"] = tweet_df["Tweet"].apply(clean_tweet)
tweet_df["Sentiment"] = tweet_df["Tweet"].apply(lambda s: sent.polarity_scores(s)['compound'])
tweet_df[4:7]
At this point we have queried the Twitter API for tweets about relevant companies from the last week. We used our previous NLP code to strip the tweets of all the unimportant information, and then we tagged each tweet with a Vader sentiment score.
Now we will pull in stock price information using YFinance to line up with the date of these tweets.
# Download stock price info for these companies for the past week
stocks_df = yf.download(tickers=tickers, start=((tweet_df.iloc[0])["Date"]), end=(date + timedelta(days=1)).strftime('%Y-%m-%d'))
# Drop unimportant columns
stocks_df.drop(['Adj Close', 'Low', 'High', 'Open', 'Volume'], axis=1, inplace=True)
# Columns are multi-indexed, so flatten this
stocks_df.columns = ['_'.join(multi_col) for multi_col in stocks_df.columns]
stocks_df["Date"] = stocks_df.index
stocks_df.index = list(range(len(stocks_df.index))) # Was originally indexed by the date of the price
stocks_df.head()
Now we want to actually tidy our stocks dataset since we also want to line it up with our tweets. We will use pd.melt to do so. Then we can merge this dataframe with our Tweets dataframe to look at close prices corresponding to the date of tweets about that company.
# Tidy the stocks dataframe
stocks_df = pd.melt(stocks_df, id_vars=["Date"], value_vars=["Close_AAPL", "Close_COF", "Close_CVS", "Close_WMT", "Close_XOM"])
stocks_df.columns = ["Date", "Symbol", "Close"]
# Remove the "Close_" from what used to be column names
stocks_df["Symbol"] = stocks_df["Symbol"].str[6:]
# Write out the date as a string so we can merge on it
stocks_df["Date"] = stocks_df["Date"].dt.strftime('%Y-%m-%d')
# Merge the Tweet dataframe with the stocks dataframe
tweet_df = tweet_df.merge(stocks_df, how="left", on=["Date", "Symbol"])
tweet_df[4:7]
This dataframe now has a lot of interesting information. Each record in the table tells us about how people felt about a company on a certain date along with a numeric sentiment score and a stock price on that day.
Something interesting that we saw was that many tweets contained curse words and these tweets had a very low sentiment score which makes sense and gives us assurance that our sentiment tagging is working. We chose not to show these tweets since they contain explicatives.
# Violin plot to show sentiment distribution
ax = sns.violinplot(x='Company', y='Sentiment', data=tweet_df)
ax.set_title('Company Sentiment Distribution')
ax.set_xlabel('Companies')
ax.set_ylabel('Sentiment')
These distributions tell us that the sentiment for most tweets is still neutral, but there is a better distribution of sentiment on these tweets compared to our last dataset which is a nice indicator, and there are far more negative ones than before, giving us a fulller spectrum of opinions. Let's see how the sentiment of these tweets fares against our close price data.
fig, axs = plt.subplots(2, 3)
syms = tweet_df.Symbol
fig.tight_layout(pad=0.25)
# Graph Sentiment vs Time for 3 companies
axs[0, 0].set_ylim([-0.8, 0.8]); axs[0, 1].set_ylim([-0.8, 0.8]); axs[0, 2].set_ylim([-0.8, 0.8])
sentiment_kwargs = {'x': 'Date', 'y': 'Sentiment', 'figsize': (20, 16), 'ylabel': 'Sentiment', 'xlabel': 'Timestamp'}
tweet_df[syms == 'AAPL'].plot.line(ax=axs[0, 0], title='Apple Sentiment', **sentiment_kwargs)
tweet_df[syms == 'XOM'].plot.line(ax=axs[0, 1], title='Exxon Sentiment', **sentiment_kwargs)
tweet_df[syms == 'COF'].plot.line(ax=axs[0, 2], title='Capital One Sentiment', **sentiment_kwargs)
# Graph stock performance vs Time for 3 companies
price_kwargs = {'x': 'Date', 'y': 'Close', 'xlabel': 'Timestamp', 'ylabel': 'Close Prices'}
tweet_df[syms == 'AAPL'].plot.line(ax=axs[1, 0], title='Apple Close Prices', **price_kwargs)
tweet_df[syms == 'XOM'].plot.line(ax=axs[1, 1], title='Exxon Close Prices', **price_kwargs)
tweet_df[syms == 'COF'].plot.line(ax=axs[1, 2], title='Capital One Close Prices', **price_kwargs)
These sentiment graphs are definitely more extreme than before which is an improvement on our previous Twitter dataset since we have actual sentiment we can track. Since the sentiment is all over the place, we will need to normalize it.
We do not have stock market data on every desired date since the market is not open on weekends.
The Twitter API definitely performed better than the Kaggle Twitter dataset since these tweets are much more emotional in nature which makes the Vader sentiment tagging more valuable. We are limited by the fact that we can only look at 7 days of tweets which means we can't assess long-term trends.
Twitter sentiment was too noisy from before. Let's try aggregating it for each day and looking at those trends.
# For each company on each day, get the average of the sentiment
rel_cols = ['Symbol', 'Date', 'Sentiment', 'Close']
net_df = tweet_df[rel_cols].groupby(['Symbol', 'Date']).mean().reset_index()
net_df.head()
Let's visualize the sentiment and close prices now
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
sns.lineplot(x='Date', y='Sentiment', hue='Symbol', data=net_df, ax=ax1)
ax1.set(title='Net Sentiment per Day', xlabel='Date', ylabel='Sentiment')
sns.lineplot(x='Date', y='Close', hue='Symbol', data=net_df, ax=ax2)
ax2.set(title='Close Price per Day', xlabel='Date', ylabel='Close Price')
The net sentiment is much easier to look at now that it is aggregated for each day. The close prices are not very interesting since it is only over a week. With more Twitter data, this would likely yield more interesting results. Regardless, this serves as an interesting proof of concept.
Even though we are low on Twitter data, we thought it would be interesting to create linear regression models to try and predict close prices for each company solely based on sentiment. This is more for proof of concept since 7 days of data likely is not enough to make accurate predictions.
Let's train the Linear Regression models!
# Add each model to the dictionary, reg, after training
reg = {}
net_df.dropna(inplace=True)
for company in net_df.Symbol.unique():
comp = net_df.loc[net_df.Symbol == company, ['Sentiment', 'Close']]
reg[company] = lm.LinearRegression()
reg[company].fit(comp[['Sentiment']], comp['Close'])
Let's view the p-values of such linear models. We will have to use StatsModels since SKlearn does not show p-values.
for company in net_df.Symbol.unique():
res = statsmodels.formula.api.ols(formula="Close ~ Sentiment", data=(net_df[net_df.Symbol == company])).fit()
print(f'{company}:\n {dict(res.pvalues)}')
There was a lot of variability, but unfortunately most of our models had a p-value greater than .05, so we can't call the results significant. There were issues with our data not covering a wide enough timeframe and the close prices being relatively static so it's too soon to rule out a correlation entirely. We could probably do better if this analysis was repeated over a longer period with more room for change in the stock price.
Twitter has been commonly used by large companies to report to the general public on news. We will now look at the Reddit API to see if we can generate more interesting correlations to provide insight into stock performance.
# Connect to the Reddit API using our credentials
reddit = praw.Reddit(**config.reddit_creds)
We will come up with a list of tickers and search for posts about those tickers within the last month.
# Search r/investing for posts about our target stocks
subs = []
rel_tickers = ['AAPL', 'TSLA', 'WMT', 'CVS']
for ticker in rel_tickers:
for sub in reddit.subreddit('investing').search(ticker, time_filter='month', limit=50):
# Store general data from these submissions
subs.append([sub.title, sub.selftext, ticker, sub.score, sub.num_comments, datetime.fromtimestamp(sub.created)])
# Generate general dataframe
reddit_df = pd.DataFrame(subs, columns=['title', 'body', 'Symbol', 'reddit_score', 'num_comments', 'Date'])
reddit_df.Date = reddit_df.Date.dt.strftime('%Y-%m-%d')
reddit_df.head()
We will focus our NLP efforts on the body instead of title since that likely has more information than the title itself. Again, we will clean up the user text (using our clean_tweet function for simplicity) and then evaluate its sentiment.
# Drop irrelevant columns. May be interesting building a model on these other features in the future
reddit_df.drop(['title', 'reddit_score', 'num_comments'], axis=1, inplace=True)
# NLP for cleanup and sentiment score
reddit_df['body'] = reddit_df.body.apply(clean_tweet)
reddit_df['Sentiment'] = reddit_df.body.apply(lambda s: sent.polarity_scores(s)['compound'])
reddit_df.head()
Grab stock prices for relevant tickers. Only grab enough data for around a month since that is how we queried the Reddit posts.
today = datetime.today()
start = date - timedelta(days=31)
stocks_df = yf.download(tickers=rel_tickers, start=start.strftime('%Y-%m-%d'), end=today.strftime('%Y-%m-%d'))
# Clean up the same way we have before
stocks_df.drop(['Adj Close', 'Low', 'High'], axis=1, inplace=True)
stocks_df.columns = ['_'.join(multi_col) for multi_col in stocks_df.columns]
stocks_df['Date'] = stocks_df.index
stocks_df.index = list(range(len(stocks_df.index)))
stocks_df.head()
# Merge with the Reddit dataframe similar to how we did with the Twitter API dataframe
stocks_df = pd.melt(stocks_df, id_vars=['Date'], value_vars=[f'Close_{tkr}' for tkr in rel_tickers])
stocks_df.columns = ['Date', 'Symbol', 'Close']
stocks_df['Symbol'] = stocks_df['Symbol'].str[6:]
stocks_df['Date'] = stocks_df['Date'].dt.strftime('%Y-%m-%d')
reddit_df = reddit_df.merge(stocks_df, how='left', on=['Date', 'Symbol'])
reddit_df.head()
First, a violin plot of our sentiment distribution
ax = sns.violinplot(x='Symbol', y='Sentiment', data=reddit_df)
ax.set_title('Company Sentiment Distribution')
ax.set_xlabel('Companies')
ax.set_ylabel('Sentiment')
These results are quite interesting since our Twitter datasets have all had very neutral sentiments regarding companies, but r/investing has very high sentiment posts on average. CVS follows the same sentiment distribution from what we have seen in our Twitter datasets, but Apple, Tesla, and Walmart all have very high sentiment distributions centered around 1.
This likely means that Reddit talks about stocks when they are doing well so that people can invest. In our Twitter datasets, we have normal people talking about companies in their daily lives. And in such daily interaction, people may have mixed feeling about these companeis.
fig, axs = plt.subplots(2, 3)
syms = reddit_df.Symbol
fig.tight_layout(pad=0.25)
# Graph Sentiment vs Time for 3 companies
axs[0, 0].set_ylim([-0.8, 0.8]); axs[0, 1].set_ylim([-0.8, 0.8]); axs[0, 2].set_ylim([-0.8, 0.8])
sentiment_kwargs = {'x': 'Date', 'y': 'Sentiment', 'figsize': (20, 16), 'ylabel': 'Sentiment', 'xlabel': 'Timestamp'}
reddit_df[syms == 'AAPL'].plot.line(ax=axs[0, 0], title='Apple Sentiment', **sentiment_kwargs)
reddit_df[syms == 'TSLA'].plot.line(ax=axs[0, 1], title='Tesla Sentiment', **sentiment_kwargs)
reddit_df[syms == 'WMT'].plot.line(ax=axs[0, 2], title='Walmart Sentiment', **sentiment_kwargs)
# Graph stock performance vs Time for 3 companies
price_kwargs = {'x': 'Date', 'y': 'Close', 'xlabel': 'Timestamp', 'ylabel': 'Close Prices'}
reddit_df[syms == 'AAPL'].plot.line(ax=axs[1, 0], title='Apple Close Prices', **price_kwargs)
reddit_df[syms == 'TSLA'].plot.line(ax=axs[1, 1], title='Tesla Close Prices', **price_kwargs)
reddit_df[syms == 'WMT'].plot.line(ax=axs[1, 2], title='Walmart Close Prices', **price_kwargs)
These correlations still are not great, but definitely looks a lot more readable than what we were getting with Twitter. We may be able to get better results if we look at a larger timeframe such as a year instead of a month. We would do so here, but we may exhaust our Reddit API queries.
It would be interesting to repeat this entire process, but with larger Twitter and Reddit datasets. We were pretty limited by our APIs, but if we were to amass more data over time and save it locally, we could repeat our analysis. This may provide more insight into specific correlations or specific industries that are particularly sensitive to social media sentiment. It would also be interesting to see how well our Twitter Linear Regression models would perform in predicting stock prices on a rolling window. Similarly, it would be interesting to build Linear Regression models on the Reddit dataset since beyond just sentiment, we have information such as how popular the posts are and how many comments they have. We encourage our readers to continue this research as we believe we have provided a strong baseline for sentiment-driven market analysis.
In case our reader would like to do further work on this, there are a few academic papers we found that do more extensive work with sentiment-driven analysis.