Introduction/Background

Strong evidence suggests that stock prices are predictable and tend to drift in the months following public news about the company (Chan, 2003). The sentiment of news articles about companies significantly affects the direction of their stock prices (Jariwala et al., 2020). Articles can also affect the volume of hourly trades in the market (Jain, 1988). However, many studies predicting stock price based on news articles often focus on news articles and headlines pertaining to that specific stock symbol, rather than general public news. Our team is interested in analyzing how stock prices are affected by the news stories from the past few days. Our study plans to be unique by analyzing immediate rises and falls instead of long-term drifts in stock prices.

Problem Definition

This machine learning project will focus on analyzing news stories to predict the movement of Bitcoin (BTC) stock and Apple (AAPL) stock prices in the market. It will involve vectorizing news articles and observing how they correlate with the movement of volatile BTC and stable AAPL to produce a model capable of predicting the rise and fall of their prices, given news articles from the last few days. This model will not be looking exclusively at financial articles, instead of focusing on news as a whole to use for predictions. Our project could contribute to existing quantitative models, which hedge funds and large companies use to decide on investments, by adding distinctly non-financial information into stock analysis.

Methods

The vectorization of the news story involves generating document vectors from the text content using a transformer-based model such as BERT. The news stories could be obtained from any news source such as NYT Developer API. By fine-tuning the model first on a static news Dataset, it can be used to make document-level vectors. For a model like BERT, this captures the differences and similarities between documents using 512 abstract parameters. Stock prices and charts are also publically available which makes analysis easy. For raw or real-time data, we can use various API providers directly or utilize python packages including yfinance and Historic-Crypto.

The next task will be to predict rises and falls of a stock given the vector representations of news stories. Methods such as neural networks, decision trees, or even Bayesian learning can be used to make the vector to rise/fall correlation. Depending on the detail in which the prediction model is made, the content published in news stories can be used to predict the rise or fall of the stock prices, and even predict percent change.

      
  !pip install yfinance
  import yfinance as yf
  appl_history = yf.download('AAPL',start = '2021-05-01', end = '2021-06-01')
  appl_history.head
      
    
      
  !pip install Historic-Crypto
  from Historic_Crypto import HistoricalData
  btc_usd_history = HistoricalData('BTC-USD', 300, '2021-05-01-00-00', '2021-06-01-00-00').retrieve_data()
  btc_usd_history.head
        
    

Potential Results and Discussion

By using the BERT transformer and neural networks to find correlation between stock prices and the content of news stories, we can advance the power of quantitative trading. The most likely result from our project would be a classification of news stories based on categories associated with either the rise or the fall of the stock price. In addition, this method of dimensionality reduction of news articles can have several significant applications outside the stock market.

References