Skip to content

Personal portfolio predictor

Abstract

This paper details a semester project in the field of Personalized Machine Learning (PML), focusing on the development of a simulated investor model for financial market analysis. The project utilizes publicly available data from the SEC EDGAR Database and financial data APIs to emulate the portfolio management tactics of institutional investors. By examining Form 13F filings, we gather insights into the historical stock holdings and investment behaviors of these investors. Our approach combines this historical investment data with current market trends to train an autoencoder neural network, specifically utilizing the EASE network.

The primary goal of this project is to apply PML techniques to create a model that predicts how institutional investors might alter their portfolios in response to changing market conditions. The model aims to predict stock buys, sells, and holds, attempting to mirror real-world investment decisions. We assess our model using basic performance metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and the F1 Score. These evaluations provide a preliminary understanding of the model's capabilities and highlight areas for future improvement.

This project contributes to the academic exploration of PML in the context of financial markets. It represents an initial step towards creating more nuanced and personalized predictive models in finance, offering a foundation for further research and development in this area.

Introduction

In this semester project, we delve into the application of Personalized Machine Learning (PML) in the financial sector by creating a simulated investor model. The model aims to emulate the investment strategies of institutional investors, such as banks and mutual funds, whose decisions significantly influence market dynamics. By leveraging historical stock holdings data from Form 13F filings, available through the SEC EDGAR Database, and integrating it with current market data from financial APIs, we set out to understand and predict changes in these investors' portfolios.

The goal is to develop a model that predicts buying, selling, and holding actions of institutional investors, providing insights into their decision-making processes.

Methodology

Data Collection

The data collection for this project involved two main sources. First, we used the SEC EDGAR Database to gather historical Form 13F filings. These filings are valuable as they offer insights into the stock holdings of institutional investors, helping us understand their past investment behaviors. The database provided detailed information about institutional investors' stock portfolios, including the types of stocks held, the amount of each, and the dates of these holdings.

The second part of our data collection involved using Financial Data APIs, such as Yahoo Finance or Alpha Vantage. The purpose of using these APIs was to obtain current and historical stock market data. This data is important for adding context to the historical investment patterns we observed from the SEC filings. Through these APIs, we accessed various data points including stock prices, market capitalization, trading volumes, and other market indicators. This information helped us better understand how current market trends might relate to the investment decisions made by institutional investors.

Data Processing

The data processing phase of the project involved several steps to prepare the data for analysis and modeling:

  1. Quarter Date Calculation:
  2. A function, get_quarter_dates, was created to identify the start and end dates of a quarter in a given year. This function is helpful for breaking down the data into quarterly segments, aligning with the quarterly nature of Form 13F filings.

  3. Market Metrics Calculation:

  4. The calculate_market_metrics function was developed to extract various market metrics from the stock data. These metrics include the end-of-quarter closing price, quarterly return, volatility, average volume, total volume, high and low prices, mean and median closing prices, price change, percentage change, and the 50-day moving average.

  5. Financial Metrics for Stock Range:

  6. The function get_financial_metrics_for_range gathers and calculates financial metrics for specific stocks over selected quarters and years. It processes each stock, retrieves data for the chosen timeframe, and computes the market metrics for each quarter, resulting in a DataFrame that combines all these metrics.

  7. Data Integration and Cleaning:

  8. Stock information was integrated with the financial metrics by reading stock data from a CSV file and filtering it to match the stocks in the financial metrics data. Columns that were not needed were removed to make the dataset more manageable.
  9. The create_all_quarter_pivots function was then used to organize the data into pivot tables, which are helpful for analysis and modeling. This step includes calculating the changes in shares held by companies from one quarter to the next.

  10. Pivot and Change Calculation:

  11. Using the create_pivot function, the data for each quarter was rearranged into pivot tables, which align the shares held by companies with each stock symbol. This setup allowed for calculating the changes in holdings between consecutive quarters.

  12. Final Data Preparation:

  13. Lastly, the create_all_quarter_pivots function was utilized to process all unique year and quarter combinations in the dataset. It generated pivot tables for each pair of consecutive quarters and calculated the changes in stock holdings, thus preparing a detailed dataset for the next phase of modeling and analysis.

Model Development

In the project, we constructed a basic autoencoder neural network aimed at predicting the buying or selling actions of institutional investors. The model's architecture was tailored to process both investor and stock information, with an input dimension designed to capture these varied data types. The output dimension was focused on yielding binary decisions: whether to buy or sell.

A key aspect of this model was its use of a masked binary cross-entropy loss function. This choice enabled the model to concentrate on significant trading actions while disregarding periods of inactivity, aligning the model's output closely with the practical aspects of institutional trading behavior. This approach facilitated a more targeted analysis of investor decision-making processes in the stock market.

Results

Our model's performance was evaluated using several key metrics, each providing insights into different aspects of its predictive accuracy:

  • Mean Absolute Error (MAE): 0.4985

  • Mean Squared Error (MSE): 0.4985

  • F1 Score: 0.0950

Interpretation

  • These metrics indicate that while the model is capable of making predictions, there is significant room for improvement, especially in its ability to accurately classify buy/sell decisions. The relatively high error metrics (MAE and MSE) and low F1 Score point towards challenges in capturing the complex patterns inherent in stock market behavior and investor decision-making.

Future Work

  • Future iterations of the project will focus on model refinement, possibly exploring more complex neural network architectures or additional feature engineering to enhance prediction accuracy. Further tuning of the model parameters and training process could also address the current limitations.