Loading...

Company Cluster Analysis

Project Overview

1,613 companies were clustered using 20 years of balance sheet and income statement data collected from the EDGAR API (developer.edgar-online.com).

The K-Modes clustering algorithm was used to cluster all publicly traded companies in America (excluding delisted companies, companies lacking data, and companies with extreme outliers) based on three financial ratios. The clustering by the three ratios allowed for alike companies to be clustered and analyze future returns per cluster.

K-Modes was used because it separated the data best regarding specific thresholds. If the data met the threshold, it would be flagged as 1 and if not it would be flagged 0.

The analysis sought to prove whether the three financial ratios can quantitatively filter for valuable investments that provided satisfactory returns over time.

The three financial ratios:
  • EBIT (Earnings Before Intrest & Taxes)/Market Cap: Determines if the stock is cheap when comparing the price to shareholder earnings. Preferably 7% at minimum to cover the inevitable loss on inflation and the average risk-free government rate. Anything above 7% may indicate the company is selling at a cheap price and returning a high earnings yield. This has to be taken into consideration with other financial valuations (qualitative and quantitative) to avoid investing in a deeply troubled company.
  • EBIT/Tangible Assets: Determines the earnings power of a company's tangible assets. This can allow for future returns to the shareholders and/or if the company can expand future compounding wealth by retained earnings. This will allow for efficient capital allocation. Preferably the threshold is 7% at a minimum to cover the inevitable loss on inflation and the average risk-free government rate.
  • Total Debt/Tangible Assets: Determines how leveraged an underlying company is. This gives the investor a gauge of the company's staying power and ability to make it through unfavorable economic periods and events. Preferably the threshold is 50% at maximum.

Note - all ratios were calculated using the data from the year 2000 to simulate an investor analyzing companies for future returns from that starting point.
The analysis can be taken advantage of by any individual or entity looking to filter for valuable investments for capital allocation opportunity.

Methods Used

image
image
image

Results

Cluster Outcomes:

PCA:
image
Cluster Frequency:
image
  • 10 clusters were generated
Cluster Comparison:
image
  • The top clusters that fit the threshold requirements best, returned the higher compounded returns basis cap-weighted capital gains (clusters 1 and 7 - Value Index).
image
image
  • Clusters 1 and 7 were grouped and called Value Index for comparison purposes.
  • The Value Index outperformed the S&P 500 Index and the Wilshire 5000 Index in all three financial ratios, and in doing so out performed them in capital growth over 20 years.

Portfolio in year 2000:
image
Portfolio in year 2019:
image
  • By this analysis, it seems as if the three ratio combination has proven to be ideal when filtering through large quantities of stocks for invaluable investment opportunities with a longer holding period.
  • To improve the results, the analyst should take into consideration qualitative advantages for the various companies in the Value Index.

Note - The same companies in the value Index may not be suitable for stock picking into today's current environment due to the analysis takes into consideration purchase price and the companies economics may have deteriorated. The Analyst should run the algorithm again and revalue the companies.

Technologies

image
image
image
image
image