In a previous paper we found a mathematical formula for doing long-term stock forecasting. The formula was derived from the definition of annualized return and separated the stock-return into 3 components: Dividends, change in the Sales Per Share, and change in the P/Sales ratio. If you can predict these 3 components, then you can predict the future stock-return.
This makes intuitive sense, because if you buy a stock and hold it for some years, then you get dividends during those years, and the change in share-price can be decomposed into the change in Sales Per Share and the change in P/Sales ratios using this simple identity:
So the change in share-price is equal to the change in Sales Per Share multiplied by the change in P/Sales ratio:
Note that we could also use the change in Earnings Per Share and P/E ratios instead, but the Earnings (aka. Net Income) can be more unstable than Sales, because of temporary fluctuations in profit margins, as well as non-cash and/or non-recurring gains and losses. That is why we will use the P/Sales ratio here.
This paper is a basic statistical study of how to predict the future Sales Growth, and a previous paper studied how to predict the future P/Sales ratio.
%matplotlib inline
import matplotlib.pyplot as plt
from IPython.display import display_jpeg
import pandas as pd
import numpy as np
import seaborn as sns
from scipy.stats import linregress
import statsmodels.api as sm
# SimFin imports.
import simfin as sf
from simfin.names import *
# Version of the SimFin Python API.
sf.__version__
# SimFin data-directory.
sf.set_data_dir('~/simfin_data/')
# SimFin load API key or use free data.
sf.load_api_key(path='~/simfin_api_key.txt', default_key='free')
# Seaborn set plotting style.
sns.set_style("whitegrid")
We use SimFin to easily load and process financial data with the following settings:
hub_args = \
{
# We are interested in the US stock-market.
'market': 'us',
# Use last-known values to fill in missing values.
'fill_method': 'ffill',
# Refresh the fundamental datasets (Income Statements etc.)
# every 30 days.
'refresh_days': 30,
# Refresh the dataset with shareprices every 10 days.
'refresh_days_shareprices': 10
}
We can then create a StockHub
object to handle all the data and signal processing:
%%time
hub = sf.StockHub(**hub_args)
%%time
# Calculate Growth Signals.
# We set variant='quarterly' to get 4 data-points per year,
# but the data used to calculate the growth signals is TTM.
df_growth_signals = hub.growth_signals(variant='quarterly')
# Calculate Financial Signals. Also 4 data-points per year.
df_fin_signals = hub.fin_signals(variant='quarterly')
# Calculate the 1-year change in the Financial Signals.
df_fin_signals_chg = hub.fin_signals(variant='quarterly',
func=sf.rel_change_ttm_1y)
# Rename columns for the 1-year changes.
def rename_chg(s):
return s + ' (1Y Change)'
df_fin_signals_chg.rename(mapper=rename_chg, axis='columns', inplace=True)
Let us now create a new Pandas DataFrame for the Sales Growth signals that are not calculated by SimFin's built-in functions above. The new signals are:
SALES_GROWTH_1Y_FUTURE
: The Sales Growth 1 year into the FUTURE.SALES_GROWTH_3Y_PAST
: The average Sales Growth for the PAST 3 years.SALES_GROWTH_3Y_FUTURE
: The average Sales Growth for the FUTURE 3 years.These signals will be calculated using SimFin's functions sf.rel_change
and sf.mean_log_change
. See the SimFin documentation here and here for a detailed explanation of what these functions are calculating, as it would be too lengthy to explain it here.
%%time
# Create a new DataFrame to hold the Sales Growth signals.
# This will have the same index as the other DataFrame w. signals.
df_sales_growth = pd.DataFrame(index=df_growth_signals.index)
# Load the Income Statements TTM data and get the Revenue / Sales.
df_income_ttm = hub.load_income(variant='ttm')
df_sales = df_income_ttm[REVENUE]
# Calculate the FUTURE 1-year Sales Growth.
SALES_GROWTH_1Y_FUTURE = 'Sales Growth 1Y FUTURE'
df_sales_growth[SALES_GROWTH_1Y_FUTURE] = \
sf.rel_change(df=df_sales, freq='ttm', years=1, future=True)
# Calculate the PAST 3-year average Sales Growth.
SALES_GROWTH_3Y_PAST = 'Sales Growth 3Y Avg. PAST'
df_sales_growth[SALES_GROWTH_3Y_PAST] = \
sf.mean_log_change(df=df_sales, freq='ttm', future=False,
min_years=1, max_years=3, annualized=True)
# Calculate the FUTURE 3-year average Sales Growth.
SALES_GROWTH_3Y_FUTURE = 'Sales Growth 3Y Avg. FUTURE'
df_sales_growth[SALES_GROWTH_3Y_FUTURE] = \
sf.mean_log_change(df=df_sales, freq='ttm', future=True,
min_years=1, max_years=3, annualized=True)
# Combine all the signals we have calculated.
dfs = [df_growth_signals, df_fin_signals, df_fin_signals_chg,
df_sales_growth]
df_signals = pd.concat(dfs, axis=1)
# Remove outliers using "Winsorization".
# The outliers are removed and not "clipped" because it distorts
# correlation-measures and line-fittings.
# We can also exclude some columns from the Winsorization.
exclude_columns = [LOG_REVENUE]
df_signals = sf.winsorize(df=df_signals, quantile=0.03, clip=False,
exclude_columns=exclude_columns)
These are the resulting signals for ticker MSFT:
df_signals.loc['MSFT'].dropna(how='all').tail()
def data_years(df):
"""
Calculate the number of years of data in DataFrame `df`.
:param df:
Pandas DataFrame assumed to have TTM data and be
grouped by TICKER, and not have any empty NaN rows.
:return:
Pandas Series with number of years for each TICKER.
"""
# Count the number of data-points for each ticker.
df_len_data = df.groupby(TICKER).apply(lambda df_grp: len(df_grp))
# Calculate the number of years of data for each ticker.
# TTM data has 4 data-points per year.
df_data_years = df_len_data / 4
return df_data_years
On average we have nealy 6 years of Sales Growth data for all these stocks:
# Calculate number of years of Sales Growth data for all stocks.
df_data_years = data_years(df=df_signals[SALES_GROWTH].dropna())
# Show statistics.
df_data_years.describe()
We can also plot a histogram, so we can see the distribution of how many years of Sales Growth data we have for all the individual companies:
df_data_years.plot(kind='hist', bins=50);
In some of the plots further below, we will compare e.g. the past year's Sales Growth to the 3-year FUTURE average Sales Growth. The summary statistics below shows that on average there is about 3.5 years of such data per company, for a total of nearly 1400 companies:
columns = [SALES_GROWTH, SALES_GROWTH_3Y_FUTURE]
df = df_signals[columns].dropna(how='any')
data_years(df=df).describe()
Further below, we will also compare the PAST 3-year average Sales Growth to the FUTURE 3-year average Sales Growth. The statistics below show that there was only about 2.5 years of data-points per company, for a total of about 1100 companies:
columns = [SALES_GROWTH_3Y_PAST, SALES_GROWTH_3Y_FUTURE]
df = df_signals[columns].dropna(how='any')
data_years(df=df).describe()
This is a fairly short data-period, because the SimFin database currently does not have any more data. What this means for our analysis, is that we should interpret the results with some caution, as the data may contain trends that are unique for that period in time.
def plot_scatter(df, x, y, hue=None, num_samples=5000):
"""
Make a scatter-plot using a random sub-sample of the data.
:param df:
Pandas DataFrame with columns named `x`, `y` and `hue`.
:param x:
String with column-name for the x-axis.
:param y:
String with column-name for the y-axis.
:param hue:
Either None or string with column-name for the hue.
:param num_samples:
Int with number of random samples for the scatter-plot.
:return:
matplotlib Axes object
"""
# Select the relevant columns from the DataFrame.
if hue is None:
df = df[[x, y]].dropna()
else:
df = df[[x, y, hue]].dropna()
# Only plot a random sample of the data-points?
if num_samples is not None and len(df) > num_samples:
idx = np.random.randint(len(df), size=num_samples)
df = df.iloc[idx]
# Ensure the plotting area is a square.
plt.figure(figsize=(5,5))
# Make the scatter-plot.
ax = sns.scatterplot(x=x, y=y, hue=hue, s=20,
data=df.reset_index())
# Move legend for the hue.
if hue is not None:
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), ncol=1)
return ax
def plot_scatter_fit(df, x, y, num_samples=5000):
"""
Make a scatter-plot and fit a line through the points.
If there are many data-points, you can use a random
sample for the scatter-plot, but the linear formula
is still found using all the data-points.
:param df:
Pandas DataFrame with columns named `x` and `y`.
:param x:
String with column-name for the x-axis.
:param y:
String with column-name for the y-axis.
:param num_samples:
Int with number of random samples for the scatter-plot.
:return:
matplotlib Axes object
"""
# Select the relevant columns from the DataFrame.
df = df[[x, y]].dropna(how='any').reset_index()
# Fit a line through all the data-points and get stats.
slope, intercept, r_value, p_value, std_err = \
linregress(x=df[x], y=df[y])
# Show the fitted line and its stats.
msg = 'y = {0:.2f} * x + {1:.2f} (R^2={2:.2f}, p={3:.0e})'
msg = msg.format(slope, intercept, r_value**2, p_value)
print(msg)
# Only plot a random sample of the data-points?
if num_samples is not None and len(df) > num_samples:
idx = np.random.randint(len(df), size=num_samples)
df = df.iloc[idx]
# Make the scatter-plot with a fitted line.
# This uses the smaller sample of data-points.
ax = sns.jointplot(x=x, y=y, kind='reg', data=df,
line_kws={'color': 'red'},
scatter_kws={'s': 2})
return ax
def regression(df, y, standardize=True, use_constant=True):
"""
Perform multiple linear-regression on the given DataFrame.
:param df:
Pandas DataFrame with signals and returns.
:param y:
String with column-name for the dependent variable.
This will be taken from the DataFrame `df`.
:param standardize:
Boolean whether to standardize the predictor variables
so they have 0 mean and 1 standard deviation.
:param use_constant:
Boolean whether to add a 'Constant' column to
find the bias.
:return:
StatsModels Regression Results.
"""
# Remove rows with missing values.
df = df.dropna(how='any').copy()
# DataFrame for the x-signals.
df_x = df.drop(columns=[y])
# DataFrame for the y-signal.
df_y = df[y]
# Standardize the signals so they have mean 0 and std 1.
if standardize:
df_x = (df_x - df_x.mean()) / df_x.std()
# Add a "constant" column so the regression can find the bias.
if use_constant:
df_x['Constant'] = 1.0
# Perform the regression on this data.
model = sm.OLS(df_y, df_x).fit()
return model
%%time
df_corr = df_signals.corr()
# New column names.
SIGNALS_1Y = '1-Year Sales Growth'
SIGNALS_3Y = '3-Year Avg. Sales Growth'
# Create a new DataFrame with the correlations.
data = \
{
SIGNALS_1Y: df_corr[SALES_GROWTH_1Y_FUTURE],
SIGNALS_3Y: df_corr[SALES_GROWTH_3Y_FUTURE]
}
df = pd.DataFrame(data=data)
We can then show the correlations between the various signals and the FUTURE 1-year Sales Growth and the FUTURE 3-year average Sales Growth.
A correlation coefficient of 1 means the correlation is perfect so the two variables always move together, while a correlation of 0 means there is no linear relation between the two variables, and a correlation of -1 means the two variables always move perfectly opposite to each other.
We will show the absolute correlation values, because we are only concerned about the strength of the correlation and not its direction here.
df.abs().sort_values(by=SIGNALS_1Y, ascending=False)
Let us try and fit a Linear Regression Model to some of the signals with highest correlation. The reason we don't fit the regression model to all the signals, is that alot of them contain NaN (Not-a-Number), which results in the entire rows with NaN being removed, so the dataset becomes much smaller, and the correlation numbers already show that many of the signals are not linearly related to the FUTURE Sales Growth anyway.
The regression model has which is fairly weak. Because the data is standardized to having zero mean and one standard deviation before the regression model is fitted, the coefficients show us which signals are most important in predicting the FUTURE 1-year Sales Growth, and that is by far the PAST Year-Over-Year (YOY) Sales Growth, followed by the PAST YOY and Quarter-Over-Quarter (QOQ) Assets Growth.
columns = [SALES_GROWTH_1Y_FUTURE,
SALES_GROWTH_YOY, SALES_GROWTH_3Y_PAST,
ASSETS_GROWTH_YOY, ASSETS_GROWTH_QOQ,
ACQ_ASSETS_RATIO, LOG_REVENUE,
PAYOUT_BUYBACK_RATIO, ROE]
model = regression(df=df_signals[columns],
y=SALES_GROWTH_1Y_FUTURE, standardize=True)
model.summary()
Let us now study whether the PAST Sales Growth can be used to predict the FUTURE Sales Growth.
First consider the PAST Year-Over-Year (YOY) Sales Growth, which compares the quarterly sales to the same quarter in the previous year. This was found above to have the strongest correlation with the FUTURE 1-year Sales Growth.
The fitted line in this scatter-plot has . Note that the plot only contains 5000 data-points that are randomly sampled from the full dataset of more than 30,000 points. But the linear formula is fitted to the entire dataset.
Each dot in the scatter-plot shows a single data-point, that is, the x-axis is the YOY Sales Growth for a single company and one of its quarterly financial reports. The y-axis shows the TTM Sales Growth starting in the same quarter and going 1 year into the future.
plot_scatter_fit(df=df_signals,
x=SALES_GROWTH_YOY,
y=SALES_GROWTH_1Y_FUTURE);
Now let us compare the PAST 3-year average Sales Growth to the FUTURE 1-year Sales Growth. The slope of the fitted line is the same as in the plot above which used the YOY Sales Growth as the predictor on the x-axis, but now the fitted line has a much lower because the data-points are more dispersed. This means the PAST 3-year average Sales Growth is a worse predictor than the YOY Sales Growth.
plot_scatter_fit(df=df_signals,
x=SALES_GROWTH_3Y_PAST,
y=SALES_GROWTH_1Y_FUTURE);
Now consider the YOY Sales Growth versus the FUTURE 3-year average Sales Growth. The scatter-plot does seem to have a trend, but due to the many outliers, the fitted line has a low .
plot_scatter_fit(df=df_signals,
x=SALES_GROWTH_YOY,
y=SALES_GROWTH_3Y_FUTURE);
Finally we can try the PAST 3-year average Sales Growth versus the FUTURE 3-year average Sales Growth, which has a poor line-fit with .
plot_scatter_fit(df=df_signals,
x=SALES_GROWTH_3Y_PAST,
y=SALES_GROWTH_3Y_FUTURE);
plot_scatter_fit(df=df_signals,
x=ASSETS_GROWTH_YOY,
y=SALES_GROWTH_1Y_FUTURE);
The YOY Assets Growth has even less predicting power for the FUTURE 3-year average Sales Growth, where the fitted line now only has .
plot_scatter_fit(df=df_signals,
x=ASSETS_GROWTH_YOY,
y=SALES_GROWTH_3Y_FUTURE);
We can also consider the Quarter-Over-Quarter (QOQ) Assets Growth, but that has even weaker predicting power for the FUTURE 1-year Sales Growth with .
plot_scatter_fit(df=df_signals,
x=ASSETS_GROWTH_QOQ,
y=SALES_GROWTH_1Y_FUTURE);
Now we will consider how Acquisitions and Divestitures may affect the FUTURE Sales Growth. This is when a company buys another company, or it sells a part of its business to someone else. It makes sense if this affects the FUTURE Sales Growth. We will the following ratio between the Net Acquisitions and Total Assets as the predictor signal:
The scatter-plot shows that this ratio has a weak correlation with FUTURE 1-year Sales Growth.
plot_scatter_fit(df=df_signals,
x=ACQ_ASSETS_RATIO,
y=SALES_GROWTH_1Y_FUTURE);
We can also consider how this Acquisitions/Assets ratio may predict FUTURE 3-year average Sales Growth, which looks very similar:
plot_scatter_fit(df=df_signals,
x=ACQ_ASSETS_RATIO,
y=SALES_GROWTH_3Y_FUTURE);
Let us now try and zoom in on these scatter-plots by only considering the Acquisitions/Assets ratios that are above 0.1, that is, the Net Acquisitions are greater than 10% of the Total Assets. Although this plot has an close to zero because the data-points are so dispersed, the plot shows that higher Acquisitions/Assets ratios usually result in FUTURE 1-year Sales Growth between 0% and 40%, with some outliers.
# Select a subsection of the data.
mask = (df_signals[ACQ_ASSETS_RATIO] > 0.1)
df = df_signals.loc[mask]
plot_scatter_fit(df=df,
x=ACQ_ASSETS_RATIO,
y=SALES_GROWTH_1Y_FUTURE);
We can make the same plot for FUTURE 3-year average Sales Growth, which shows that higher Acquisitions/Assets ratios usually result in 3-year average Sales Growth between zero and 20% (these are annualized growth-rates), again with some outliers:
plot_scatter_fit(df=df,
x=ACQ_ASSETS_RATIO,
y=SALES_GROWTH_3Y_FUTURE);
Let us now consider a company's investment in Research & Development relative to its Revenue. This might indicate whether or not companies that spend more of their Revenue on R&D also have a greater tendency to grow their future sales. The ratio is simply defined as:
The scatter-plot has a lot of outliers:
plot_scatter_fit(df=df_signals,
x=RD_REVENUE,
y=SALES_GROWTH_3Y_FUTURE);
We can remove the outliers for the R&D/Revenue ratio, which indicates that more R&D has a tendency to result in higher FUTURE 3-year average Sales Growth, although with near zero because the data-points are so dispersed.
# Remove outliers.
mask = (df_signals[RD_REVENUE] < 0.6)
df = df_signals.loc[mask]
plot_scatter_fit(df=df,
x=RD_REVENUE,
y=SALES_GROWTH_3Y_FUTURE);
We can also make a scatter-plot where the dots are colored by the size of the company's Revenue, to see if the combination of the R&D / Revenue ratio and the size of the Revenue can help predict the FUTURE 3-year average Sales Growth. But as we can see, there does not appear to be any relation between these.
plot_scatter(df=df, hue=LOG_REVENUE,
x=RD_REVENUE,
y=SALES_GROWTH_3Y_FUTURE);
The Dividend Payout Ratio measures how much of the company's earnings are paid out as dividends to shareholders. We might expect that companies that spend less money on dividends, presumably spend more money on growing their business. We will use a slightly different formula for the Payout Ratio than people normally use, as we will use the Free Cash Flow instead of the reported Net Income:
The scatter-plot shows a tendency for higher Dividend Payout Ratios to result in lower 1-year Sales Growth, although the is nearly zero because the data-points are so dispersed.
plot_scatter_fit(df=df_signals,
x=PAYOUT_RATIO,
y=SALES_GROWTH_1Y_FUTURE);
In the scatter-plot above, there are many points where Dividends / FCF = 0 because many companies do not pay any dividends. If we remove those points, then we still get a weak tendency between higher Dividend Payout Ratios and lower Sales Growth, again with near zero.
# Remove data-points that are zero.
mask = (df_signals[PAYOUT_RATIO] != 0)
df = df_signals.loc[mask]
# Plot the result.
plot_scatter_fit(df=df,
x=PAYOUT_RATIO,
y=SALES_GROWTH_1Y_FUTURE);
It would make intuitive sense if the FUTURE Sales Growth would be related to both the Dividend Payout Ratio and the Return On Equity (ROE), because the Payout Ratio indicates how much of the earnings are retained by the company, and the ROE indicates how much the company makes on its retained earnings.
We can make a scatter-plot with the Dividend Payout Ratio on the x-axis, and the FUTURE 1-year Sales Growth on the y-axis as usual, but then we set the hue or color to be the ROE. As we can see from the scatter-plot below, the combination of Payout Ratio and ROE does not seem to improve the prediction of the FUTURE 1-year Sales Growth.
# Limit the ROE values to make it clearer.
df2 = sf.winsorize(df=df, columns=[ROE], clip=True)
plot_scatter(df=df2, hue=ROE,
x=PAYOUT_RATIO, y=SALES_GROWTH_1Y_FUTURE);
In the last few decades, share buybacks have overtaken dividends as the main form of "trying to" return capital to shareholders. I write "trying to" because share buybacks can be extremely destructive to shareholder value if the share-price is too high relative to the "intrinsic value" of the shares. I have made an elaborate theory on that subject here.
Regardless of whether or not the share buybacks will be good for long-term shareholder value, it is still a fact that it lowers the amount of cash that the company could otherwise spend on growing the business. So let us augment the Dividend Payout Ratio with the share buybacks. The new ratio is defined as:
The scatter-plot looks quite similar to the one above for the Dividend Payout Ratio, once again it has a weak downwards-trend but the is nearly zero because the data-points are so dispersed.
plot_scatter_fit(df=df_signals,
x=PAYOUT_BUYBACK_RATIO,
y=SALES_GROWTH_1Y_FUTURE);
Because the numerator contains both the Dividend Payout and Net Share Buyback, there are fewer data-points that are exactly zero. Nevertheless, let us try and remove those zero data-points and redo the scatter-plot. This also gives a weak downwards trend with nearly zero .
# Remove data-points that are zero.
mask = (df_signals[PAYOUT_BUYBACK_RATIO] != 0)
df = df_signals.loc[mask]
# Plot the result.
plot_scatter_fit(df=df,
x=PAYOUT_BUYBACK_RATIO,
y=SALES_GROWTH_1Y_FUTURE);
When a company buys an asset, the cash is paid immediately but its cost is subtracted from the operating income over a number of years, which is called Depreciation. A similar concept for intangible assets is known as Amortization. The SimFin database combines these two items into "Depreciation & Amortization" in the Cash-Flow Statement. It is important to understand that these are accounting-numbers and not cash-numbers.
The cash used to buy new assets is called Capital Expenditures (CapEx). The SimFin database lists this item as "Change in Fixed Assets & Intangibles" in the Cash-Flow Statement.
If the company has accurately estimated the Depreciation and Amortization costs, then a difference between those and the CapEx could theoretically be used to estimate whether the company is growing or shrinking its operating capabilities. For example, if the company is improving or expanding its factories more than regular maintenance, then that might indicate growth in future sales, and conversely, if the company is neglecting to maintain its factories, then that might indicate a decline in future sales.
Let us test this notion by considering the ratio between the CapEx and Depreciation & Amortization, defined as:
The scatter-plot below shows that there is a weak tendency for the FUTURE 1-year Sales Growth to increase as CapEx becomes larger than the Depreciation & Amortization. But the is nearly zero because the data-points are so dispersed.
plot_scatter_fit(df=df_signals,
x=CAPEX_DEPR_RATIO,
y=SALES_GROWTH_1Y_FUTURE);
We can also try and make a scatter-plot for the FUTURE 3-year average Sales Growth. This shows practically no predictive power of the CapEx / Depreciation ratio.
plot_scatter_fit(df=df_signals,
x=CAPEX_DEPR_RATIO,
y=SALES_GROWTH_3Y_FUTURE);
We can also try and consider whether the 1-year change in the CapEx / Depreciation ratio has any predictive power for the FUTURE 1-year Sales Growth. The scatter-plot and fitted line shows that there is practically no relation.
# Name of the data-column.
CAPEX_DEPR_RATIO_CHG = rename_chg(CAPEX_DEPR_RATIO)
plot_scatter_fit(df=df_signals,
x=CAPEX_DEPR_RATIO_CHG,
y=SALES_GROWTH_1Y_FUTURE);
There is a common belief that small-cap and mid-cap stocks tend to outperform large-cap stocks in the long run, because the smaller companies have higher sales-growth.
We can also test this notion by making a scatter-plot with the TTM Revenue on the x-axis and the FUTURE 1-year Sales Growth on the y-axis. We will actually use the Log10 of the Revenue, which basically counts the "number of zeros" in the Revenue, which makes the Revenue numbers on the x-axis more evenly spread out.
The scatter-plot shows that there is a weak tendency for companies with lower Revenue to have higher Sales Growth, but the data-points are so dispersed that the is nearly zero.
plot_scatter_fit(df=df_signals,
x=LOG_REVENUE,
y=SALES_GROWTH_1Y_FUTURE);
We can also make the scatter-plot for the FUTURE 3-year average Sales Growth, which looks very similar and only has a marginally better that is still close to zero.
plot_scatter_fit(df=df_signals,
x=LOG_REVENUE,
y=SALES_GROWTH_3Y_FUTURE);
Inventory Turnover is defined as the yearly Revenue divided by the Inventory, so it measures how many times per year the inventory is "turned over" by selling it to customers. Changes in a company's Inventory Turnover may indicate changes in demand for the company's products, which might predict future growth or decline in its sales.
We can test this notion by considering the change in the Inventory Turnover from the previous year, and see if that has any relation to the FUTURE 1-year Sales Growth. The following scatter-plot shows a weak tendency for this and the is zero.
# Name of the data-column.
INVENTORY_TURNOVER_CHG = rename_chg(INVENTORY_TURNOVER)
plot_scatter_fit(df=df_signals,
x=INVENTORY_TURNOVER_CHG,
y=SALES_GROWTH_1Y_FUTURE);
Companies with higher Gross Profit Margins often have some unique product or service. Let us test if this also means they have higher Sales Growth.
The following scatter-plot shows a weak tendency for companies with higher Gross Profit Margin to also have higher Sales Growth, but the data-points are so dispersed that the is nearly zero.
# Remove outliers in the data.
mask = (df_signals[GROSS_PROFIT_MARGIN] < 1.0)
df = df_signals.loc[mask]
plot_scatter_fit(df=df,
x=GROSS_PROFIT_MARGIN,
y=SALES_GROWTH_1Y_FUTURE);
If we use the FUTURE 3-year average Sales Growth, then the scatter-plot looks almost the same:
plot_scatter_fit(df=df,
x=GROSS_PROFIT_MARGIN,
y=SALES_GROWTH_3Y_FUTURE);
We might imagine that a significant change in Gross Profit Margin would reflect a change in demand for the company's product or services, which in turn might lead to changes in the company's future sales. We can test that notion with the following scatter-plot, which shows that there is no such relation.
# Name of the column.
GROSS_PROFIT_MARGIN_CHG = rename_chg(GROSS_PROFIT_MARGIN)
plot_scatter_fit(df=df_signals,
x=GROSS_PROFIT_MARGIN_CHG,
y=SALES_GROWTH_1Y_FUTURE);
A company's Interest Coverage measures how many times it can pay the interest on its debt from its Operating Income. Companies with low Interest Coverage risk bankruptcy in case of a period with poor business.
We might imagine that a company's Interest Coverage is related to its capabilities for growing future Sales. We can test this notion in the following scatter-plot, which shows that there is generally no relation between Interest Coverage and FUTURE 1-year Sales Growth.
plot_scatter_fit(df=df_signals,
x=INTEREST_COVERAGE,
y=SALES_GROWTH_1Y_FUTURE);
We can also test if a change in the Interest Coverage can predict the FUTURE 1-year Sales Growth, but once again, the scatter-plot below shows that there is no such relation.
# Name of the data-column.
INTEREST_COVERAGE_CHG = rename_chg(INTEREST_COVERAGE)
plot_scatter_fit(df=df_signals,
x=INTEREST_COVERAGE_CHG,
y=SALES_GROWTH_1Y_FUTURE);
Forecasting the Sales Growth is an essential component of long-term stock forecasting. This was a basic statistical study of which signals could be used to predict the FUTURE 1-year and 3-year average Sales Growth.
In summary, the signals we tested here, were all very weak predictors for the FUTURE 1-year and 3-year average Sales Growth. Perhaps they could be useful if multiple signals are combined, or when making a diversified portfolio of many stocks.
These are the results:
The strongest predictor for the FUTURE 1-year Sales Growth was the PAST Year-Over-Year (YOY) Sales Growth, which compares the quarterly sales to the same quarter in the previous year. The data-points were very dispersed so the line-fitting only had . When using the YOY Sales Growth to predict the FUTURE 3-year average Sales Growth the line-fitting was even worse with only .
The YOY Assets Growth predicted the FUTURE 1-year Sales Growth with , and it predicted the FUTURE 3-year average Sales Growth with .
Larger Net Acquisitions & Divestitures relative to the Total Assets had a positive correlation with FUTURE 1-year and 3-year average Sales Growth, although the data-points were so dispersed that the was nearly zero.
Companies whose Net Acquisitions were greater than 10% of the Total Assets mostly had positive Sales Growth between 0-20% per year on average for the following 3 years, with a few outliers beyond this range.
Larger Research & Development relative to the Revenue had a positive correlation with FUTURE 3-year average Sales Growth, but the data-points were so dispersed that the was nearly zero.
Larger Dividend Payouts and Share Buybacks relative to the Free Cash Flow had a negative correlation with the FUTURE 1-year Sales Growth, but the data-points were so dispersed that the was nearly zero.
Larger Capital Expenditures (CapEx) relative to the Depreciation & Amortization had a positive correlation with the FUTURE 1-year Sales Growth, but the data-points were so dispersed that the was zero.
Larger Revenue for a company had a negative correlation with FUTURE 1-year Sales Growth, but the data-points were so dispersed that the was zero.
Changes in Inventory Turnover compared to the previous year had a negative correlation with the FUTURE 1-year Sales Growth, but the data-points were so dispersed that the was zero.
Larger Gross Profit Margins had a positive correlation with FUTURE 1-year Sales Growth, but the data-points were so dispersed that the was nearly zero.
Interest Coverage had no correlation with FUTURE 1-year Sales Growth.
Also note that we used data for a fairly short period of time, which ended in April 2020 and only went back about 6 years on average for the individual stocks, with the max period being 11 years for one stock. Ideally this kind of study would be done with 20-30 years of data, in which case we might be able to find stronger predictors for Sales Growth over 5-10 year periods.