A factor model, also known as a factor-based asset pricing model, is a statistical (or econometric) model used in finance to explain the expected returns and risk of a portfolio or individual security by relating them to a set of underlying common risk factors.
The basic idea is that a security’s return is not driven purely by its unique, company-specific characteristics (idiosyncratic risk), but also by its exposure or sensitivity to broader, market-wide economic forces (systematic risk).
The general mathematical representation of a linear factor model for the return of an asset \(i\) at time \(t\) is:
\(R_{i,t}\): The return (often excess return) of asset \(i\) at time \(t\).
\(\alpha_i\): The intercept (or alpha), representing the average return not explained by the factors. In asset pricing theory, for a correctly specified model, this should be close to zero.
\(F_{j,t}\): The return of factor \(j\) at time \(t\). These are the systematic sources of risk. Famous examples include the market factor in the CAPM (Capital Asset Pricing Model), and the size and value factors in the Fama-French models, the focus of this tutorial.
\(\beta_{i,j}\): The factor loading or factor sensitivity of asset \(i\) to factor \(j\). This measures how much asset \(i\)’s return reacts to a one-unit change in factor \(j\).
\(\epsilon_{i,t}\): The idiosyncratic error or specific risk of asset \(i\) at time \(t\). This is the part of the return unique to the company and uncorrelated with the factors.
The theoretical backbone of factor models is that only systematic components of asset returns matter for pricing, while idiosyncratic risk can be diversified away in sufficiently large portfolios. In equilibrium-based models such as the CAPM, investors require compensation for bearing systematic risk, implying that expected returns are determined by an asset’s exposures to one or more common risk factors. From a no-arbitrage perspective, as formalized by (Ross 1976), if returns admit a factor structure and idiosyncratic risks are diversifiable, then the absence of arbitrage implies that expected asset returns must be linear functions of their sensitivities to these common factors, regardless of investor preferences. Together, these arguments provide the theoretical foundation for linear factor pricing models.
Factor models are generally categorized based on how the factors are determined (Connor 1995):
Connor, Gregory. 1995. “The Three Types of Factor Models: A Comparison of Their Explanatory Power.”Financial Analysts Journal 51 (3): 42–46. http://www.jstor.org/stable/4479845.
Macroeconomic Factor Models: Factors are observable macroeconomic variables, like GDP growth, inflation, or interest rates. See (Chen, Roll, and Ross 1986) for an example.
Fundamental Factor Models: Factors are attributes of the asset itself, such as market capitalization (Size), book-to-market ratio (Value), or industry classification. For example, the Fama-French models fall into this category.
Statistical Factor Models: Factors are derived from statistical techniques like Principal Component Analysis (PCA) on the historical return data of a large group of assets.
Chen, Nai-Fu, Richard Roll, and Stephen A. Ross. 1986. “Economic Forces and the Stock Market.”The Journal of Business 59 (3): 383–403. http://www.jstor.org/stable/2352710.
3 Fama-French 3-Factor Model
A widely-used and classic factor model is the Fama-French 3-Factor Model (FF3F)(Fama and French 1993). The FF3F model for an asset’s excess return (\(R_{i,t} - R_{f,t}\)) is: \[
R_{i,t} - R_{f,t} = \alpha_i + \beta_{i,MKT}(R_{M,t} - R_{f,t}) + \beta_{i,SMB} SMB_t + \beta_{i,HML} HML_t + \epsilon_{i,t}
\]
Fama, Eugene F., and Kenneth R. French. 1993. “Common Risk Factors in the Returns on Stocks and Bonds.”Journal of Financial Economics 33 (1): 3–56. https://doi.org/https://doi.org/10.1016/0304-405X(93)90023-5.
The factors are:
\(R_{M,t} - R_{f,t}\): Market Risk Premium (MKT). The excess return of the broad market \((R_{M,t})\) over the risk-free rate \((R_{f,t})\).
\(SMB_t\) (Small Minus Big): The Size Factor. The return of a portfolio of small-cap stocks minus the return of a portfolio of large-cap stocks.
\(HML_t\) (High Minus Low): The Value Factor. The return of a portfolio of high book-to-market (value) stocks minus the return of a portfolio of low book-to-market (growth) stocks.
The factor loadings (\(\beta_{i,MKT}, \beta_{i,SMB}, \beta_{i,HML}\)) measure the sensitivity of the asset’s returns to each of these factors. The intercept (\(\alpha_i\)) represents any abnormal return not explained by the factors.
3.1 An Extension of CAPM
The FF3F is considered a foundational extension of the CAPM because it builds upon the CAPM’s single risk factor by adding two empirically observed factors.
The CAPM is a single-factor model that posits that the expected return of an asset is solely determined by its sensitivity to market risk \((\beta_{MKT})\), which is measured relative to the market portfolio \((R_{M,t} - R_{f,t})\). It assumes that investors are compensated only for systematic risk (non-diversifiable market risk).
Fama and French introduced their model in 1993 to address empirical anomalies that the CAPM failed to explain, most notably:
The Size Effect: Small-cap stocks (low Market Equity) historically earned higher average returns than large-cap stocks.
The Value Effect: Value stocks (high Book-to-Market ratio) historically earned higher average returns than growth stocks (low Book-to-Market ratio).
By adding these two factors, the FF3F model is able to explain a significantly greater portion of the variation in diversified portfolio returns (often \(90\%+\)) compared to the CAPM (often \(\sim 70\%\)), leading to a more accurate model of expected returns and risk.
3.2 The Construction of the Three Factors
The Market Risk Premium factor is constructed as the value-weighted return of the broad market portfolio minus the risk-free rate. The value-weighted approach means that larger companies have a greater influence on the market return.
The SMB (Size) and HML (Value) factors are returns derived from zero-cost portfolios designed to isolate and measure the returns associated with specific risk factors. The portfolios are created by longing (buying) an asset group expected to benefit from the factor, and shorting (selling) an asset group expected to lose or not benefit from the factor, with an equal dollar amount for each side. The returns of the long side and the short side of the portfolio both generally move with the market. By taking equal and opposite positions, the market exposure of the long side is largely offset by the market exposure of the short side. This construction allows the factor to capture the excess return attributable to the specific risk factor, independent of overall market movements.
Factor
Concept
Construction (Long/Short)
MKT (Market Risk Premium)
Compensation for overall market risk.
Market Portfolio (value-weighted return of all eligible stocks) Minus Risk-Free Rate \((R_{M,t} - R_{f,t})\)
SMB (Small Minus Big)
Compensation for size risk (tendency of small stocks to outperform large stocks).
Long average return of small-cap portfolios Minus Short average return of large-cap portfolios.
HML (High Minus Low)
Compensation for value risk (tendency of value stocks to outperform growth stocks).
Long average return of high Book-to-Market (Value) portfolios Minus Short average return of low Book-to-Market (Growth) portfolios.
To be more specific, the process for constructing the SMB and HML factors involves a series of sorts on the universe of eligible stocks:
Form Portfolios: At the end of a given time period (e.g., June of year \(t\) for annual sorts), all eligible stocks are independently sorted into six portfolios based on two characteristics:
Size (Market Equity, ME): Stocks are split into two groups, Small (S) and Big (B), using the median ME.
Value (Book-to-Market Ratio, B/M): Stocks are split into three groups, Low (L), Neutral (N), and High (H), using the 30th and 70th percentiles of B/M.
This creates six portfolios: S/L, S/N, S/H, B/L, B/N, B/H.
Calculate Factor Returns: The monthly factor returns are calculated using the returns of these six portfolios:
SMB (Small Minus Big): Takes the average return of the three small portfolios and subtracts the average return of the three big portfolios. \[\text{SMB} = \underbrace{\frac{1}{3} (R_{S/L} + R_{S/N} + R_{S/H})}_{\text{Long (Small Stocks)}} - \underbrace{\frac{1}{3} (R_{B/L} + R_{B/N} + R_{B/H})}_{\text{Short (Big Stocks)}}\]
HML (High Minus Low): Takes the average return of the two high B/M portfolios and subtracts the average return of the two low B/M portfolios. \[\text{HML} = \underbrace{\frac{1}{2} (R_{S/H} + R_{B/H})}_{\text{Long (Value Stocks)}} - \underbrace{\frac{1}{2} (R_{S/L} + R_{B/L})}_{\text{Short (Growth Stocks)}}\]
This methodology ensures that the factors are orthogonal (or nearly so) to each other and represent distinct, non-diversifiable sources of risk that are independent of the broad market.
In practice, we typically use pre-calculated Fama-French factor returns available from Kenneth French’s data library, which provides factor returns for various regions at different frequencies.
3.3 Estimation (Time-Series Regression)
The estimation of the Fama-French 3-Factor Model parameters is typically done using time-series regression analysis. The steps are as follows:
Select Asset and Timeframe: Choose the asset (stock, mutual fund, etc.) and the historical period you want to analyze (e.g., monthly returns from 2010 to 2020).
Gather Data: Collect the following time-series data for the selected period:
The monthly return of the asset (\(R_{i,t}\)).
The monthly risk-free rate (\(R_{f,t}\)), typically T-bill rates.
The monthly returns for the three factors (\(MKT_t\), \(SMB_t\), \(HML_t\)). (These are often publicly available from databases like Kenneth French’s data library).
Run Regression: Perform an Ordinary Least Squares (OLS) multiple linear regression with the asset’s excess returns as the dependent variable and the three factor returns (\(MKT\), \(SMB\), \(HML\)) as the independent variables.
Analyze Results: Examine the regression output, focusing on:
Coefficients (\(\beta_{i,MKT}, \beta_{i,SMB}, \beta_{i,HML}\)): These indicate the asset’s sensitivity to each factor.
Intercept (\(\alpha_i\)): This indicates whether the asset has generated abnormal returns not explained by the factors.
R-squared: This indicates how well the model explains the variation in the asset’s returns.
4 Code
4.1 Example 1
We start by choosing an asset and a historical period for our analysis, and then we load and prepare the data for the FF3F model.
The Fama-French factor data can be downloaded manually from Kenneth French’s data library site. However, in our example, we use the pandas_datareader library to fetch the factor data directly 1.
1 The pandas_datareader library hasn’t been updated since summer 2021. Nevertheless, as of Dec 2025, its Fama-French data fetching is still functional.
For the asset historical price data, we use the yfinance library to download them from the Yahoo Finance site.
import pandas as pdimport yfinance as yfimport pandas_datareader.data as webimport datetime# Define the asset of interestticker ='VBR'# Vanguard Small-Cap Value ETF# Define the time period for analysisstart = datetime.datetime(2010, 1, 1)end = datetime.datetime(2020, 12, 31)# Download Fama-French Factors (Source: Ken French Library)# The web.DataReader() call returns a dictionary (with the monthly factor data in its 0th item)ff_data = web.DataReader('F-F_Research_Data_Factors', 'famafrench', start=start, end=end)# Taking the monthly data (index 0) and converting to decimalsfactors = ff_data[0] /100factors.rename(columns={'Mkt-RF': 'MKT_RF'}, inplace=True)# The data contains Mkt-RF, SMB, HML, and RF and are stored in a dataframeprint(factors.head())
Next, we will download the asset’s historical price data, compute its monthly returns. We use the Vanguard Small-Cap Value ETF (ticker: VBR) as an example.
#Download Asset Data (Source: Yahoo Finance)stock_data = yf.download(ticker, start=start, end=end, auto_adjust=False, progress=False)# Resample to Monthly (end of month) and calculate returns# 'Adj Close' is best for returns as it accounts for dividends/splitsstock_returns = stock_data['Adj Close'].resample('ME').last().pct_change().dropna()# Formatting index to match French's data (Year-Month) for mergingstock_returns.index = stock_returns.index.to_period('M')# Rename the returns column for claritystock_returns.columns = ['Asset_Return']print(stock_returns.head())
Now we merge the asset returns with the factor data on the Date index and compute the excess returns.
# Merge the stock returns with the factors on the Date indexdf = pd.merge(stock_returns, factors, left_index=True, right_index=True, how='inner')# Calculate Excess Returns (Stock Return minus Risk-Free Rate)df['Excess_Return'] = df['Asset_Return'] - df['RF']print(df.head())
We now start the regression analysis to estimate the FF3F model parameters. We use the statsmodels library to perform the Ordinary Least Squares (OLS) regression.
import statsmodels.api as sm# Define X (Independent Vars) and Y (Dependent Var)X = df[['MKT_RF', 'SMB', 'HML']]Y = df['Excess_Return']# Add a constant (Alpha) to the modelX = sm.add_constant(X)# Fit the Fama-French 3-Factor Model using OLSmodel = sm.OLS(Y, X)results = model.fit()# Print the regression summaryprint(results.summary())
Adjusted R-squared (0.980): This is extremely high. It tells us that 98% of the variation in this asset’s excess returns is explained by its exposure to the Market, Size, and Value factors. Only 2% is due to idiosyncratic (unique) risk or other missing factors.
This isn’t surprising as the asset chosen for this analysis is the Vanguard Small-Cap Value ETF (ticker: VBR). The asset is a highly diversified portfolio rather than an individual stock.
Factor Sensitivities (The Betas): The estimated factor loadings (betas) are
MKT_RF (1.0136): This is the “Market Beta.” It is very close to 1.0, meaning the asset moves almost exactly in line with the broad market: if the market rises 1%, this asset rises about 1.01% (holding other factors constant). It is statistically significant (p < 0.000).
SMB (0.5574): This is a strong positive loading on the Size factor. Because it is positive, the asset behaves like a Small-Cap stock. It tends to outperform when small stocks beat large stocks.
HML (0.4030): This is a significant positive loading on the Value factor. It indicates the asset behaves like a Value stock. It tends to outperform when stocks with high book-to-market ratios (undervalued) beat “growth” stocks.
Performance Assessment (Alpha)
const (-0.0006): This is the monthly Alpha, return not explained by the factors. This is often regarded as a performance measure because it indicates the manager’s ability to generate returns beyond what is expected given the risk factors. Here, the alpha is effectively zero (-0.06% per month).
Crucially, the P-value is 0.374, which is much higher than the 0.05 significance level. We therefore fail to reject the null hypothesis that alpha equals zero. This asset has no statistically significant alpha. It is performing exactly as expected given the risks it is taking. There is no evidence of “manager skill” or “abnormal return” here.
Diagnostic Tests (Assumptions)
Durbin-Watson (1.665)2: This is slightly below the ideal 2.0 but generally sits in the “acceptable” range (1.5 to 2.5). It suggests there isn’t a major issue with first-order autocorrelation in the residuals.
Omnibus & Jarque-Bera (p < 0.05)3: Both tests for normality have very low p-values. This means the residuals are not normally distributed (confirmed by the Skew of -0.82 and Kurtosis of 5.03). Note: In finance, this is very common due to “fat tails” (occasional large market moves). While it doesn’t ruin the model, it means you should be cautious with the exact precision of the confidence intervals.
2 The Durbin-Watson (DW) test is a statistical test used to detect the presence of autocorrelation in the residuals from a regression analysis. We want the residuals to be uncorrelated over time. If there is significant autocorrelation, it can violate the assumptions of the regression model and lead to inefficient estimates and misleading inference.
3 Omnibus and Jarque-Bera (JB) are statistical tests checking whether the residuals from the regression follow a normal distribution. This is important because many statistical inference techniques (like hypothesis testing) assume normality of errors. If the residuals are not normally distributed, it can affect the validity of p-values and confidence intervals derived from the regression.
In summary, this regression describes a passive Small-Cap Value fund. It perfectly captures the “Size” and “Value” premiums without adding any additional “active” value (zero alpha). If you were looking for a fund that gives you pure exposure to these two risk factors, this is it.
Note that for simplicity, we did not perform diagnostics for heteroscedasticity and make adjustments for it. In practice, these steps may be necessary for robust inference.
4.2 Example 2
Let’s try another asset, the SPDR S&P 500 ETF Trust (ticker: SPY), which tracks the S&P 500 index. This ETF represents a broad market portfolio of large-cap U.S. stocks.
This time, we encapsulate the entire process into a function for reusability.
Show the code: run_fama_french_analysis()
import pandas as pdimport yfinance as yfimport statsmodels.api as smimport pandas_datareader.data as webimport datetimefrom statsmodels.stats.diagnostic import het_whitedef run_fama_french_analysis(ticker, start_date, end_date):""" Performs a complete Fama-French 3-Factor analysis including data download, regression, and post-regression diagnostics. """print(f"\n--- Starting Analysis for {ticker} ---")# ==========================================# 1. DATA RETRIEVAL# ==========================================try: start = datetime.datetime.strptime(start_date, '%Y-%m-%d') end = datetime.datetime.strptime(end_date, '%Y-%m-%d')# A. Download Fama-French Factors (Source: Ken French Library)# 'F-F_Research_Data_Factors' contains Mkt-RF, SMB, HML, and RFprint("Downloading Fama-French Factors...") ff_data = web.DataReader('F-F_Research_Data_Factors', 'famafrench', start=start, end=end)# Taking the monthly data (index 0) and converting to decimals factors = ff_data[0] /100 factors.rename(columns={'Mkt-RF': 'MKT_RF'}, inplace=True)# B. Download Asset Data (Source: Yahoo Finance)print(f"Downloading data for {ticker}...") stock_data = yf.download(ticker, start=start, end=end, auto_adjust=False, progress=False)# Resample to Monthly (end of month) and calculate returns# 'Adj Close' is best for returns as it accounts for dividends/splits stock_returns = stock_data['Adj Close'].resample('ME').last().pct_change().dropna()# Formatting index to match French's data (Year-Month) for merging stock_returns.index = stock_returns.index.to_period('M')# Rename the returns column for clarity stock_returns.columns = ['Asset_Return']exceptExceptionas e:print(f"Data Download Error: {e}")returnNone# ==========================================# 2. DATA PREPARATION# ==========================================# Merge the stock returns with the factors on the Date index df = pd.merge(stock_returns, factors, left_index=True, right_index=True, how='inner')# Calculate Excess Returns (Stock Return minus Risk-Free Rate) df['Excess_Return'] = df['Asset_Return'] - df['RF']# Define X (Independent Vars) and Y (Dependent Var) X = df[['MKT_RF', 'SMB', 'HML']] Y = df['Excess_Return']# Add a constant (Alpha) to the model X = sm.add_constant(X)# ==========================================# 3. OLS REGRESSION# ========================================== model = sm.OLS(Y, X) results = model.fit()# ==========================================# 4. DIAGNOSTICS & REPORTING# ==========================================print(results.summary())print("\n"+"="*40)print(f"FF3F Results Condensed Summary for {ticker}")print("="*40)# A. Factor Loadings (The Betas)print("\n--- Factor Sensitivities (Betas) ---")print(f"Market Beta: {results.params['MKT_RF']:.4f}")print(f"Size Beta (SMB): {results.params['SMB']:.4f}")print(f"Value Beta (HML): {results.params['HML']:.4f}")# B. Performance (Alpha) alpha = results.params['const'] alpha_p = results.pvalues['const'] is_significant ="YES"if alpha_p <0.05else"NO"print("\n--- Performance (Alpha) ---")print(f"Monthly Alpha: {alpha:.4f} (approx {alpha*12*100:.2f}% annualized)")print(f"Is Alpha Significant? {is_significant} (p-value: {alpha_p:.4f})")# C. Model Fitprint("\n--- Model Fit ---")print(f"Adj. R-Squared: {results.rsquared_adj:.4f}")# D. Assumptions Testingprint("\n--- Diagnostic Tests ---")# 1. Autocorrelation (Durbin-Watson)# Ideal is ~2.0. <1.5 or >2.5 indicates serial correlation dw_stat = sm.stats.stattools.durbin_watson(results.resid)print(f"Durbin-Watson: {dw_stat:.2f} (Target: ~2.0)")# 2. Heteroscedasticity (White Test)# H0: Variance is constant (Homoscedasticity) white_test = het_white(results.resid, results.model.exog) white_p = white_test[1] het_msg ="Warning: Heteroscedasticity detected!"if white_p <0.05else"Pass: Variance looks constant."print(f"White Test: {het_msg} (p-value: {white_p:.4f})")return results
Let’s run the analysis for SPY using the function we just built.
# We analyze 'SPY', a broad market ETF._ = run_fama_french_analysis('SPY', '2010-01-01', '2020-12-31')
--- Starting Analysis for SPY ---
Downloading Fama-French Factors...
Downloading data for SPY...