8  __init()__: Organize Code into Importable Modules

8.1 Motivation

When your research project grows in complexity, you often have code scattered across multiple scripts. To improve code organization and reusability, it’s a good practice to organize your code into modules and packages. This not only improves code readability but also allows you to reuse code across different parts of your project or even in other projects.

8.2 Creating Importable Modules

Suppose you have a research project with the following structure:

my-research-project/
├── src/
│   ├── process_data.py
│   ├── model.py
│   └── plot.py
├── main.py
├── data/
├── output/
├── README.md
├── pyproject.toml
└── uv.lock

The src directory contains various scripts for data processing, modeling, and plotting. For example,

# src/process_data.py
def clean_data(df):
    df = df.dropna()
    df["log_income"] = np.log(df["income"])
    return df
# src/model.py
from sklearn.linear_model import LinearRegression

def fit_model(X, y):
    model = LinearRegression()
    model.fit(X, y)
    return model
# src/plot.py
import matplotlib.pyplot as plt

def plot_results(y_true, y_pred):
    plt.scatter(y_true, y_pred)
    plt.xlabel("Actual")
    plt.ylabel("Predicted")
    plt.show()

To make these scripts importable as modules, you can create an empty __init__.py file inside the src directory. This file tells Python that src is a package.

In your main.py or other scripts, you can now import functions from the src package:

from src.process_data import clean_data
from src.model import fit_model
from src.plot import plot_results

import pandas as pd

# Load data
df = pd.read_csv("data/raw.csv")
df = clean_data(df)

# Fit model
model = fit_model(df[["log_income"]], df["consumption"])

# Evaluate
pred = model.predict(df[["log_income"]])
plot_results(df["consumption"], pred)

Even better, you could also add the following to src/__init__.py to simplify imports:

from .process_data import clean_data
from .model import fit_model
from .plot import plot_results
from .report import summarize

__all__ = ["clean_data", "fit_model", "plot_results", "summarize"]

You can now import everything from src directly:

from src import clean_data, fit_model, plot_results

import pandas as pd

# Load data

# Fit model

# Evaluate

8.3 Learning Resources

To learn more about organizing Python code into modules and packages, consider the following resources: