Multiple Linear Regression ( Toy Example)

March 14, 2026

Experiment

Multiple Linear Regression

Aim

To implement Multiple Linear Regression using multiple input features and evaluate the model using:

Mean Squared Error (MSE)
R-squared (R²)

Objectives

Understand multiple linear regression
Implement regression using matrix method
Predict output using multiple features
Compute MSE and R² manually
Visualize actual vs predicted values

🛠️ Tools Required

Python
NumPy
Matplotlib

📖 Theory

🔹 Multiple Linear Regression

Multiple Linear Regression models the relationship between one dependent variable and multiple independent variables:

🔹 Vector / Matrix Form

$\hat{y} = X\theta$

Where:

$\hat{y}$ → Predicted output
$X$ → Feature matrix (including bias column)
$\theta$ → Parameter vector

🔹 Expanded Form (for 2 features)

$\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2$

🔹 General Form (n features)

$\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n$

🔹 Structure of θ (Theta Vector)

$\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \theta_2 \\ \vdots \\ \theta_n \end{bmatrix}$

$\theta_0$ → Intercept (bias term)
$\theta_1, \theta_2, ... \theta_n$ → Feature coefficients

🔹 Structure of X (Feature Matrix)

$X = \begin{bmatrix} 1 & x_{11} & x_{12} & \dots & x_{1n} \\ 1 & x_{21} & x_{22} & \dots & x_{2n} \\ 1 & x_{31} & x_{32} & \dots & x_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{m1} & x_{m2} & \dots & x_{mn} \end{bmatrix}$

First column = 1s (bias term)
Each row = one training example

🔹 Final Parameter Equation (Normal Equation)

$\theta = (X^T X)^{-1} X^T y$

y = b_0 + b_1 x_1 + b_2 x_2 + \dots + b_n x_n

In matrix form:

Y = X\theta

Where:

$X$ = Feature matrix (with bias term)
$\theta$ = Parameter vector
$Y$ = Output vector

🔹 Normal Equation

To compute optimal parameters:

\theta = (X^T X)^{-1} X^T Y

🔹 Mean Squared Error (MSE)

MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2

🔹 R-squared (R² Score)

R^2 = 1 - \frac{SS_{res}}{SS_{tot}}

📋 Procedure

Define dataset with multiple features
Add bias (intercept term)
Compute parameters using Normal Equation
Predict values
Calculate MSE
Calculate R²
Plot actual vs predicted values

💻 Program


import numpy as np
import matplotlib.pyplot as plt

# Sample multi-variable dataset
# Features: x1, x2
X = np.array([
    [1, 2],
    [2, 1],
    [3, 4],
    [4, 3],
    [5, 5],
    [6, 7],
    [7, 6],
    [8, 8]
])

# Target variable
y = np.array([3, 3, 7, 7, 10, 13, 13, 16]).reshape(-1, 1)

# -----------------------------
# Add bias term
# -----------------------------
ones = np.ones((X.shape[0], 1))
X_b = np.hstack((ones, X))

# -----------------------------
# Normal Equation
# -----------------------------
theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

print("Parameters (theta):")
print(theta)

# -----------------------------
# Predictions
# -----------------------------
y_pred = X_b.dot(theta)

# -----------------------------
# Manual MSE
# -----------------------------
n = len(y)
mse = np.sum((y - y_pred) ** 2) / n
print("MSE:", mse)

# -----------------------------
# Manual R²
# -----------------------------
y_mean = np.mean(y)

SS_res = np.sum((y - y_pred) ** 2)
SS_tot = np.sum((y - y_mean) ** 2)

r2 = 1 - (SS_res / SS_tot)
print("R²:", r2)

# -----------------------------
# Visualization (Actual vs Predicted)
# -----------------------------
plt.scatter(y, y_pred)
plt.plot([min(y), max(y)], [min(y), max(y)])  # ideal line

plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted (Multiple Linear Regression)")

plt.show()

Output

Model parameters (θ values)
Mean Squared Error (MSE)
R² score
Scatter plot of actual vs predicted values

Parameters (theta): [[-6.79456491e-14] [ 1.00000000e+00] [ 1.00000000e+00]] MSE: 1.7150822155636323e-27 R²: 1.0

Result

The multiple linear regression model was successfully implemented using the Normal Equation.

Model performance was evaluated using manual MSE and R² calculations.

Multiple regression handles more than one feature
Matrix form simplifies computation
R² indicates how well multiple features explain the target

Search This Blog

Machine Learning Lab PCCSL508 Semester 5 KTU CS 2024 Scheme - Dr Binu V P