Multiple Linear Regression ( Toy Example)

 

Experiment 

Multiple Linear Regression 


Aim

To implement Multiple Linear Regression using multiple input features and evaluate the model using:

  • Mean Squared Error (MSE)

  • R-squared (R²)


Objectives

  • Understand multiple linear regression

  • Implement regression using matrix method

  • Predict output using multiple features

  • Compute MSE and R² manually

  • Visualize actual vs predicted values


🛠️ Tools Required

  • Python

  • NumPy

  • Matplotlib


📖 Theory

🔹 Multiple Linear Regression

Multiple Linear Regression models the relationship between one dependent variable and multiple independent variables:

🔹 Vector / Matrix Form

y^=Xθ\hat{y} = X\theta

Where:

  • y^\hat{y} → Predicted output

  • XX → Feature matrix (including bias column)

  • θ\theta → Parameter vector


🔹 Expanded Form (for 2 features)

y^=θ0+θ1x1+θ2x2\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2


🔹 General Form (n features)

y^=θ0+θ1x1+θ2x2++θnxn\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n


🔹 Structure of θ (Theta Vector)

θ=[θ0θ1θ2θn]\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \theta_2 \\ \vdots \\ \theta_n \end{bmatrix}

  • θ0\theta_0 → Intercept (bias term)

  • θ1,θ2,...θn\theta_1, \theta_2, ... \theta_n → Feature coefficients


🔹 Structure of X (Feature Matrix)

X=[1x11x12x1n1x21x22x2n1x31x32x3n1xm1xm2xmn]X = \begin{bmatrix} 1 & x_{11} & x_{12} & \dots & x_{1n} \\ 1 & x_{21} & x_{22} & \dots & x_{2n} \\ 1 & x_{31} & x_{32} & \dots & x_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{m1} & x_{m2} & \dots & x_{mn} \end{bmatrix}

  • First column = 1s (bias term)

  • Each row = one training example


🔹 Final Parameter Equation (Normal Equation)

θ=(XTX)1XTy\theta = (X^T X)^{-1} X^T y

y=b0+b1x1+b2x2++bnxny = b_0 + b_1 x_1 + b_2 x_2 + \dots + b_n x_n

In matrix form:

Y=XθY = X\theta

Where:

  • XX = Feature matrix (with bias term)

  • θ\theta = Parameter vector

  • YY = Output vector


🔹 Normal Equation

To compute optimal parameters:

θ=(XTX)1XTY\theta = (X^T X)^{-1} X^T Y

🔹 Mean Squared Error (MSE)

MSE=1n(yiy^i)2MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2

🔹 R-squared (R² Score)

R2=1SSresSStotR^2 = 1 - \frac{SS_{res}}{SS_{tot}}

📋 Procedure

  1. Define dataset with multiple features

  2. Add bias (intercept term)

  3. Compute parameters using Normal Equation

  4. Predict values

  5. Calculate MSE

  6. Calculate R²

  7. Plot actual vs predicted values


💻 Program

import numpy as np import matplotlib.pyplot as plt # Sample multi-variable dataset # Features: x1, x2 X = np.array([ [1, 2], [2, 1], [3, 4], [4, 3], [5, 5], [6, 7], [7, 6], [8, 8] ]) # Target variable y = np.array([3, 3, 7, 7, 10, 13, 13, 16]).reshape(-1, 1) # ----------------------------- # Add bias term # ----------------------------- ones = np.ones((X.shape[0], 1)) X_b = np.hstack((ones, X)) # ----------------------------- # Normal Equation # ----------------------------- theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y) print("Parameters (theta):") print(theta) # ----------------------------- # Predictions # ----------------------------- y_pred = X_b.dot(theta) # ----------------------------- # Manual MSE # ----------------------------- n = len(y) mse = np.sum((y - y_pred) ** 2) / n print("MSE:", mse) # ----------------------------- # Manual R² # ----------------------------- y_mean = np.mean(y) SS_res = np.sum((y - y_pred) ** 2) SS_tot = np.sum((y - y_mean) ** 2) r2 = 1 - (SS_res / SS_tot) print("R²:", r2) # ----------------------------- # Visualization (Actual vs Predicted) # ----------------------------- plt.scatter(y, y_pred) plt.plot([min(y), max(y)], [min(y), max(y)]) # ideal line plt.xlabel("Actual Values") plt.ylabel("Predicted Values") plt.title("Actual vs Predicted (Multiple Linear Regression)") plt.show()

 Output

  • Model parameters (θ values)

  • Mean Squared Error (MSE)

  • R² score

  • Scatter plot of actual vs predicted values

Parameters (theta): [[-6.79456491e-14] [ 1.00000000e+00] [ 1.00000000e+00]] MSE: 1.7150822155636323e-27 R²: 1.0





 Result

The multiple linear regression model was successfully implemented using the Normal Equation.

Model performance was evaluated using manual MSE and R² calculations.

  • Multiple regression handles more than one feature

  • Matrix form simplifies computation

  • R² indicates how well multiple features explain the target

Comments

Popular posts from this blog

Machine Learning Lab PCCSL508 Semester 5 KTU CS 2024 Scheme - Dr Binu V P

Explore Californoa Housing Dataset

Recommended Tools and Setup for Lab