Implementation of Multivariate Linear Regression using Gradient Descent (Toy Dataset)

 

Experiment 

Implementation of Multivariate Linear Regression using Gradient Descent (Toy Dataset)


🎯 Aim

To implement multivariate linear regression using gradient descent and evaluate its performance.


🎯 Objectives

  • Understand multivariate regression
  • Implement gradient descent manually
  • Train model on sample dataset
  • Compute MSE and R²
  • Visualize cost convergence

📖 Theory


🔹 Multivariate Linear Regression Model

y^=θ0+θ1x1+θ2x2++θnxn\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n

Matrix form:

y^=Xθ\hat{y} = X\theta

🔹 Cost Function (MSE)

J(θ)=1n(yy^)2J(\theta) = \frac{1}{n} \sum (y - \hat{y})^2

🔹 Gradient Descent Update Rule

θ=θα2nXT(Xθy)\theta = \theta - \alpha \cdot \frac{2}{n} X^T (X\theta - y)

🔹 Evaluation Metrics

Mean Squared Error (MSE):

MSE=1n(yy^)2MSE = \frac{1}{n} \sum (y - \hat{y})^2

R-squared (R²):

R2=1SSresSStotR^2 = 1 - \frac{SS_{res}}{SS_{tot}}

📊 Dataset (Toy Example)

We simulate a housing-like dataset:

Size (x₁)    Bedrooms (x₂)    Price (y)
1000        2        200
1500        3        300
1800        3        350
2400        4        450
3000        4        500

📋 Procedure

  1. Define dataset
  2. Normalize features (important for GD)
  3. Add bias term
  4. Initialize parameters
  5. Apply gradient descent
  6. Predict values
  7. Compute MSE and R²
  8. Plot cost vs iterations

💻 Program

import numpy as np import matplotlib.pyplot as plt # ----------------------------- # Step 1: Dataset # ----------------------------- X = np.array([ [1000, 2], [1500, 3], [1800, 3], [2400, 4], [3000, 4] ]) y = np.array([200, 300, 350, 450, 500]) # ----------------------------- # Step 2: Feature Scaling # ----------------------------- X_mean = np.mean(X, axis=0) X_std = np.std(X, axis=0) X = (X - X_mean) / X_std # ----------------------------- # Step 3: Add Bias Term # ----------------------------- X = np.c_[np.ones(X.shape[0]), X] # ----------------------------- # Step 4: Initialize Parameters # ----------------------------- theta = np.zeros(X.shape[1]) learning_rate = 0.1 iterations = 1000 n = len(y) cost_history = [] # ----------------------------- # Step 5: Gradient Descent # ----------------------------- for i in range(iterations): y_pred = np.dot(X, theta) error = y_pred - y # Gradient gradient = (2/n) * np.dot(X.T, error) # Update theta = theta - learning_rate * gradient # Cost cost = (1/n) * np.sum(error**2) cost_history.append(cost) # ----------------------------- # Step 6: Predictions # ----------------------------- y_pred_final = np.dot(X, theta) # ----------------------------- # Step 7: Evaluation # ----------------------------- # MSE mse = np.mean((y - y_pred_final)**2) # R² SS_res = np.sum((y - y_pred_final)**2) SS_tot = np.sum((y - np.mean(y))**2) r2 = 1 - (SS_res / SS_tot) print("Theta Values:", theta) print("MSE:", mse) print("R²:", r2) # ----------------------------- # Step 8: Plot Cost Convergence # ----------------------------- plt.plot(cost_history) plt.xlabel("Iterations") plt.ylabel("Cost") plt.title("Cost Convergence (Gradient Descent)") plt.show()

📊 Output

Theta Values: [360.          65.32586696  42.64908267]
MSE: 60.68601583257023
R²: 0.9946766652778447




📈 Interpretation

  • Cost decreases over iterations → model is learning
  • Low MSE → good fit
  • R² ≈ 1 → strong prediction accuracy

📉 Graph Explanation

  • X-axis → iterations
  • Y-axis → cost
  • Curve decreases → convergence

🎯 Key Observations

ConceptInsight
Feature scaling    Improves convergence
Gradient descent    Iteratively minimizes error
Multivariate model    Uses multiple inputs

✅ Result

Multivariate linear regression was successfully implemented using gradient descent and evaluated using MSE and R².

  • Gradient descent works effectively for multiple variables
  • Feature scaling is essential
  • Model converges to optimal parameters

Comments

Popular posts from this blog

Machine Learning Lab PCCSL508 Semester 5 KTU CS 2024 Scheme - Dr Binu V P

Explore Californoa Housing Dataset

Recommended Tools and Setup for Lab