Simple Linear Regression to Predict Pass Percentage using Python(scikit-learn)

 

Experiment Title

Implementation of Simple Linear Regression to Predict Pass Percentage using Python


🎯 Aim

To implement a Simple Linear Regression model for predicting student pass percentage based on the year using Python and Scikit-learn.


🎯 Objectives

  • To understand the concept of simple linear regression
  • To build a regression model using Scikit-learn
  • To predict pass percentage using year as the predictor variable
  • To evaluate model performance using MSE and R²
  • To visualize the regression line with data points

📖 Theory


🔹 Simple Linear Regression

Simple Linear Regression is a supervised machine learning algorithm used to model the relationship between:

  • One independent variable (X)
  • One dependent variable (Y)

The regression equation is:

Y=θ0+θ1XY = \theta_0 + \theta_1 X

Where:

  • YY → Predicted output
  • XX → Input feature
  • θ0\theta_0→ Intercept
  • θ1\theta_1 → Slope/Coefficient

🔹 In This Experiment

VariableDescription
Year    Predictor Variable (Independent Variable)
Pass Percentage    Response Variable (Dependent Variable)

🔹 Working Principle

The model tries to find the best-fit straight line that minimizes prediction error.


📊 Error Metrics


🔹 Mean Squared Error (MSE)

MSE measures the average squared difference between actual and predicted values.

MSE=1n(yy^)2MSE = \frac{1}{n} \sum (y - \hat{y})^2

Where:

  • yy → Actual value
  • y^\hat{y} → Predicted value

Interpretation:

  • Smaller MSE → Better model performance

🔹 R-Squared (R²)

R² measures how well the regression line explains the variability in the data.

R2=1SSresSStotR^2 = 1 - \frac{SS_{res}}{SS_{tot}}

Where:

SSres=(yy^)2SS_{res} = \sum (y - \hat{y})^2
SStot=(yyˉ)2SS_{tot} = \sum (y - \bar{y})^2

Interpretation:

R² Value    Meaning
1    Perfect fit
0    Poor fit

🧰 Software Requirements

  • Python
  • NumPy
  • Pandas
  • Matplotlib
  • Scikit-learn

📋 Algorithm

  1. Import required libraries
  2. Create dataset containing year and pass percentage
  3. Separate predictor and target variables
  4. Train Linear Regression model
  5. Predict pass percentage values
  6. Calculate MSE and R² score
  7. Plot regression line
  8. Predict future values

💻 Program

# ============================================
# SIMPLE LINEAR REGRESSION MODEL
# Predictor Variable : Year
# Response Variable : Pass Percentage
# ============================================

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# --------------------------------------------
# Create Sample Dataset
# --------------------------------------------

# For reproducibility
np.random.seed(10)

# Years from 2000 to 2025
years = np.arange(2000, 2026)

# Random pass percentages between 50 and 100
pass_percentage = np.random.randint(50, 101, len(years))

# Create dataframe
df = pd.DataFrame({
'Year': years,
'Pass_Percentage': pass_percentage
})

# Display dataset
print("\nDATASET\n")
print(df)

# --------------------------------------------
# Prepare Data
# --------------------------------------------

# Predictor variable (X)
X = df[['Year']]

# Target variable (y)
y = df['Pass_Percentage']

# --------------------------------------------
# Build Linear Regression Model
# --------------------------------------------

model = LinearRegression()

# Train model
model.fit(X, y)

# Predict values
y_pred = model.predict(X)

# --------------------------------------------
# Model Parameters
# --------------------------------------------

print("\nMODEL PARAMETERS\n")

print("Intercept :", model.intercept_)
print("Slope :", model.coef_[0])

# Regression Equation
print("\nRegression Equation:")
print(f"Pass Percentage = {model.coef_[0]:.4f} * Year + {model.intercept_:.4f}")

# --------------------------------------------
# Error Metrics
# --------------------------------------------

# Mean Squared Error
mse = mean_squared_error(y, y_pred)

# R^2 Score
r2 = r2_score(y, y_pred)

print("\nERROR METRICS\n")

print("MSE :", round(mse, 4))
print("R^2 :", round(r2, 4))

# --------------------------------------------
# Plot Actual Data and Regression Line
# --------------------------------------------

plt.figure(figsize=(10, 6))

# Scatter plot for actual data
plt.scatter(X, y, color='blue', label='Actual Data')

# Regression line
plt.plot(X, y_pred, color='red', linewidth=2, label='Regression Line')

# Labels and title
plt.title('Simple Linear Regression')
plt.xlabel('Year')
plt.ylabel('Pass Percentage')

# Show legend
plt.legend()

# Grid
plt.grid(True)

# Show plot
plt.show()

# --------------------------------------------
# Predict Future Values
# --------------------------------------------

future_years = pd.DataFrame({
'Year': [2026, 2027, 2028]
})

future_predictions = model.predict(future_years)

print("\nFUTURE PREDICTIONS\n")

for year, pred in zip(future_years['Year'], future_predictions):
print(f"Year {year} --> Predicted Pass Percentage = {pred:.2f}")

# ============================================
# END OF PROGRAM
# ============================================

📊 Sample Output

MODEL PARAMETERS

Intercept : -416.85470085470035
Slope     : 0.24547008547008523

Regression Equation:
Pass Percentage = 0.2455 * Year + -416.8547

ERROR METRICS

MSE  : 234.6639
R^2  : 0.0142




📈 Graph

The graph contains:

  • Blue scatter points → Actual pass percentages
  • Red line → Regression line (best fit line)

📉 Interpretation of Results

  • The regression line shows the relationship between year and pass percentage
  • Positive slope indicates increasing trend
  • MSE measures prediction error
  • R² indicates how well the model explains variation in pass percentage

✅ Result

The Simple Linear Regression model was successfully implemented to predict pass percentage based on year. The model performance was evaluated using MSE and R² score.

  • Linear regression effectively models relationships between variables
  • The model predicts future pass percentages using historical data
  • Evaluation metrics help measure model efficiency

Comments

Popular posts from this blog

Machine Learning Lab PCCSL508 Semester 5 KTU CS 2024 Scheme - Dr Binu V P

Explore California Housing Dataset

Recommended Tools and Setup for Lab