Simple Linear Regression to Predict Pass Percentage using Python(scikit-learn)
Experiment Title
Implementation of Simple Linear Regression to Predict Pass Percentage using Python
🎯 Aim
To implement a Simple Linear Regression model for predicting student pass percentage based on the year using Python and Scikit-learn.
🎯 Objectives
- To understand the concept of simple linear regression
- To build a regression model using Scikit-learn
- To predict pass percentage using year as the predictor variable
- To evaluate model performance using MSE and R²
- To visualize the regression line with data points
📖 Theory
🔹 Simple Linear Regression
Simple Linear Regression is a supervised machine learning algorithm used to model the relationship between:
- One independent variable (X)
- One dependent variable (Y)
The regression equation is:
Where:
- → Predicted output
- → Input feature
- → Intercept
- → Slope/Coefficient
🔹 In This Experiment
| Variable | Description |
|---|---|
| Year | Predictor Variable (Independent Variable) |
| Pass Percentage | Response Variable (Dependent Variable) |
🔹 Working Principle
The model tries to find the best-fit straight line that minimizes prediction error.
📊 Error Metrics
🔹 Mean Squared Error (MSE)
MSE measures the average squared difference between actual and predicted values.
Where:
- → Actual value
- → Predicted value
Interpretation:
- Smaller MSE → Better model performance
🔹 R-Squared (R²)
R² measures how well the regression line explains the variability in the data.
Where:
Interpretation:
| R² Value | Meaning |
|---|---|
| 1 | Perfect fit |
| 0 | Poor fit |
🧰 Software Requirements
- Python
- NumPy
- Pandas
- Matplotlib
- Scikit-learn
📋 Algorithm
- Import required libraries
- Create dataset containing year and pass percentage
- Separate predictor and target variables
- Train Linear Regression model
- Predict pass percentage values
- Calculate MSE and R² score
- Plot regression line
- Predict future values
💻 Program
# ============================================
# SIMPLE LINEAR REGRESSION MODEL
# Predictor Variable : Year
# Response Variable : Pass Percentage
# ============================================
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# --------------------------------------------
# Create Sample Dataset
# --------------------------------------------
# For reproducibility
np.random.seed(10)
# Years from 2000 to 2025
years = np.arange(2000, 2026)
# Random pass percentages between 50 and 100
pass_percentage = np.random.randint(50, 101, len(years))
# Create dataframe
df = pd.DataFrame({
'Year': years,
'Pass_Percentage': pass_percentage
})
# Display dataset
print("\nDATASET\n")
print(df)
# --------------------------------------------
# Prepare Data
# --------------------------------------------
# Predictor variable (X)
X = df[['Year']]
# Target variable (y)
y = df['Pass_Percentage']
# --------------------------------------------
# Build Linear Regression Model
# --------------------------------------------
model = LinearRegression()
# Train model
model.fit(X, y)
# Predict values
y_pred = model.predict(X)
# --------------------------------------------
# Model Parameters
# --------------------------------------------
print("\nMODEL PARAMETERS\n")
print("Intercept :", model.intercept_)
print("Slope :", model.coef_[0])
# Regression Equation
print("\nRegression Equation:")
print(f"Pass Percentage = {model.coef_[0]:.4f} * Year + {model.intercept_:.4f}")
# --------------------------------------------
# Error Metrics
# --------------------------------------------
# Mean Squared Error
mse = mean_squared_error(y, y_pred)
# R^2 Score
r2 = r2_score(y, y_pred)
print("\nERROR METRICS\n")
print("MSE :", round(mse, 4))
print("R^2 :", round(r2, 4))
# --------------------------------------------
# Plot Actual Data and Regression Line
# --------------------------------------------
plt.figure(figsize=(10, 6))
# Scatter plot for actual data
plt.scatter(X, y, color='blue', label='Actual Data')
# Regression line
plt.plot(X, y_pred, color='red', linewidth=2, label='Regression Line')
# Labels and title
plt.title('Simple Linear Regression')
plt.xlabel('Year')
plt.ylabel('Pass Percentage')
# Show legend
plt.legend()
# Grid
plt.grid(True)
# Show plot
plt.show()
# --------------------------------------------
# Predict Future Values
# --------------------------------------------
future_years = pd.DataFrame({
'Year': [2026, 2027, 2028]
})
future_predictions = model.predict(future_years)
print("\nFUTURE PREDICTIONS\n")
for year, pred in zip(future_years['Year'], future_predictions):
print(f"Year {year} --> Predicted Pass Percentage = {pred:.2f}")
# ============================================
# END OF PROGRAM
# ============================================
📊 Sample Output
📈 Graph
The graph contains:
- Blue scatter points → Actual pass percentages
- Red line → Regression line (best fit line)
📉 Interpretation of Results
- The regression line shows the relationship between year and pass percentage
- Positive slope indicates increasing trend
- MSE measures prediction error
- R² indicates how well the model explains variation in pass percentage
✅ Result
The Simple Linear Regression model was successfully implemented to predict pass percentage based on year. The model performance was evaluated using MSE and R² score.
- Linear regression effectively models relationships between variables
- The model predicts future pass percentages using historical data
- Evaluation metrics help measure model efficiency

Comments
Post a Comment