Multiple Linear Regression using Scikit-Learn for House Price Prediction
Multiple Linear Regression using Scikit-Learn for House Price Prediction
📌 Aim
To implement Multiple Linear Regression using Scikit-Learn for predicting house prices based on multiple input features.
🎯 Objectives
- Understand Multiple Linear Regression
- Create a real-time style dataset
- Train a regression model using Scikit-Learn
- Predict house prices using multiple variables
-
Evaluate model performance using:
- Mean Squared Error (MSE)
- R² Score
- Visualize Actual vs Predicted values
📖 Theory
🔹 Multiple Linear Regression
Multiple Linear Regression predicts a dependent variable using multiple independent variables.
🔹 Mathematical Model
Where:
- → Predicted value
- → Intercept
- → Coefficients
- → Input features
📊 Real-Time Example
We predict House Price based on:
| Feature | Description |
|---|---|
| Area | Size of house (sq.ft) |
| Bedrooms | Number of bedrooms |
| Age | Age of house |
| Distance | Distance from city center |
📋 Dataset (Randomly Generated)
| Area | Bedrooms | Age | Distance | Price |
|---|---|---|---|---|
| 1200 | 2 | 10 | 5 | 250 |
| 1500 | 3 | 5 | 3 | 320 |
| 1800 | 3 | 8 | 4 | 340 |
| 2400 | 4 | 2 | 2 | 450 |
| 3000 | 5 | 1 | 1 | 600 |
📌 Algorithm
- Import libraries
- Create dataset
- Split data into features and target
- Train Multiple Linear Regression model
- Predict values
- Evaluate model using MSE and R²
- Predict price for a new house
- Visualize results
💻 Program
# ============================================
# MULTIPLE LINEAR REGRESSION
# House Price Prediction
# ============================================
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# --------------------------------------------
# Create Dataset
# --------------------------------------------
data = {
'Area': [1200, 1500, 1800, 2400, 3000,
3500, 4000, 4500, 5000, 5500],
'Bedrooms': [2, 3, 3, 4, 5,
5, 6, 6, 7, 7],
'Age': [10, 5, 8, 2, 1,
3, 2, 4, 1, 2],
'Distance': [5, 3, 4, 2, 1,
2, 1, 3, 2, 1],
'Price': [250, 320, 340, 450, 600,
650, 700, 720, 800, 850]
}
# Create DataFrame
df = pd.DataFrame(data)
# Display Dataset
print("\nDATASET\n")
print(df)
# --------------------------------------------
# Prepare Input and Output
# --------------------------------------------
# Independent variables
X = df[['Area', 'Bedrooms', 'Age', 'Distance']]
# Dependent variable
y = df['Price']
# --------------------------------------------
# Build Model
# --------------------------------------------
model = LinearRegression()
# Train model
model.fit(X, y)
# --------------------------------------------
# Predictions
# --------------------------------------------
y_pred = model.predict(X)
# --------------------------------------------
# Model Parameters
# --------------------------------------------
print("\nMODEL PARAMETERS\n")
print("Intercept:")
print(model.intercept_)
print("\nCoefficients:")
for feature, coef in zip(X.columns, model.coef_):
print(feature, ":", coef)
# --------------------------------------------
# Regression Equation
# --------------------------------------------
print("\nREGRESSION EQUATION\n")
print("Price = ")
for feature, coef in zip(X.columns, model.coef_):
print(f"({coef:.4f} * {feature}) + ")
print(model.intercept_)
# --------------------------------------------
# Evaluation Metrics
# --------------------------------------------
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print("\nEVALUATION METRICS\n")
print("Mean Squared Error (MSE):", round(mse, 4))
print("R² Score:", round(r2, 4))
# --------------------------------------------
# Predict New House Price
# --------------------------------------------
new_house = pd.DataFrame({
'Area': [2800],
'Bedrooms': [4],
'Age': [2],
'Distance': [2]
})
predicted_price = model.predict(new_house)
print("\nNEW HOUSE PREDICTION\n")
print("Predicted House Price:",
round(predicted_price[0], 2))
# --------------------------------------------
# Visualization
# --------------------------------------------
plt.figure(figsize=(8,6))
plt.scatter(y, y_pred, color='blue')
plt.plot([y.min(), y.max()],
[y.min(), y.max()],
color='red')
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted Prices")
plt.grid(True)
plt.show()
# ============================================
# END OF PROGRAM
# ============================================
📊 Sample Output
DATASET Area Bedrooms Age Distance Price 0 1200 2 10 5 250 1 1500 3 5 3 320 2 1800 3 8 4 340 3 2400 4 2 2 450 4 3000 5 1 1 600 5 3500 5 3 2 650 6 4000 6 2 1 700 7 4500 6 4 3 720 8 5000 7 1 2 800 9 5500 7 2 1 850 MODEL PARAMETERS Intercept: 151.9548847198012 Coefficients: Area : 0.08786516356237986 Bedrooms : 36.348475116830265 Age : 0.9805984363579832 Distance : -19.51540386702398 REGRESSION EQUATION Price = (0.0879 * Area) + (36.3485 * Bedrooms) + (0.9806 * Age) + (-19.5154 * Distance) + 151.9548847198012 EVALUATION METRICS Mean Squared Error (MSE): 406.4522 R² Score: 0.9901 NEW HOUSE PREDICTION Predicted House Price: 506.3
📈 Interpretation
| Observation | Meaning |
|---|---|
| Positive coefficient | Feature increases price |
| Negative coefficient | Feature decreases price |
| Low MSE | Better prediction |
| R² close to 1 | Strong model |
📉 Graph Explanation
Actual vs Predicted Plot
- X-axis → Actual prices
- Y-axis → Predicted prices
- Points near red line → good prediction accuracy
🔍 Advantages of Multiple Linear Regression
- Handles multiple features
- Simple and interpretable
- Fast training
- Useful for prediction tasks
⚠️ Limitations
- Assumes linear relationship
- Sensitive to outliers
- Multicollinearity affects performance
✅ Result
Multiple Linear Regression was successfully implemented using Scikit-Learn to predict house prices using multiple input variables.
The experiment demonstrated how multiple input variables can be used to predict house prices effectively using Multiple Linear Regression.

Comments
Post a Comment