Multiple Linear Regression using Scikit-Learn for House Price Prediction

 

Multiple Linear Regression using Scikit-Learn for House Price Prediction


📌 Aim

To implement Multiple Linear Regression using Scikit-Learn for predicting house prices based on multiple input features.


🎯 Objectives

  • Understand Multiple Linear Regression
  • Create a real-time style dataset
  • Train a regression model using Scikit-Learn
  • Predict house prices using multiple variables
  • Evaluate model performance using:
    • Mean Squared Error (MSE)
    • R² Score
  • Visualize Actual vs Predicted values

📖 Theory

🔹 Multiple Linear Regression

Multiple Linear Regression predicts a dependent variable using multiple independent variables.


🔹 Mathematical Model

y^=θ0+θ1x1+θ2x2+θ3x3++θnxn\hat{y} = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_3 + \dots + \theta_nx_n

Where:

  • y^\hat{y} → Predicted value
  • θ0\theta_0 → Intercept
  • θ1,θ2,θ3\theta_1, \theta_2, \theta_3 → Coefficients
  • x1,x2,x3x_1, x_2, x_3 → Input features

📊 Real-Time Example

We predict House Price based on:

FeatureDescription
Area    Size of house (sq.ft)
Bedrooms    Number of bedrooms
Age    Age of house
Distance    Distance from city center

📋 Dataset (Randomly Generated)

Area    Bedrooms    Age    Distance    Price
1200    210    5    250
1500    35    3    320
1800    38    4    340
2400    42    2    450
3000    51    1    600

📌 Algorithm

  1. Import libraries
  2. Create dataset
  3. Split data into features and target
  4. Train Multiple Linear Regression model
  5. Predict values
  6. Evaluate model using MSE and R²
  7. Predict price for a new house
  8. Visualize results

💻 Program

# ============================================
# MULTIPLE LINEAR REGRESSION
# House Price Prediction
# ============================================

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# --------------------------------------------
# Create Dataset
# --------------------------------------------

data = {
'Area': [1200, 1500, 1800, 2400, 3000,
3500, 4000, 4500, 5000, 5500],

'Bedrooms': [2, 3, 3, 4, 5,
5, 6, 6, 7, 7],

'Age': [10, 5, 8, 2, 1,
3, 2, 4, 1, 2],

'Distance': [5, 3, 4, 2, 1,
2, 1, 3, 2, 1],

'Price': [250, 320, 340, 450, 600,
650, 700, 720, 800, 850]
}

# Create DataFrame
df = pd.DataFrame(data)

# Display Dataset
print("\nDATASET\n")
print(df)

# --------------------------------------------
# Prepare Input and Output
# --------------------------------------------

# Independent variables
X = df[['Area', 'Bedrooms', 'Age', 'Distance']]

# Dependent variable
y = df['Price']

# --------------------------------------------
# Build Model
# --------------------------------------------

model = LinearRegression()

# Train model
model.fit(X, y)

# --------------------------------------------
# Predictions
# --------------------------------------------

y_pred = model.predict(X)

# --------------------------------------------
# Model Parameters
# --------------------------------------------

print("\nMODEL PARAMETERS\n")

print("Intercept:")
print(model.intercept_)

print("\nCoefficients:")
for feature, coef in zip(X.columns, model.coef_):
print(feature, ":", coef)

# --------------------------------------------
# Regression Equation
# --------------------------------------------

print("\nREGRESSION EQUATION\n")

print("Price = ")

for feature, coef in zip(X.columns, model.coef_):
print(f"({coef:.4f} * {feature}) + ")

print(model.intercept_)

# --------------------------------------------
# Evaluation Metrics
# --------------------------------------------

mse = mean_squared_error(y, y_pred)

r2 = r2_score(y, y_pred)

print("\nEVALUATION METRICS\n")

print("Mean Squared Error (MSE):", round(mse, 4))

print("R² Score:", round(r2, 4))

# --------------------------------------------
# Predict New House Price
# --------------------------------------------

new_house = pd.DataFrame({
'Area': [2800],
'Bedrooms': [4],
'Age': [2],
'Distance': [2]
})

predicted_price = model.predict(new_house)

print("\nNEW HOUSE PREDICTION\n")

print("Predicted House Price:",
round(predicted_price[0], 2))

# --------------------------------------------
# Visualization
# --------------------------------------------

plt.figure(figsize=(8,6))

plt.scatter(y, y_pred, color='blue')

plt.plot([y.min(), y.max()],
[y.min(), y.max()],
color='red')

plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")

plt.title("Actual vs Predicted Prices")

plt.grid(True)

plt.show()

# ============================================
# END OF PROGRAM
# ============================================

📊 Sample Output

DATASET

   Area  Bedrooms  Age  Distance  Price
0  1200         2   10         5    250
1  1500         3    5         3    320
2  1800         3    8         4    340
3  2400         4    2         2    450
4  3000         5    1         1    600
5  3500         5    3         2    650
6  4000         6    2         1    700
7  4500         6    4         3    720
8  5000         7    1         2    800
9  5500         7    2         1    850

MODEL PARAMETERS

Intercept:
151.9548847198012

Coefficients:
Area : 0.08786516356237986
Bedrooms : 36.348475116830265
Age : 0.9805984363579832
Distance : -19.51540386702398

REGRESSION EQUATION

Price = 
(0.0879 * Area) + 
(36.3485 * Bedrooms) + 
(0.9806 * Age) + 
(-19.5154 * Distance) + 
151.9548847198012

EVALUATION METRICS

Mean Squared Error (MSE): 406.4522
R² Score: 0.9901

NEW HOUSE PREDICTION

Predicted House Price: 506.3




📈 Interpretation

ObservationMeaning
Positive coefficient        Feature increases price
Negative coefficient        Feature decreases price
Low MSE        Better prediction
R² close to 1        Strong model

📉 Graph Explanation

Actual vs Predicted Plot

  • X-axis → Actual prices
  • Y-axis → Predicted prices
  • Points near red line → good prediction accuracy

🔍 Advantages of Multiple Linear Regression

  • Handles multiple features
  • Simple and interpretable
  • Fast training
  • Useful for prediction tasks

⚠️ Limitations

  • Assumes linear relationship
  • Sensitive to outliers
  • Multicollinearity affects performance

✅ Result

Multiple Linear Regression was successfully implemented using Scikit-Learn to predict house prices using multiple input variables.

The experiment demonstrated how multiple input variables can be used to predict house prices effectively using Multiple Linear Regression.

Comments

Popular posts from this blog

Machine Learning Lab PCCSL508 Semester 5 KTU CS 2024 Scheme - Dr Binu V P

Explore California Housing Dataset

Recommended Tools and Setup for Lab