Explore Californoa Housing Dataset
California Housing Dataset
🔹 Dataset Characteristics
| Feature | Description |
|---|---|
| Number of Instances | 20,640 |
| Number of Attributes | 8 numerical predictive attributes + 1 target + 1 categorical |
| Target Variable | Median House Value |
🔹 Context
This dataset is used in the book
“Hands-On Machine Learning with Scikit-Learn and TensorFlow” by Aurélien Géron.
It is widely used as an introductory dataset for machine learning because:
-
Requires basic preprocessing
-
Has clear and interpretable features
-
Is moderate in size (not too small, not too large)
🔹 Dataset Description
The dataset contains information about housing in California districts based on the 1990 U.S. Census.
Each row represents a census block group, which is:
-
The smallest geographical unit used by the census
-
Typically contains 600 to 3,000 people
🔹 Features (Attributes)
| No. | Attribute | Description |
|---|---|---|
| 1 | longitude | How far west a house is (higher = farther west) |
| 2 | latitude | How far north a house is (higher = farther north) |
| 3 | housing_median_age | Median age of houses in a block |
| 4 | total_rooms | Total number of rooms in a block |
| 5 | total_bedrooms | Total number of bedrooms in a block |
| 6 | population | Total population in the block |
| 7 | households | Number of households in the block |
| 8 | median_income | Median income (in tens of thousands of USD) |
| 9 | median_house_value | Median house value (target variable) |
| 10 | ocean_proximity | Categorical feature (distance from ocean) |
🎯 Target Variable
-
median_house_value
-
Represents housing prices in hundreds of thousands of dollars
-
Example: value = 2.5 → $250,000
-
⚠️ Important Notes
-
Dataset is not fully cleaned
-
May require:
-
Handling missing values (e.g.,
total_bedrooms) -
Encoding categorical variable (
ocean_proximity) -
Feature scaling
-
🔍 Additional Insights
-
A household = group of people living in one home
-
Some blocks may have:
-
Few households
-
Many rooms (e.g., vacation areas)
-
-
This can lead to unusual feature values
📥 Dataset File
-
File name:
housing.csv -
Source: California Housing Dataset (1990 Census)
Download File : housing.csv
Comments
Post a Comment