Pandas iloc
Cheatsheet for Machine Learning
The iloc
indexer in pandas is a powerful tool for data selection, slicing, and manipulation, essential for preparing datasets for machine learning tasks. Here's a comprehensive guide to help you master iloc
.
You can download the .ipynb file from here
Table of Contents
- Introduction to
iloc
- Basic Usage
- Selecting Rows
- Selecting Columns
- Advanced Indexing
- Slicing Rows and Columns
- Selecting Specific Rows and Columns
- Conditional Selection
- Modifying Data
- Practical Machine Learning Examples
- Splitting Data into Features and Target
- Handling Missing Data
- Data Normalization
- Oficial documentation
- Tutorial Videos
1. Introduction to iloc
The iloc
indexer is used for integer-location based indexing for selection by position. It is one of the primary indexers for Pandas data structures.
import pandas as pd
# Sample DataFrame
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'D': [13, 14, 15, 16]
}
df = pd.DataFrame(data)
print(df)
A B C D
0 1 5 9 13
1 2 6 10 14
2 3 7 11 15
3 4 8 12 16
2. Basic Usage
Selecting Rows
To select rows using iloc
, you specify the row index.
# Select the first row
print(df.iloc[0])
A 1
B 5
C 9
D 13
Name: 0, dtype: int64
# Select the first three rows
print(df.iloc[:3])
A B C D
0 1 5 9 13
1 2 6 10 14
2 3 7 11 15
Selecting Columns
To select columns, you specify the column index.
# Select the first column
print(df.iloc[:, 0])
0 1
1 2
2 3
3 4
Name: A, dtype: int64
# Select the first two columns
print(df.iloc[:, :2])
A B
0 1 5
1 2 6
2 3 7
3 4 8
3. Advanced Indexing
Slicing Rows and Columns
You can slice both rows and columns simultaneously.
# Select the first two rows and the first two columns
print(df.iloc[:2, :2])
A B
0 1 5
1 2 6
Selecting Specific Rows and Columns
Specify exact row and column indices.
# Select the first and third rows and the second and fourth columns
print(df.iloc[[0, 2], [1, 3]])
B D
0 5 13
2 7 15
4. Conditional Selection
Using iloc
in combination with conditions.
# Example DataFrame
df_cond = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
# Condition to select rows where column 'A' values are greater than 2
print(df_cond[df_cond['A'] > 2].iloc[:, [0, 2]]) # Select columns 'A' and 'C'
A C
2 3 300
3 4 400
4 5 500
5. Modifying Data
You can use iloc
to modify specific parts of the DataFrame.
# Set the value of the first cell to 0
df.iloc[0, 0] = 0
print(df)
# Set the values of the first column to 0
df.iloc[:, 0] = 0
print(df)
A B C D
0 0 5 9 13
1 2 6 10 14
2 3 7 11 15
3 4 8 12 16
A B C D
0 0 5 9 13
1 0 6 10 14
2 0 7 11 15
3 0 8 12 16
6. Practical Machine Learning Examples
Splitting Data into Features and Target
Separating features (X) and target (y) is a common task.
# Sample DataFrame with a target column
df_ml = pd.DataFrame({
'Feature1': [1, 2, 3, 4, 5],
'Feature2': [10, 20, 30, 40, 50],
'Target': [0, 1, 0, 1, 0]
})
# Features (all rows, all columns except the last one)
X = df_ml.iloc[:, :-1]
# Target (all rows, last column)
y = df_ml.iloc[:, -1]
print("Features:\n", X)
print("Target:\n", y)
Features:
Feature1 Feature2
0 1 10
1 2 20
2 3 30
3 4 40
4 5 50
Target:
0 0
1 1
2 0
3 1
4 0
Name: Target, dtype: int64
Handling Missing Data
Using iloc
to handle missing data by selecting specific parts of the DataFrame.
# Sample DataFrame with missing values
df_missing = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [5, None, 7, 8],
'C': [None, 10, 11, 12]
})
# Fill missing values in the first two columns with 0
df_missing.iloc[:, :2] = df_missing.iloc[:, :2].fillna(0)
print(df_missing)
A B C
0 1.0 5.0 NaN
1 2.0 0.0 10.0
2 0.0 7.0 11.0
3 4.0 8.0 12.0
Data Normalization
Using iloc
to normalize data.
from sklearn.preprocessing import MinMaxScaler
# Sample DataFrame for normalization
df_norm = pd.DataFrame({
'Feature1': [1, 2, 3, 4, 5],
'Feature2': [10, 20, 30, 40, 50]
})
scaler = MinMaxScaler()
# Normalize the first two columns
df_norm.iloc[:, :2] = scaler.fit_transform(df_norm.iloc[:, :2])
print(df_norm)
Feature1 Feature2
0 0.00 0.00
1 0.25 0.25
2 0.50 0.50
3 0.75 0.75
4 1.00 1.00
Certainly! Here are some references to official documentation and YouTube videos that can help you learn more about using the iloc
indexer in pandas for machine learning:
Official Documentation
-
Pandas Documentation on Indexing and Selecting Data:
- Pandas Official Documentation - Indexing and Selecting Data
- This section of the pandas documentation provides comprehensive details on various indexing methods, including
iloc
.
-
Pandas API Reference for
iloc
:- Pandas API Reference - iloc
- This page contains detailed information about the
iloc
property and its usage.
Tutorial Videos
-
Corey Schafer - Python Pandas DataFrame Tutorial:
- Selecting Rows and Columns from a Pandas DataFrame
- This playlist covers various methods to select rows and columns in pandas DataFrames, including the use of
iloc
.
-
Data School - How do I select a subset of a DataFrame:
- Data School - Pandas iloc
- Data School provides an in-depth tutorial on selecting subsets of DataFrames using
iloc
.
-
Getting Started with Data Analysis:
- Pandas DataFrames in Python
- This video explains the basics of pandas DataFrames and covers various indexing techniques including
iloc
.
-
Pandas Tutorial:
- Pandas Tutorial (Data Analysis with Python)
- A comprehensive tutorial on pandas covering many aspects including data selection and manipulation using
iloc
.
These resources should provide you with a strong foundation for understanding and utilizing iloc
in pandas for your machine learning projects.
Conclusion
The iloc
indexer is a versatile and powerful tool for data manipulation in pandas, especially useful in the preprocessing stages of machine learning. Mastering iloc
allows for efficient and precise data selection and modification, essential for building robust machine learning models.