When cleaning data in Pandas, we often need to remove columns. While dropping by name (df.drop(columns=['Name'])) is common, there are many scenarios where it’s more efficient to drop columns by their index position.
This is particularly useful when: 1. You have dozens of columns and don’t want to type their names. 2. Your columns have names that change frequently, but their order remains the same. 3. You want to remove the “last X” columns of a dataset.
The Core Concept: iloc and columns
To drop columns by index, we first need to identify the column names at those specific positions using the iloc or columns property.
1. Dropping a Single Column by Index
Let’s say we want to remove the 3rd column (index 2).
import pandas as pd
df = pd.read_excel("data.xlsx")
# Get the name of the column at index 2
col_to_drop = df.columns[2]
# Drop it
df = df.drop(columns=[col_to_drop])
2. Dropping a Range of Columns
A more powerful technique is to use Python’s slicing notation. If you want to drop the first 3 columns:
# Get names of columns 0, 1, and 2
cols_to_drop = df.columns[:3]
df = df.drop(columns=cols_to_drop)
3. Removing the “Last X” Columns
This is a very common task in data cleaning, especially when Excel files have empty columns at the end. In Python, we can use negative indexing to refer to the end of the list.
To remove the last 2 columns:
# Get the names of the last two columns
last_two = df.columns[-2:]
df = df.drop(columns=last_two)
A “One-Liner” using iloc
Alternatively, you can keep only the columns you want using iloc. This is often cleaner:
# Keep everything from the start up to the last two columns
df = df.iloc[:, :-2]
The : before the comma means “keep all rows.” The :-2 after the comma means “keep all columns except the last two.”
Summary
| Goal | Code Snippet |
|---|---|
| Drop by Name | df.drop(columns=['ColName']) |
| Drop by Index | df.drop(df.columns[i], axis=1) |
| Drop First 3 | df.iloc[:, 3:] (Keeps from 3 onwards) |
| Drop Last 2 | df.iloc[:, :-2] |
Conclusion
Knowing how to manipulate DataFrames by index allows you to write more flexible and automated data cleaning scripts. Whether you’re dealing with raw Excel dumps or scraped web data, mastering iloc and drop is essential for any aspiring data scientist.
Related Posts:
Written by
Abdur-Rahmaan Janhangeer
Chef
Python author of 7+ years having worked for Python companies around the world
Suggested Posts
How to Prepare Excel Files for Data Analysis with Pandas
Excel is the world’s most popular data tool, but for advanced analysis and machine learning, we ofte...
Data Scaling Techniques in Machine Learning
Data and its quality affect machine learning models and their accuracy, and the quality of the data ...
Measures in Statistics for Data Science
Statistics is a critical component of data science and machine learning algorithms. Almost all the m...