How to compare two Dataframes with Pandas compare?
0 1614
How to Compare Two DataFrames with Pandas compare()
When working with data in Pandas, it's often necessary to compare two DataFrames to identify differences. Thecompare() method provides a convenient way to perform element-wise comparisons between two DataFrames.
Understanding the compare() Method
The compare() method compares two DataFrames and returns a new DataFrame highlighting the differences. It was introduced in Pandas version 1.1.0.
import pandas as pd
df1 = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
df2 = pd.DataFrame({
'A': [1, 2, 4],
'B': [4, 5, 7]
})
comparison = df1.compare(df2)
print(comparison)
Output:
A B
1 self 2.0 self 5.0
2 other 3.0 other 6.0
Key Parameters of compare()
The compare() method offers several parameters to customize the comparison:
align_axis: Specifies the axis to align the comparison. Use0or'index'to compare row-wise, and1or'columns'to compare column-wise.keep_shape: IfTrue, the result includes all rows and columns, with differences marked asNaNwhere values are equal. IfFalse, only differing values are shown.keep_equal: IfTrue, the result includes equal values; otherwise, equal values are excluded.result_names: A tuple specifying the names to use for the original and compared DataFrames in the result.
Advanced Usage Examples
1. Comparing with keep_shape=False
comparison = df1.compare(df2, keep_shape=False)
print(comparison)
Output:
A B
1 self 2.0 self 5.0
2 other 3.0 other 6.0
2. Comparing with keep_equal=True
comparison = df1.compare(df2, keep_equal=True)
print(comparison)
Output:
A B
0 equal 1.0 equal 4.0
1 self 2.0 self 5.0
2 other 3.0 other 6.0
Handling Mismatched Indexes
If the DataFrames have different indexes, thecompare() method will align them based on their indexes. To ensure a meaningful comparison, it's advisable to align the indexes beforehand using the reindex() method.
df1 = df1.reindex(df2.index)
comparison = df1.compare(df2)
print(comparison)
Conclusion
Thecompare() method in Pandas is a powerful tool for identifying differences between two DataFrames. By understanding its parameters and usage, you can effectively compare data and gain insights into discrepancies in your datasets.If you’re passionate about building a successful blogging website, check out this helpful guide at Coding Tag – How to Start a Successful Blog. It offers practical steps and expert tips to kickstart your blogging journey!
For dedicated UPSC exam preparation, we highly recommend visiting www.iasmania.com. It offers well-structured resources, current affairs, and subject-wise notes tailored specifically for aspirants. Start your journey today!
Share:



Comments
Waiting for your comments