How to compare two Dataframes with Pandas compare?
×


How to compare two Dataframes with Pandas compare?

1614

How to Compare Two DataFrames with Pandas compare()

When working with data in Pandas, it's often necessary to compare two DataFrames to identify differences. The compare() method provides a convenient way to perform element-wise comparisons between two DataFrames.

Understanding the compare() Method

The compare() method compares two DataFrames and returns a new DataFrame highlighting the differences. It was introduced in Pandas version 1.1.0.

import pandas as pd

df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

df2 = pd.DataFrame({
    'A': [1, 2, 4],
    'B': [4, 5, 7]
})

comparison = df1.compare(df2)
print(comparison)
Output:
     A      B
1  self  2.0  self  5.0
2  other  3.0  other  6.0

Key Parameters of compare()

The compare() method offers several parameters to customize the comparison:

  • align_axis: Specifies the axis to align the comparison. Use 0 or 'index' to compare row-wise, and 1 or 'columns' to compare column-wise.
  • keep_shape: If True, the result includes all rows and columns, with differences marked as NaN where values are equal. If False, only differing values are shown.
  • keep_equal: If True, the result includes equal values; otherwise, equal values are excluded.
  • result_names: A tuple specifying the names to use for the original and compared DataFrames in the result.

Advanced Usage Examples

1. Comparing with keep_shape=False

comparison = df1.compare(df2, keep_shape=False)
print(comparison)
Output:
     A      B
1  self  2.0  self  5.0
2  other  3.0  other  6.0

2. Comparing with keep_equal=True

comparison = df1.compare(df2, keep_equal=True)
print(comparison)
Output:
     A      B
0  equal  1.0  equal  4.0
1  self  2.0  self  5.0
2  other  3.0  other  6.0

Handling Mismatched Indexes

If the DataFrames have different indexes, the compare() method will align them based on their indexes. To ensure a meaningful comparison, it's advisable to align the indexes beforehand using the reindex() method.

df1 = df1.reindex(df2.index)
comparison = df1.compare(df2)
print(comparison)

Conclusion

The compare() method in Pandas is a powerful tool for identifying differences between two DataFrames. By understanding its parameters and usage, you can effectively compare data and gain insights into discrepancies in your datasets.



If you’re passionate about building a successful blogging website, check out this helpful guide at Coding Tag – How to Start a Successful Blog. It offers practical steps and expert tips to kickstart your blogging journey!

For dedicated UPSC exam preparation, we highly recommend visiting www.iasmania.com. It offers well-structured resources, current affairs, and subject-wise notes tailored specifically for aspirants. Start your journey today!


Best WordPress Hosting


Share:


Discount Coupons

Unlimited Video Generation

Best Platform to generate videos

Search and buy from Namecheap

Secure Domain for a Minimum Price



Leave a Reply


Comments
    Waiting for your comments

Coding Tag WhatsApp Chat