Pandas Dataframe describe()

0 856

Exploring the Pandas describe() Method

The describe() method in Pandas is a powerful tool that provides a quick statistical summary of the numerical columns in a DataFrame. By default, it computes metrics like count, mean, standard deviation, and percentiles, offering valuable insights into your dataset.

Syntax

DataFrame.describe(percentiles=None, include=None, exclude=None)

Parameters:

percentiles: list of float, optional â€” Specifies which percentiles to include in the summary. The default is [.25, .5, .75].
include: str or list of str, optional â€” Specifies which data types to include in the summary. The default is None, meaning all numeric types are included.
exclude: str or list of str, optional â€” Specifies which data types to exclude from the summary. The default is None, meaning no types are excluded.

Example Usage

import pandas as pd

# Sample DataFrame
data = {
    'Age': [25, 30, 35, 40, 22],
    'Salary': [50000, 60000, 70000, 80000, 55000]
}

df = pd.DataFrame(data)

# Generate descriptive statistics
print(df.describe())

Output:

          Age   Salary
count   5.000000      5.0
mean   30.400000  63000.0
std     6.557438   11547.0
min    22.000000  50000.0
25%    25.000000  55000.0
50%    30.000000  60000.0
75%    35.000000  65000.0
max    40.000000  80000.0

Customizing the Summary

You can customize the describe() method to include specific percentiles or data types:

# Include 10th and 90th percentiles
print(df.describe(percentiles=[.1, .9]))

Output:

          Age   Salary
count   5.000000      5.0
mean   30.400000  63000.0
std     6.557438   11547.0
min    22.000000  50000.0
10%    23.000000  51000.0
25%    25.000000  55000.0
50%    30.000000  60000.0
75%    35.000000  65000.0
90%    38.000000  74000.0
max    40.000000  80000.0

Describing Categorical Data

To obtain a summary of categorical data, set the include parameter to 'object':

# Sample DataFrame with categorical data
data = {
    'Department': ['HR', 'Engineering', 'Finance', 'Engineering', 'Marketing']
}

df = pd.DataFrame(data)

# Generate descriptive statistics for categorical data
print(df.describe(include='object'))

Output:

          Department
count              5
unique             4
top       Engineering
freq               2

Conclusion

The describe() method in Pandas is an essential tool for quickly understanding the distribution and central tendencies of your dataset. Whether you're working with numerical or categorical data, this method provides a concise summary that aids in data exploration and analysis.

If youâ€™re passionate about building a successful blogging website, check out this helpful guide at Coding Tag â€“ How to Start a Successful Blog. It offers practical steps and expert tips to kickstart your blogging journey!

For dedicated UPSC exam preparation, we highly recommend visiting www.iasmania.com. It offers well-structured resources, current affairs, and subject-wise notes tailored specifically for aspirants. Start your journey today!