Pandas Dataframe describe()
0 612
Exploring the Pandas describe() Method
The describe() method in Pandas is a powerful tool that provides a quick statistical summary of the numerical columns in a DataFrame. By default, it computes metrics like count, mean, standard deviation, and percentiles, offering valuable insights into your dataset.
Syntax
DataFrame.describe(percentiles=None, include=None, exclude=None)
Parameters:
percentiles: list of float, optional — Specifies which percentiles to include in the summary. The default is[.25, .5, .75].include: str or list of str, optional — Specifies which data types to include in the summary. The default isNone, meaning all numeric types are included.exclude: str or list of str, optional — Specifies which data types to exclude from the summary. The default isNone, meaning no types are excluded.
Example Usage
import pandas as pd
# Sample DataFrame
data = {
'Age': [25, 30, 35, 40, 22],
'Salary': [50000, 60000, 70000, 80000, 55000]
}
df = pd.DataFrame(data)
# Generate descriptive statistics
print(df.describe())
Output:
Age Salary
count 5.000000 5.0
mean 30.400000 63000.0
std 6.557438 11547.0
min 22.000000 50000.0
25% 25.000000 55000.0
50% 30.000000 60000.0
75% 35.000000 65000.0
max 40.000000 80000.0
Customizing the Summary
You can customize the describe() method to include specific percentiles or data types:
# Include 10th and 90th percentiles
print(df.describe(percentiles=[.1, .9]))
Output:
Age Salary
count 5.000000 5.0
mean 30.400000 63000.0
std 6.557438 11547.0
min 22.000000 50000.0
10% 23.000000 51000.0
25% 25.000000 55000.0
50% 30.000000 60000.0
75% 35.000000 65000.0
90% 38.000000 74000.0
max 40.000000 80000.0
Describing Categorical Data
To obtain a summary of categorical data, set the include parameter to 'object':
# Sample DataFrame with categorical data
data = {
'Department': ['HR', 'Engineering', 'Finance', 'Engineering', 'Marketing']
}
df = pd.DataFrame(data)
# Generate descriptive statistics for categorical data
print(df.describe(include='object'))
Output:
Department
count 5
unique 4
top Engineering
freq 2
Conclusion
The describe() method in Pandas is an essential tool for quickly understanding the distribution and central tendencies of your dataset. Whether you're working with numerical or categorical data, this method provides a concise summary that aids in data exploration and analysis.
If you’re passionate about building a successful blogging website, check out this helpful guide at Coding Tag – How to Start a Successful Blog. It offers practical steps and expert tips to kickstart your blogging journey!
For dedicated UPSC exam preparation, we highly recommend visiting www.iasmania.com. It offers well-structured resources, current affairs, and subject-wise notes tailored specifically for aspirants. Start your journey today!
Share:


Comments
Waiting for your comments