How to make a time series plot with Rolling Average in Python? - Seaborn
0 1164
Creating a Time Series Plot with Rolling Average in Python
Time series analysis is essential for understanding trends and patterns in data collected over time. However, raw time series data can be noisy, making it challenging to discern underlying trends. Applying a rolling average (or moving average) helps smooth out short-term fluctuations and highlights longer-term trends. In this tutorial, we'll learn how to create a time series plot with a rolling average using Python's Pandas and Seaborn libraries.
Understanding Rolling Average
A rolling average is a statistical method used to analyze time series data by calculating averages over a moving window of data points. This technique helps smooth out short-term fluctuations and highlight longer-term trends or cycles. In Pandas, the rolling average can be computed using the rolling() function followed by mean(). For example:
df['Rolling_Avg'] = df['Data'].rolling(window=7).mean()
Here, window=7 specifies a 7-day rolling window. The first six values will be NaN because there aren't enough data points to compute the average.
Step-by-Step Guide
Let's walk through the process of creating a time series plot with a rolling average:
Step 1: Import Libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Step 2: Load the Dataset
For this example, we'll use a dataset containing daily female births in California for the year 1959. You can download the dataset from here.
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv'
df = pd.read_csv(url, parse_dates=['Date'], index_col='Date')
df.head()
Step 3: Plot the Original Time Series
Before applying the rolling average, let's visualize the original data:
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x=df.index, y='Births')
plt.title('Daily Female Births in California (1959)')
plt.xlabel('Date')
plt.ylabel('Number of Births')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
This plot shows the daily number of female births, which exhibits significant fluctuations.
Step 4: Compute the Rolling Average
Now, let's compute a 7-day rolling average to smooth the data:
df['7-day Rolling Avg'] = df['Births'].rolling(window=7).mean()
Step 5: Plot the Rolling Average
Next, we'll plot both the original data and the rolling average on the same graph:
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x=df.index, y='Births', label='Original Data')
sns.lineplot(data=df, x=df.index, y='7-day Rolling Avg', label='7-day Rolling Average', color='red')
plt.title('Daily Female Births with 7-day Rolling Average')
plt.xlabel('Date')
plt.ylabel('Number of Births')
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()
The red line represents the 7-day rolling average, which smooths out the daily fluctuations and makes the underlying trend more apparent.
Customizing the Rolling Average
You can adjust the window size of the rolling average to suit your data:
- Shorter Window: A smaller window (e.g., 3 days) responds more quickly to changes but may still be noisy.
- Larger Window: A larger window (e.g., 30 days) provides a smoother trend but may lag behind rapid changes.
To apply a different window size, simply change the window parameter:
df['30-day Rolling Avg'] = df['Births'].rolling(window=30).mean()
Conclusion
Applying a rolling average to time series data is a powerful technique for smoothing out short-term fluctuations and highlighting longer-term trends. By using Pandas and Seaborn, you can easily compute and visualize rolling averages to gain better insights into your data. Experiment with different window sizes to find the best balance between smoothness and responsiveness for your specific dataset.
For dedicated UPSC exam preparation, we highly recommend visiting www.iasmania.com. It offers well-structured resources, current affairs, and subject-wise notes tailored specifically for aspirants. Start your journey today!
Share:



Comments
Waiting for your comments