How to make Scatter Plot with Regression Line using Seaborn in Python?
0 189
Creating Scatter Plots with Regression Lines using Seaborn in Python
Understanding the relationship between two continuous variables is a fundamental aspect of data analysis. Scatter plots are a powerful tool for visualizing these relationships. When combined with a regression line, they can provide insights into the nature and strength of the correlation. Seaborn, a Python data visualization library built on Matplotlib, offers intuitive functions to create scatter plots with regression lines.
What is a Scatter Plot with a Regression Line?
A scatter plot displays individual data points on a two-dimensional plane, with each point representing an observation in the dataset. By fitting a regression line through these points, we can model the relationship between the variables. This line helps in understanding trends, making predictions, and identifying outliers.
Using Seaborn's regplot() Function
Seaborn's regplot()
function is designed to plot data and a linear regression model fit. Here's how you can use it:
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset
df = sns.load_dataset('tips')
# Create a scatter plot with a regression line
sns.regplot(x='total_bill', y='tip', data=df)
# Display the plot
plt.show()
This code will generate a scatter plot showing the relationship between the total bill and the tip amounts in the dataset, with a regression line fitted through the points.
Customizing the Plot
Seaborn allows for various customizations to enhance the scatter plot:
- Confidence Interval: By default,
regplot()
includes a 95% confidence interval around the regression line. To remove it, setci=None
. - Marker Style: Change the marker style using the
marker
parameter. For example,marker='o'
for circles,marker='x'
for crosses. - Line Color: Adjust the color of the regression line using the
line_kws
parameter. For instance,line_kws={'color': 'red'}
will make the line red. - Scatter Size: Modify the size of the scatter points using
scatter_kws
. For example,scatter_kws={'s': 50}
will increase the size of the points.
Here's an example with some customizations:
sns.regplot(x='total_bill', y='tip', data=df, ci=None, marker='o', line_kws={'color': 'red'}, scatter_kws={'s': 50})
plt.show()
Using Seaborn's lmplot() Function
Another function provided by Seaborn is lmplot()
, which combines regplot()
with FacetGrid to show multiple regression lines across different subsets of the dataset. This is useful when you want to explore how the relationship between variables changes across different categories.
sns.lmplot(x='total_bill', y='tip', data=df, hue='sex')
plt.show()
In this example, the regression lines are colored based on the 'sex' column, allowing for a comparison between male and female data points.
Conclusion
Seaborn's regplot()
and lmplot()
functions provide powerful tools for visualizing relationships between variables and fitting regression lines. By customizing these plots, you can gain deeper insights into your data and effectively communicate your findings.
For dedicated UPSC exam preparation, we highly recommend visiting www.iasmania.com. It offers well-structured resources, current affairs, and subject-wise notes tailored specifically for aspirants. Start your journey today!

Share:
Comments
Waiting for your comments