Fit line scatter plot python


Regression - How to program the Best Fit Line

Welcome to the 9th part of our machine learning regression tutorial within our Machine Learning with Python tutorial series. We've been working on calculating the regression, or best-fit, line for a given dataset in Python. Previously, we wrote a function that will gather the slope, and now we need to calculate the y-intercept. Our code up to this point:

from statistics import mean
import numpy as np

xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)

def best_fit_slope(xs,ys):
    m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
         ((mean(xs)*mean(xs)) - mean(xs*xs)))
    return m

m = best_fit_slope(xs,ys)
print(m)

As a reminder, the calculation for the best-fit line's y-intercept is:

Fit line scatter plot python

This one will be a bit easier than the slope was. We can save a few lines by incorporating this into our other function. We'll rename it to best_fit_slope_and_intercept.

Next, we can fill in: b = mean(ys) - (m*mean(xs)), and return m and b:

def best_fit_slope_and_intercept(xs,ys):
    m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
         ((mean(xs)*mean(xs)) - mean(xs*xs)))
    
    b = mean(ys) - m*mean(xs)
    
    return m, b

Now we can call upon it with: m, b = best_fit_slope_and_intercept(xs,ys)

Our full code up to this point:

from statistics import mean
import numpy as np

xs = np.array([1,2,3,4,5], dtype=np.float64)
ys = np.array([5,4,6,5,6], dtype=np.float64)

def best_fit_slope_and_intercept(xs,ys):
    m = (((mean(xs)*mean(ys)) - mean(xs*ys)) /
         ((mean(xs)*mean(xs)) - mean(xs*xs)))
    
    b = mean(ys) - m*mean(xs)
    
    return m, b

m, b = best_fit_slope_and_intercept(xs,ys)

print(m,b)

Output should be: 0.3 4.3

Now we just need to create a line for the data:

Fit line scatter plot python

Recall that y=mx+b. We could make a function for this... or just knock it out in a single 1-liner for loop:

regression_line = [(m*x)+b for x in xs]

The above 1-liner for loop is the same as doing:

regression_line = []
for x in xs:
    regression_line.append((m*x)+b)

Great, let's reap the fruits of our labor finally! Add the following imports:

import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')

This will allow us to make graphs, and make them not so ugly. Now at the end:

plt.scatter(xs,ys,color='#003F72')
plt.plot(xs, regression_line)
plt.show()

First we plot a scatter plot of the existing data, then we graph our regression line, then finally show it. If you're not familiar with , you can check out the Data Visualization with Python and Matplotlib tutorial series.

Output:

Fit line scatter plot python

Congratulations for making it this far! So, how might you go about actually making a prediction based on this model you just made? Simple enough, right? You have your model, you just fill in x. For example, let's predict out a couple of points:

predict_x = 7

We have our input data, our "feature" so to speak. What's the label?

predict_y = (m*predict_x)+b
print(predict_y)

Output: 6.4

We can even graph it:

predict_x = 7
predict_y = (m*predict_x)+b

plt.scatter(xs,ys,color='#003F72',label='data')
plt.plot(xs, regression_line, label='regression line')
plt.legend(loc=4)
plt.show()

Output:

Fit line scatter plot python

We now know how to create our own models, which is great, but we're stilling missing something integral: how accurate is our model? This is the topic for discussion in the next tutorial!

The next tutorial:


Visualization and understanding with python

One of my favorite and niche chart is scatterplot! If we are in the field of Data Science and have a vast range of statistical analyses to perform, then scatterplot is our friendly one. Scatterplots are extremely useful to focus on the relationship between two numeric, quantitative series, and a common one in both technical and non-technical fields.

What is a scatterplot?

A scatterplot shows the relationship between quantitative variables using the X and Y-axis. These plots are often used to understand data than to communicate with. Unlike Line plots, Scatterplots show dots to focus on individual data points. Scatterplots are best used to:
1.Unveil any patterns
2. Find the relationship between two sets of data

Read a Scatterplot

While using a scatterplot, we have to use data wisely for our audience. We may need to break the data to explain how to read it.

1. Scanning of each axis: When data contains multiple variables it may difficult for our audience to determine which variable represents which axis.

2. Visualise section wise: We can create sections by grouping the points into quadrants. This is an important aspect to look at the natural breaks and groupings exist. This will help us to make sense of the comparison.

3. To identify the Shape: While plotting it’s better to summarize the individual points into a unified shape. Some questions we have to ask ourselves like:
i) Are all the dots are moving in the same direction?
ii) Is it like an exponential curve?
iii)Do the dots are increasing with my eyes along the axis?
Now, we shall try to explore the patterns of weight to height ratio from a database using Python, Pandas, and Jupyter Notebook to understand Scatterplot visually.

Fit line scatter plot python

In the above notebook, we are using Dataset to understand the height (y-axis) depending on the weight(x-axis) of a team of school students.

Fit line scatter plot python

Best Fit Line

The line of best fit or best-fit line(“trend” line), is a straight line that may pass through the center of the data points, none of the points, or all of the points.on the scatterplot.

As we know that the equation of a straight line is :

y = mx + b

where m is the slope of the line and b is the y-intercept
we already have our X and y values, so now we need to calculate m and b. The formulas for these can be written as:

m = ( ((mean(x)*mean(y))- mean(x*y))/
((mean(x)*mean(x))- mean(x*x)))

b =b = mean(y)-m*mean(x)

Denominator is

denom=X.dot(X) — X.mean()*X.sum()

Fit line scatter plot python

The above green line passes through the data points is called the Best fit line of the data points.

Conclusion: While analyzing a process, the line through the data points may be controversial. The implementing line may generate confusion if the trend of underlying data is ambiguous.

How do you fit a line on a scatter plot in Python?

How to plot a line of best fit in Python.
x = np. array([1, 3, 5, 7]).
y = np. array([ 6, 3, 9, 5 ]).
m, b = np. polyfit(x, y, 1) m = slope, b = intercept..
plt. plot(x, y, 'o') create scatter plot..
plt. plot(x, m*x + b) add line of best fit..

How do you fit a graph in Python?

data = dataframe. values. ... .
x, y = data[:, 4], data[:, -1] # curve fit..
popt, _ = curve_fit(objective, x, y) # summarize the parameter values..
print('y = %.5f * x + %.5f' % (a, b)) # plot input vs output..
pyplot. scatter(x, y) ... .
x_line = arange(min(x), max(x), 1) ... .
y_line = objective(x_line, a, b).

How do you fit a regression line in Python?

Multiple Linear Regression With scikit-learn.
Steps 1 and 2: Import packages and classes, and provide data. First, you import numpy and sklearn.linear_model.LinearRegression and provide known inputs and output: ... .
Step 3: Create a model and fit it. ... .
Step 4: Get results. ... .
Step 5: Predict response..