Pearson Correlation Coefficient – A Simple Guide & Examples

Time to read: 6 Minutes
Pearson-Correlation-Coefficient-Definition

The best and most common method for measuring linear correlation is using the Pearson correlation coefficient.

This is a significant formula in experimental research.

This article will provide an in-depth guide for using this method, with examples.1

Pearson Correlation Coefficient – In a Nutshell

  • The Pearson correlation coefficient is an expressive statistic that measures the strength between diverse variables and how they relate.
  • In simpler terms, the Pearson correlation coefficient recaps the features of a dataset.
  • This article gives insight into the various types of Pearson correlation coefficients
  • It also outlines the steps of how to calculate the Pearson correlation coefficient.

Definition: Pearson correlation coefficient

The Pearson correlation coefficient is an expressive statistic that measures the strength between diverse variables and how they relate.

In simpler terms, it recaps the features of a dataset.2

The Pearson correlation coefficient is also known as:

  • Bivariate correlation
  • The correlation coefficient
  • Pearson’s r
  • (PPMCC) Pearson product-moment correlation coefficient

Its formula is:

Types of Pearson correlation coefficients

The Pearson correlation coefficient is a digit between -1 and 1 that calculates the strength and course of the affiliation between two variables. The table below provides a vivid explanation.

Pearson correlation coefficient (r) Correlation type Interpretation Example
Between 0 and 1 Positive correlation A change in one variable triggers a change in the other in the same direction Height and weight of a person:
The taller a person gets, the heavier they weigh
0 No correlation The variables are not affiliated Cost of shoes and width of cars:
The price of shoes will not influence the width of your cars and vice versa.
Between 0 and -1 Negative correlation A change in one variable triggers a change in the other in the opposite direction Elevation and temperature:
The higher you go, the lower the temperature

Positive correlation

Pearson-correlation-coefficient-positive

Negative correlation

Pearson correlation coefficient negative

No correlation

Pearson-correlation-coefficient-no-correlation

The effect size (relationship strength) interpretation may vary depending on the discipline. However, the following standard rules still apply.

Pearson correlation coefficient (r) value Strength Direction
Higher than .5 Strong Positive
.3 to .5 Moderate Positive
0 to .3 Weak Positive
0 None None
0 to -.3 Weak Negative
-.3 to -.5 Moderate Negative
Below -.5 Strong Negative3

Besides descriptive statistics, the Pearson correlation coefficient can also be used for testing statistical hypotheses because it is an inferential statistic.

Visualizing the Pearson correlation coefficient

You can visualize Pearson’s r as a measure of how close the observations in experimental research are to a line of best fit. Also, it tells you whether the slope of the line of best fit is positive or negative.

For example, if the slope is negative, the r is negative, and vice versa.

The line of best fit is when r is 1 or -1.

Pearson correlation coefficient vs. Spearman’s rank correlation coefficients

Besides the Pearson correlation coefficient, another popular correlation coefficient is Spearman’s rank correlation coefficient.

It is a go-to method when at least one of the following characteristics is true:

  • The variables are ordinal
  • The variables are not distributed normally
  • The data features outliers
  • The variables have a non-linear or monotone relationship

Pearson correlation coefficient: Calculation

The formula for calculating the Pearson correlation coefficient is:4

While the formula is easy to use, you can apply software tools like R or Excel to help you calculate the Pearson correlation coefficient.

Example:

You are researching the relationship between the weight and length of newborn babies and have data from 10 babies born within the last four weeks at a local clinic. After translating the imperial dimensions to metrics, you enter the data in this table:3

Weight (kg) Length (cm)
3.33 52.9
3.63 53.2
3.02 49.7
3.82 48.4
3.59 54.9
3.42 54.2
2.87 43.7
3.36 54.4
3.03 47.2
3.46 45.25

Step 1: Calculating the sums of x and y

The first step is renaming the variables from weight and length to and . Next, add up all the  and values as indicated in the formula by the symbol (take the sum of).

Example:

Weight =

Length =

 

Therefore:

 

Step 2: Calculating x2 and y2 and the respective sums

Next, create two new columns containing the squares of the values in  and . Then, calculate the sums of the new columns.

Example:

3.33 52.9 11.09 2798.4
3.63 53.2 13.18 2819.6
3.02 49.7 9.12 2470.1
3.82 48.4 14.59 2342.6
3.59 54.9 12.89 3014
3.42 54.2 11.7 2937.6
2.87 43.7 8.24 1909.7
3.36 54.4 11.29 2959.4
3.03 47.2 9.18 2227.8
3.46 45.2 11.97 2043

 





Step 3: Calculating the cross product and its sum

Finally, create a column with the products of x and y and name it the cross product. Then, calculate the sum of the new column.

Example:

3.33 52.9 11.09 2798.4 176.16
3.63 53.2 13.18 2819.6 193.12
3.02 49.7 9.12 2470.1 150.1
3.82 48.4 14.59 2342.6 184.9
3.59 54.9 12.89 3014 197.1
3.42 54.2 11.7 2937.6 185.4
2.87 43.7 8.24 1909.7 125.4
3.36 54.4 11.29 2959.4 182.8
3.03 47.2 9.18 2227.8 143
3.46 45.2 11.97 2043 156.4

 


Step 4: Calculating Pearson correlation coefficient r

Use the formula above and the figures for each section to calculate the Pearson correlation coefficient.

Example:






 

 

Pearson correlation coefficient: Significance test

You can use the Pearson correlation coefficient to test if the relationship between two variables is significant.

For instance, if the Pearson correlation coefficient of the sample is r, then it is an estimate of rho, which is the correlation of the population. Therefore, determining the r and n (sample size) can help deduce if the rho is meaningfully different from 0.

  • Null hypothesis
  • Alternative hypothesis

You can use tools like the R or Strata software to test the hypothesis. Alternatively, you can follow these three steps:

Step 1: Calculating the t value

Calculating the t value is as easy as the following formula:5

Example:

The weight and length of 10 babies have a Pearson correlation coefficient of 0.51. So, and

Therefore, using the formula above,

Step 2: Finding the critical value of t

You can find the t in a table that will need the following facts:

  • The degree of freedom (df) (calculated using the formula: ()
  • Significance level α: Which is usually 0.05
  • One-tailed or two-tailed: Two-tailed is the right option for correlations

Example:

For a two-tailed significance test at and , the critical value is 1.86.

Step 3: Comparing the t value to the critical value

Then, determine if the absolute t value is greater than the critical value. Note that “absolute” implies that you should disregard the minus sign if the t value is negative.

Example:

And critical value

Therefore: the t value is less than the critical value of

Step 4: Deciding whether to reject the null hypothesis

  • If the t value is larger than , the relationship is significant (p ˂ α). This information allows you to reject the null hypothesis and support the alternative hypothesis.
  • If the t value is less than , the relationship is insignificant (p ˃ α). This does not allow you to reject the null hypothesis or support the alternative hypothesis.

Example:

In our previous example, the correlation between newborns’ height and weight, the t value, is less than the . Therefore, we do not reject the null hypothesis that the coefficient of the population of p is 0.(6)

Pearson correlation coefficient in a thesis

The Pearson correlation coefficient usually comes up in the results section of an academic paper or thesis. Apply the rules below if you want to report in APA style:3

  • No need for a reference
  • Italicize r
  • Include a leading zero before the decimal point
  • Provide two significant digits after the decimal point

FAQs

It is calculated using the formula below:4

You should use this method in inferential statistics or quantitative statistics. You can also use it to test correlations between two variants.

It helps test the relationship between two variants. It also helps determine the course of change if either variant is altered.4

It isn’t easy to calculate. However, if you master the formula, you should be okay.4

Sources

1 Thakur, Madhuri. “Pearson Correlation Coefficient Formula.” EDUCBA. Accessed November 15, 2022. https://www.educba.com/pearson-correlation-coefficient-formula/.

2 Zach. “How to Calculate a Pearson Correlation Coefficient by Hand.” Statology. November 30, 2020. https://www.statology.org/correlation-coefficient-by-hand/.

3 QuestionPro. “Pearson correlation coefficient: Introduction, formula, calculation, and examples.” Accessed November 15, 2022. https://www.questionpro.com/blog/pearson-correlation-coefficient/.

4 Srivastav, Ashish Kumar. “Pearson Correlation Coefficient.” WallStreetMojo. Accessed November 15, 2022. https://www.wallstreetmojo.com/pearson-correlation-coefficient/.

5 Hartin, Travis. “Calculating and Interpreting the Correlation Coefficient.” Study.com. Accessed November 29, 2021. https://study.com/learn/lesson/pearson-correlation-formula-examples.html.

6 M&E studies. “How to Use Pearson Correlation.” Accessed November 15, 2022. http://mnestudies.com/research/how-use-pearson-correlation.