Linear Regression In R ~ A Guide With Examples

Linear regression is a foundational concept in statistics and applies a straight line to model the correlation between variables. Using R and a proper programming language ensures a solid analysis and effective depiction of a linear regression visual. R offers a variety of add-ons and already built-in functions that are suitable for graphic implementations and statistical tests. Learn more in this article.

Index

Inhaltsverzeichnis

1 Linear Regression in R – In a Nutshell
2 Definition: Linear regression in R
3 Linear regression in R: Getting started
4 Step 1 of simple linear regression in R: Loading data
5 Step 2 of linear regression in R: Assumptions
6 Step 3 of linear regression in R: Analysis
7 Step 4 of linear regression in R: Homoscedasticity
8 Step 5 of linear regression in R: Visualize
9 Step 6 of linear regression in R: Report
10 FAQs

Linear Regression in R – In a Nutshell

Linear regression models straight lines to describe the link between variables.
Linear regression in R is a method to find the line of best fit.
The value of the regression coefficient minimizes the model’s total error.
The two primary types of linear regression are simple and multiple linear regression.

Definition: Linear regression in R

Linear regression is a method of supervised learning algorithm used to predict a continuous dependent variable in research data based on the independent variable values.

The two primary forms of linear regression are:

Simple linear regression
Multiple linear regression

Datasets for simple linear regression in R

The first dataset features observations about adult incomes ranging from $20k to $80k and the satisfaction scales rated from 1 to 10 in an imaginary sample of 400 individuals. The income values are divided by 10,000 for the income data to match the satisfaction scales. Therefore, $1 represents $10,000, and so on.

Dataset for multiple linear regression in R

The second set of data features observations of the percentage of people that drink alcohol, have ulcers, and drive to work every day in an imaginary sample of 500 cities.

Linear regression in R: Getting started

The first step of running linear regression in R is downloading R and RStudio software. Next, open the software and click on File, New File, then R script.

As you proceed from one step to the next, copy and paste the code in the text boxes directly into your script. Then, run the code by highlighting specific lines and clicking on “Run” or pressing “ctrl” + “enter”.

Run the code below to install the first package needed for your analysis:

Install.packages(“ggplot2”)

Install.pacjages(“dplyr”)

Install.packages(“broom”)

Install.packages(“ggpubr”)

Finally, run this code to load the packages into your R environment (do this each time you restart R):

Library(ggplot2)

Library(dplyr)

Library(broom)

Library(ggpubr)

Step 1 of simple linear regression in R: Loading data

Follow the linear regression in R steps below to load your data into R:

Go to File, Import Data Set, then choose From Text (In RStudio)
Select your data file and the import dataset window will show up
The data frame window will display an X column that lists the data for each of your variables
Finish by clicking on “import”

Use summary () to check if the loaded data has been read correctly.

Simple regression

Use this code to see if the simple regression dataset has been correctly loaded:

summary(income.data).

The variables in our dataset are both quantitative, so this function will provide a table with a numeric data summary that tells the minimum, mean, median, and highest independent variable incomes (income). Also, the dependent variables (satisfaction).

Multiple regression

Use this code to check if the multiple regression dataset has been correctly loaded:

summary(heart.data).

Running this function will yield a numeric summary of the data for the independent variables, which are drinking and driving, and the dependent one, ulcers.

Step 2 of linear regression in R: Assumptions

Using R, you can check if your data meets R’s four key assumptions in linear regression. The assumptions in linear regression in R are:

Independence of observations
Normality
Linearity
Homogeneity of variance

Simple regression

Simple regression: Independence of observations

Since there is one independent and one dependent variable, you need not test for any hidden correlations between the variables. So, if there is an autocorrelation of variables, then you need not perform simple linear regression in r. Instead, use a constructed model like a linear mixed-effects model.

Simple regression: Normality

The hist() function will help you check if the dependent variable follows a normal data distribution.

Example

hist(income.data&satisfaction)

Simple regression: Linearity

You can test the linearity using a scatter plot. If the points of distribution can be described with a straight line, then there is linearity.

Example

Plot(satisfaction ~ income, data = income.data)

Simple regression: Homoscedasticity

This linear regression in the r assumption refers to the homogeneity of variance. It means that the prediction error does not significantly change over the model’s prediction range. You can confirm this assumption after fitting the linear model for simple regression in R.

Multiple regression

Multiple regression: Independence of observations

The cor() function will help you test this linear regression in the r assumption (the correlation between your variables, and ensure that they are not too highly linked).

Example:

cor(ulcers.data&driving, ulcer.data&drinking)

Multiple regression: Normality

The hist() function will help you test if the dependent variable adheres to a normal distribution.

Example

hist(ulcers.data&ulcers.disease)

The resulting histogram is barely bell-shaped. So, you can continue with the linear regression in R.

Multiple regression: Linearity

You can use two scatter plots to check for linearity. One scatter plot for driving and ulcers and another for drinking and ulcers.

Example:

Plot(ulcers.disease ~ driving, data=ulcers.data)

Proceed with linear regression in r if the correlation appears linear.

Multiple regression: Homoscedasticity

This assumption in linear regression in R is easier to check after model construction.

Step 3 of linear regression in R: Analysis

After determining that your data meet the assumptions of linear regression in R, you can proceed to the analysis for evaluating the link between your variables.

Simple regression

Here, you should check the relationship between income levels and satisfaction scales. You will need to run two code lines to perform simple linear regression in R and check out the results. The first line of code is the linear model, while the second produces the model summary.

Example

Income.satisfaction.im (- im(satisfaction ~ income, data = income.data)
Summary(income.satisfaction.im)

This function produces an output table of the model equation and summarize the model residuals.

The section of coefficients displays:

Model parameters estimates
Standard error of estimated values
The t-value (test statistic)
P-value

The final three lines of the results are model diagnostics. This result will explain if there is a significant relationship between the two variables.

Multiple regression

You can use multiple regression to test the link between ulcers, drinking, and driving. You should use a linear model of ulcers as the dependent variable, and drinking and driving as the independent variables.

Example

Ulcers.disease.im(-im(ulcers.disease ~ driving + drinking, data= ulcers.data)
Summary (ulcers.disease.im)

Step 4 of linear regression in R: Homoscedasticity

Data visualization will help you check the homoscedasticity of your data and clear this assumption within linear regression in R.

Simple regression

Run plot(income.satisfaction.im) to meet the assumption.

Example

Par(mfrow=c(2,2)
Plot(income.satisfaction.im)
Par(mfrow-c(1,1)

The code will produce residual plots, which you can use to determine if the data meets the linear regression in the R homoscedasticity assumption.

Multiple regression

Use the following multiple linear regression in the R code:

Example

Par(mfrow=c(2,2)
Plot(income.satisfaction.im)
Par(mfrow-c(1,1)

Lack of bias in the residual clarifies that the model fits the linear regression in R assumption of homoscedasticity.

Step 5 of linear regression in R: Visualize

The next step is data visualization using a graph. You can plot data and the line of regression from the linear regression model for shared results.

Simple regression

Follow the following steps for linear regression in the R result visualization:

1. Plot the data points on a graph

Income.graph(-ggplot(income.data, aes(x=income, y=happiness))+geom_point()

Income.graph

2. Add linear regression lines to the plotted data

Income.graph (- income.graph + geom_smooth(method=”im”, col+”black”)

Income.graph

3. Add the regression line equation

Income.graph (- income.graph +

Stat_regline_equation(label.x = 3, label.y = 7)

Income.graph

4. Prepare the graph for publication

Income.graph +

Theme_bw() +

Labs(title = “reported satisfaction as a function of income”,

X = “income (x$10,000)

Y = “satisfaction score (1 to 10)

This will produce a finished linear regression in an R graph that you can include in your papers.

Multiple regression

This linear regression in the R process is more challenging than for simple linear regression in r.

Follow these steps:

1. Create a new data frame with the necessary information

Plotting.data(-expand.grid(

Biking = seq(min(ulcers.data&driving), max(ulcers.data&driving), length.out=30),

Smoking=c(min(ulcers.data&drinking, mean(heart.data&driving), max(ulcers.data&drinking)))

This will produce a frame in the environment tab that you can click to review.

2. Predict the values of ulcers based on the linear model

Plotting.data&predicted.y (- predict.im(ulcers.disease.im, newdata=plotting.data)

3. Round the drinking values to two decimals

Plotting.data&drinking (- round(plotting.data&smoking, digits = 2)

4. Change the drinking variable into a factor

Plotting.data&drinking (- as.factor(plotting.data.drinking)

5. Plot the original data

Heart.plot (- ggplot(ulcers.data, aes(x=driving, y=ulcers.disease)) + geom_point()

6. Add the regression lines

Heart.plot (- heart.plot +

Geom_line(data=plotting.data, aes(x=driving, y=predicted.y, color=smoking), size=1.25)

heart.plot

7. Prep the graph for publication

Ulcers.plot

Ulcers.plot +

theme_bw() +

labs(title = “Rates of ulcers disease (% of the population) \n as a function of driving to work and drinking”,

x = “Driving to work (% of population)”,

y = “Ulcers (% of population)”,

color = “Drinking \n (% of population)”)

heart.plot

8. Add the linear regression in the R model to your graph

heart.plot + annotate(geom=”text”, x=30, y=1.75, label=” = 15 + (-0.2*drinking) + (0.178*drinking)”)

You can add the finished graph to your paper.

Step 6 of linear regression in R: Report

Add the graph to your paper and include a small explanation statement.

Printing Your Thesis With BachelorPrint

High-quality bindings with customizable embossing
3D live preview to check your work before ordering
Free express delivery

Configure your binding now!

to printing services

Simple linear regression
Multiple linear regression

What is the difference between simple and multiple linear regression?

Simple linear regression uses one independent and dependent variable, while multiple linear regression includes more than one variable.

Category

Your Steps to Success

Linear Regression In R – A Guide With Examples

How do you like this article? Cancel reply

Linear Regression in R – In a Nutshell

Definition: Linear regression in R

Linear regression in R: Getting started

Step 1 of simple linear regression in R: Loading data

Simple regression

Multiple regression

Step 2 of linear regression in R: Assumptions

Simple regression

Simple regression: Independence of observations

Simple regression: Normality

Simple regression: Linearity

Simple regression: Homoscedasticity

Multiple regression

Multiple regression: Independence of observations

Multiple regression: Normality

Multiple regression: Linearity

Multiple regression: Homoscedasticity

Step 3 of linear regression in R: Analysis

Simple regression

Multiple regression

Step 4 of linear regression in R: Homoscedasticity

Simple regression

Multiple regression

Step 5 of linear regression in R: Visualize

Simple regression

Multiple regression

Step 6 of linear regression in R: Report

FAQs

What is linear regression in R?

What is linear regression?

What are the two types of linear regression?

What is the difference between simple and multiple linear regression?

How do you like this article? Cancel reply