Normal Distribution – Definition & Formula

Time to read: 5 Minutes
Normal-distribution-Definition

A normal distribution is one of the most common types of distribution in statistical analysis. It is measured using the standard deviation and the mean as the main parameters. The normal distribution is used to design statistical tests because many variables are normally distributed.

Normal Distribution – In a Nutshell

  • Normal distributions are assumed in various fields of statistical research, including social and health sciences.
  • The mode, median, and mean are identical in a normal distribution.
  • The main parameters of a normal distribution are the mean and the standard deviation.

Definition: Normal distribution

A normal distribution (Gaussian distribution) refers to a symmetrical probability distribution near the mean. Most observations, group around the peak and the probabilities of other values in the set lessen gradually in both directions.1

Normal distribution bell curve

The matter of normal distribution

Most variables used in scientific research display normal or near-normal distribution. Such variables include height, income, weight, literacy levels, and exam test scores. Since most of these variables display a normal distribution, many scientists use tests designed for normal distributions.2

Knowing the characteristics of this distribution enables students and researchers to make verifiable conclusions and predictions from data samples representing larger populations. Such samples may be selected randomly or picked using the most representative elements of the population in inferential statistics.3

The properties of normal distribution

The standard deviation and the mean are the two main parameters in a normally distributed data set:

  • Mean – The mean is used as one of the main measures of central tendency in quantitative research. It is used in a distribution to define the peak, and most points tend to cluster around the mean.
  • Standard deviation – It measures the variation of the distribution’s data points from the mean. The standard deviation illustrates how spread the data points are from the mean and is calculated by determining the square root of the variance.

Normal distributions exhibit the following characteristics:

  • Symmetry – It assumes a symmetrical shape. This implies that the curve can be divided into two equal halves. The symmetrical shape of the distribution is because half of each observation falls on either side of the bell curve.
  • The mean, mode and median are equal – The mode refers to the most frequently occurring data point in a data distribution. The median is the value that separates the upper from the lower half in an ordered data set. These measures of distribution are equal.
Standard normal distribution mode median mean

Normal distribution: Empirical rule

The empirical rule, also known as the three-sigma rule or the 68-95-99.7 rule, shows where most of the values fall in a normal distribution:

  • Approximately 68% of the values fall between the mean and 1 standard deviation.
  • Approximately 95% of the values lie between the mean and 2 standard deviations.
  • Around 99.7% of the values fall between the mean and 3 standard deviations.

Example:

You collect the ages of a group of students. The data set exhibits the properties of a normal distribution with a mean age of 15 and a standard deviation of 3. Using the empirical rule:

  • About 68% of the ages fall between 12 and 18, which is 1 standard deviation over and under the mean.
  • About 95% of the ages lie between 9 and 21, which is 2 standard deviations over and under the mean.
  • About 99.7% of the ages fall between 6 and 24, which is 3 standard deviations over and under the mean.

The empirical rule can be used as a measure of “normality”. If too many data points are outside the three boundaries, then a distribution may not be normal. It can also show the outliers in your data range, i.e. values that are too small or too large, which may affect the shape of the curve.4

Normal distribution: Central limit theorem

The central limit theorem postulates that if you have sizeable samples from a given population, the means will be normally distributed even if the population is not necessarily normally distributed. The central limit theorem highlights the following:

  • The law of large numbers which states, as the sample size grows larger, the sample mean moves towards the population mean.
  • For several large samples, the mean of the sampling distribution is distributed normally.

The central limit theorem asserts that the assumption of “normality” is unnecessary when conducting parametric tests if the researcher uses a sizeable sample. Parametric tests can be used in large samples of any distribution type as long as the groups have comparable variance and the data in the set is independent.5

Formula of the normal curve

A probability density function is used to plot a normal curve after determining the mean and standard deviation. The area under the curve shows the probability, and the total area covered by the curve is equal to 100% or 1, as provided.

Normal Probability density formula:

Normal probability density formula

Normal probability density formula f(x) – Probability

Normal probability density formula x– the value of the variable

Normal probability density formula μ – mean

Normal probability density formula σ – standard deviation

Normal probability density formula σ2 – variance6

Standard normal distribution

The standard normal distribution is a normal distribution with a mean of zero and a standard deviation of 1. The standard normal distribution is also known as the z- distribution, as its observations are denoted with z rather than x. Z-scores in a standard normal distribution show where each value falls away from the mean using the number of standard deviations.

Calculating probability in a z-distribution

Every z-score is assigned a probability (p-value) which shows the probability of the occurrence of some values falling below the z-score.

FAQs

The distribution is divisible into two equal halves. It features a peak represented by a curve where values are distributed around a central point.

In a normal distribution, data is not skewed, and the data points form a bell curve. Additionally, the main measures of tendency, i.e. the mean and mode, are similar to the mean.

A z-distribution is commonly known as a standard normal distribution. In a z-distribution, the mean is o, and the standard deviation is 1.

These tests assume that a sample is obtained from a population that exhibits the properties of a normal distribution. They usually focus on comparisons between the variance and the mean.

Sources

1 Chen, James. “Normal Distribution.” Investopedia. July 22, 2022. https://www.investopedia.com/terms/n/normaldistribution.asp.

2 CFI Team. “Normal Distribution.” CFI. April 23, 2022. https://corporatefinanceinstitute.com/resources/knowledge/other/normal-distribution/.

3 Math is Fun. “Normal Distribution.” Accessed August 30, 2022. https://www.mathsisfun.com/data/standard-normal-distribution.html.

4 Hayes, Adam. “Empirical Rule.” Investopedia. March 05, 2022. https://www.investopedia.com/terms/e/empirical-rule.asp.

5 LaMorte, Wayne W.. “Central Limit Theorem.” The Role of Probability. July 24, 2016. https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_probability/BS704_Probability12.html.

6 ScienceDirect. “Normal Density Functions.” Accessed August 30, 2022. https://www.sciencedirect.com/topics/earth-and-planetary-sciences/normal-density-functions.