Central tendency, a key concept within the realm of statistics, aids in understanding the central or typical value in a dataset. This concept relies on three primary measures – mean, mode, and median, which serve as tools for summarizing and analyzing data. These measures form the backbone of many statistical analyses, providing insight into the distribution of data points. In our subsequent discussions, we will delve into how these three vital measures of central tendency are employed to synthesize the results of a study.
Definition: Central tendency
Measures of central tendency is a summary measure that describes a data set with a single value that represents the middle of the distribution. Here are the three most common measures of central tendency:
- Mean – This represents the average of the data set.
- Median – This represents the middle value.
- Mode – This is the most commonly occurring value in a data set.
When performing descriptive statistics, it is also crucial to understand measures of variability. You can also summarize the data set by describing its distribution.
Central tendency: Distributions
In statistics, a data set is defined as a distribution of n number of values or scores.
Normal distribution
In a normal distribution, the data is distributed symmetrically. In this case, the values of the mean, median, and mode would be the same. Here is an example of a normally distributed data set:
Shoe size | Frequency |
4 | 1 |
5 | 4 |
6 | 8 |
7 | 4 |
8 | 1 |
Skewed distributions
In a skewed distribution, more values will fall on one side of the center than on the other. In such cases, the mean will be greater than the median, and the median will be greater than the mode.
In a negatively skewed distribution, the mode would be greater than the median, and the mean will be less than both of these values.
Central tendency – Mode
The mode is the value that appears most frequently in a distribution. To get the mode, you have to arrange the values in ascending or descending order, and then you can find the middle value. Depending on the nature of the data set, you may get one mode, multiple modes, or no mode at all. In a frequency table, the mode would be the variable with the highest frequency. If you choose to use a bar graph, you simply need to check the highest bar, as it represents the mode. Let’s consider this example:
Shoe size | Frequency |
4 | 1 |
5 | 4 |
6 | 8 |
7 | 4 |
8 | 1 |
In this case, the mode is 6 because most people reported this as their shoe size.
When to use the mode
Mode is commonly used with nominal data since this form of data is classified into mutually-exclusive categories. When dealing with ratio data, it is not necessary to use the mode since you will be dealing with many variables. Here is an example of ratio data:
Height | Frequency |
154 | 1 |
156 | 1 |
158 | 1 |
161.2 | 1 |
164 | 1 |
Central tendency – Median
The median refers to the middle value in a data set, and you can find this value by arranging the data in ascending or descending order.
Income level | Frequency |
$0-$2,000 | 2 |
$2,001-$4,000 | 5 |
$4,001-$6,000 | 20 |
$6,001-$8,000 | 5 |
$8,001-$10,000 | 1 |
By ordering the data from low to high, you will be able to see that the exact middle point is at $4,001-$6,000.
Median of an odd-numbered data set
In an odd-numbered data set, you can find the median by locating the value at the position. The in the formula represents the number of values featured in the data set. In the above example, the total number of values is 33, so you can apply the formula as follows.
By finding the value at the 17th position, you will be able to locate the median.
Median of an even-numbered data set
If the data set has an even number of variables, you will have to find the and values. After that, you can add the two numbers and divide them by two. In a data set with 60 values, the median will be the mean of the values at these positions:
and
Central tendency – Mean
The arithmetic mean is the most commonly used measure of central tendency. It represents the average of the data set and is calculated by adding up all the values and dividing the product by the number of values. On the other hand, the geometrical mean is calculated as the n-root of the product of all the values. In the data set (3,4,6,8,14), the arithmetic mean can be calculated by adding up all the values. You can find the mean by dividing this number by n, which equals 5 in this example.
Outlier effect on the mean
Data outliers are values that lie very far from the other values in a data set. These values can make the mean significantly higher or lower than the other values. For example, in the data set (3,5,7,9,300), the mean is 64.8, and this doesn’t represent the data set accurately.
Population vs. sample mean
You can find the mean of a sample or a population. Population vs. sample mean are calculated in the same way, but the notations are different. For example, the ‘n’ symbol represents the number of variables in the sample data set, and the ‘N’ symbol represents the number of variables in the population.
Central tendency – Mean, median, or mode?
All three measures of central tendency are meant to be used together since they have different strengths and limitations. However, in some cases, you may not be able to use one or two measures of central tendency.
- The mode can be applied to all four levels of measurement, but it’s mostly used with nominal data and ordinal data.
- The median can only be used with ordinal data, ratio data, and interval data.
- The mean can only be used with interval or ratio levels of measurement.
Levels of measurement | Examples | Measure of central tendency |
Nominal | Gender, nationality | Mode |
Ordinal | Education level, satisfaction rating | Mode, median |
Interval and ratio | IQ grading, temperature | Mode, median, mean |
When choosing a measure to use in a particular data set, you have to consider the distribution of the data. If it is normally distributed, you can use mean, median, or mode as they would all have the same value. For skewed data, you should use the median.
FAQs
The measures of central tendency include the mean, mode, and median.
If the distribution is strongly skewed, you should use the median.
You can use mode on all levels of data, but median and mean cannot be used on nominal data.
Mode is preferred when dealing with nominal data.