Averages, Spread & Cumulative Frequency
Mean, median, mode, range, interquartile range, and ogive curves.
Averages describe the centre of a dataset, while spread describes how far the values are from each other. Two classes can have the same mean score but very different consistency, so both centre and spread are needed to understand the data properly.
CSEC questions often move from calculation to interpretation: find the mean, median, range, quartiles, or cumulative frequency, then say what the result tells you. Do not treat these as isolated formulas. After calculating, write a short sentence explaining what the number means in the context of the question.
A measure of central tendency represents the "typical" or "average" value in a dataset.
Mean (Average)
The mean uses every value, so it is affected by very large or very small outliers. It is useful when the data is fairly balanced.
For raw data:
Scores: 5, 7, 8, 9, 5
For ungrouped frequency data:
| Score | Frequency |
|---|---|
| 5 | 2 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
For grouped data:
Use class midpoints:
| Class | Frequency | Midpoint |
|---|---|---|
| 10-19 | 3 | 14.5 |
| 20-29 | 5 | 24.5 |
| 30-39 | 2 | 34.5 |
Median
The median is the middle value after ordering the data. It is useful when outliers would distort the mean.
The median is the middle value when data is arranged in order.
For raw data:
- Arrange values from smallest to largest
- If odd number of values: median is the middle one
- If even number of values: median is the average of the two middle ones
Scores: 5, 5, 7, 8, 9 (5 values, odd)
Median = 7 (the 3rd value)
Scores: 5, 5, 7, 8, 9, 10 (6 values, even)
Median = (7 + 8) ÷ 2 = 7.5
For grouped data:
Use the cumulative frequency table and interpolation:
Where:
- = lower boundary of median class
- = total frequency
- = cumulative frequency before median class
- = frequency of median class
- = class width
| Class | Frequency | Cumulative |
|---|---|---|
| 10-19 | 3 | 3 |
| 20-29 | 5 | 8 |
| 30-39 | 2 | 10 |
Total = 10, so median position = 10÷2 = 5
Median class is 20-29 (cumulative frequency reaches 5 here)
Mode
The mode identifies the most common value. It is especially useful for categorical data, where mean and median may not make sense.
The mode is the value that appears most often.
Scores: 5, 5, 5, 7, 8, 9, 9
Mode = 5 (appears 3 times)
Data: 2, 5, 5, 7, 7, 9
Two modes: 5 and 7 (both appear twice) = bimodal
Data: 2, 5, 7, 9
No mode (all appear once) = no mode
Choosing Mean, Median, or Mode
Choosing the average is a reasoning skill. The best measure depends on the shape of the data and what the question is trying to describe.
Use MEAN when:
- Data is roughly symmetric
- No extreme outliers
- You want to use all values
- Example: class average on a test
Use MEDIAN when:
- Data has outliers or is skewed
- You want the "typical" middle value
- Example: house prices (skewed by luxury homes)
Use MODE when:
- Categorical data (colors, preferences)
- Discrete data with clear peaks
- Example: favorite color, most common shoe size
Example: Which average?
House prices: 120,000, 125,000, 130,000, 140,000, 2,000,000
- Mean = (120+125+130+140+2000)÷5 = 502,300 (way too high!)
- Median = 130,000 (better—the luxury home is an outlier)
- Mode = no mode
Best answer: Median, because the data has an outlier.
Part 5: Measures of Spread (Dispersion)
Spread measures how far apart the data values are from each other.
Range
Range gives a quick sense of spread, but it only uses the smallest and largest values. One unusual value can make the range misleading.
Scores: 5, 7, 8, 9, 5
Range = 9 - 5 = 4
Problem: Only uses the extreme values. Doesn't show middle spread.
Quartiles and Interquartile Range
Quartiles split ordered data into four parts. The interquartile range focuses on the middle half of the data, so it is less affected by extremes.
Quartiles divide the data into 4 equal parts.
- Q₁ (1st quartile) = 25th percentile
- Q₂ (2nd quartile) = 50th percentile = median
- Q₃ (3rd quartile) = 75th percentile
Interquartile Range (IQR):
This shows the spread of the middle 50% of data.
Test scores: 5, 6, 7, 7, 8, 8, 8, 9, 9, 10 (10 values)
Arrange in order: 5, 6, 7, 7, 8, 8, 8, 9, 9, 10
Q₁ position = (10+1) ÷ 4 = 2.75 → between 2nd and 3rd values = 6 + 0.75(7-6) = 6.75
Q₂ position = (10+1) ÷ 2 = 5.5 → between 5th and 6th values = 8
Q₃ position = 3(10+1) ÷ 4 = 8.25 → between 8th and 9th values = 9 + 0.25(9-9) = 9
Semi-Interquartile Range
The semi-interquartile range is half of the IQR. It gives a compact measure of spread around the middle of the dataset.
Using the example above:
- Range: Uses only extremes, sensitive to outliers
- IQR: Shows spread of middle 50%, ignores outliers
- Semi-IQR: Half of IQR, useful for comparison
Part 6: Cumulative Frequency and Ogives
Cumulative Frequency Table
Cumulative frequency is a running total. It answers questions like "how many values are less than or equal to this point?"
Cumulative frequency = total count up to and including that class.
| Class | Frequency | Cumulative Frequency |
|---|---|---|
| 10-19 | 3 | 3 |
| 20-29 | 5 | 3+5 = 8 |
| 30-39 | 7 | 8+7 = 15 |
| 40-49 | 4 | 15+4 = 19 |
| 50-59 | 1 | 19+1 = 20 |
Cumulative Frequency Curve (Ogive)
An ogive turns cumulative totals into a graph. It is useful for estimating medians, quartiles, and percentiles from grouped data.
An ogive is an S-shaped curve showing cumulative frequency.
How to draw:
- Use class boundaries on x-axis (not class limits)
- Use cumulative frequency on y-axis
- Plot point at upper boundary of each class
- Connect points with a smooth curve
Using the table above:
| Upper Boundary | Cumulative Frequency |
|---|---|
| 19.5 | 3 |
| 29.5 | 8 |
| 39.5 | 15 |
| 49.5 | 19 |
| 59.5 | 20 |
Reading from an Ogive
To read from an ogive, move horizontally from the cumulative frequency value to the curve, then down to the data value. This is an estimate, so use the graph carefully.
You can read:
- Quartiles: Q₁ at 25% of total, Q₂ at 50%, Q₃ at 75%
- Percentiles: any value's percentage position
- Median: where cumulative frequency = n/2
- Frequencies above/below a given value
From the ogive above (n=20):
Q₁ (25% of 20 = 5): Read across from cumulative frequency 5 to curve, then down to x-axis ≈ 22
Median (50% of 20 = 10): Read from cumulative 10 ≈ 32
Q₃ (75% of 20 = 15): Read from cumulative 15 = 39.5