Data Distribution
Learn how to read the shape of a histogram and understand what it tells us about the values inside the dataset.
The distribution of a dataset is the way the values are spread across the different ranges. When we draw a histogram, the bars form a shape, and that shape is what we call the distribution.
Different datasets have different shapes. Some are balanced and smooth, others lean to one side, and others have more than one peak. Each shape tells us something useful about the data.
Why Do We Look at the Shape?
The shape helps us understand the data quickly. For example:
- Are most values close to each other or are they spread out?
- Are the high values more common, or the low values?
- Is there a single common range, or are there a few common ranges?
Before doing any calculations, look at the shape of the histogram. The shape gives you a quick first impression of the data.
One of the most common shapes in real life is the symmetric distribution, also called the Normal Distribution or the Bell Curve.
What Does It Look Like?
In a symmetric distribution, the bars are tallest in the middle and get shorter as we move to the left or right. Both sides look the same, like a mirror image.
What Does It Mean?
- Most values are close to the middle.
- Very high and very low values are uncommon.
- The mean, median, and mode are very close to each other.
- The heights of adult men or women in a country.
- The weights of bags of rice produced in a factory.
- Test scores in a fair exam where most students score around the average.
When a histogram is not symmetric, we say it is skewed. A skewed distribution has a longer "tail" on one side. There are two types:
π Symmetric
Both sides look the same. Mean β Median.
Example: heights of students.
βοΈ Right-Skewed
The tail points to the right (higher values). Most values are small, but a few are very large.
Example: family income.
βοΈ Left-Skewed
The tail points to the left (lower values). Most values are large, but a few are very small.
Example: scores on an easy quiz.
Right-Skewed (Positive Skew)
Most of the values are on the left side of the chart, with a tail stretching to the right. This means most people or items have small values, while a small number have large values.
Left-Skewed (Negative Skew)
Most of the values are on the right side of the chart, with a tail stretching to the left. This means most people or items have large values, while a small number have small values.
The skew is named after the side where the tail points, not where the bars are tallest.
The shape of the data tells us which measure of center to trust more β the mean (average) or the median (middle value).
In a Symmetric Distribution
The mean and the median are very close. Either one is a good way to describe the center.
In a Skewed Distribution
The mean is affected by very large or very small values, so it gets pulled toward the tail. The median is not affected by extreme values as much, because it only depends on the middle position.
- Right-skewed: The mean is usually higher than the median.
- Left-skewed: The mean is usually lower than the median.
Imagine a small classroom with 5 friends, with monthly allowances of 100, 100, 200, 200, and 5,000 EGP.
- Mean = (100 + 100 + 200 + 200 + 5000) Γ· 5 = 1,120 EGP
- Median = middle value = 200 EGP
Here the mean of 1,120 EGP is misleading because four out of five friends actually have 100 or 200 EGP. The median of 200 EGP is a much better description of a "typical" friend in this group.
When a distribution is strongly skewed, the median is usually a more useful measure of the center than the mean.
Looking at the shape of a distribution helps us make better decisions. With one quick look at a histogram you can:
- Decide whether to use the mean or the median to describe the data.
- Spot if the data leans more to one side.
- Notice values that look different from the rest (we will learn about these in the next topics).
- Compare two groups easily by drawing two histograms side by side.
Quick Comparison
| Shape | What it looks like | Best center to use |
|---|---|---|
| Symmetric | Both sides equal, peak in the middle | Mean or Median (both are close) |
| Right-skewed | Long tail to the right | Median (mean is pulled higher) |
| Left-skewed | Long tail to the left | Median (mean is pulled lower) |
- The distribution is the shape made by the bars of a histogram.
- A symmetric distribution looks the same on both sides; the mean and median are close.
- A right-skewed distribution has a long tail on the right (a few large values).
- A left-skewed distribution has a long tail on the left (a few small values).
- In skewed data, the median usually describes the center better than the mean.