๐ŸŽฏWhat is an Outlier?
โ–ผ

An outlier is a value in a dataset that is very different from the rest of the values. It can be much higher or much lower than the other values.

For example, in a list of student heights from a class, most heights might be between 150 cm and 175 cm. If one student is recorded as 250 cm, that value is an outlier โ€” it does not fit with the others.

Why Outliers Happen

  • A real but rare value: a genuine measurement that just happens to be unusual (a very tall student, a very expensive purchase).
  • A data entry mistake: someone wrote 250 instead of 175, or added an extra zero.
  • A measurement error: the tool was broken or used in the wrong way.
๐Ÿ“
Important Note

Not every outlier is a mistake. Some are real and useful. The goal is to notice them first, then decide what they mean.

๐Ÿ”ŽHow Outliers Affect the Mean
โ–ผ

One value that is very far from the others can change the mean (average) a lot. The median, on the other hand, is much less affected.

Simple Example

Imagine the daily allowance of 5 students (in EGP):

20, 25, 30, 30, 35

  • Mean = (20 + 25 + 30 + 30 + 35) รท 5 = 28 EGP
  • Median = middle value = 30 EGP

Now imagine one student forgot a comma and wrote 350 instead of 35:

20, 25, 30, 30, 350

  • Mean = (20 + 25 + 30 + 30 + 350) รท 5 = 91 EGP
  • Median = middle value = 30 EGP (no change)

One outlier changed the mean from 28 to 91 โ€” more than three times as much! The median stayed exactly the same.

๐Ÿ’ก
Key Idea

The mean is sensitive to outliers. The median is not. When outliers are present, the median is usually a safer way to describe the typical value.

๐Ÿ“Finding Outliers with the IQR Rule
โ–ผ

To decide if a value is really an outlier, we can use a simple rule that uses the quartiles of the data. This is called the IQR Rule.

๐Ÿ“
Note: Other Methods Exist

The IQR rule and the boxplot are not the only ways to find outliers. Statisticians also use methods like the Z-score, the Modified Z-score, and several others built into machine-learning libraries. In this topic, we focus only on the boxplot and the IQR rule because they are the simplest, most visual, and most widely used in data analysis. As you grow in the field, you will meet the other methods too.

What Are Quartiles?

When we sort the data from smallest to largest, the quartiles split it into four equal parts:

  • Q1 (first quartile): the value at the 25% mark.
  • Q2 (second quartile): the value at the 50% mark โ€” this is the median.
  • Q3 (third quartile): the value at the 75% mark.

The IQR (Interquartile Range)

The IQR is the distance between Q1 and Q3. It tells us how spread out the middle half of the data is.

IQR = Q3 โˆ’ Q1

The Outlier Limits (Fences)

We then build two limits, one above the data and one below it. Any value outside these limits is called an outlier.

Lower Limit = Q1 โˆ’ 1.5 ร— IQR
Upper Limit = Q3 + 1.5 ร— IQR

Any value smaller than the Lower Limit or larger than the Upper Limit is treated as an outlier.

Quick Worked Example

Suppose Q1 = 20, Q3 = 40 for a dataset.

  • IQR = 40 โˆ’ 20 = 20
  • Lower Limit = 20 โˆ’ (1.5 ร— 20) = 20 โˆ’ 30 = โˆ’10
  • Upper Limit = 40 + (1.5 ร— 20) = 40 + 30 = 70

Any value below โˆ’10 or above 70 is an outlier. A value of 90, for example, is outside the upper limit, so it is an outlier.

๐Ÿ“Š
Visual Tool: The Boxplot

Boxplots are charts that show Q1, Q2 (median), and Q3 as a box, with lines (called whiskers) reaching to the limits. Any value outside the whiskers is drawn as a separate dot โ€” that dot is an outlier.

๐ŸงฐWhat to Do with an Outlier
โ–ผ

Finding an outlier is only the first step. The next step is to decide what to do with it. There are three common choices:

1. Remove It

Use this only when you are sure the value is a mistake (for example, an obvious typing error like a height of 999 cm).

2. Correct It

If you can find the right value (by checking the original record), replace the wrong value with the correct one.

3. Keep and Report

If the value is unusual but real, keep it. Mention it in your report and explain that it might affect the average.

โš ๏ธ
Be Careful

Never remove a value just because it looks "too big" or "too small." Always check first. Real outliers can carry important information.

โœ…Best Practice in Reports
โ–ผ

When you write a report or build a chart, follow these simple steps to handle outliers in a clear and honest way:

  1. Sort the data and check if any value looks very different from the rest.
  2. Use the IQR Rule to confirm whether it is really an outlier.
  3. Decide if the value is a mistake, a real rare value, or unclear.
  4. If you keep the outlier, report both the mean and the median so the reader sees the full picture.
  5. Mention any value you removed and explain why.

Summary Comparison

MeasureAffected by Outliers?Best Used When
MeanYes, very muchThe data has no extreme values.
MedianAlmost notThe data has outliers or is skewed.
IQRAlmost notYou want to measure the spread of the middle of the data.
?
Which measure of center is most affected by an outlier?
  • An outlier is a value that is far from the rest of the data.
  • The mean is strongly affected by outliers; the median is much less affected.
  • The IQR Rule uses Q1, Q3, and the formula 1.5 ร— IQR to set upper and lower limits.
  • For each outlier, decide whether to remove, correct, or keep and report it.
  • Always be honest in reports: explain any values you removed and show both the mean and the median when outliers are present.
๐Ÿ“šExternal Resources
โ–ผ