πŸ“Š
Level 3 AI&DSWeek 3

The Interquartile Range (IQR)

Discover how to analyze the middle 50% of your data distribution using a robust measure that ignores extreme outliers.

πŸ“ŠSection 1: The Context Bridge
β–Ό

You are analyzing apartment prices in a new urban development. Most properties range between 1,000,000 and 1,500,000 EGP. However, your dataset contains a few "Outliers": a tiny storage room listed for 5,000 EGP (entry error) and a luxury penthouse listed for 500,000,000 EGP.

If you include these extremes in your calculation of the Mean or Standard Deviation, the results will be completely distorted. Your "Average" would suggest a price that no normal buyer can afford. To produce a professional market report, you need to look at the middle 50% of the dataβ€”the heart of the market where the real business happens.

IQR Statistics Illustration
βš™οΈSection 2: The Core Logic
β–Ό

The Interquartile Range (IQR) is a robust statistical measure used to quantify the spread of the middle 50% of data. It is calculated as Q3 βˆ’ Q1.

Unlike the mean and standard deviation, the IQR is less sensitive to extreme values (Robust) because it relies on quartiles (positions) rather than all data points. It does not remove outliers automatically, but it helps identify them using mathematical boundaries. Care must be taken when computing it, as reversing the subtraction (Q1 βˆ’ Q3) produces a negative value, indicating a logical error in implementation (distances cannot be negative).

πŸ› οΈSection 3: Technical Mastery
β–Ό

Let's build a robust IQR function in Python that handles any list length and sorts the data automatically before calculation.

Python
def calculate_iqr(data):
    # 1. Sort first to ensure positional logic
    sd = sorted(data)
    n = len(sd)
    if n < 2: return 0
    
    def get_med(lst):
        ln = len(lst)
        m = ln // 2
        return (lst[m-1] + lst[m])/2 if ln%2==0 else lst[m]

    mid = n // 2
    q1 = get_med(sd[:mid])
    q3 = get_med(sd[mid+1:] if n%2!=0 else sd[mid:])
    
    return q3 - q1

apartment_prices = [1.2e6, 5000, 1.1e6, 1.3e6, 5e8]
print("Market IQR:", calculate_iqr(apartment_prices))
πŸ”¬Section 4: Under the Hood (Logic Defense)
β–Ό

In professional data analysis, a Logic Error is often more dangerous than a Syntax Error because the program keeps running with wrong numbers. If you calculate IQR as q1 - q3, Python won't crash, but you will get a negative "spread" which makes no mathematical sense. Distances and ranges are absolute, positive values. Always ensure your subtraction order is High - Low.

🀝Section 5: The Master Integration (Final Boss)
β–Ό

You must write a final integration test for your IQR function to ensure it ignores extreme values in a skewed dataset.

Python
# Skewed list with outliers at both ends
skewed_data = [1, 50, 52, 55, 58, 60, 5000]
result = calculate_iqr(skewed_data)

print("Robust IQR Result:", result)
# Notice how 1 and 5000 are ignored!
?
Why is the IQR considered "Robust"?
  • The Interquartile Range (IQR) measures the spread of the middle 50% of data.
  • IQR = Q3 βˆ’ Q1. It is a Robust measure because it resists the pull of outliers.
  • Always sort data before finding quartiles to maintain positional logic.
  • Negative IQR results indicate a logical error in the subtraction order.
πŸ“šExternal Resources
β–Ό