🚦
Level 3 Data Analysis Week 2

Control Flow: Filtering Logic

Acting as the gatekeeper of data quality by enforcing business rules and identifying anomalies.

🌉 The "Gatekeeper"

So far, you’ve learned how to store data. Now, you need to decide what data to keep and what to throw away. This is the core of Data Cleaning.

Imagine you are looking at a dataset of exam scores. You see a grade of -5. You know that's mathematically impossible. Or you see an age of 200 in a customer list. To an analyst, these are Anomalies. We use Control Flow (if/elif/else) to act as the "Gatekeeper" of our dataset, ensuring only logical, clean data passes through to our final report.

⚙️ Short-Circuit Evaluation

When you use multiple conditions like if (age > 18) and (has_id == True), Python uses Short-Circuit Evaluation.

  • In an and statement, if the first part is False, Python doesn't even bother looking at the second part.
  • In an or statement, if the first part is True, it stops immediately.
Python
# Efficiency Example
if (price > 0) and (is_in_stock): 
    # If price is 0, it stops. 
    # It won't waste time checking if it's in stock.
    print("Valid Product")
ℹ️
Order Matters

To make your scripts run faster on millions of rows, put the "hardest" or "most likely to fail" check first. This allows the script to exit early more often.

💥 The "Impossible" Data Points

In Data Analysis, filtering isn't just about removing empty rows. It's about enforcing Business Rules. A rule might be: "We only care about sales in Alexandria that are over 500 EGP."

If you don't filter these out early, your averages and charts will be completely wrong.

OperatorUsageAnalogy
==Exact MatchSame name on a passport.
!=Exclude"Don't include VIPs."
inMembership"Is Cairo in our branch list?"
isIdentityChecking for specific None values.
Python
price = -10.5
if price < 0:
    print("ALERT: Logically impossible price detected!")
elif price == 0:
    print("FREE: This is a promotional item.")
else:
    print(f"Standard Price: {price}")
🛡️ The Membership in Check

When you have multiple categories to check (e.g., Cairo, Giza, Alexandria), using if city == "Cairo" or city == "Giza"... is messy. The professional way is to use the in keyword with a collection.

Python
target_cities = ["Cairo", "Giza", "Alexandria"]
customer_city = "Luxor"

if customer_city in target_cities:
    print("Local Shipping available.")
else:
    print("Regional Shipping required.")
?
Why is if (age > 0) and (age < 120) better than just checking if (age < 120)?
🔗 The "Marketplace Gatekeeper"

Let's build a filtering logic for a single data record from an Egyptian shop using safe, defensive programming.

Python
# Raw Record
record = {
    "product": "Gaming Mouse",
    "price": 1200,
    "stock": 0,
    "category": "Tech"
}

# 1. Safe extraction using previous knowledge
price = record.get("price", 0)
stock = record.get("stock", 0)
category = record.get("category", "Unknown")

# 2. Multi-level filtering logic
if price <= 0:
    status = "REJECTED: Price Error"
elif stock == 0:
    status = "OUT_OF_STOCK"
elif category not in ["Tech", "Home"]:
    status = "WRONG_CATEGORY"
else:
    status = "PROCESSED"

# 3. Final output
print(f"Record Status: {status}")
  • if/elif/else acts as the gatekeeper for data quality.
  • Short-Circuit Evaluation (and/or) makes your scripts more efficient.
  • Membership in is the cleanest way to check against a list of categories.
  • Anomalies (negative prices, impossible ages) MUST be caught before analysis.
📚 External Resources