Control Flow: Filtering Logic
Acting as the gatekeeper of data quality by enforcing business rules and identifying anomalies.
So far, you’ve learned how to store data. Now, you need to decide what data to keep and what to throw away. This is the core of Data Cleaning.
Imagine you are looking at a dataset of exam scores. You see a grade of -5. You know that's mathematically impossible. Or you see an age of 200 in a customer list. To an analyst, these are Anomalies. We use Control Flow (if/elif/else) to act as the "Gatekeeper" of our dataset, ensuring only logical, clean data passes through to our final report.
When you use multiple conditions like if (age > 18) and (has_id == True), Python uses Short-Circuit Evaluation.
- In an
andstatement, if the first part isFalse, Python doesn't even bother looking at the second part. - In an
orstatement, if the first part isTrue, it stops immediately.
# Efficiency Example if (price > 0) and (is_in_stock): # If price is 0, it stops. # It won't waste time checking if it's in stock. print("Valid Product")
To make your scripts run faster on millions of rows, put the "hardest" or "most likely to fail" check first. This allows the script to exit early more often.
In Data Analysis, filtering isn't just about removing empty rows. It's about enforcing Business Rules. A rule might be: "We only care about sales in Alexandria that are over 500 EGP."
If you don't filter these out early, your averages and charts will be completely wrong.
| Operator | Usage | Analogy |
|---|---|---|
== | Exact Match | Same name on a passport. |
!= | Exclude | "Don't include VIPs." |
in | Membership | "Is Cairo in our branch list?" |
is | Identity | Checking for specific None values. |
price = -10.5
if price < 0:
print("ALERT: Logically impossible price detected!")
elif price == 0:
print("FREE: This is a promotional item.")
else:
print(f"Standard Price: {price}")When you have multiple categories to check (e.g., Cairo, Giza, Alexandria), using if city == "Cairo" or city == "Giza"... is messy. The professional way is to use the in keyword with a collection.
target_cities = ["Cairo", "Giza", "Alexandria"] customer_city = "Luxor" if customer_city in target_cities: print("Local Shipping available.") else: print("Regional Shipping required.")
if (age > 0) and (age < 120) better than just checking if (age < 120)?Let's build a filtering logic for a single data record from an Egyptian shop using safe, defensive programming.
# Raw Record
record = {
"product": "Gaming Mouse",
"price": 1200,
"stock": 0,
"category": "Tech"
}
# 1. Safe extraction using previous knowledge
price = record.get("price", 0)
stock = record.get("stock", 0)
category = record.get("category", "Unknown")
# 2. Multi-level filtering logic
if price <= 0:
status = "REJECTED: Price Error"
elif stock == 0:
status = "OUT_OF_STOCK"
elif category not in ["Tech", "Home"]:
status = "WRONG_CATEGORY"
else:
status = "PROCESSED"
# 3. Final output
print(f"Record Status: {status}")- if/elif/else acts as the gatekeeper for data quality.
- Short-Circuit Evaluation (and/or) makes your scripts more efficient.
- Membership in is the cleanest way to check against a list of categories.
- Anomalies (negative prices, impossible ages) MUST be caught before analysis.
-
↗
Conditional Statements in Python (Real Python)
https://realpython.com/python-conditional-statements/ -
↗
Comparisons and Membership (Official Docs)
https://docs.python.org/3/library/stdtypes.html#comparisons -
↗
Short-Circuiting in Python (GeeksforGeeks)
https://www.geeksforgeeks.org/short-circuit-evaluation-in-programming/