Loops: Iterating Datasets
Turn manual tasks into automated pipelines by processing thousands of records in milliseconds using Python loops.
Imagine you are a teacher in Alexandria with 100 students. You have a list of their grades and you need to calculate everyone's final status. If you do this manually one by one, it's slow and you'll probably make mistakes. This is why we have Loops.
In Data Analysis, loops are our "Automatic Processor." We tell Python: "Take this entire container of 10,000 records, look at each record ONE BY ONE, and perform these three cleaning steps." What would take a human 10 hours takes Python 10 milliseconds.
When you write for item in list, Python creates an Iterator behind the scenes. Think of this as a "Pointer" that keeps track of which record is currently being processed. This is much more efficient than traditional counting loops used in other languages.
# The standard "For Each" loop daily_sales = [120, 85, 140, 200] for sale in daily_sales: # Logic is applied to 'sale' one at a time print(f"Processed sale: {sale} EGP")
- break: Stop the entire loop immediately (e.g., "Stop once we find the first error").
- continue: Skip the current item and move to the next (e.g., "If this row is empty, skip it").
A common mistake is assuming every item in your loop will be "Perfect." If you are looping through 1,000 prices and item #452 is a string "N/A", your script will crash and lose all the work it did for the first 451 items.
| Technique | Goal | Analogy |
|---|---|---|
enumerate() | Track row numbers. | Numbering every student on the list. |
continue | Handle minor errors. | Skipping a student who was absent. |
break | Stop for critical errors. | Stopping the exam if the fire alarm goes off. |
# Skipping Corrupted Data mixed_data = [100, 200, "Messy", 300] for item in mixed_data: if isinstance(item, str): print("Skipping corrupted row...") continue # Moves directly to 300 print(f"Adding {item} to total.")
enumerate() for Error Reports
When a loop fails, knowing "an error happened" isn't helpful. You need to know exactly WHICH row failed. enumerate() provides the index (row number) alongside the data.
prices = [10, 20, "ERROR", 40] for row_num, price in enumerate(prices): if isinstance(price, str): print(f"CRITICAL: Data corruption on Row {row_num}!")
In professional data science, we often create a sequence of automated steps to move data from a "raw" state to a "final" state. We call this a Data Pipeline.
# A list of dictionaries (Our dataset)
raw_inventory = [
{"id": 1, "price": 100},
{"id": 2, "price": -50}, # Logic Error
{"id": 3, "price": 200},
{"id": 4, "price": "Missing"} # Type Error
]
cleaned_prices = []
for item in raw_inventory:
p = item.get("price")
# 1. Check for Type Errors (The Negative Gatekeeper)
if not isinstance(p, (int, float)):
continue # Skip "Jagged" or corrupted rows
# 2. Check for Logic Errors
if p <= 0:
continue
# 3. Collect Clean Data
cleaned_prices.append(p)
# 4. Final Aggregation
print(f"Total Valid Revenue: {sum(cleaned_prices)}")
- Negative Gatekeeper: In the code if not isinstance(p, (int, float)):, the not operator flips the logic. It tells Python: "If the data is NOT a number, stop what you are doing and skip this row." This is the best way to handle Jagged Arrays where a column might be missing or corrupted.
- sum() Function: This is a built-in Python tool that adds up every number inside a list instantly. It is much faster than writing your own loop to add numbers manually!
- for loops automate repetitive data cleaning tasks.
- continue allows you to skip "dirty" or "jagged" rows without crashing.
- enumerate() tracks the row index for professional error reporting.
- break stops a process once a certain goal or critical failure is met.
-
â
Python for Loops (Real Python)
https://realpython.com/python-for-loop/ -
â
Looping Techniques (Official Docs)
https://docs.python.org/3/tutorial/controlflow.html#looping-techniques -
â
Python enumerate() Function (W3Schools)
https://www.w3schools.com/python/ref_func_enumerate.asp