🔄
Data Analysis Week 2

Loops: Iterating Datasets

Turn manual tasks into automated pipelines by processing thousands of records in milliseconds using Python loops.

đŸ› ī¸ The "Power of Automation"
â–ŧ

Imagine you are a teacher in Alexandria with 100 students. You have a list of their grades and you need to calculate everyone's final status. If you do this manually one by one, it's slow and you'll probably make mistakes. This is why we have Loops.

🎮
The Automatic Processor

In Data Analysis, loops are our "Automatic Processor." We tell Python: "Take this entire container of 10,000 records, look at each record ONE BY ONE, and perform these three cleaning steps." What would take a human 10 hours takes Python 10 milliseconds.

đŸ—ī¸ The Iterator Object
â–ŧ

When you write for item in list, Python creates an Iterator behind the scenes. Think of this as a "Pointer" that keeps track of which record is currently being processed. This is much more efficient than traditional counting loops used in other languages.

Python
# The standard "For Each" loop
daily_sales = [120, 85, 140, 200]
for sale in daily_sales:
    # Logic is applied to 'sale' one at a time
    print(f"Processed sale: {sale} EGP")
â„šī¸
Loop Controls

- break: Stop the entire loop immediately (e.g., "Stop once we find the first error").
- continue: Skip the current item and move to the next (e.g., "If this row is empty, skip it").

âš ī¸ The "Crash and Burn" Loop
â–ŧ

A common mistake is assuming every item in your loop will be "Perfect." If you are looping through 1,000 prices and item #452 is a string "N/A", your script will crash and lose all the work it did for the first 451 items.

TechniqueGoalAnalogy
enumerate()Track row numbers.Numbering every student on the list.
continueHandle minor errors.Skipping a student who was absent.
breakStop for critical errors.Stopping the exam if the fire alarm goes off.
Python
# Skipping Corrupted Data
mixed_data = [100, 200, "Messy", 300]
for item in mixed_data:
    if isinstance(item, str):
        print("Skipping corrupted row...")
        continue # Moves directly to 300
    print(f"Adding {item} to total.")
đŸ›Ąī¸ Using enumerate() for Error Reports
â–ŧ

When a loop fails, knowing "an error happened" isn't helpful. You need to know exactly WHICH row failed. enumerate() provides the index (row number) alongside the data.

Python
prices = [10, 20, "ERROR", 40]
for row_num, price in enumerate(prices):
    if isinstance(price, str):
        print(f"CRITICAL: Data corruption on Row {row_num}!")
?
Why would an analyst use continue inside a loop that is calculating total revenue?
đŸ›ī¸ The "Automatic Cleaner"
â–ŧ

In professional data science, we often create a sequence of automated steps to move data from a "raw" state to a "final" state. We call this a Data Pipeline.

Python
# A list of dictionaries (Our dataset)
raw_inventory = [
    {"id": 1, "price": 100},
    {"id": 2, "price": -50}, # Logic Error
    {"id": 3, "price": 200},
    {"id": 4, "price": "Missing"} # Type Error
]

cleaned_prices = []

for item in raw_inventory:
    p = item.get("price")
    
    # 1. Check for Type Errors (The Negative Gatekeeper)
    if not isinstance(p, (int, float)):
        continue # Skip "Jagged" or corrupted rows
        
    # 2. Check for Logic Errors
    if p <= 0:
        continue
        
    # 3. Collect Clean Data
    cleaned_prices.append(p)

# 4. Final Aggregation
print(f"Total Valid Revenue: {sum(cleaned_prices)}")
â„šī¸
The Logic of not and sum()

- Negative Gatekeeper: In the code if not isinstance(p, (int, float)):, the not operator flips the logic. It tells Python: "If the data is NOT a number, stop what you are doing and skip this row." This is the best way to handle Jagged Arrays where a column might be missing or corrupted.
- sum() Function: This is a built-in Python tool that adds up every number inside a list instantly. It is much faster than writing your own loop to add numbers manually!

  • for loops automate repetitive data cleaning tasks.
  • continue allows you to skip "dirty" or "jagged" rows without crashing.
  • enumerate() tracks the row index for professional error reporting.
  • break stops a process once a certain goal or critical failure is met.
📚 External Resources
â–ŧ