🔍
Pandas Series Boolean Indexing Slicing loc iloc

Topic 6.3: Series Indexing and Slicing

Selecting, filtering, and modifying data with precision

🎯 Accessing Elements

📂 Dataset for this topic — 6_3_city_populations.csv: ⬇ Download 6_3_city_populations.csv

In the previous topic, we learned how to create a Series with a custom index. Now the practical question: how do you retrieve specific values from it? Pandas provides two access methods — loc and iloc — each built for a different scenario.

loc uses labels. You pass the index label exactly as it appears in the Series. This keeps your code readable: cities.loc['Cairo'] reads like a natural sentence.

Python
import pandas as pd

# Load city population data
cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True)

print('Series with custom index:')
print(cities)
# Output:
# City
# Cairo         9500000
# Alexandria    5200000
# Giza          4800000
# Aswan          300000
# Luxor          500000
# Name: Population, dtype: int64

# Access by label
print(cities.loc['Cairo'])
# Output: 9500000
▶ Output
City Cairo 9500000 Alexandria 5200000 Giza 4800000 Aswan 300000 Luxor 500000 Name: Population, dtype: int64 9500000

iloc uses positions — zero-indexed integers, just like Python lists. iloc[0] is the first element, iloc[-1] is the last.

Python
# Access by position
print(cities.iloc[0])   # First element
# Output: 9500000

print(cities.iloc[-1])  # Last element
# Output: 500000
▶ Output
9500000 500000
📘
When to Use Each

Use loc when you know the label and want readable, self-documenting code. Use iloc when you care about order — for example, getting the first five rows or iterating through elements by position. On a Series with string labels, passing a number to loc will raise a KeyError because no label with that number exists.

Boolean Indexing

Boolean indexing lets you filter data with conditions. Instead of specifying which elements you want by name or position, you describe them with a condition. Pandas evaluates that condition for every element and returns only those that match.

The process happens in two steps. First, you create a condition that returns a Series of boolean values — True where the condition is met, False where it is not. This boolean Series is called a mask. Second, you use that mask to filter the original data.

Python
import pandas as pd

cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True)

# Step 1: Create the boolean mask
mask = cities > 1000000
print('Boolean mask (population > 1 million):')
print(mask)
# Output:
# City
# Cairo         True
# Alexandria    True
# Giza          True
# Aswan         False
# Luxor         False

# Step 2: Apply the mask to filter
large_cities = cities[mask]
print('Cities with population > 1 million:')
print(large_cities)
# Output:
# City
# Cairo         9500000
# Alexandria    5200000
# Giza          4800000
▶ Output
City Cairo True Alexandria True Giza True Aswan False Luxor False Name: Population, dtype: bool City Cairo 9500000 Alexandria 5200000 Giza 4800000 Name: Population, dtype: int64

You can combine both steps into a single line for cleaner code:

Python
# One-line filtering
large_cities = cities[cities > 1000000]
print(large_cities)
▶ Output
City Cairo 9500000 Alexandria 5200000 Giza 4800000 Name: Population, dtype: int64
📘
Why Boolean Indexing is Fast

Internally, Pandas uses NumPy for these operations. When you write cities > 1000000, NumPy evaluates this condition across the entire array in a single vectorized operation — no Python loops involved. This makes boolean indexing fast even on datasets with millions of rows.

Boolean indexing works with any comparison operator: <, <=, >, >=, ==, !=. You can also combine multiple conditions using & (and), | (or), and ~ (not):

Python
# Cities with population between 500,000 and 5,000,000
medium_cities = cities[(cities >= 500000) & (cities <= 5000000)]
print(medium_cities)
# Output:
# City
# Giza    4800000
# Luxor    500000

# Note: Use & and | instead of 'and' and 'or'
# Also, always use parentheses around each condition
▶ Output
City Giza 4800000 Luxor 500000 Name: Population, dtype: int64
📍 Selecting Multiple Elements

Sometimes you know exactly which elements you want, and you want to select them by name. Instead of passing a single label to loc, pass a list of labels. Pandas returns a new Series containing only those elements.

Python
# Select specific cities by name
selected = cities.loc[['Alexandria', 'Aswan']]
print(selected)
# Output:
# City
# Alexandria    5200000
# Aswan          300000
▶ Output
City Alexandria 5200000 Aswan 300000 Name: Population, dtype: int64

This works well when you have a predefined set of items to analyze — for example, comparing a specific group of products, or tracking certain stocks from a larger portfolio.

✂️ Slicing: Selecting Ranges

Slicing lets you select a contiguous range of elements from a Series. It works similarly to Python list slicing, but with an important difference depending on whether you use iloc or loc.

With iloc (position-based slicing), the behavior matches standard Python slicing: the start is included, the stop is excluded.

Python
import pandas as pd

cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True)

# Position-based slicing with iloc
print('First 3 cities using iloc[0:3]:')
print(cities.iloc[0:3])
# Output: positions 0, 1, 2 (position 3 is excluded)
# City
# Cairo         9500000
# Alexandria    5200000
# Giza          4800000

# Last 2 cities using negative indexing
print('Last 2 cities using iloc[-2:]:')
print(cities.iloc[-2:])
# Output:
# City
# Aswan     300000
# Luxor     500000
▶ Output
City Cairo 9500000 Alexandria 5200000 Giza 4800000 Name: Population, dtype: int64 City Aswan 300000 Luxor 500000 Name: Population, dtype: int64

With loc (label-based slicing), both the start and stop labels are included in the result. This is different from Python's default slicing behavior.

Python
# Label-based slicing with loc
print("Cairo to Giza using loc['Cairo':'Giza']:")
print(cities.loc['Cairo':'Giza'])
# Output: Cairo, Alexandria, AND Giza (all included)
# City
# Cairo         9500000
# Alexandria    5200000
# Giza          4800000
▶ Output
City Cairo 9500000 Alexandria 5200000 Giza 4800000 Name: Population, dtype: int64
⚠️
Critical Difference: iloc vs loc Slicing

With iloc, the stop position is excluded: iloc[0:3] returns 3 elements (positions 0, 1, 2). With loc, the stop label is included: loc['Cairo':'Giza'] returns Cairo, Alexandria, and Giza. This difference exists because labels are not necessarily ordered or numeric. When you slice by label, Pandas assumes you want to include both endpoints.

You can omit either the start or stop to slice from the beginning or to the end:

Python
# From start to position 2 (excluded)
print(cities.iloc[:2])
# Output: Cairo, Alexandria

# From position 3 to end
print(cities.iloc[3:])
# Output: Aswan, Luxor
▶ Output
City Cairo 9500000 Alexandria 5200000 Name: Population, dtype: int64 City Aswan 300000 Luxor 500000 Name: Population, dtype: int64
✏️ Modifying Series Values

Series are mutable — you can change their values after creation. This is essential for data cleaning, updating records, or applying transformations.

To update a single value, use loc or iloc with assignment:

Python
import pandas as pd

cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True)

# Update Cairo's population
print('Original Cairo population:', cities.loc['Cairo'])
# Output: 9500000

cities.loc['Cairo'] = 11000000
print('Updated Cairo population:', cities.loc['Cairo'])
# Output: 11000000
▶ Output
Original Cairo population: 9500000 Updated Cairo population: 11000000

You can combine modification with boolean indexing to update multiple values at once based on a condition.

Python
# Increase all small cities' populations by 10%
print('Small cities (before update):')
small_cities_mask = cities < 1000000
print(cities[small_cities_mask])
# Output:
# City
# Aswan     300000
# Luxor     500000

# Apply 10% increase
cities[small_cities_mask] = cities[small_cities_mask] * 1.1

print('Small cities (after 10% increase):')
print(cities[small_cities_mask])
# Output:
# City
# Aswan     330000.0
# Luxor     550000.0
▶ Output
City Aswan 300000 Luxor 500000 Name: Population, dtype: int64 City Aswan 330000.0 Luxor 550000.0 Name: Population, dtype: float64

Instead of writing a loop to check each element and conditionally update it, you express the entire operation in one line. Pandas handles the iteration internally, using fast NumPy operations.

📘
Vectorized Operations

When you write cities[small_cities_mask] * 1.1, Pandas multiplies every selected element by 1.1 in a single operation. No Python for-loop is involved. This pattern — operating on entire arrays at once — is called vectorization, and it is the foundation of Pandas performance. Vectorized code is shorter, clearer, and can be hundreds of times faster than equivalent loops on large datasets.

🔗 Combining Techniques

These selection methods can be combined for more complex operations. You can slice a Series and then filter the slice, or chain selections to narrow down to exactly the elements you need.

Python
import pandas as pd

cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True)

# Slice first 3 cities, then filter for population > 4 million
first_three = cities.iloc[0:3]
large_from_first_three = first_three[first_three > 4000000]
print(large_from_first_three)
# Output:
# City
# Cairo         9500000
# Alexandria    5200000
# Giza          4800000
▶ Output
City Cairo 9500000 Alexandria 5200000 Giza 4800000 Name: Population, dtype: int64

Each operation returns a new Series, which you can immediately use in the next operation. This makes it easy to build complex queries from simple building blocks.

  • loc accesses elements by label; iloc accesses by zero-based position.
  • With loc slicing, the stop label is included. With iloc slicing, the stop position is excluded — matching Python list behavior.
  • Boolean indexing uses a condition to produce a mask of True/False values, then filters the Series to return only matching elements.
  • You can select multiple elements by passing a list of labels to loc.
  • Series values can be modified using assignment with loc or iloc.
  • Boolean indexing combined with assignment enables vectorized updates — changing many values at once based on a condition.
  • These selection techniques can be combined and chained to build complex queries from simple operations.
Quick Check
?
You have a Series with city names as the index. Which code correctly retrieves only cities where the population exceeds 2,000,000?
📚External Resources