Topic 6.3: Series Indexing and Slicing
Selecting, filtering, and modifying data with precision
📂 Dataset for this topic — 6_3_city_populations.csv: ⬇ Download 6_3_city_populations.csv
In the previous topic, we learned how to create a Series with a custom index. Now the practical question: how do you retrieve specific values from it? Pandas provides two access methods — loc and iloc — each built for a different scenario.
loc uses labels. You pass the index label exactly as it appears in the Series. This keeps your code readable: cities.loc['Cairo'] reads like a natural sentence.
import pandas as pd # Load city population data cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True) print('Series with custom index:') print(cities) # Output: # City # Cairo 9500000 # Alexandria 5200000 # Giza 4800000 # Aswan 300000 # Luxor 500000 # Name: Population, dtype: int64 # Access by label print(cities.loc['Cairo']) # Output: 9500000
iloc uses positions — zero-indexed integers, just like Python lists. iloc[0] is the first element, iloc[-1] is the last.
# Access by position print(cities.iloc[0]) # First element # Output: 9500000 print(cities.iloc[-1]) # Last element # Output: 500000
Use loc when you know the label and want readable, self-documenting code. Use iloc when you care about order — for example, getting the first five rows or iterating through elements by position. On a Series with string labels, passing a number to loc will raise a KeyError because no label with that number exists.
Boolean indexing lets you filter data with conditions. Instead of specifying which elements you want by name or position, you describe them with a condition. Pandas evaluates that condition for every element and returns only those that match.
The process happens in two steps. First, you create a condition that returns a Series of boolean values — True where the condition is met, False where it is not. This boolean Series is called a mask. Second, you use that mask to filter the original data.
import pandas as pd cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True) # Step 1: Create the boolean mask mask = cities > 1000000 print('Boolean mask (population > 1 million):') print(mask) # Output: # City # Cairo True # Alexandria True # Giza True # Aswan False # Luxor False # Step 2: Apply the mask to filter large_cities = cities[mask] print('Cities with population > 1 million:') print(large_cities) # Output: # City # Cairo 9500000 # Alexandria 5200000 # Giza 4800000
You can combine both steps into a single line for cleaner code:
# One-line filtering large_cities = cities[cities > 1000000] print(large_cities)
Internally, Pandas uses NumPy for these operations. When you write cities > 1000000, NumPy evaluates this condition across the entire array in a single vectorized operation — no Python loops involved. This makes boolean indexing fast even on datasets with millions of rows.
Boolean indexing works with any comparison operator: <, <=, >, >=, ==, !=. You can also combine multiple conditions using & (and), | (or), and ~ (not):
# Cities with population between 500,000 and 5,000,000 medium_cities = cities[(cities>=500000) & (cities<=5000000)] print(medium_cities) # Output: # City # Giza 4800000 # Luxor 500000 # Note: Use & and | instead of 'and' and 'or' # Also, always use parentheses around each condition
Sometimes you know exactly which elements you want, and you want to select them by name. Instead of passing a single label to loc, pass a list of labels. Pandas returns a new Series containing only those elements.
# Select specific cities by name selected = cities.loc[['Alexandria', 'Aswan']] print(selected) # Output: # City # Alexandria 5200000 # Aswan 300000
This works well when you have a predefined set of items to analyze — for example, comparing a specific group of products, or tracking certain stocks from a larger portfolio.
Slicing lets you select a contiguous range of elements from a Series. It works similarly to Python list slicing, but with an important difference depending on whether you use iloc or loc.
With iloc (position-based slicing), the behavior matches standard Python slicing: the start is included, the stop is excluded.
import pandas as pd cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True) # Position-based slicing with iloc print('First 3 cities using iloc[0:3]:') print(cities.iloc[0:3]) # Output: positions 0, 1, 2 (position 3 is excluded) # City # Cairo 9500000 # Alexandria 5200000 # Giza 4800000 # Last 2 cities using negative indexing print('Last 2 cities using iloc[-2:]:') print(cities.iloc[-2:]) # Output: # City # Aswan 300000 # Luxor 500000
With loc (label-based slicing), both the start and stop labels are included in the result. This is different from Python's default slicing behavior.
# Label-based slicing with loc print("Cairo to Giza using loc['Cairo':'Giza']:") print(cities.loc['Cairo':'Giza']) # Output: Cairo, Alexandria, AND Giza (all included) # City # Cairo 9500000 # Alexandria 5200000 # Giza 4800000
With iloc, the stop position is excluded: iloc[0:3] returns 3 elements (positions 0, 1, 2). With loc, the stop label is included: loc['Cairo':'Giza'] returns Cairo, Alexandria, and Giza. This difference exists because labels are not necessarily ordered or numeric. When you slice by label, Pandas assumes you want to include both endpoints.
You can omit either the start or stop to slice from the beginning or to the end:
# From start to position 2 (excluded) print(cities.iloc[:2]) # Output: Cairo, Alexandria # From position 3 to end print(cities.iloc[3:]) # Output: Aswan, Luxor
Series are mutable — you can change their values after creation. This is essential for data cleaning, updating records, or applying transformations.
To update a single value, use loc or iloc with assignment:
import pandas as pd cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True) # Update Cairo's population print('Original Cairo population:', cities.loc['Cairo']) # Output: 9500000 cities.loc['Cairo'] = 11000000 print('Updated Cairo population:', cities.loc['Cairo']) # Output: 11000000
You can combine modification with boolean indexing to update multiple values at once based on a condition.
# Increase all small cities' populations by 10% print('Small cities (before update):') small_cities_mask = cities < 1000000 print(cities[small_cities_mask]) # Output: # City # Aswan 300000 # Luxor 500000 # Apply 10% increase cities[small_cities_mask] = cities[small_cities_mask] * 1.1 print('Small cities (after 10% increase):') print(cities[small_cities_mask]) # Output: # City # Aswan 330000.0 # Luxor 550000.0
Instead of writing a loop to check each element and conditionally update it, you express the entire operation in one line. Pandas handles the iteration internally, using fast NumPy operations.
When you write cities[small_cities_mask] * 1.1, Pandas multiplies every selected element by 1.1 in a single operation. No Python for-loop is involved. This pattern — operating on entire arrays at once — is called vectorization, and it is the foundation of Pandas performance. Vectorized code is shorter, clearer, and can be hundreds of times faster than equivalent loops on large datasets.
These selection methods can be combined for more complex operations. You can slice a Series and then filter the slice, or chain selections to narrow down to exactly the elements you need.
import pandas as pd cities = pd.read_csv('6_3_city_populations.csv', index_col='City', squeeze=True) # Slice first 3 cities, then filter for population > 4 million first_three = cities.iloc[0:3] large_from_first_three = first_three[first_three > 4000000] print(large_from_first_three) # Output: # City # Cairo 9500000 # Alexandria 5200000 # Giza 4800000
Each operation returns a new Series, which you can immediately use in the next operation. This makes it easy to build complex queries from simple building blocks.
- loc accesses elements by label; iloc accesses by zero-based position.
- With loc slicing, the stop label is included. With iloc slicing, the stop position is excluded — matching Python list behavior.
- Boolean indexing uses a condition to produce a mask of True/False values, then filters the Series to return only matching elements.
- You can select multiple elements by passing a list of labels to loc.
- Series values can be modified using assignment with loc or iloc.
- Boolean indexing combined with assignment enables vectorized updates — changing many values at once based on a condition.
- These selection techniques can be combined and chained to build complex queries from simple operations.
- ↗ Pandas Documentation: Indexing and Selecting Data
https://pandas.pydata.org/docs/user_guide/indexing.html - ↗ Pandas Documentation: Boolean Indexing
https://pandas.pydata.org/docs/user_guide/indexing.html#boolean-indexing - ↗ Pandas Documentation: pandas.Series.loc
https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html