Topic 6.2: Pandas Series
The fundamental building block of Pandas.
When you work with Data in Pandas, you usually work with a DataFrame — a table with rows and columns. But what happens when you extract just one column from that table? You get a Series.
A Series is the fundamental building block of Pandas. Think of it as an intelligent column — not just a simple list of values, but a data structure that comes with labels, methods, and powerful capabilities built in.
What makes a Series different from a regular Python list? Three key features. First, it has an index — labels that identify each value. Second, it has a data type (dtype) that ensures all values are the same type. Third, it comes packed with methods for analysis and manipulation that would take dozens of lines of code to replicate with basic Python.
import pandas as pd # Create a Series of Egyptian city populations population = pd.Series([10200000, 5400000, 9000000, 1500000, 500000]) print(population)
Notice how the Series displays — vertically, one value per line. On the left, you see the index (row positions). On the right, you see the values (population numbers). At the bottom, you see the data type. This structure makes it easy to access specific data points by their labels rather than just numeric positions.
You can create a Series in multiple ways, each useful in different situations. The most direct method is using pd.Series() and passing it a list of values.
# Method 1: From a list with automatic index populations = pd.Series([10200000, 5400000, 9000000, 1500000]) print(populations)
When you create a Series without specifying an index, Pandas automatically assigns numeric labels starting from zero. This works, but lacks context. Custom labels give meaning to each value.
# Method 2: From a list with custom index populations = pd.Series( [10200000, 5400000, 9000000, 1500000], index=['Cairo', 'Alexandria', 'Giza', 'Aswan'] ) print(populations)
Now each population figure has a clear label. You can ask for the population of Cairo without needing to know that Cairo is at position zero. This makes your code more readable and less prone to errors.
Method 3: Creating a Series from a Dictionary
# Method 3: From a dictionary city_data = { 'Cairo': 10200000, 'Alexandria': 5400000, 'Giza': 9000000, 'Aswan': 1500000 } populations = pd.Series(city_data) print(populations)
Creating a Series from a dictionary works well when your Data naturally comes in key-value pairs. Pandas uses each key as the index label and each value as the corresponding Series value.
Method 4: Creating a Series from a Scalar Value
# Method 4: From a scalar value (same value repeated) populations = pd.Series( 0, index=['Cairo', 'Alexandria', 'Giza', 'Aswan'] ) print(populations)
When you pass a single scalar value to pd.Series(), Pandas repeats that value for every label in the index. This is useful when you want to initialise a Series with a default value — for example, setting all cities to zero before filling in real data later.
Every Series has two main components: the index and the values. These work together to enable flexible data manipulation.
The index is the set of labels for your Data. It can be numbers, strings, dates, or any other data type. When you print a Series, the index appears on the left side. You can access the index separately using the .index attribute.
populations = pd.Series( [10200000, 5400000, 9000000, 1500000], index=['Cairo', 'Alexandria', 'Giza', 'Aswan'] ) # Access the index print(populations.index) # Convert to a list if needed city_names = populations.index.tolist() print(city_names)
The values are the actual data points — the numbers, strings, or other information you're analyzing. These are stored internally as a NumPy array, which is why Pandas operations are so fast. You can access the values using the .values attribute.
# Access the values print(populations.values) # This returns a NumPy array print(type(populations.values))
The values are stored internally as a NumPy array. This is why Pandas operations are so fast — NumPy is highly optimised for numerical computation. The output confirms both the array contents and its type.
Two additional attributes help you understand your Series further. The dtype tells you the data type stored in the Series — int64 for large integers, float64 for decimal numbers, object for strings. The type() function tells you the Python type of the Series object itself — confirming it is a pandas.core.series.Series. The name attribute is an optional label that identifies what the Series represents. You can assign a name to make your data more descriptive and self-explanatory.
# Check the data type (dtype) print(populations.dtype) # Check the Python type of the Series object print(type(populations)) # Give the Series a name populations.name = 'Population' print(populations.name)
The output shows three lines. The first line int64 is the dtype — it tells you the values in this Series are 64-bit integers. The second line <class 'pandas.core.series.Series'> is the Python type of the object itself, confirming you are working with a Pandas Series and not a plain list or NumPy array. The third line Population is the name you assigned — Pandas stores this label and uses it when displaying or joining data. Knowing the dtype helps you avoid type errors in calculations, and the name makes your Series self-documenting when it is part of a larger analysis.
- A Series is a one-dimensional labeled array — the fundamental building block of Pandas.
- Every Series has an index (labels) and values (data), plus a dtype and an optional name.
- You can create a Series from a list with an automatic index (Method 1), a list with a custom index (Method 2), a dictionary (Method 3), or a scalar value repeated across an index (Method 4).
- The .index attribute returns the labels; the .values attribute returns a NumPy array.
- The dtype attribute tells you the data type stored in the Series (e.g. int64, float64, object).
- The type() function confirms the Python type of the Series object itself.
- The name attribute gives the Series an optional label to make it more descriptive.
- Series are built on NumPy arrays, which makes bulk numerical operations fast.
- ↗ pandas.Series — API Reference
https://pandas.pydata.org/docs/reference/api/pandas.Series.html - ↗ Intro to Data Structures — Pandas User Guide
https://pandas.pydata.org/docs/user_guide/dsintro.html - ↗ Real Python — Pandas: Explore Your Dataset
https://realpython.com/pandas-python-explore-dataset/