📊
Pandas Series Data Structures Python

Topic 6.2: Pandas Series

The fundamental building block of Pandas.

🎯 What is a Pandas Series?

When you work with Data in Pandas, you usually work with a DataFrame — a table with rows and columns. But what happens when you extract just one column from that table? You get a Series.

A Series is the fundamental building block of Pandas. Think of it as an intelligent column — not just a simple list of values, but a data structure that comes with labels, methods, and powerful capabilities built in.

What makes a Series different from a regular Python list? Three key features. First, it has an index — labels that identify each value. Second, it has a data type (dtype) that ensures all values are the same type. Third, it comes packed with methods for analysis and manipulation that would take dozens of lines of code to replicate with basic Python.

Python
import pandas as pd

# Create a Series of Egyptian city populations
population = pd.Series([10200000, 5400000, 9000000, 1500000, 500000])

print(population)
▶ Output
0 10200000 1 5400000 2 9000000 3 1500000 4 500000 dtype: int64

Notice how the Series displays — vertically, one value per line. On the left, you see the index (row positions). On the right, you see the values (population numbers). At the bottom, you see the data type. This structure makes it easy to access specific data points by their labels rather than just numeric positions.

🔨 Creating a Series

You can create a Series in multiple ways, each useful in different situations. The most direct method is using pd.Series() and passing it a list of values.

Python
# Method 1: From a list with automatic index
populations = pd.Series([10200000, 5400000, 9000000, 1500000])

print(populations)
▶ Output
0 10200000 1 5400000 2 9000000 3 1500000 dtype: int64

When you create a Series without specifying an index, Pandas automatically assigns numeric labels starting from zero. This works, but lacks context. Custom labels give meaning to each value.

Python
# Method 2: From a list with custom index
populations = pd.Series(
    [10200000, 5400000, 9000000, 1500000],
    index=['Cairo', 'Alexandria', 'Giza', 'Aswan']
)

print(populations)
▶ Output
Cairo 10200000 Alexandria 5400000 Giza 9000000 Aswan 1500000 dtype: int64

Now each population figure has a clear label. You can ask for the population of Cairo without needing to know that Cairo is at position zero. This makes your code more readable and less prone to errors.

Method 3: Creating a Series from a Dictionary

Python
# Method 3: From a dictionary
city_data = {
    'Cairo': 10200000,
    'Alexandria': 5400000,
    'Giza': 9000000,
    'Aswan': 1500000
}

populations = pd.Series(city_data)

print(populations)
▶ Output
Cairo 10200000 Alexandria 5400000 Giza 9000000 Aswan 1500000 dtype: int64

Creating a Series from a dictionary works well when your Data naturally comes in key-value pairs. Pandas uses each key as the index label and each value as the corresponding Series value.

Method 4: Creating a Series from a Scalar Value

Python
# Method 4: From a scalar value (same value repeated)
populations = pd.Series(
    0,
    index=['Cairo', 'Alexandria', 'Giza', 'Aswan']
)

print(populations)
▶ Output
Cairo 0 Alexandria 0 Giza 0 Aswan 0 dtype: int64

When you pass a single scalar value to pd.Series(), Pandas repeats that value for every label in the index. This is useful when you want to initialise a Series with a default value — for example, setting all cities to zero before filling in real data later.

🔍 Index and Values

Every Series has two main components: the index and the values. These work together to enable flexible data manipulation.

The index is the set of labels for your Data. It can be numbers, strings, dates, or any other data type. When you print a Series, the index appears on the left side. You can access the index separately using the .index attribute.

Python
populations = pd.Series(
    [10200000, 5400000, 9000000, 1500000],
    index=['Cairo', 'Alexandria', 'Giza', 'Aswan']
)

# Access the index
print(populations.index)

# Convert to a list if needed
city_names = populations.index.tolist()
print(city_names)
▶ Output
Index(['Cairo', 'Alexandria', 'Giza', 'Aswan'], dtype='object') ['Cairo', 'Alexandria', 'Giza', 'Aswan']

The values are the actual data points — the numbers, strings, or other information you're analyzing. These are stored internally as a NumPy array, which is why Pandas operations are so fast. You can access the values using the .values attribute.

Python
# Access the values
print(populations.values)

# This returns a NumPy array
print(type(populations.values))
▶ Output
[10200000 5400000 9000000 1500000] <class 'numpy.ndarray'>

The values are stored internally as a NumPy array. This is why Pandas operations are so fast — NumPy is highly optimised for numerical computation. The output confirms both the array contents and its type.

Two additional attributes help you understand your Series further. The dtype tells you the data type stored in the Series — int64 for large integers, float64 for decimal numbers, object for strings. The type() function tells you the Python type of the Series object itself — confirming it is a pandas.core.series.Series. The name attribute is an optional label that identifies what the Series represents. You can assign a name to make your data more descriptive and self-explanatory.

Python
# Check the data type (dtype)
print(populations.dtype)

# Check the Python type of the Series object
print(type(populations))

# Give the Series a name
populations.name = 'Population'
print(populations.name)
▶ Output
int64 <class 'pandas.core.series.Series'> Population

The output shows three lines. The first line int64 is the dtype — it tells you the values in this Series are 64-bit integers. The second line <class 'pandas.core.series.Series'> is the Python type of the object itself, confirming you are working with a Pandas Series and not a plain list or NumPy array. The third line Population is the name you assigned — Pandas stores this label and uses it when displaying or joining data. Knowing the dtype helps you avoid type errors in calculations, and the name makes your Series self-documenting when it is part of a larger analysis.

  • A Series is a one-dimensional labeled array — the fundamental building block of Pandas.
  • Every Series has an index (labels) and values (data), plus a dtype and an optional name.
  • You can create a Series from a list with an automatic index (Method 1), a list with a custom index (Method 2), a dictionary (Method 3), or a scalar value repeated across an index (Method 4).
  • The .index attribute returns the labels; the .values attribute returns a NumPy array.
  • The dtype attribute tells you the data type stored in the Series (e.g. int64, float64, object).
  • The type() function confirms the Python type of the Series object itself.
  • The name attribute gives the Series an optional label to make it more descriptive.
  • Series are built on NumPy arrays, which makes bulk numerical operations fast.
Quick Check
?
You create the following Series: — scores = pd.Series([85, 92, 78], index=['Ali', 'Sara', 'Mona']) — What does scores.values return?
📚External Resources