🔗
Pipeline Styles savefig NumPy Pandas Week 7

Topic 7.6: The Pipeline Payoff

Combining NumPy, Pandas, and Matplotlib into one seamless data-to-chart pipeline — plus styles and file export

🔗 Three Libraries, One Pipeline

In Week 6 you learned NumPy — fast numerical computation on arrays — and Pandas — labeled data structures for analysis. In Week 7 you have been learning Matplotlib — turning numbers into charts. This topic is where all three converge.

The critical insight is that you do not need to manually convert between these libraries. Matplotlib understands Pandas Series and NumPy arrays natively. You can pass a DataFrame column directly into ax.bar() or ax.plot() — no loops, no list conversion, no intermediate steps.

🔗 The Python Data Science Stack
NumPy
  • Fast numerical arrays
  • Mathematical operations
  • .to_numpy() bridges to Matplotlib
  • Foundation of Pandas internally
Pandas
  • Labeled DataFrames & Series
  • Data loading with read_csv()
  • groupby, filter, aggregate
  • Columns feed directly into Matplotlib
Matplotlib
  • Renders data as visual charts
  • Accepts arrays, Series, or lists
  • plt.style sets the visual theme
  • savefig() exports to file
ℹ️
The Transport Dataset

All examples use a dataset of 15 Egyptian cities with transport statistics: city, metro_lines, bus_routes, daily_passengers_m (millions), avg_speed_kmh, and co2_emissions_kt (kilotons).

Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load the Egyptian cities transport dataset
transport = pd.read_csv('../Datasets/7_6_transport_stats.csv')

print("Transport Stats Dataset:")
print(transport)
print("\nShape:", transport.shape)
print("Columns:", list(transport.columns))
▶ Output
Transport Stats Dataset: city metro_lines bus_routes daily_passengers_m avg_speed_kmh co2_emissions_kt 0 Cairo 3 120 4.50 22 890 1 Alexandria 1 45 1.20 28 310 2 Giza 1 38 0.85 24 240 3 Shubra El Kheima 0 22 0.52 19 145 4 Port Said 0 18 0.38 35 95 5 Suez 0 15 0.31 38 82 6 Luxor 0 12 0.22 42 58 7 Asyut 0 20 0.41 36 110 8 Mansoura 0 25 0.48 30 125 9 Tanta 0 22 0.44 29 115 10 Ismailia 0 14 0.27 40 70 11 Faiyum 0 16 0.33 32 88 12 Zagazig 0 18 0.35 31 92 13 Aswan 0 10 0.19 45 50 14 Minya 0 17 0.37 34 98 Shape: (15, 6) Columns: ['city', 'metro_lines', 'bus_routes', 'daily_passengers_m', 'avg_speed_kmh', 'co2_emissions_kt']
🐼 Pandas DataFrames Directly into Matplotlib

Before Pandas, creating a bar chart from data required loading values into a list, iterating through rows, and managing index positions manually. With Pandas, a single column is all Matplotlib needs — no loop, no conversion.

When you pass transport['city'] as the x-axis argument, Matplotlib interprets the Pandas Series as a sequence of category labels. When you pass transport['daily_passengers_m'] as the y-axis argument, it reads the numeric values. The library handles everything automatically.

Python
# Plot: daily passengers per city (bar chart using DataFrame column directly)
fig, ax = plt.subplots(figsize=(12, 5))

ax.bar(transport['city'], transport['daily_passengers_m'],
       color='steelblue', alpha=0.85)

ax.set_title('Daily Passengers by Egyptian City (Millions)', fontsize=14, fontweight='bold')
ax.set_xlabel('City')
ax.set_ylabel('Daily Passengers (Millions)')
ax.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("DataFrame column fed directly into ax.bar() — no loop, no manual list")
Chart output
▶ Output
DataFrame column fed directly into ax.bar() — no loop, no manual list
💡
fontweight='bold' on Titles

Adding fontweight='bold' to ax.set_title() makes the chart title visually heavier than the axis labels, establishing a clear typographic hierarchy. This small change makes a chart look significantly more professional.

🔢 NumPy Arrays — The .to_numpy() Bridge

Pandas Series and NumPy arrays are interchangeable in most Matplotlib operations. But when you explicitly need a NumPy array — perhaps for mathematical operations before plotting — the .to_numpy() method converts any Pandas Series to a bare NumPy ndarray.

Python
# Extract columns as NumPy arrays
speed     = transport['avg_speed_kmh'].to_numpy()
emissions = transport['co2_emissions_kt'].to_numpy()

print(f"Type of speed: {type(speed)}")

# Scatter: speed vs emissions
fig, ax = plt.subplots(figsize=(10, 6))

ax.scatter(speed, emissions, alpha=0.7)

ax.set_title('Speed vs. CO₂ Emissions — Egyptian Cities', fontsize=14, fontweight='bold')
ax.set_xlabel('Average Speed (km/h)')
ax.set_ylabel('CO₂ Emissions (kt)')

plt.tight_layout()
plt.show()
Chart output
▶ Output
Type of speed: <class 'numpy.ndarray'>

Both transport['avg_speed_kmh'] (a Pandas Series) and speed (a NumPy array) would work identically as input to ax.scatter(). The explicit conversion with .to_numpy() is useful when you need to apply NumPy functions to the data before plotting — like np.log(), np.cumsum(), or array slicing with pure NumPy syntax.

🎨 Professional Themes with plt.style.use()

Matplotlib's default appearance is functional but plain. The library ships with dozens of built-in themes that apply a complete visual makeover to any chart — changing background, gridlines, color palettes, and typography — in a single line of code.

Python
# Show available styles
print("Available styles (first 10):")
print(plt.style.available[:10])
print()

# Apply a style and redraw the bar chart
plt.style.use('ggplot')

fig, ax = plt.subplots(figsize=(12, 5))

ax.bar(transport['city'], transport['daily_passengers_m'],
       color='steelblue', alpha=0.9)

ax.set_title('Daily Passengers — ggplot style', fontsize=13, fontweight='bold')
ax.set_xlabel('City')
ax.set_ylabel('Daily Passengers (Millions)')
ax.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Reset to default
plt.style.use('default')
print("Style applied and reset to default.")
Chart output
⚠️
Styles Persist for the Entire Session

Calling plt.style.use('ggplot') applies that style to every subsequent chart in the same Python session or notebook. If you only want to style one chart, reset with plt.style.use('default') immediately after. For temporary styling, use the context manager: with plt.style.context('ggplot'): ...

'ggplot'
Gray background with clean white gridlines — popular for reports
'bmh'
Bayesian Methods for Hackers style — muted, academic feel
'fivethirtyeight'
FiveThirtyEight news site style — bold, journalistic
'dark_background'
Dark mode — white lines on black background, good for slides
'seaborn-v0_8'
Clean, modern scientific style
'default'
Matplotlib's original white background style — always available
💾 Saving Your Work — plt.savefig()

plt.show() displays a chart inside a Jupyter Notebook. But the notebook is not a deliverable — a manager, client, or colleague who doesn't have Python cannot open it. fig.savefig() writes the chart as a real image file (PNG, PDF, SVG) that can be embedded in any document or presentation.

Python
# Reuse the bar chart figure from above and save it to disk
fig.savefig('transport_report.png', dpi=150, bbox_inches='tight')

plt.show()
plt.style.use('default')

The two most important parameters are dpi and bbox_inches. DPI (dots per inch) controls resolution: 150 dpi is good for screen display; 300 dpi is standard for print. bbox_inches='tight' trims whitespace from the edges and ensures no labels are cut off — without it, axis labels and titles near the figure border may be partially clipped.

dpi=72
Screen-only resolution — small file size, lower clarity
dpi=150
Good for reports and presentations
dpi=300
Print-quality resolution — larger file size
bbox_inches='tight'
Trims excess whitespace and ensures all labels are included
'.png'
Raster format — good for reports, emails, slides
'.pdf'
Vector format — infinitely scalable, ideal for publications
🔬
savefig Must Come Before plt.show()

Always call fig.savefig() before plt.show(). After plt.show(), the figure is cleared from memory. Calling savefig after show would save a blank image. The sequence is: build chart → savefig → show.

?
Why must plt.savefig() be called before plt.show() when saving a chart?
  • Matplotlib natively accepts Pandas Series and NumPy arrays — pass DataFrame columns directly into chart functions with no conversion required.
  • Use .to_numpy() to explicitly convert a Pandas Series to a NumPy array when you need NumPy-specific operations before plotting.
  • plt.style.use('ggplot') applies a complete visual theme to all subsequent charts in the session — reset with plt.style.use('default').
  • Styles affect background, gridlines, colors, and typography in one line — they do not change your data or code logic.
  • fig.savefig('filename.png', dpi=150, bbox_inches='tight') writes the chart as a real image file, suitable for any report or presentation.
  • Always call fig.savefig() before plt.show() — after show, the figure is cleared from memory.
  • The complete Python data science pipeline: Pandas reads and structures data → Matplotlib renders it visually → savefig exports the deliverable.
📚External Resources