Skip to content

Python for Analytics

Python fundamentals for data analysts, covering core syntax, NumPy for numerical operations, and Jupyter Notebook workflow. This complements SQL skills and enables more flexible data manipulation than BI tools alone.

Key Facts

  • Python is dynamically typed (unlike Kotlin/Java which are statically typed)
  • NumPy provides vectorized operations that are orders of magnitude faster than Python loops
  • Jupyter Notebook is the standard environment for analytical Python work
  • Key libraries: NumPy (numerical), pandas (tabular data), matplotlib/seaborn (visualization)
  • List comprehensions and dict comprehensions provide concise data transformation syntax

Patterns

Core Data Types and Structures

# Basic types
x = 5           # int
y = 3.14        # float
name = "Alice"  # str
active = True   # bool
nothing = None  # NoneType

# Lists
items = [1, 2, 3, 4, 5]
items.append(6)
items[0]         # first
items[-1]        # last
items[1:3]       # slice [2, 3]
squares = [x**2 for x in range(10)]
evens = [x for x in items if x % 2 == 0]

# Dictionaries
user = {"id": 1, "name": "Alice", "age": 28}
user.get("email", "")  # safe access with default
word_counts = {word: len(word) for word in ["hello", "world"]}

Functions

def calculate_ltv(avg_order, frequency, months):
    """Calculate customer lifetime value."""
    return avg_order * frequency * months

def get_stats(numbers):
    return min(numbers), max(numbers), sum(numbers) / len(numbers)

min_val, max_val, avg = get_stats([1, 2, 3, 4, 5])

NumPy

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
zeros = np.zeros(10)
ones = np.ones((3, 4))

# Vectorized operations (no loops needed)
arr * 2          # [2, 4, 6, 8, 10]
arr ** 2         # [1, 4, 9, 16, 25]
np.sqrt(arr)     # element-wise sqrt

# Statistics
np.mean(arr)
np.median(arr)
np.std(arr)
np.percentile(arr, [25, 50, 75])

# 2D arrays (matrices)
matrix = np.array([[1, 2], [3, 4]])
matrix.shape     # (2, 2)
matrix.T         # transpose
matrix @ matrix  # matrix multiplication

# Boolean masking
arr[arr > 3]     # [4, 5]

Jupyter Notebook Workflow

Key shortcuts: - Shift+Enter - run cell, move to next - Ctrl+Enter - run cell, stay - A / B - insert cell above/below (command mode) - DD - delete cell - M - convert to Markdown; Y - convert to Code

Best practices: - Restart kernel and run all cells before sharing - Use Markdown cells for documentation - Keep cells focused (one concept per cell) - Name variables descriptively

Gotchas

  • Python integer division: 5 / 2 = 2.5 (float) but 5 // 2 = 2 (integer) - this differs from many languages
  • Mutable default arguments in functions are a classic trap: def f(lst=[]) shares the list across calls
  • NumPy arrays have fixed dtype - mixing types causes silent upcasting (ints become floats if any float present)
  • Jupyter cells can run out of order, creating hidden state bugs - restart kernel regularly

See Also