What is database partitioning and its types?

Answer

Database partitioning is a technique that divides large tables or indexes into smaller, more manageable pieces called partitions. Each partition can be stored separately and managed independently while appearing as a single logical table to applications.

Benefits of Partitioning

Performance Benefits

-- Query performance improvement through partition elimination
SELECT * FROM sales_partitioned 
WHERE sale_date BETWEEN '2024-01-01' AND '2024-01-31';
-- Only January partition is scanned, not entire table

-- Parallel processing across partitions
SELECT region, SUM(sales_amount)
FROM sales_partitioned
GROUP BY region;
-- Each partition can be processed in parallel

Management Benefits

-- Partition-level maintenance operations
-- Drop old data quickly
ALTER TABLE sales_partitioned DROP PARTITION p_2020;

-- Backup specific partitions
BACKUP TABLE sales_partitioned PARTITION (p_2024_q1);

-- Index maintenance per partition
ALTER INDEX IX_sales_region ON sales_partitioned 
REBUILD PARTITION = p_2024_q1;

Types of Partitioning

1. Range Partitioning

Partitions data based on value ranges.

-- PostgreSQL range partitioning by date
CREATE TABLE sales (
    sale_id SERIAL,
    sale_date DATE,
    customer_id INT,
    amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);

-- Create partitions for each quarter
CREATE TABLE sales_2024_q1 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE sales_2024_q2 PARTITION OF sales
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

CREATE TABLE sales_2024_q3 PARTITION OF sales
FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');

CREATE TABLE sales_2024_q4 PARTITION OF sales
FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');

-- SQL Server range partitioning
CREATE PARTITION FUNCTION pf_sales_date (DATE)
AS RANGE RIGHT FOR VALUES 
('2024-01-01', '2024-04-01', '2024-07-01', '2024-10-01');

CREATE PARTITION SCHEME ps_sales_date
AS PARTITION pf_sales_date
TO (fg_2023, fg_2024_q1, fg_2024_q2, fg_2024_q3, fg_2024_q4);

CREATE TABLE sales (
    sale_id INT IDENTITY(1,1),
    sale_date DATE,
    customer_id INT,
    amount DECIMAL(10,2)
) ON ps_sales_date(sale_date);

2. Hash Partitioning

Distributes data evenly across partitions using a hash function.

-- PostgreSQL hash partitioning
CREATE TABLE customers (
    customer_id SERIAL,
    customer_name VARCHAR(100),
    email VARCHAR(100),
    registration_date DATE
) PARTITION BY HASH (customer_id);

-- Create hash partitions
CREATE TABLE customers_p0 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE customers_p1 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE customers_p2 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE customers_p3 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 3);

-- MySQL hash partitioning
CREATE TABLE customers (
    customer_id INT AUTO_INCREMENT,
    customer_name VARCHAR(100),
    email VARCHAR(100),
    registration_date DATE,
    PRIMARY KEY (customer_id)
)
PARTITION BY HASH(customer_id)
PARTITIONS 4;

3. List Partitioning

Partitions data based on explicit lists of values.

-- PostgreSQL list partitioning by region
CREATE TABLE orders (
    order_id SERIAL,
    customer_id INT,
    region VARCHAR(20),
    order_date DATE,
    amount DECIMAL(10,2)
) PARTITION BY LIST (region);

-- Create partitions for each region
CREATE TABLE orders_north PARTITION OF orders
FOR VALUES IN ('North', 'Northeast', 'Northwest');

CREATE TABLE orders_south PARTITION OF orders
FOR VALUES IN ('South', 'Southeast', 'Southwest');

CREATE TABLE orders_east PARTITION OF orders
FOR VALUES IN ('East', 'Central East');

CREATE TABLE orders_west PARTITION OF orders
FOR VALUES IN ('West', 'Central West');

-- Default partition for unspecified values
CREATE TABLE orders_other PARTITION OF orders DEFAULT;

4. Composite Partitioning

Combines multiple partitioning methods.

-- Range-Hash composite partitioning
CREATE TABLE sales_composite (
    sale_id SERIAL,
    sale_date DATE,
    customer_id INT,
    region VARCHAR(20),
    amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);

-- Create range partitions first
CREATE TABLE sales_2024 PARTITION OF sales_composite
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
PARTITION BY HASH (customer_id);

-- Then create hash sub-partitions
CREATE TABLE sales_2024_h0 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE sales_2024_h1 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE sales_2024_h2 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE sales_2024_h3 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Partitioning Strategies by Use Case

Time-Series Data

-- Monthly partitioning for time-series data
CREATE TABLE sensor_data (
    sensor_id INT,
    reading_time TIMESTAMP,
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2),
    pressure DECIMAL(8,2)
) PARTITION BY RANGE (reading_time);

-- Automated partition creation function
CREATE OR REPLACE FUNCTION create_monthly_partition(
    table_name TEXT,
    start_date DATE
) RETURNS VOID AS $$
DECLARE
    partition_name TEXT;
    end_date DATE;
BEGIN
    partition_name := table_name || '_' || to_char(start_date, 'YYYY_MM');
    end_date := start_date + INTERVAL '1 month';
    
    EXECUTE format('CREATE TABLE %I PARTITION OF %I
                    FOR VALUES FROM (%L) TO (%L)',
                   partition_name, table_name, start_date, end_date);
END;
$$ LANGUAGE plpgsql;

-- Create partitions for the year
SELECT create_monthly_partition('sensor_data', date_trunc('month', CURRENT_DATE) + (n || ' months')::INTERVAL)
FROM generate_series(0, 11) n;

Geographic Distribution

-- Partition by geographic regions
CREATE TABLE user_activities (
    activity_id BIGSERIAL,
    user_id INT,
    country_code CHAR(2),
    activity_type VARCHAR(50),
    activity_timestamp TIMESTAMP,
    data JSONB
) PARTITION BY LIST (country_code);

-- Create regional partitions
CREATE TABLE user_activities_us PARTITION OF user_activities
FOR VALUES IN ('US');

CREATE TABLE user_activities_eu PARTITION OF user_activities
FOR VALUES IN ('GB', 'DE', 'FR', 'IT', 'ES', 'NL', 'SE', 'NO', 'DK');

CREATE TABLE user_activities_asia PARTITION OF user_activities
FOR VALUES IN ('JP', 'KR', 'CN', 'IN', 'SG', 'HK');

CREATE TABLE user_activities_other PARTITION OF user_activities DEFAULT;

High-Volume Transactional Data

-- Hash partitioning for even distribution
CREATE TABLE transactions (
    transaction_id BIGSERIAL,
    account_id BIGINT,
    transaction_type VARCHAR(20),
    amount DECIMAL(15,2),
    transaction_date TIMESTAMP,
    description TEXT
) PARTITION BY HASH (account_id);

-- Create multiple hash partitions for load distribution
DO $$
BEGIN
    FOR i IN 0..15 LOOP
        EXECUTE format('CREATE TABLE transactions_p%s PARTITION OF transactions
                        FOR VALUES WITH (MODULUS 16, REMAINDER %s)', i, i);
    END LOOP;
END $$;

Partition Management

Automatic Partition Creation

-- PostgreSQL: pg_partman extension for automated management
CREATE EXTENSION pg_partman;

-- Set up automatic monthly partitioning
SELECT partman.create_parent(
    p_parent_table => 'public.sales',
    p_control => 'sale_date',
    p_type => 'range',
    p_interval => 'monthly',
    p_premake => 3  -- Create 3 future partitions
);

-- SQL Server: Automated partition management
CREATE FUNCTION [dbo].[fn_partition_range](@date DATE)
RETURNS INT
AS
BEGIN
    RETURN YEAR(@date) * 100 + MONTH(@date)
END

-- Create sliding window maintenance job
EXEC sp_add_job @job_name = 'Partition Maintenance';

Partition Pruning and Elimination

-- Query that benefits from partition elimination
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(amount)
FROM sales
WHERE sale_date BETWEEN '2024-06-01' AND '2024-06-30'
GROUP BY customer_id;

-- Execution plan shows:
-- -> Seq Scan on sales_2024_q2 (actual rows=50000)
-- Partitions eliminated: 3 (sales_2024_q1, q3, q4)

Partition-wise Joins

-- Tables partitioned on same key enable partition-wise joins
CREATE TABLE orders (
    order_id SERIAL,
    customer_id INT,
    order_date DATE,
    amount DECIMAL(10,2)
) PARTITION BY HASH (customer_id);

CREATE TABLE order_items (
    item_id SERIAL,
    order_id INT,
    customer_id INT,  -- Same partition key
    product_id INT,
    quantity INT
) PARTITION BY HASH (customer_id);

-- Join can be performed partition-wise
SELECT o.order_id, oi.product_id, oi.quantity
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.customer_id = 12345;

Advanced Partitioning Techniques

Interval Partitioning

-- Oracle-style interval partitioning (PostgreSQL equivalent)
CREATE TABLE sales_interval (
    sale_id SERIAL,
    sale_date DATE,
    amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);

-- Create initial partition
CREATE TABLE sales_202401 PARTITION OF sales_interval
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

-- Function to auto-create partitions
CREATE OR REPLACE FUNCTION create_partition_if_not_exists()
RETURNS TRIGGER AS $$
DECLARE
    partition_date DATE;
    partition_name TEXT;
    start_date DATE;
    end_date DATE;
BEGIN
    partition_date := date_trunc('month', NEW.sale_date);
    partition_name := 'sales_' || to_char(partition_date, 'YYYYMM');
    start_date := partition_date;
    end_date := partition_date + INTERVAL '1 month';
    
    -- Check if partition exists
    IF NOT EXISTS (
        SELECT 1 FROM pg_class WHERE relname = partition_name
    ) THEN
        EXECUTE format('CREATE TABLE %I PARTITION OF sales_interval
                        FOR VALUES FROM (%L) TO (%L)',
                       partition_name, start_date, end_date);
    END IF;
    
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Trigger for automatic partition creation
CREATE TRIGGER trigger_create_partition
    BEFORE INSERT ON sales_interval
    FOR EACH ROW EXECUTE FUNCTION create_partition_if_not_exists();

Reference Partitioning

-- Child table inherits partitioning from parent
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100),
    region VARCHAR(20)
) PARTITION BY LIST (region);

CREATE TABLE customer_orders (
    order_id SERIAL,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE,
    amount DECIMAL(10,2)
) PARTITION BY LIST (region);  -- Same partitioning scheme

-- Partitions are created with matching constraints
CREATE TABLE customers_north PARTITION OF customers
FOR VALUES IN ('North');

CREATE TABLE customer_orders_north PARTITION OF customer_orders
FOR VALUES IN ('North');

Performance Optimization

Partition-wise Statistics

-- Collect statistics per partition
ANALYZE sales_2024_q1;
ANALYZE sales_2024_q2;
ANALYZE sales_2024_q3;
ANALYZE sales_2024_q4;

-- Automated statistics collection
DO $$
DECLARE
    partition_name TEXT;
BEGIN
    FOR partition_name IN 
        SELECT schemaname||'.'||tablename 
        FROM pg_tables 
        WHERE tablename LIKE 'sales_2024%'
    LOOP
        EXECUTE 'ANALYZE ' || partition_name;
    END LOOP;
END $$;

Constraint Exclusion

-- Enable constraint exclusion for better performance
SET constraint_exclusion = partition;

-- Add check constraints to ensure partition elimination
ALTER TABLE sales_2024_q1 
ADD CONSTRAINT chk_sales_2024_q1_date 
CHECK (sale_date >= '2024-01-01' AND sale_date < '2024-04-01');

Parallel Processing

-- Configure parallel processing for partitioned tables
SET max_parallel_workers_per_gather = 4;
SET parallel_tuple_cost = 0.1;
SET parallel_setup_cost = 1000;

-- Query that benefits from parallel partition processing
SELECT 
    date_trunc('month', sale_date) as month,
    COUNT(*) as transaction_count,
    SUM(amount) as total_amount
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY date_trunc('month', sale_date)
ORDER BY month;

Monitoring and Maintenance

Partition Information Queries

-- PostgreSQL: View partition information
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables 
WHERE tablename LIKE 'sales_%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Check partition constraints
SELECT 
    t.tablename,
    c.conname,
    c.consrc
FROM pg_tables t
JOIN pg_constraint c ON c.conrelid = (t.schemaname||'.'||t.tablename)::regclass
WHERE t.tablename LIKE 'sales_%'
AND c.contype = 'c';

Partition Maintenance Scripts

-- Drop old partitions
CREATE OR REPLACE FUNCTION drop_old_partitions(
    table_pattern TEXT,
    retention_months INT
) RETURNS VOID AS $$
DECLARE
    partition_name TEXT;
    cutoff_date DATE;
BEGIN
    cutoff_date := CURRENT_DATE - (retention_months || ' months')::INTERVAL;
    
    FOR partition_name IN
        SELECT tablename 
        FROM pg_tables 
        WHERE tablename ~ table_pattern
        AND tablename < table_pattern || '_' || to_char(cutoff_date, 'YYYY_MM')
    LOOP
        EXECUTE 'DROP TABLE ' || partition_name;
        RAISE NOTICE 'Dropped partition: %', partition_name;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT drop_old_partitions('sales', 24);  -- Keep 24 months

Best Practices

1. Choose Appropriate Partition Key

-- Good: Frequently filtered column with good distribution
PARTITION BY RANGE (order_date)  -- Time-series data

-- Good: Even distribution
PARTITION BY HASH (customer_id)  -- Large customer base

-- Avoid: Low cardinality columns
-- PARTITION BY LIST (status)  -- Only few status values

2. Partition Size Guidelines

-- Target partition sizes:
-- OLTP: 2-50 GB per partition
-- OLAP: 10-100 GB per partition
-- Time-series: Based on query patterns (daily/monthly/yearly)

-- Monitor partition sizes
SELECT 
    tablename,
    pg_size_pretty(pg_total_relation_size(tablename::regclass)) as size,
    (SELECT COUNT(*) FROM tablename::regclass) as row_count
FROM pg_tables 
WHERE tablename LIKE 'sales_%';

3. Index Strategy

-- Create indexes on each partition
CREATE INDEX CONCURRENTLY idx_sales_2024_q1_customer 
ON sales_2024_q1 (customer_id);

CREATE INDEX CONCURRENTLY idx_sales_2024_q1_amount 
ON sales_2024_q1 (amount);

-- Or use partitioned indexes (PostgreSQL 11+)
CREATE INDEX idx_sales_customer ON sales (customer_id);
-- Automatically creates matching indexes on all partitions

Interview Tips

Understand different partitioning types and when to use each
Know the benefits: performance, manageability, parallel processing
Be familiar with partition elimination and pruning concepts
Understand the trade-offs: complexity vs performance gains
Know how to monitor and maintain partitioned tables
Practice designing partitioning strategies for different scenarios
Be aware of database-specific partitioning features and limitations

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.