What is database partitioning and its types?
Answer
Database partitioning is a technique that divides large tables or indexes into smaller, more manageable pieces called partitions. Each partition can be stored separately and managed independently while appearing as a single logical table to applications.
Benefits of Partitioning
Performance Benefits
-- Query performance improvement through partition elimination
SELECT * FROM sales_partitioned
WHERE sale_date BETWEEN '2024-01-01' AND '2024-01-31';
-- Only January partition is scanned, not entire table
-- Parallel processing across partitions
SELECT region, SUM(sales_amount)
FROM sales_partitioned
GROUP BY region;
-- Each partition can be processed in parallel
Management Benefits
-- Partition-level maintenance operations
-- Drop old data quickly
ALTER TABLE sales_partitioned DROP PARTITION p_2020;
-- Backup specific partitions
BACKUP TABLE sales_partitioned PARTITION (p_2024_q1);
-- Index maintenance per partition
ALTER INDEX IX_sales_region ON sales_partitioned
REBUILD PARTITION = p_2024_q1;
Types of Partitioning
1. Range Partitioning
Partitions data based on value ranges.
-- PostgreSQL range partitioning by date
CREATE TABLE sales (
sale_id SERIAL,
sale_date DATE,
customer_id INT,
amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);
-- Create partitions for each quarter
CREATE TABLE sales_2024_q1 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE sales_2024_q2 PARTITION OF sales
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
CREATE TABLE sales_2024_q3 PARTITION OF sales
FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');
CREATE TABLE sales_2024_q4 PARTITION OF sales
FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');
-- SQL Server range partitioning
CREATE PARTITION FUNCTION pf_sales_date (DATE)
AS RANGE RIGHT FOR VALUES
('2024-01-01', '2024-04-01', '2024-07-01', '2024-10-01');
CREATE PARTITION SCHEME ps_sales_date
AS PARTITION pf_sales_date
TO (fg_2023, fg_2024_q1, fg_2024_q2, fg_2024_q3, fg_2024_q4);
CREATE TABLE sales (
sale_id INT IDENTITY(1,1),
sale_date DATE,
customer_id INT,
amount DECIMAL(10,2)
) ON ps_sales_date(sale_date);
2. Hash Partitioning
Distributes data evenly across partitions using a hash function.
-- PostgreSQL hash partitioning
CREATE TABLE customers (
customer_id SERIAL,
customer_name VARCHAR(100),
email VARCHAR(100),
registration_date DATE
) PARTITION BY HASH (customer_id);
-- Create hash partitions
CREATE TABLE customers_p0 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE customers_p1 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE customers_p2 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE customers_p3 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
-- MySQL hash partitioning
CREATE TABLE customers (
customer_id INT AUTO_INCREMENT,
customer_name VARCHAR(100),
email VARCHAR(100),
registration_date DATE,
PRIMARY KEY (customer_id)
)
PARTITION BY HASH(customer_id)
PARTITIONS 4;
3. List Partitioning
Partitions data based on explicit lists of values.
-- PostgreSQL list partitioning by region
CREATE TABLE orders (
order_id SERIAL,
customer_id INT,
region VARCHAR(20),
order_date DATE,
amount DECIMAL(10,2)
) PARTITION BY LIST (region);
-- Create partitions for each region
CREATE TABLE orders_north PARTITION OF orders
FOR VALUES IN ('North', 'Northeast', 'Northwest');
CREATE TABLE orders_south PARTITION OF orders
FOR VALUES IN ('South', 'Southeast', 'Southwest');
CREATE TABLE orders_east PARTITION OF orders
FOR VALUES IN ('East', 'Central East');
CREATE TABLE orders_west PARTITION OF orders
FOR VALUES IN ('West', 'Central West');
-- Default partition for unspecified values
CREATE TABLE orders_other PARTITION OF orders DEFAULT;
4. Composite Partitioning
Combines multiple partitioning methods.
-- Range-Hash composite partitioning
CREATE TABLE sales_composite (
sale_id SERIAL,
sale_date DATE,
customer_id INT,
region VARCHAR(20),
amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);
-- Create range partitions first
CREATE TABLE sales_2024 PARTITION OF sales_composite
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
PARTITION BY HASH (customer_id);
-- Then create hash sub-partitions
CREATE TABLE sales_2024_h0 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sales_2024_h1 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sales_2024_h2 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sales_2024_h3 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
Partitioning Strategies by Use Case
Time-Series Data
-- Monthly partitioning for time-series data
CREATE TABLE sensor_data (
sensor_id INT,
reading_time TIMESTAMP,
temperature DECIMAL(5,2),
humidity DECIMAL(5,2),
pressure DECIMAL(8,2)
) PARTITION BY RANGE (reading_time);
-- Automated partition creation function
CREATE OR REPLACE FUNCTION create_monthly_partition(
table_name TEXT,
start_date DATE
) RETURNS VOID AS $$
DECLARE
partition_name TEXT;
end_date DATE;
BEGIN
partition_name := table_name || '_' || to_char(start_date, 'YYYY_MM');
end_date := start_date + INTERVAL '1 month';
EXECUTE format('CREATE TABLE %I PARTITION OF %I
FOR VALUES FROM (%L) TO (%L)',
partition_name, table_name, start_date, end_date);
END;
$$ LANGUAGE plpgsql;
-- Create partitions for the year
SELECT create_monthly_partition('sensor_data', date_trunc('month', CURRENT_DATE) + (n || ' months')::INTERVAL)
FROM generate_series(0, 11) n;
Geographic Distribution
-- Partition by geographic regions
CREATE TABLE user_activities (
activity_id BIGSERIAL,
user_id INT,
country_code CHAR(2),
activity_type VARCHAR(50),
activity_timestamp TIMESTAMP,
data JSONB
) PARTITION BY LIST (country_code);
-- Create regional partitions
CREATE TABLE user_activities_us PARTITION OF user_activities
FOR VALUES IN ('US');
CREATE TABLE user_activities_eu PARTITION OF user_activities
FOR VALUES IN ('GB', 'DE', 'FR', 'IT', 'ES', 'NL', 'SE', 'NO', 'DK');
CREATE TABLE user_activities_asia PARTITION OF user_activities
FOR VALUES IN ('JP', 'KR', 'CN', 'IN', 'SG', 'HK');
CREATE TABLE user_activities_other PARTITION OF user_activities DEFAULT;
High-Volume Transactional Data
-- Hash partitioning for even distribution
CREATE TABLE transactions (
transaction_id BIGSERIAL,
account_id BIGINT,
transaction_type VARCHAR(20),
amount DECIMAL(15,2),
transaction_date TIMESTAMP,
description TEXT
) PARTITION BY HASH (account_id);
-- Create multiple hash partitions for load distribution
DO $$
BEGIN
FOR i IN 0..15 LOOP
EXECUTE format('CREATE TABLE transactions_p%s PARTITION OF transactions
FOR VALUES WITH (MODULUS 16, REMAINDER %s)', i, i);
END LOOP;
END $$;
Partition Management
Automatic Partition Creation
-- PostgreSQL: pg_partman extension for automated management
CREATE EXTENSION pg_partman;
-- Set up automatic monthly partitioning
SELECT partman.create_parent(
p_parent_table => 'public.sales',
p_control => 'sale_date',
p_type => 'range',
p_interval => 'monthly',
p_premake => 3 -- Create 3 future partitions
);
-- SQL Server: Automated partition management
CREATE FUNCTION [dbo].[fn_partition_range](@date DATE)
RETURNS INT
AS
BEGIN
RETURN YEAR(@date) * 100 + MONTH(@date)
END
-- Create sliding window maintenance job
EXEC sp_add_job @job_name = 'Partition Maintenance';
Partition Pruning and Elimination
-- Query that benefits from partition elimination
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(amount)
FROM sales
WHERE sale_date BETWEEN '2024-06-01' AND '2024-06-30'
GROUP BY customer_id;
-- Execution plan shows:
-- -> Seq Scan on sales_2024_q2 (actual rows=50000)
-- Partitions eliminated: 3 (sales_2024_q1, q3, q4)
Partition-wise Joins
-- Tables partitioned on same key enable partition-wise joins
CREATE TABLE orders (
order_id SERIAL,
customer_id INT,
order_date DATE,
amount DECIMAL(10,2)
) PARTITION BY HASH (customer_id);
CREATE TABLE order_items (
item_id SERIAL,
order_id INT,
customer_id INT, -- Same partition key
product_id INT,
quantity INT
) PARTITION BY HASH (customer_id);
-- Join can be performed partition-wise
SELECT o.order_id, oi.product_id, oi.quantity
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.customer_id = 12345;
Advanced Partitioning Techniques
Interval Partitioning
-- Oracle-style interval partitioning (PostgreSQL equivalent)
CREATE TABLE sales_interval (
sale_id SERIAL,
sale_date DATE,
amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);
-- Create initial partition
CREATE TABLE sales_202401 PARTITION OF sales_interval
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Function to auto-create partitions
CREATE OR REPLACE FUNCTION create_partition_if_not_exists()
RETURNS TRIGGER AS $$
DECLARE
partition_date DATE;
partition_name TEXT;
start_date DATE;
end_date DATE;
BEGIN
partition_date := date_trunc('month', NEW.sale_date);
partition_name := 'sales_' || to_char(partition_date, 'YYYYMM');
start_date := partition_date;
end_date := partition_date + INTERVAL '1 month';
-- Check if partition exists
IF NOT EXISTS (
SELECT 1 FROM pg_class WHERE relname = partition_name
) THEN
EXECUTE format('CREATE TABLE %I PARTITION OF sales_interval
FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Trigger for automatic partition creation
CREATE TRIGGER trigger_create_partition
BEFORE INSERT ON sales_interval
FOR EACH ROW EXECUTE FUNCTION create_partition_if_not_exists();
Reference Partitioning
-- Child table inherits partitioning from parent
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
customer_name VARCHAR(100),
region VARCHAR(20)
) PARTITION BY LIST (region);
CREATE TABLE customer_orders (
order_id SERIAL,
customer_id INT REFERENCES customers(customer_id),
order_date DATE,
amount DECIMAL(10,2)
) PARTITION BY LIST (region); -- Same partitioning scheme
-- Partitions are created with matching constraints
CREATE TABLE customers_north PARTITION OF customers
FOR VALUES IN ('North');
CREATE TABLE customer_orders_north PARTITION OF customer_orders
FOR VALUES IN ('North');
Performance Optimization
Partition-wise Statistics
-- Collect statistics per partition
ANALYZE sales_2024_q1;
ANALYZE sales_2024_q2;
ANALYZE sales_2024_q3;
ANALYZE sales_2024_q4;
-- Automated statistics collection
DO $$
DECLARE
partition_name TEXT;
BEGIN
FOR partition_name IN
SELECT schemaname||'.'||tablename
FROM pg_tables
WHERE tablename LIKE 'sales_2024%'
LOOP
EXECUTE 'ANALYZE ' || partition_name;
END LOOP;
END $$;
Constraint Exclusion
-- Enable constraint exclusion for better performance
SET constraint_exclusion = partition;
-- Add check constraints to ensure partition elimination
ALTER TABLE sales_2024_q1
ADD CONSTRAINT chk_sales_2024_q1_date
CHECK (sale_date >= '2024-01-01' AND sale_date < '2024-04-01');
Parallel Processing
-- Configure parallel processing for partitioned tables
SET max_parallel_workers_per_gather = 4;
SET parallel_tuple_cost = 0.1;
SET parallel_setup_cost = 1000;
-- Query that benefits from parallel partition processing
SELECT
date_trunc('month', sale_date) as month,
COUNT(*) as transaction_count,
SUM(amount) as total_amount
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY date_trunc('month', sale_date)
ORDER BY month;
Monitoring and Maintenance
Partition Information Queries
-- PostgreSQL: View partition information
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE tablename LIKE 'sales_%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Check partition constraints
SELECT
t.tablename,
c.conname,
c.consrc
FROM pg_tables t
JOIN pg_constraint c ON c.conrelid = (t.schemaname||'.'||t.tablename)::regclass
WHERE t.tablename LIKE 'sales_%'
AND c.contype = 'c';
Partition Maintenance Scripts
-- Drop old partitions
CREATE OR REPLACE FUNCTION drop_old_partitions(
table_pattern TEXT,
retention_months INT
) RETURNS VOID AS $$
DECLARE
partition_name TEXT;
cutoff_date DATE;
BEGIN
cutoff_date := CURRENT_DATE - (retention_months || ' months')::INTERVAL;
FOR partition_name IN
SELECT tablename
FROM pg_tables
WHERE tablename ~ table_pattern
AND tablename < table_pattern || '_' || to_char(cutoff_date, 'YYYY_MM')
LOOP
EXECUTE 'DROP TABLE ' || partition_name;
RAISE NOTICE 'Dropped partition: %', partition_name;
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT drop_old_partitions('sales', 24); -- Keep 24 months
Best Practices
1. Choose Appropriate Partition Key
-- Good: Frequently filtered column with good distribution
PARTITION BY RANGE (order_date) -- Time-series data
-- Good: Even distribution
PARTITION BY HASH (customer_id) -- Large customer base
-- Avoid: Low cardinality columns
-- PARTITION BY LIST (status) -- Only few status values
2. Partition Size Guidelines
-- Target partition sizes:
-- OLTP: 2-50 GB per partition
-- OLAP: 10-100 GB per partition
-- Time-series: Based on query patterns (daily/monthly/yearly)
-- Monitor partition sizes
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(tablename::regclass)) as size,
(SELECT COUNT(*) FROM tablename::regclass) as row_count
FROM pg_tables
WHERE tablename LIKE 'sales_%';
3. Index Strategy
-- Create indexes on each partition
CREATE INDEX CONCURRENTLY idx_sales_2024_q1_customer
ON sales_2024_q1 (customer_id);
CREATE INDEX CONCURRENTLY idx_sales_2024_q1_amount
ON sales_2024_q1 (amount);
-- Or use partitioned indexes (PostgreSQL 11+)
CREATE INDEX idx_sales_customer ON sales (customer_id);
-- Automatically creates matching indexes on all partitions
Interview Tips
- Understand different partitioning types and when to use each
- Know the benefits: performance, manageability, parallel processing
- Be familiar with partition elimination and pruning concepts
- Understand the trade-offs: complexity vs performance gains
- Know how to monitor and maintain partitioned tables
- Practice designing partitioning strategies for different scenarios
- Be aware of database-specific partitioning features and limitations
Test Your Knowledge
Take a quick quiz to test your understanding of this topic.