What is the difference between clustered and non-clustered indexes?

Answer

Clustered indexes physically reorder and store table data based on the index key, while non-clustered indexes create a separate structure that points to the actual data rows. Each table can have only one clustered index but multiple non-clustered indexes.

Key Differences

AspectClustered IndexNon-Clustered Index
Data StorageData pages stored in order of index keySeparate index structure points to data
Quantity per TableOnly oneMultiple allowed
Data RetrievalDirect access to dataLookup to find data location
Storage SpaceNo additional space for indexAdditional space required
Insert PerformanceSlower (may require page splits)Faster
Range QueriesVery fastFast but requires key lookup

Clustered Index

Characteristics

  • Physical ordering: Table data is physically stored in the order of the clustered index key
  • Leaf level contains data: The leaf pages of the index contain the actual table data
  • Automatic creation: Usually created automatically on primary key

Example: Clustered Index Creation

-- SQL Server: Create clustered index
CREATE CLUSTERED INDEX IX_Orders_OrderDate 
ON Orders (OrderDate);

-- MySQL: Primary key automatically creates clustered index
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,  -- Clustered index on OrderID
    CustomerID INT,
    OrderDate DATE,
    OrderTotal DECIMAL(10,2)
);

-- PostgreSQL: No explicit clustered indexes, but similar with CLUSTER
CREATE INDEX IX_Orders_OrderDate ON Orders (OrderDate);
CLUSTER Orders USING IX_Orders_OrderDate;

How Clustered Index Works

-- Table with clustered index on OrderDate
-- Physical storage order:
-- Page 1: Orders from 2024-01-01 to 2024-01-15
-- Page 2: Orders from 2024-01-16 to 2024-01-31
-- Page 3: Orders from 2024-02-01 to 2024-02-15

-- Range query benefits from clustered index
SELECT * FROM Orders 
WHERE OrderDate BETWEEN '2024-01-10' AND '2024-01-20';
-- Reads only pages 1 and 2, very efficient

Clustered Index Structure

-- Conceptual structure of clustered index
/*
Root Level:     [Jan] [Feb] [Mar] [Apr]
                 |     |     |     |
Intermediate:   [1-15][16-31] [1-14][15-28] [1-15][16-31] [1-15][16-30]
                 |     |       |     |       |     |       |     |
Leaf Level:    [Data][Data]  [Data][Data]  [Data][Data]  [Data][Data]
               (Actual table rows stored here)
*/

-- Query execution
SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
-- 1. Start at root: Navigate to Jan
-- 2. Go to intermediate: Navigate to 1-15 range  
-- 3. Read leaf page: Get actual data rows

Non-Clustered Index

Characteristics

  • Separate structure: Index is stored separately from table data
  • Pointers to data: Leaf level contains pointers (row locators) to actual data
  • Multiple allowed: Can have many non-clustered indexes per table
  • Additional storage: Requires extra storage space

Example: Non-Clustered Index Creation

-- Create non-clustered index
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID 
ON Orders (CustomerID);

-- MySQL: All secondary indexes are non-clustered
CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID);

-- Composite non-clustered index
CREATE INDEX IX_Orders_Customer_Date 
ON Orders (CustomerID, OrderDate);

How Non-Clustered Index Works

-- Non-clustered index structure
/*
Index Pages (IX_Orders_CustomerID):
CustomerID | Row Locator
101        | Page 5, Slot 3
102        | Page 2, Slot 7  
103        | Page 8, Slot 1
104        | Page 5, Slot 9

Table Pages (Heap or Clustered):
Page 2: [OrderID: 1002, CustomerID: 102, ...]
Page 5: [OrderID: 1001, CustomerID: 101, ...] [OrderID: 1004, CustomerID: 104, ...]
Page 8: [OrderID: 1003, CustomerID: 103, ...]
*/

-- Query using non-clustered index
SELECT * FROM Orders WHERE CustomerID = 102;
-- 1. Search non-clustered index for CustomerID = 102
-- 2. Find row locator: Page 2, Slot 7
-- 3. Go to Page 2, Slot 7 to get actual data (Key Lookup)

Performance Comparison

Range Queries

-- Clustered index on OrderDate
SELECT * FROM Orders 
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-01-31';
-- Very fast: Data is physically ordered by OrderDate
-- Reads contiguous pages

-- Non-clustered index on OrderDate  
SELECT * FROM Orders 
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-01-31';
-- Slower: Must perform key lookups for each matching row
-- Random I/O to get actual data

Point Lookups

-- Clustered index lookup
SELECT * FROM Orders WHERE OrderID = 1001;  -- OrderID is clustered key
-- Fast: Direct access to data page

-- Non-clustered index lookup
SELECT * FROM Orders WHERE CustomerID = 102;  -- CustomerID has non-clustered index
-- Good: Index seek + key lookup
-- Still efficient for single row

Insert Performance

-- Insert with clustered index on OrderDate
INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderTotal)
VALUES (2001, 105, '2024-01-15', 250.00);
-- May be slow: If inserted in middle of date range, 
-- may cause page splits and data movement

-- Insert with non-clustered indexes
INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderTotal)
VALUES (2001, 105, '2024-06-15', 250.00);
-- Faster for table data, but must update all non-clustered indexes

Practical Examples

E-commerce Order System

-- Orders table design
CREATE TABLE Orders (
    OrderID INT IDENTITY(1,1) PRIMARY KEY,  -- Clustered index
    CustomerID INT,
    OrderDate DATETIME,
    OrderTotal DECIMAL(10,2),
    Status VARCHAR(20)
);

-- Non-clustered indexes for common queries
CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID);
CREATE INDEX IX_Orders_OrderDate ON Orders (OrderDate);
CREATE INDEX IX_Orders_Status ON Orders (Status);
CREATE INDEX IX_Orders_Customer_Date ON Orders (CustomerID, OrderDate);

-- Query patterns and index usage:

-- 1. Customer order history (uses IX_Orders_CustomerID)
SELECT * FROM Orders WHERE CustomerID = 12345;

-- 2. Daily orders report (uses IX_Orders_OrderDate)
SELECT COUNT(*), SUM(OrderTotal) 
FROM Orders 
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2024-01-02';

-- 3. Customer orders in date range (uses IX_Orders_Customer_Date)
SELECT * FROM Orders 
WHERE CustomerID = 12345 
AND OrderDate >= '2024-01-01';

Employee Management System

-- Employees table with clustered index on EmployeeID
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,  -- Clustered index
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Email VARCHAR(100),
    DepartmentID INT,
    HireDate DATE,
    Salary DECIMAL(10,2)
);

-- Non-clustered indexes for different access patterns
CREATE INDEX IX_Employees_Email ON Employees (Email);  -- Login queries
CREATE INDEX IX_Employees_Department ON Employees (DepartmentID);  -- Department reports
CREATE INDEX IX_Employees_LastName_FirstName ON Employees (LastName, FirstName);  -- Name searches
CREATE INDEX IX_Employees_HireDate ON Employees (HireDate);  -- Seniority queries

-- Covering index for salary reports
CREATE INDEX IX_Employees_Dept_Salary 
ON Employees (DepartmentID) 
INCLUDE (FirstName, LastName, Salary);

Index Design Strategies

Choosing Clustered Index Key

-- Good clustered index candidates:
-- 1. Primary key (if sequential)
CREATE TABLE Orders (
    OrderID INT IDENTITY(1,1) PRIMARY KEY  -- Sequential, good for clustered
);

-- 2. Frequently used in range queries
CREATE CLUSTERED INDEX IX_Sales_Date ON Sales (SaleDate);  -- Date ranges common

-- 3. Narrow, unique, static
CREATE CLUSTERED INDEX IX_Products_SKU ON Products (ProductSKU);  -- Narrow key

-- Poor clustered index candidates:
-- 1. Wide keys
CREATE CLUSTERED INDEX IX_Bad_Wide ON Orders (CustomerName, ProductName, OrderDate);  -- Too wide

-- 2. Frequently updated
CREATE CLUSTERED INDEX IX_Bad_Updated ON Products (LastModified);  -- Changes often

-- 3. Random values
CREATE CLUSTERED INDEX IX_Bad_Random ON Orders (OrderGUID);  -- Random, causes page splits

Covering Indexes

-- Non-clustered covering index
CREATE INDEX IX_Orders_Customer_Covering
ON Orders (CustomerID)
INCLUDE (OrderDate, OrderTotal, Status);

-- Query satisfied entirely by index (no key lookup needed)
SELECT OrderDate, OrderTotal, Status
FROM Orders
WHERE CustomerID = 12345;
-- Index contains all needed columns

Database-Specific Implementations

SQL Server

-- Explicit clustered index creation
CREATE CLUSTERED INDEX IX_Orders_Date ON Orders (OrderDate);

-- Non-clustered with included columns
CREATE NONCLUSTERED INDEX IX_Orders_Customer
ON Orders (CustomerID)
INCLUDE (OrderTotal, Status);

-- Filtered index
CREATE INDEX IX_Orders_Active
ON Orders (CustomerID)
WHERE Status = 'ACTIVE';

MySQL (InnoDB)

-- Primary key is always clustered
CREATE TABLE Orders (
    OrderID INT AUTO_INCREMENT PRIMARY KEY,  -- Clustered
    CustomerID INT,
    OrderDate DATE
);

-- All other indexes are non-clustered (secondary indexes)
CREATE INDEX IX_Customer ON Orders (CustomerID);

-- InnoDB secondary indexes contain primary key value
-- No separate row locator needed

PostgreSQL

-- No true clustered indexes, but CLUSTER command
CREATE INDEX IX_Orders_Date ON Orders (OrderDate);
CLUSTER Orders USING IX_Orders_Date;  -- Physically reorders table once

-- Regular indexes (similar to non-clustered)
CREATE INDEX IX_Orders_Customer ON Orders (CustomerID);

-- Partial indexes
CREATE INDEX IX_Orders_Active ON Orders (CustomerID) 
WHERE Status = 'ACTIVE';

Performance Monitoring

Index Usage Analysis

-- SQL Server: Check index usage
SELECT 
    i.name as index_name,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates
FROM sys.indexes i
JOIN sys.dm_db_index_usage_stats s ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE OBJECT_NAME(i.object_id) = 'Orders';

-- MySQL: Check index usage
SELECT 
    TABLE_NAME,
    INDEX_NAME,
    SEQ_IN_INDEX,
    COLUMN_NAME,
    CARDINALITY
FROM information_schema.STATISTICS
WHERE TABLE_NAME = 'Orders';

Index Fragmentation

-- SQL Server: Check fragmentation
SELECT 
    i.name,
    ps.avg_fragmentation_in_percent,
    ps.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Orders'), NULL, NULL, 'DETAILED') ps
JOIN sys.indexes i ON ps.object_id = i.object_id AND ps.index_id = i.index_id;

-- Rebuild fragmented indexes
ALTER INDEX IX_Orders_Date ON Orders REBUILD;

Best Practices

1. Clustered Index Design

-- Choose narrow, unique, static keys
CREATE CLUSTERED INDEX IX_Orders_ID ON Orders (OrderID);  -- Good

-- Avoid wide or frequently updated keys
-- Bad: CREATE CLUSTERED INDEX IX_Bad ON Orders (CustomerName, ProductDescription);

2. Non-Clustered Index Strategy

-- Create indexes based on query patterns
-- Frequently filtered columns
CREATE INDEX IX_Orders_Status ON Orders (Status);

-- Composite indexes for multi-column filters
CREATE INDEX IX_Orders_Customer_Date ON Orders (CustomerID, OrderDate);

-- Covering indexes for frequently selected columns
CREATE INDEX IX_Orders_Summary 
ON Orders (CustomerID) 
INCLUDE (OrderDate, OrderTotal);

3. Maintenance

-- Regular index maintenance
-- Rebuild fragmented indexes
ALTER INDEX ALL ON Orders REBUILD;

-- Update statistics
UPDATE STATISTICS Orders;

-- Remove unused indexes
DROP INDEX IX_Unused ON Orders;

Interview Tips

  • Understand that clustered indexes physically order data while non-clustered indexes are separate structures
  • Know the “one clustered, many non-clustered” rule
  • Be able to explain key lookup operations in non-clustered indexes
  • Understand when each type provides better performance
  • Know how to choose appropriate clustered index keys
  • Be familiar with covering indexes and included columns
  • Understand the trade-offs between query performance and maintenance overhead

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.