What is the difference between correlated and non-correlated subqueries?

Answer

The main difference lies in how the subquery executes and whether it depends on values from the outer query.

Non-Correlated Subquery

Definition: A subquery that executes independently of the outer query and can run on its own.

Characteristics:

Independent execution - Runs once before the outer query
No reference to outer query columns
Better performance - Executes only once
Self-contained - Can be run separately

-- Non-correlated subquery example
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'
);

-- The subquery can run independently:
SELECT department_id FROM departments WHERE location = 'New York';

Correlated Subquery

Definition: A subquery that references columns from the outer query and executes once for each row processed by the outer query.

Characteristics:

Dependent execution - Runs for each outer query row
References outer query columns
Slower performance - Executes multiple times
Cannot run alone - Depends on outer query context

-- Correlated subquery example
SELECT name, salary
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id  -- References outer query
);

-- The subquery cannot run independently due to e1.department_id reference

Execution Flow Comparison

Non-Correlated Execution:

-- Step 1: Execute subquery once
SELECT department_id FROM departments WHERE location = 'New York';
-- Result: [10, 20, 30]

-- Step 2: Use result in outer query
SELECT name, salary FROM employees WHERE department_id IN (10, 20, 30);

Correlated Execution:

-- For each employee row:
-- Row 1: John (dept_id = 10)
SELECT AVG(salary) FROM employees WHERE department_id = 10;
-- Compare John's salary with department 10 average

-- Row 2: Jane (dept_id = 20)  
SELECT AVG(salary) FROM employees WHERE department_id = 20;
-- Compare Jane's salary with department 20 average

-- ... continues for each row

Performance Comparison

Non-Correlated (Faster)

-- Executes subquery once for 1000 employees
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id 
    FROM orders 
    WHERE order_date >= '2024-01-01'
);
-- Subquery executions: 1

Correlated (Slower)

-- Executes subquery 1000 times for 1000 employees
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_date >= '2024-01-01'
);
-- Subquery executions: 1000

Practical Examples

Sample Data Setup

CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    salary DECIMAL(10,2),
    hire_date DATE
);

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    location VARCHAR(50)
);

Non-Correlated Examples

-- 1. Employees in specific locations
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location IN ('New York', 'California')
);

-- 2. Employees with above-average company salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- 3. Employees in departments with more than 10 people
SELECT name, department_id
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM employees
    GROUP BY department_id
    HAVING COUNT(*) > 10
);

Correlated Examples

-- 1. Employees earning more than their department average
SELECT name, salary, department_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

-- 2. Employees who are the highest paid in their department
SELECT name, salary, department_id
FROM employees e1
WHERE salary = (
    SELECT MAX(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

-- 3. Customers with their latest order date
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date = (
        SELECT MAX(order_date)
        FROM orders o2
        WHERE o2.customer_id = c.customer_id
    )
);

Converting Between Types

Correlated to Non-Correlated (When Possible)

-- Correlated version
SELECT name FROM employees e1
WHERE salary > (
    SELECT AVG(salary) FROM employees e2 
    WHERE e2.department_id = e1.department_id
);

-- Non-correlated version using window functions
SELECT name
FROM (
    SELECT 
        name,
        salary,
        AVG(salary) OVER (PARTITION BY department_id) as dept_avg
    FROM employees
) t
WHERE salary > dept_avg;

Non-Correlated to Correlated

-- Non-correlated version
SELECT name FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE location = 'New York'
);

-- Correlated version (less efficient)
SELECT name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d 
    WHERE d.department_id = e.department_id 
    AND d.location = 'New York'
);

When to Use Each Type

Use Non-Correlated When:

Subquery result is independent of outer query
Performance is critical
Subquery returns a small result set
Logic can be expressed without row-by-row comparison

Use Correlated When:

Need row-by-row comparison with outer query
Logic requires context from outer query
Using EXISTS/NOT EXISTS for existence checks
Finding records based on relationships within the same table

Optimization Strategies

1. Replace Correlated with JOINs

-- Slow correlated subquery
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);

-- Faster JOIN
SELECT DISTINCT c.customer_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

2. Use Window Functions

-- Instead of correlated subquery
SELECT name, salary
FROM employees e1
WHERE salary = (
    SELECT MAX(salary) FROM employees e2 
    WHERE e2.department_id = e1.department_id
);

-- Use window function
SELECT name, salary
FROM (
    SELECT 
        name, 
        salary,
        MAX(salary) OVER (PARTITION BY department_id) as max_dept_salary
    FROM employees
) t
WHERE salary = max_dept_salary;

Common Patterns

EXISTS Pattern (Usually Correlated)

-- Find customers who have placed orders
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);

IN Pattern (Usually Non-Correlated)

-- Find employees in specific departments
SELECT name
FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE budget > 100000
);

Interview Tips

Understand execution patterns: non-correlated runs once, correlated runs per row
Know performance implications: non-correlated is generally faster
Practice identifying which type a subquery is
Learn to convert between types when possible
Understand when each type is most appropriate

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.