What is the difference between correlated and non-correlated subqueries?

Answer

The main difference lies in how the subquery executes and whether it depends on values from the outer query.

Non-Correlated Subquery

Definition: A subquery that executes independently of the outer query and can run on its own.

Characteristics:

  • Independent execution - Runs once before the outer query
  • No reference to outer query columns
  • Better performance - Executes only once
  • Self-contained - Can be run separately
-- Non-correlated subquery example
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'
);

-- The subquery can run independently:
SELECT department_id FROM departments WHERE location = 'New York';

Correlated Subquery

Definition: A subquery that references columns from the outer query and executes once for each row processed by the outer query.

Characteristics:

  • Dependent execution - Runs for each outer query row
  • References outer query columns
  • Slower performance - Executes multiple times
  • Cannot run alone - Depends on outer query context
-- Correlated subquery example
SELECT name, salary
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id  -- References outer query
);

-- The subquery cannot run independently due to e1.department_id reference

Execution Flow Comparison

Non-Correlated Execution:

-- Step 1: Execute subquery once
SELECT department_id FROM departments WHERE location = 'New York';
-- Result: [10, 20, 30]

-- Step 2: Use result in outer query
SELECT name, salary FROM employees WHERE department_id IN (10, 20, 30);

Correlated Execution:

-- For each employee row:
-- Row 1: John (dept_id = 10)
SELECT AVG(salary) FROM employees WHERE department_id = 10;
-- Compare John's salary with department 10 average

-- Row 2: Jane (dept_id = 20)  
SELECT AVG(salary) FROM employees WHERE department_id = 20;
-- Compare Jane's salary with department 20 average

-- ... continues for each row

Performance Comparison

Non-Correlated (Faster)

-- Executes subquery once for 1000 employees
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id 
    FROM orders 
    WHERE order_date >= '2024-01-01'
);
-- Subquery executions: 1

Correlated (Slower)

-- Executes subquery 1000 times for 1000 employees
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_date >= '2024-01-01'
);
-- Subquery executions: 1000

Practical Examples

Sample Data Setup

CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    salary DECIMAL(10,2),
    hire_date DATE
);

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    location VARCHAR(50)
);

Non-Correlated Examples

-- 1. Employees in specific locations
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location IN ('New York', 'California')
);

-- 2. Employees with above-average company salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- 3. Employees in departments with more than 10 people
SELECT name, department_id
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM employees
    GROUP BY department_id
    HAVING COUNT(*) > 10
);

Correlated Examples

-- 1. Employees earning more than their department average
SELECT name, salary, department_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

-- 2. Employees who are the highest paid in their department
SELECT name, salary, department_id
FROM employees e1
WHERE salary = (
    SELECT MAX(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

-- 3. Customers with their latest order date
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date = (
        SELECT MAX(order_date)
        FROM orders o2
        WHERE o2.customer_id = c.customer_id
    )
);

Converting Between Types

Correlated to Non-Correlated (When Possible)

-- Correlated version
SELECT name FROM employees e1
WHERE salary > (
    SELECT AVG(salary) FROM employees e2 
    WHERE e2.department_id = e1.department_id
);

-- Non-correlated version using window functions
SELECT name
FROM (
    SELECT 
        name,
        salary,
        AVG(salary) OVER (PARTITION BY department_id) as dept_avg
    FROM employees
) t
WHERE salary > dept_avg;

Non-Correlated to Correlated

-- Non-correlated version
SELECT name FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE location = 'New York'
);

-- Correlated version (less efficient)
SELECT name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d 
    WHERE d.department_id = e.department_id 
    AND d.location = 'New York'
);

When to Use Each Type

Use Non-Correlated When:

  • Subquery result is independent of outer query
  • Performance is critical
  • Subquery returns a small result set
  • Logic can be expressed without row-by-row comparison

Use Correlated When:

  • Need row-by-row comparison with outer query
  • Logic requires context from outer query
  • Using EXISTS/NOT EXISTS for existence checks
  • Finding records based on relationships within the same table

Optimization Strategies

1. Replace Correlated with JOINs

-- Slow correlated subquery
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);

-- Faster JOIN
SELECT DISTINCT c.customer_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

2. Use Window Functions

-- Instead of correlated subquery
SELECT name, salary
FROM employees e1
WHERE salary = (
    SELECT MAX(salary) FROM employees e2 
    WHERE e2.department_id = e1.department_id
);

-- Use window function
SELECT name, salary
FROM (
    SELECT 
        name, 
        salary,
        MAX(salary) OVER (PARTITION BY department_id) as max_dept_salary
    FROM employees
) t
WHERE salary = max_dept_salary;

Common Patterns

EXISTS Pattern (Usually Correlated)

-- Find customers who have placed orders
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);

IN Pattern (Usually Non-Correlated)

-- Find employees in specific departments
SELECT name
FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE budget > 100000
);

Interview Tips

  • Understand execution patterns: non-correlated runs once, correlated runs per row
  • Know performance implications: non-correlated is generally faster
  • Practice identifying which type a subquery is
  • Learn to convert between types when possible
  • Understand when each type is most appropriate

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.