What is the difference between correlated and non-correlated subqueries?

Answer

The main difference lies in how the subquery executes and whether it depends on values from the outer query.

Non-Correlated Subquery

Definition: A subquery that executes independently of the outer query and can run on its own.

Characteristics:

  • Independent execution - Runs once before the outer query
  • No reference to outer query columns
  • Better performance - Executes only once
  • Self-contained - Can be run separately
-- Non-correlated subquery example
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'
);

-- The subquery can run independently:
SELECT department_id FROM departments WHERE location = 'New York';

Correlated Subquery

Definition: A subquery that references columns from the outer query and executes once for each row processed by the outer query.

Characteristics:

  • Dependent execution - Runs for each outer query row
  • References outer query columns
  • Slower performance - Executes multiple times
  • Cannot run alone - Depends on outer query context
-- Correlated subquery example
SELECT name, salary
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id  -- References outer query
);

-- The subquery cannot run independently due to e1.department_id reference

Execution Flow Comparison

Non-Correlated Execution:

-- Step 1: Execute subquery once
SELECT department_id FROM departments WHERE location = 'New York';
-- Result: [10, 20, 30]

-- Step 2: Use result in outer query
SELECT name, salary FROM employees WHERE department_id IN (10, 20, 30);

Correlated Execution:

-- For each employee row:
-- Row 1: John (dept_id = 10)
SELECT AVG(salary) FROM employees WHERE department_id = 10;
-- Compare John's salary with department 10 average

-- Row 2: Jane (dept_id = 20)  
SELECT AVG(salary) FROM employees WHERE department_id = 20;
-- Compare Jane's salary with department 20 average

-- ... continues for each row

Performance Comparison

Non-Correlated (Faster)

-- Executes subquery once for 1000 employees
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id 
    FROM orders 
    WHERE order_date >= '2024-01-01'
);
-- Subquery executions: 1

Correlated (Slower)

-- Executes subquery 1000 times for 1000 employees
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_date >= '2024-01-01'
);
-- Subquery executions: 1000

Practical Examples

Sample Data Setup

CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    salary DECIMAL(10,2),
    hire_date DATE
);

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    location VARCHAR(50)
);

Non-Correlated Examples

-- 1. Employees in specific locations
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location IN ('New York', 'California')
);

-- 2. Employees with above-average company salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- 3. Employees in departments with more than 10 people
SELECT name, department_id
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM employees
    GROUP BY department_id
    HAVING COUNT(*) > 10
);

Correlated Examples

-- 1. Employees earning more than their department average
SELECT name, salary, department_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

-- 2. Employees who are the highest paid in their department
SELECT name, salary, department_id
FROM employees e1
WHERE salary = (
    SELECT MAX(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

-- 3. Customers with their latest order date
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date = (
        SELECT MAX(order_date)
        FROM orders o2
        WHERE o2.customer_id = c.customer_id
    )
);

Converting Between Types

Correlated to Non-Correlated (When Possible)

-- Correlated version
SELECT name FROM employees e1
WHERE salary > (
    SELECT AVG(salary) FROM employees e2 
    WHERE e2.department_id = e1.department_id
);

-- Non-correlated version using window functions
SELECT name
FROM (
    SELECT 
        name,
        salary,
        AVG(salary) OVER (PARTITION BY department_id) as dept_avg
    FROM employees
) t
WHERE salary > dept_avg;

Non-Correlated to Correlated

-- Non-correlated version
SELECT name FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE location = 'New York'
);

-- Correlated version (less efficient)
SELECT name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d 
    WHERE d.department_id = e.department_id 
    AND d.location = 'New York'
);

When to Use Each Type

Use Non-Correlated When:

  • Subquery result is independent of outer query
  • Performance is critical
  • Subquery returns a small result set
  • Logic can be expressed without row-by-row comparison

Use Correlated When:

  • Need row-by-row comparison with outer query
  • Logic requires context from outer query
  • Using EXISTS/NOT EXISTS for existence checks
  • Finding records based on relationships within the same table

Optimization Strategies

1. Replace Correlated with JOINs

-- Slow correlated subquery
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);

-- Faster JOIN
SELECT DISTINCT c.customer_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

2. Use Window Functions

-- Instead of correlated subquery
SELECT name, salary
FROM employees e1
WHERE salary = (
    SELECT MAX(salary) FROM employees e2 
    WHERE e2.department_id = e1.department_id
);

-- Use window function
SELECT name, salary
FROM (
    SELECT 
        name, 
        salary,
        MAX(salary) OVER (PARTITION BY department_id) as max_dept_salary
    FROM employees
) t
WHERE salary = max_dept_salary;

Common Patterns

EXISTS Pattern (Usually Correlated)

-- Find customers who have placed orders
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);

IN Pattern (Usually Non-Correlated)

-- Find employees in specific departments
SELECT name
FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE budget > 100000
);

Interview Tips

  • Understand execution patterns: non-correlated runs once, correlated runs per row
  • Know performance implications: non-correlated is generally faster
  • Practice identifying which type a subquery is
  • Learn to convert between types when possible
  • Understand when each type is most appropriate

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.

Test Your SQL Knowledge

Ready to put your skills to the test? Take our interactive SQL quiz and get instant feedback on your answers.