SQL Correlated Subqueries: Syntax and Example

Correlated Subquery In SQL

“SQL Correlated Subqueries” are a special type of subquery in SQL where the inner query depends on values from the outer query. Unlike a regular query that runs once, it runs repeatedly for each row processed.

If you already understand basic SQL subqueries, learning Correlated Subqueries is the next step in writing more advanced queries. They are commonly used when you need to compare a row with other rows in the same table or in a related table.

In this article, we will explore the concept of SQL correlated subqueries, understand their syntax, and look at simple examples to see how they work in real database queries.

TL;DR: Correlated Subqueries In SQL

Aspect

Summary

Definition of Correlated Subquery

A correlated subquery is a type of SQL subquery that depends on the outer query for its values. The inner query executes repeatedly for each row processed by the outer query.

Outer Query vs Inner Query

The outer query is the main query that produces the final result of the SQL statement. The inner query runs inside it and often provides conditions or values used for filtering.

Execution Behavior

Unlike regular subqueries that run once, correlated subqueries run multiple times. The database executes the inner query once for every row processed in the outer query.

Common Use Cases

Correlated subqueries are useful for row-by-row comparisons, dynamic filtering, and data validation. They are also used for hierarchical queries and checking related values within tables.

Performance Considerations

Since the inner query runs many times, correlated subqueries can affect performance on large datasets. Proper indexing, limiting rows, or replacing them with joins can improve efficiency.

What Are Correlated Subquery or Synchronized Subquery In SQL?

A Correlated Subquery is an important concept in SQL that depends on the outer query to retrieve its values. In a correlated subquery, the inner query references a column from the outer query, which creates a connection between the two queries.

Because of this relationship, the inner query is executed repeatedly for each row processed by the outer query. These subqueries are commonly written using the WHERE clause to compare values between the inner and outer queries.

Visual representation showing step by step workflow of how SQL correlated subquery works with Inner Query And Outer Query

At first glance, correlated subqueries may look similar to normal subqueries. However, the key difference is that a regular subquery usually runs once and returns a result, while a correlated subquery runs multiple times. This repeated execution allows it to perform row-by-row comparisons in a table.

While writing complex SQL queries, it is also important to understand the SQL Order of Operations. This helps you see how the database engine processes different clauses such as SELECT, WHERE, and GROUP BY, which becomes useful when you start working with Correlated Queries.

What Are Outer Query & Inner Query?

In the previous discussion, we used two important terms: Outer Query and Inner Query.

The Outer Query is the main SQL query that controls the overall result of the statement. It is the primary query that retrieves the final data from the table, which is why it is also called the Main Query.

The Inner Query, also known as a subquery, is a query written inside another SQL query. It is often placed inside the WHERE clause, but it can also appear in other parts of a SQL statement.

The inner query usually provides values or conditions that help the outer query filter or process the data. In many cases, like working with correlated subqueries, the inner query depends on the outer query to complete its execution.

As you continue learning about correlated subqueries, the relationship between the outer query and inner query will become clearer.

How To Create Correlated Subqueries In SQL?

Here, we are going to discuss the process of creating a Correlated Subquery. We will discuss the simple syntax of the Correlated Subquery.

				
					SELECT c1, c2 FROM table AS t1 WHERE c3 = (SELECT c4 FROM table AS t2 WHERE t2.c3 = t1.c3);
				
			

From the above query, you can notice that the WHERE clause is used to connect the outer query and the inner query. The part of the statement before the WHERE clause represents the outer query, which retrieves the main data from the table.

The portion written inside the parentheses after the WHERE clause represents the inner query. In this example, ‘t1’ is an alias for the table used in the outer query, while ‘t2’ is the alias used inside the inner query.

The correlated behavior occurs because the inner query references a column from the outer query using a condition such as ‘t2.column = t1.column’.  This connection allows the inner query to execute for each row processed by the outer query.

How To Use Correlated Subqueries In SQL?

Here, we are going to implement the Correlated Subqueries from scratch. That means we will first declare the Table & then provide some values there.

After that, with the help of the same query discussed above, the Correlated Subquery will be defined. In this case, we will declare a new table, Code Table.

Step 1: Create The Table First

The first thing will be to declare the Table in the SQL Program. Here, we will implement the Code Table with the SQL Create Command. The table will have the following entities.

  • ID: Each Programming Language will have a particular ID. The ID will also be the Primary key.

  • Language: Here, the programming languages used should be marked.

  • Lines: The total Lines of Code will be discussed here. It can only hold integer values.

  • Developer: The Name of the Developer will be displayed & stored here.

				
					-- Creating Table Code with ID, Language, Lines & Developer Name
CREATE TABLE Code (ID INT PRIMARY KEY, Language VARCHAR(50), Lines INT, Developer VARCHAR(50));
				
			

Step 2: Insert Values Into The Table

Now, as the formation of the table is completed, the following query will be used to Insert Values in the table.

This query returns an SQL Table with all the values mentioned here. As per the Datatype of the Entities, the values are assigned.

				
					-- Inserting Values in Code Table in Proper Sequence as Developed
INSERT INTO Code (ID, Language, Lines, Developer) VALUES
(1, 'Python', 150, 'Tom'),
(2, 'Java', 200, 'Bob'),
(3, 'C++', 120, 'Krish'),
(4, 'Python', 100, 'Eve'),
(5, 'JavaScript', 180, 'SG');
				
			

Output image showing the Code Table where different values are inserted like Python, Java, C++, JavaScript

Step 3: Implement The Correlated Subqueries In The Table

Now, after inserting values into the table, it is time to declare Correlated Subqueries there. For that purpose, you have to use the following query.

				
					-- Selecting Language & Lines as the Outer Query using Where
SELECT Language, Lines FROM Code c1 WHERE Lines > ( SELECT AVG(Lines) FROM Code c2 WHERE c2.Language = c1.Language);
				
			

Explanation:

  • The outer query retrieves the Language and Lines columns from the Code table, which is assigned the alias ‘c1’.

  • The inner query calculates the average number of lines of code for the same programming language using the Code table with the alias ‘c2’.

  • The condition ‘c2.Language = c1.Language’ creates the correlation between the inner and outer queries.

  • The outer query then compares the value of Lines in each row with the average returned by the inner query.

  • Only those rows where the number of lines is greater than the average for that specific language will appear in the final output.

Output:

Output screenshow showing the result of using Correlated Subqueries in the previously created Code Table which is giving 3 rows

From the above output, we can see that Python, Java & JavaScript are coming with the Number of Lines. As the Number of Lines is greater than the average number of lines, that is why they appear. So, the Command is executing well.

How To Implement Correlated Subquery With Exists Operator?

The EXISTS operator is used to check whether a subquery returns any records. If the subquery returns at least one row, the EXISTS condition evaluates to TRUE; otherwise, it evaluates to FALSE.

This operator is commonly used in the Outer Query to determine whether matching data exists in the subquery.

Let us examine the following query to understand how it works. For this example, we will use the same ‘Code’ table as used earlier.

				
					-- Selecting ID & Developer as the Outer Query using Where
SELECT ID, Developer FROM Code c1 WHERE EXISTS (SELECT 1 FROM Code c2 WHERE c1.Lines > c2.Lines );
				
			

Explanation:

  • The outer query retrieves the ID and Developer columns from the Code table, which is assigned the alias ‘c1’.

  • After the EXISTS operator, the subquery is written inside parentheses.

  • The inner query checks whether there exists at least one row in the table where the value of Lines is smaller than the Lines value of the current row in the outer query.

  • If the inner query returns at least one matching row, the EXISTS condition becomes true, and that row from the outer query will appear in the final output.

Output:

The image showing the output of using Correlated Subquery with Exists Operator which only gives the ID and Developer name as the result

From the above output, we can see that all the records are coming except ID Number 4. The ID Number 4 is not coming as the Exists Operator is filtering out the value. So, in this way, you can use other types of SQL Commands as well.

How To Use Aggregate Functions With Subqueries?

In this example, the subquery calculates the average number of lines written across the table. The outer query then filters rows where the number of lines is greater than this average value.

				
					-- Selecting Developer & ID and Counting ID in the Outer Query using Where & Group By Clause
SELECT Developer, COUNT(ID) AS CodeCount FROM Code c1 WHERE Lines > (SELECT AVG(Lines) FROM Code c2) GROUP BY Developer;
				
			

Explanation:

  • Here, the Outer Query will access the Developer and the Count of Every Programming ID.

  • The Count of every Programming ID will be termed as the CodeCount in the table.

  • Now, the Average Line will be selected after the Where Statement in the SQL Command.

  • At the end, the GROUP BY clause groups the results based on the Developer column.

  • This can be implemented if the Outer Query Lines are treated before the Inner One.

Output:

The output image showing the use Aggregate Functions with Subqueries giving Developer and Count as the columns

From the above output, we can see that only the Bob & SG Developer Names are coming with the Code Count as 1 because they have written the code only in one programming language. So, the SQL Command is working fine.

What Are Some Real-World Applications Of Correlated Subquery?

As a mentor, students often ask about the practical uses of SQL Correlated Subqueries, because many believe they are only used in academic examples.

In reality, correlated subqueries are widely used in database systems when a query needs to compare each row with related data from the same table or another table.

From my experience, I have drafted some real-world applications of SQL Correlated Subquery.

  • A correlated subquery can process each row of the outer query individually, allowing specific conditions to be applied to every record.
  • It is useful for dynamic comparisons, where a value from the current row must be compared with values from other rows in the same or related tables.
  • Correlated subqueries help implement advanced filtering conditions that may be difficult to express using only WHERE, GROUP BY, or other standard SQL clauses.
  • They are often used for data validation and integrity checks, where a row must be verified against other related rows before performing database operations.
  • In some database structures involving hierarchical relationships, correlated subqueries can help evaluate relationships between parent and child records.
  • Developers often use correlated subqueries to find employees whose salary is higher than the average salary in their department, or to identify products priced above the average in their category.

 

Comparison Table On Different Database-Specific Implementations:

While mentoring students on Correlated Subqueries, many often ask me whether the implementation is the same across different databases like PostgreSQL, SQL Server, etc., or if there are any changes.

To make them understand, I design a comparison table of different database-specific implementations in the following way.

Category

MySQL

PostgreSQL

SQL Server

Support Of Correlated Subquery

Fully Supported

Fully Supported

Fully Supported

Optimization

Subquery can be optimized.

Full Optimization can be done with JOIN.

It has built-in optimization techniques.

Large Dataset Performance

Performance may degrade

It can easily handle Large Datasets

It can support a large dataset with Optimization

Index Support

Indexing can be done on Columns with WHERE

Indexing can be done on Columns with JOIN

It also uses JOIN for Indexing

Materialized Support

No Direct Support present

Fully Supported

Temporary Tables can be used.

Comparison Table On Correlated Subquery And Its Alternatives:

After having 7 years of experience in Database mentoring, I have noticed a pattern that most students assume the correlated Subquery is an alternative to the Join Statement. I have to rectify students’ misconceptions about this many times.

So, here, I will draft a difference table between the Correlated Subquery and some of its so-called alternatives.

Category

Correlated Subquery

Join Statement

Common Table Expression

Definition

A Subquery that references any Column in SQL from the outer query

Combines two related rows from Two or More Tables

It is a Temporary Result Set using the WITH Clause

Execution

Executed Once for Each Row

Executed only once

Executed only once

Complexity

More Complex & Hard to Read

Less Complex & Easy to Understand

More Readable than Subquery

Performance

Slower Performance

Faster Performance

Performance can be improved by Reusing Results

Readability

Less Readable & Straightforward

More Readable & Straightforward

More Readable for Complex Quires

Maintenance

Difficult

Easy

Easy for Complex Logic

Optimization

Difficult

Easy

Typically Optimized

What Are Some Performance Optimization Tips On Correlated Subquery?

Over the years of tuning real production databases, I have learned that small changes in how you write a Correlated Subquery can make a big difference in speed.

Here, I will share a few practical tips I usually give to my students to help them write Correlated Subqueries that run faster and scale better with larger datasets.

  • We have to avoid the Correlated Subquery in some Large Datasets. Instead, in some Large Datasets, we can use the JOIN Statement as an alternative to the Subquery.
  • If we are using the WHERE Clause with the Correlated Subquery, then we should be extra careful. We have to use Indexes Effectively while working on SQL Columns.
  • While working with a correlated subquery, we have to limit the Row Number in the Subquery. Otherwise, the Database Engine will take more time to complete the work.
  • It will be a good practice to Precompute Subquery Results using the Temporary Table. This will reduce the work for recalculating results for each row.
  • We should not use the Functions on the indexed columns of the correlated subquery. Because, in that case, the Indexes will not work as efficiently as we need.

 

Key Areas Where Students Often Lose Marks In SQL Correlated Subqueries:

After understanding the concepts of SQL Correlated Subqueries, many students still lose marks in exams because they overlook some important areas while writing queries.

The following points highlight some key areas where students often face difficulties. These insights are based on my experience teaching database concepts.

  • Many students struggle to clearly differentiate between Correlated Subqueries and Non-Correlated Subqueries. Assuming both work in the same way can lead to inefficient query design and incorrect logic.
  • Another area where students lose marks is not considering the indexes on the relevant columns. When indexes are not properly created or used, the database performance can degrade significantly.
  • Students sometimes use Correlated Subqueries even when they are not required. In such cases, the query becomes unnecessarily long and complex. Instead, alternatives like JOINs or CTEs can often provide a more efficient solution.
  • Correlated Subqueries frequently involve the use of Aggregate Functions. If these functions are not applied correctly, the query may return incorrect results.
  • While writing queries, it is important to avoid unnecessary complexity. Restructuring the query to keep it simple and readable can help ensure better accuracy and understanding.

Conclusion:

To sum up, “SQL Correlated Subqueries” are a powerful SQL feature that helps you filter or compare data based on values from the outer query.

Since the inner query runs for each row of the outer query, it allows you to write queries that handle more detailed and conditional data checks. However, because correlated subqueries execute multiple times, they should be used carefully in large datasets to avoid performance issues.

When working with SQL queries, you may need to format or combine results from tables. For instance, developers sometimes merge multiple columns into a single output for easier reading or analysis. You can also learn about Concatenating Columns in SQL, a useful technique for handling database records.

 

Takeaways:

  • Correlated Subquery is an important concept of SQL that works on the same or different tables.
  • There is a big role for Inner Query & Outer Query in the Correlated Subquery of SQL.
  • The Correlated Subquery is different from the Join Statement in SQL.
  • The Correlated Subquery helps to filter the database with different queries simultaneously.
  • The Correlated Subquery can be implemented with other clauses as well.
  • If any other clauses are used with a Correlated Subquery, then the Nested Operation will be developed.

 

Frequently Asked Questions:

1) What is the main difference between a regular subquery and a correlated subquery in SQL?

A regular subquery runs once and returns a result that the outer query uses. But a correlated subquery depends on values from the outer query. Because of this dependency, it executes once for every row processed by the outer query.

2) Why can correlated subqueries sometimes cause performance issues?

A correlated subquery runs repeatedly for each row in the outer query. If the table contains a large number of rows, the database must execute the inner query many times. This repeated execution can slow down the query if it is not optimized.

3) When should you use a correlated subquery instead of a JOIN?

A correlated subquery is useful when you need row-by-row comparison or conditional filtering. It is often used in cases like finding rows with values greater than an average or related condition. However, in many large datasets, a JOIN with aggregation can perform faster.