When Replacing a JOIN with a Subquery Improves Performance
Although JOINs are generally efficient, there are specific scenarios where replacing a JOIN with a subquery (especially EXISTS or a scalar subquery) can dramatically improve performance. This usually happens when a JOIN produces large intermediate result sets or multiplies rows unnecessarily.
JOINs can produce duplicates when the joined table contains multiple matches.
This leads to massive intermediate result sets that MySQL needs to sort, buffer, or group.
A subquery (EXISTS or IN) avoids row multiplication because it only checks existence.
If a customer has 1000 orders, that customer's row appears 1000 times. MySQL must process and filter these duplicate rows.
MySQL stops scanning as soon as it finds the first matching order. This reduces I/O, CPU usage, and memory consumption.
JOINs often pull millions of rows before filtering.
A subquery can apply the filter first, using indexes efficiently.
This allows MySQL to eliminate unnecessary row comparisons.
If sales is indexed on year, product_id, the subquery filters down to only relevant rows, which is far faster than joining millions of rows first.
JOINs involving GROUP BY or DISTINCT may trigger on-disk temporary tables.
EXISTS usually avoids sorting and temporary tables entirely.
This reduces disk I/O and memory pressure.
JOINs return full row data, even if not needed.
EXISTS stops at the first match, returning a simple boolean check.
This is much faster on large tables.
Replacing a JOIN with a subquery improves performance when JOINs produce large intermediate results, when only existence checks are needed, when filtering can be pushed into the subquery, and when avoiding on-disk temporary tables is critical. EXISTS-based subqueries often provide superior performance for large, selective datasets.