MySQL query optimization: a practical guide

A slow query is rarely just one slow statement. In a MySQL application it can increase API latency, hold locks longer, consume buffer pool and CPU, and make dashboards feel unreliable just when users need them most. The goal is not to add indexes until the query looks fast on your laptop. The goal is to understand what MySQL is doing, reduce the amount of work it has to do, and give the optimizer an access path that still works when the table is ten times larger. This guide uses MySQL 8 examples, but most ideas also apply to recent MySQL-compatible systems.

2026-05-22

Start with the plan, not the guess

When a query is slow, first answer three questions:

How many rows does MySQL need to read?
Which indexes, if any, does it use?
Does it sort, aggregate, or materialize a large intermediate result?

Use EXPLAIN for a quick plan and EXPLAIN ANALYZE when you need actual runtime details.

EXPLAIN ANALYZE
SELECT o.id, o.order_date, o.total_amount
FROM orders o
WHERE o.customer_id = 42
ORDER BY o.order_date DESC
LIMIT 10;

In MySQL, pay special attention to these columns and messages:

type: const, ref, and range are usually better than ALL.
key: the index MySQL chose. If this is NULL, MySQL is scanning without an index.
rows: the estimated number of rows MySQL expects to inspect.
filtered: the estimated percentage of rows left after applying the condition.
Extra: look for Using index, Using where, Using temporary, and Using filesort.

Using filesort does not always mean a disk sort, and it is not always bad. It means MySQL cannot return rows in the required order directly from the chosen index. That becomes expensive when the input is large.

A useful first check is whether the plan shape changes from a broad scan to an indexed lookup. Before adding the customer/date index, the plan might look roughly like this:

type: ALL
key: NULL
rows: 1200000
Extra: Using where; Using filesort

After adding an index that matches the filter and order, you want to see a narrower access type and the expected index name:

type: ref
key: idx_orders_customer_date
rows: 42
Extra: Using where

The exact numbers depend on your data, but the direction matters: fewer rows examined, a chosen index, and less temporary sorting work.

A repeatable optimization workflow

1. Measure the real query

Run the query with realistic parameters and data volume. A query that is fast for one customer can be slow for another if the data distribution is skewed.

Useful measurements:

wall-clock runtime
rows returned
rows examined in EXPLAIN ANALYZE
slow query log or Performance Schema data for production-like workloads
whether the query creates temporary tables or filesorts
whether it runs alone or under concurrent write load

2. Reduce the rows early

Most query tuning starts here. Filter before joining when possible, avoid unnecessary columns, and make predicates easy for MySQL to use with an index.

3. Add the right index

Indexes help when they let MySQL skip most of the table or avoid a large sort. A good index matches the query shape: equality filters first, then range filters, then ordering columns when possible.

4. Re-check the plan

After each change, compare the old and new plan. A query rewrite that looks cleaner is not automatically faster, and an index that helps one query can hurt writes or be ignored by the optimizer.

Example: latest orders for a customer

Imagine this dashboard query:

SELECT id, order_date, total_amount, status
FROM orders
WHERE customer_id = 42
ORDER BY order_date DESC
LIMIT 10;

Without a useful index, MySQL may scan many rows for customer_id = 42, sort them by order_date, and then keep only ten.

A better index matches the filter and the order:

CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date DESC, id);

Why this helps:

customer_id lets MySQL jump to one customer’s orders.
order_date DESC lets MySQL read the newest rows first.
id gives a stable tie-breaker when multiple orders share the same timestamp.

Now the query can usually stop after reading the first ten matching index entries instead of sorting every order for that customer.

If the query is critical and you want it to be covered by the index, include the selected columns in the index:

CREATE INDEX idx_orders_customer_date_covering
ON orders (customer_id, order_date DESC, id, total_amount, status);

MySQL does not have separate INCLUDE columns for covering indexes. Every column in a secondary index is part of the index structure, so covering indexes must be used deliberately. They can speed up reads, but they also take storage and slow down writes. Wide text columns are usually poor candidates for this pattern; cover the small columns that make a critical query cheaper, not every column the application might display.

Write predicates MySQL can use

Avoid functions on indexed columns

This query looks reasonable, but it makes the indexed column harder to use:

-- Avoid
SELECT id, total_amount
FROM orders
WHERE DATE(order_date) = '2026-05-01';

Prefer a range predicate:

-- Prefer
SELECT id, total_amount
FROM orders
WHERE order_date >= '2026-05-01'
  AND order_date <  '2026-05-02';

This matters when order_date is a DATETIME or TIMESTAMP. The index is ordered by the original timestamp values, such as 2026-05-01 09:30:00, not by the result of DATE(order_date). When the query wraps the column in a function, MySQL may need to calculate that function for many rows before it can compare the date.

The range version keeps the indexed column unchanged. MySQL can jump to the first value on 2026-05-01, scan forward through that day, and stop before 2026-05-02. That usually means fewer rows examined, less CPU, and less pressure on the buffer pool.

The point is not that = is bad. Equality is excellent when it compares directly against the indexed column. If order_date were just a DATE column, = would be perfectly fine:

-- Good equality predicate
WHERE customer_id = 42

-- Good when order_date is a DATE column
WHERE order_date = '2026-05-01'

Use the range pattern when a timestamp column needs to match a whole day, month, or other time window.

Avoid implicit type conversions

If customer_id is an integer, compare it to an integer:

-- Avoid
WHERE customer_id = '42'

-- Prefer
WHERE customer_id = 42

Implicit conversions can produce surprising plans, especially when the column type and literal type do not match.

Be careful with leading wildcards

Normal b-tree indexes can help with prefix searches, but not with arbitrary substring searches:

-- Can use an index on sku in many cases
WHERE sku LIKE 'ABC%'

-- Usually cannot use a normal b-tree index effectively
WHERE sku LIKE '%ABC%'

If substring search is a real feature, consider a dedicated search strategy instead of expecting a normal index to solve it.

Composite indexes: order matters

MySQL uses composite indexes from left to right. This is often called the leftmost prefix rule.

For this query:

SELECT id, order_date, total_amount
FROM orders
WHERE tenant_id = 'foo'
  AND customer_id = 42
  AND order_date >= '2026-01-01'
ORDER BY order_date DESC
LIMIT 20;

A useful index is:

CREATE INDEX idx_orders_tenant_customer_date
ON orders (tenant_id, customer_id, order_date DESC, id);

The equality columns come first: tenant_id, then customer_id. The range and ordering column order_date comes after them. This lets MySQL narrow the search before scanning the date range.

Avoid building indexes one column at a time and hoping MySQL combines them well:

CREATE INDEX idx_orders_tenant ON orders (tenant_id);
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_date ON orders (order_date);

MySQL can sometimes use index merge, but a composite index that matches the query is usually more predictable and faster.

Sorting and grouping: avoid large temporary work

In MySQL, expensive ORDER BY and GROUP BY queries often show Using temporary or Using filesort in EXPLAIN.

Consider a dashboard widget that shows the biggest customers this year:

SELECT customer_id, SUM(total_amount) AS revenue
FROM orders
WHERE order_date >= '2026-01-01'
  AND order_date <  '2027-01-01'
GROUP BY customer_id
ORDER BY revenue DESC
LIMIT 10;

MySQL has to aggregate all matching orders before it knows the top ten customers. An index can help with the date filter, but it cannot magically avoid calculating the totals.

The ORDER BY revenue DESC is especially important: revenue is computed after GROUP BY, so an index on (order_date, customer_id) can reduce the rows entering the aggregate, but it cannot return the final result already sorted by revenue.

Good options depend on the product requirement:

If the widget is shown often, maintain a summary table such as customer_revenue_daily.
If exact freshness is not required, precompute the result on a schedule.
If the date range is small, index the filter so MySQL reads fewer rows.

For the raw query, this index can reduce the input:

CREATE INDEX idx_orders_date_customer
ON orders (order_date, customer_id);

But if this query is run constantly over millions of rows, a summary table is often the real fix. Indexes reduce work; they do not remove the need to aggregate large datasets.

`LIMIT` helps only after MySQL can find rows cheaply

LIMIT 10 does not automatically make a query cheap. It helps when MySQL can use an index to find the first ten rows in the requested order.

Fast pattern:

SELECT id, order_date, total_amount
FROM orders
WHERE customer_id = 42
ORDER BY order_date DESC
LIMIT 10;

Matching index:

CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date DESC, id);

Slower pattern:

SELECT id, order_date, total_amount
FROM orders
WHERE status = 'paid'
ORDER BY total_amount DESC
LIMIT 10;

If status = 'paid' matches most rows and there is no useful order-preserving index, MySQL may still need to inspect and sort a large part of the table before returning ten rows.

For pagination, avoid deep offsets on large tables:

-- Expensive at high page numbers
SELECT id, order_date
FROM orders
WHERE customer_id = 42
ORDER BY order_date DESC, id DESC
LIMIT 50 OFFSET 50000;

Prefer keyset pagination:

SELECT id, order_date
FROM orders
WHERE customer_id = 42
  AND (order_date, id) < ('2026-05-01 12:00:00', 987654)
ORDER BY order_date DESC, id DESC
LIMIT 50;

With an index on (customer_id, order_date DESC, id DESC), MySQL can continue from the last seen row instead of counting through thousands of skipped rows.

Because the query orders by both columns descending, the next page asks for rows that come before the last seen (order_date, id) pair. If the tuple comparison feels too compact, the same idea can be written more explicitly:

WHERE customer_id = 42
  AND (
    order_date < '2026-05-01 12:00:00'
    OR (order_date = '2026-05-01 12:00:00' AND id < 987654)
  )

Existence checks, counts, and fetching rows are different jobs

Use `EXISTS` for boolean checks

If you only need to know whether a row exists, do not count all matches:

-- Avoid
SELECT COUNT(*)
FROM orders
WHERE customer_id = 42
  AND status = 'open';

-- Prefer
SELECT EXISTS (
  SELECT 1
  FROM orders
  WHERE customer_id = 42
    AND status = 'open'
);

EXISTS can stop as soon as MySQL finds the first matching row.

Use `COUNT(*)` when you really need the count

Counting is real work. Keep the predicate index-friendly:

SELECT COUNT(*)
FROM orders
WHERE order_date >= '2026-01-01'
  AND order_date <  '2027-01-01';

An index on order_date can help MySQL count the matching range, but it still has to examine the matching index entries.

Fetch one row without sorting unless order matters

If any matching row is fine:

SELECT id
FROM orders
WHERE customer_id = 42
LIMIT 1;

If the newest row matters, make the order explicit and index for it:

SELECT id, order_date
FROM orders
WHERE customer_id = 42
ORDER BY order_date DESC, id DESC
LIMIT 1;

Joins: reduce before you multiply

Joins become expensive when large row sets meet each other. The most common mistake is joining detail tables first and filtering or aggregating later.

Suppose you need revenue by product category for one month:

SELECT p.category_id, SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.order_date >= '2026-05-01'
  AND o.order_date <  '2026-06-01'
  AND o.status = 'paid'
GROUP BY p.category_id;

Helpful indexes:

CREATE INDEX idx_orders_status_date_id
ON orders (status, order_date, id);

CREATE INDEX idx_order_items_order_product
ON order_items (order_id, product_id);

The first index helps MySQL find the paid orders in the month. The second helps it join from those orders into their items.

If order_items is huge and this report runs often, pre-aggregating monthly revenue can be better than repeatedly joining and grouping raw order lines.

Also watch for accidental many-to-many joins. If a join unexpectedly multiplies rows, a later GROUP BY may hide the correctness issue while still forcing MySQL to process a much larger intermediate result.

`OR` vs joins for index lookups

OR is not automatically bad. It is fine when the conditions are simple and MySQL can still use a good access path. But OR becomes risky when each branch wants a different index, or when the condition appears inside a join predicate.

Example:

SELECT id, order_date, total_amount
FROM orders
WHERE tenant_id = 'foo'
  AND (external_order_id = 'A-10001' OR marketplace_order_id = 'B-90001');

If you have separate indexes for each lookup shape, MySQL may not pick the best path for both sides of the OR:

CREATE INDEX idx_orders_tenant_external
ON orders (tenant_id, external_order_id);

CREATE INDEX idx_orders_tenant_marketplace
ON orders (tenant_id, marketplace_order_id);

One clear rewrite is to split the lookup into two index-friendly queries:

SELECT id, order_date, total_amount
FROM orders
WHERE tenant_id = 'foo'
  AND external_order_id = 'A-10001'

UNION ALL

SELECT id, order_date, total_amount
FROM orders
WHERE tenant_id = 'foo'
  AND marketplace_order_id = 'B-90001';

Each branch can use the index that matches its predicate. Use UNION ALL when duplicates are impossible or acceptable. Use UNION, or deduplicate by primary key in the application, when the same order can match both lookup values.

The same issue can show up in joins. This pattern is hard to optimize because the join can match through two different columns:

-- Avoid when possible
SELECT o.id, m.mapping_id
FROM orders o
JOIN marketplace_mappings m
  ON m.tenant_id = o.tenant_id
 AND (m.external_order_id = o.external_order_id
      OR m.marketplace_order_id = o.marketplace_order_id)
WHERE o.tenant_id = 'foo';

When the data model has two alternative join keys, split the join into one lookup type per branch:

SELECT o.id, m.mapping_id
FROM orders o
JOIN marketplace_mappings m
  ON m.tenant_id = o.tenant_id
 AND m.external_order_id = o.external_order_id
WHERE o.tenant_id = 'foo'

UNION ALL

SELECT o.id, m.mapping_id
FROM orders o
JOIN marketplace_mappings m
  ON m.tenant_id = o.tenant_id
 AND m.marketplace_order_id = o.marketplace_order_id
WHERE o.tenant_id = 'foo';

That gives MySQL a simpler choice in each branch: use the index for the external-order lookup, then use the index for the marketplace-order lookup.

For larger or repeated lookup workloads, the cleaner design is often a normalized lookup table with one indexed key shape:

CREATE TABLE order_lookup_keys (
  tenant_id VARCHAR(16) NOT NULL,
  lookup_type VARCHAR(32) NOT NULL,
  lookup_value VARCHAR(128) NOT NULL,
  order_id BIGINT NOT NULL,
  PRIMARY KEY (tenant_id, lookup_type, lookup_value),
  INDEX idx_order_lookup_order (order_id)
);

Then the lookup becomes a straightforward indexed join:

SELECT o.id, o.order_date, o.total_amount
FROM order_lookup_keys lk
JOIN orders o ON o.id = lk.order_id
WHERE lk.tenant_id = 'foo'
  AND lk.lookup_type = 'external'
  AND lk.lookup_value = 'A-10001';

This matters because MySQL can use one predictable composite index lookup instead of reasoning about multiple optional paths in the same predicate. As always, confirm with EXPLAIN: sometimes MySQL handles the OR well, and sometimes the rewrite removes a large scan.

`UNION` vs `UNION ALL`

UNION removes duplicates. That means MySQL may need extra temporary work. If duplicates are acceptable or impossible, use UNION ALL.

-- Avoid if deduplication is not needed
SELECT customer_id FROM orders_2025
UNION
SELECT customer_id FROM orders_2026;

-- Prefer when duplicates are acceptable or impossible
SELECT customer_id FROM orders_2025
UNION ALL
SELECT customer_id FROM orders_2026;

Keep statistics fresh

The optimizer chooses a plan based on statistics. If those statistics are stale or too coarse for skewed data, MySQL may choose a bad join order or the wrong index.

This can happen after ordinary product growth. Imagine an orders table that has only one tenant for several weeks. During that time, tenant_id = 'foo' matches almost every row, so MySQL may learn that tenant_id is not selective and prefer an index that starts with order_date.

CREATE INDEX idx_orders_date ON orders (order_date);
CREATE INDEX idx_orders_tenant_date ON orders (tenant_id, order_date);

Then a second tenant is added and starts importing a large amount of data. A tenant-filtered dashboard query now needs one tenant’s rows for a date range:

SELECT COUNT(*)
FROM orders
WHERE tenant_id = 'bar'
  AND order_date >= '2026-05-01'
  AND order_date <  '2026-06-01';

If MySQL still has statistics from the one-tenant phase, it may underestimate how useful tenant_id has become and choose idx_orders_date. That can force it to scan all tenants in the date range and filter afterward. After statistics are refreshed, MySQL has a better chance of choosing idx_orders_tenant_date, jumping directly to the tenant and date range it needs.

ANALYZE TABLE does not force MySQL to choose a specific index. It gives the optimizer better estimates so its cost calculation is less likely to be based on old data.

Refresh table statistics with:

ANALYZE TABLE orders;

For skewed columns, MySQL 8 histograms can sometimes help the optimizer estimate selectivity better:

ANALYZE TABLE orders
UPDATE HISTOGRAM ON status, tenant_id;

Use histograms selectively. They can help for columns that are filtered often but are not good index candidates, especially when values are unevenly distributed.

A MySQL tuning checklist

[ ] Did I inspect the query with EXPLAIN or EXPLAIN ANALYZE?
[ ] Is MySQL scanning too many rows (type = ALL, high rows, low filtered)?
[ ] Can I return fewer columns?
[ ] Can I filter before joining or aggregating?
[ ] Are predicates sargable, with no functions on indexed columns?
[ ] Do literal types match column types?
[ ] Does a composite index match the equality filters, range filters, and ordering?
[ ] Is Using temporary or Using filesort caused by a large input?
[ ] Is LIMIT backed by an index that returns rows in the requested order?
[ ] Am I counting rows when I only need to know whether one exists?
[ ] Did I re-test with realistic parameters and data volume?

Closing thoughts

Good MySQL optimization is mostly about reducing work. Start with the execution plan, make the query easy to reason about, then add indexes that match real access patterns. If a query still needs to aggregate or sort millions of rows for every request, the better solution may be a summary table, a cached result, or a different product interaction.

Fast queries are not the ones with the most indexes. They are the ones where MySQL has a short, predictable path to the rows the application actually needs.

AI Coding Agents Commit Your Bad Habits Faster