The Statistics Behind Every SQL Query

Published on: 2025.01.29　Last updated: 2026.04.16

thumbnail: illustration of two pc users and a robot

Table of Contents

DBMS and the Query Optimizer

Hi, I’m Aki Matsumura from the operational efficiency project team at Hirose Paper. As I’ve been revisiting the fundamentals of database systems, I’ve found myself genuinely impressed by the sophistication built into modern DBMS engines.

When developers write SQL, most of us don’t think too hard about what happens next. We type a query, hit run, and results appear. But between those two moments, the database is doing something remarkable: figuring out the most efficient way to find what you asked for.

We might add an index on a column here and there. But whether that lookup is handled in linear time or logarithmic time? Most developers never think about that — and in most cases, they don’t have to. The DBMS handles it. That’s both its genius and the reason so many developers never learn what’s actually happening under the hood.

Database Optimization and the Developer

In my previous role at a web development firm, the heaviest data work we did was reading from and writing to a database. We didn’t write algorithms to process large data sets directly in application code — the database absorbed that load. As a result, I never had much reason to think carefully about computational complexity.

That’s a reasonable situation to be in. But it has a downside: when you never have to optimize, you never really understand what’s being optimized for you. And as systems grow, small inefficiencies compound. A query that’s fine at a thousand rows starts dragging at a million.

That’s why I think it’s worth learning the underlying math — the basic statistics and algorithmic thinking that make database performance possible. Competitive programming is a good entry point for this. (I’ll be honest: I’m not good at it. AtCoder is mercilessly difficult.)

The Query Optimizer

Here’s the part that genuinely fascinated me: the query optimizer — the component of a DBMS that decides how to execute a query — makes its decisions using statistical data.

Since 2022, Japan’s high school math curriculum has included statistics, introducing concepts like population and sample. The idea: rather than measuring an entire population, you take a statistically meaningful sample and use it to draw inferences about the whole. If you want to know the average height of Japanese adults, you don’t measure every person in the country — you measure a representative sample.

The query optimizer works the same way. When you run a query, it doesn’t scan every row of every table to figure out the best execution plan — that would defeat the purpose. Instead, it maintains pre-computed statistical summaries of the data and uses those to make an informed judgment: which index to use, which table to scan first, how to structure the join.

It’s making a rational decision from a sample, just like a statistician would. And most of the time, it gets it right.

What struck me is how human this feels. The optimizer isn’t brute-forcing a solution — it’s reasoning under uncertainty, using incomplete information to make a smart judgment call. That kind of thinking, hidden inside a piece of software we use every day, is quietly remarkable.

DBMSs are genuinely clever.

About the Author

Aki Matsumura

Joined HIROSE PAPER MFG. CO., LTD. in November 2024.

Brings a diverse professional background spanning retail, welfare services, and food service before transitioning into system development.

Currently serves as an in-house systems engineer, responsible for internal database development and system improvement initiatives across the company.

View posts by this author →