Understanding query performance
A course on why your database queries are slow and how to fix them.
Become the person others come to when they need the database to work.
Why is my database so slow?
Your queries work, but they're slow. Five seconds to insert a row seems ridiculous. As the table has grown, it's slowed until its unusable. And why are queries to that other table slow when it's small and the machine is nearly idle? You know it shouldn't be this way.
But what do you change? Do you add more filters to your WHERE clause? And if you remove that ORDER BY it's suddenly fast. And how is aggregation slow in this query? Is this big enough to need indexes? But which columns do you index? Do you need to switch to a different database? What change will help seems like guesswork.
Your senior engineer at work told you something. The DBA told you something else. At this point you're hoping that something will work.
Imagine knowing exactly what to do
Imagine looking at a query and saying, "Oh, I know why this is probably slow." Imagine making a change and knowing what it's going to do to performance. Imagine knowing beforehand how to keep your queries fast as the tables get big. Imagine a junior engineer coming to you asking what columns to index and being able to explain exactly which ones to index and why.
Learn from a database veteran
Understanding Query Performance gives you the mental models used by the people who run extreme databases, distilled from many systems over twenty years, including:
- Distributed map-reduce for particle physics.
- Facebook's core ads database.
- Capturing gigabit network traffic for interactive analysis.
- High throughput sequencing analysis in biology.
- Microsoft's CosmosDB and managed Cassandra.
It doesn't matter if it's NoSQL or SQL, distributed or a single machine, ultra high write rate or giant queries over static data. The same mental models apply, the same way that big-O notation applies whether you're writing Python or Java or Rust.
The course is organized into four parts:
- How to make a query fast.
- Learn the mental models to reason about why adding that ORDER BY clause tanks your query performance, how to figure out the fast path for your JOINs, and what indexes are going to help and what are going to make things worse.
- How to make lots of queries fast.
- Sometimes your database machine's CPU is nearly idle, its memory is free, and its disk barely being touched, yet your queries are slow. Learn how untangle this extreme case, and many other less extreme ones.
- How to monitor performance.
- Once you've untangled the problems with the queries and indexes themselves, you're limited by your hardware. Learn what the primary metrics to monitor are, how they relate to query performance, and how to plan adjusting your hardware to get even better performance.
- How to make distributed queries fast.
- When you make the jump to distributed databases, a whole new source of friction arises. Learn how to reason about querying distributed databases and how to keep your queries running fast on them.