When I was a junior programmer I was sitting next to an experienced senior programmer that not only loved our PostgreSQL database but also knew SQL very well and a lot about databases in general.
Often, when I’d loudly ask the room why such is such or why something works the way it did, he would perk up, his spine seemingly realising that it’s lesson time, slide over with his chair, and start a first-principles explanation.
One day I was wondering what an index-only scan is.
Database indexes are like phone books, he said. You know how in a phone book you have the first name, last name, name of the street, number of the house, and phone number for each person living in a city?
Say you want to find a person and ask them how old they are, but you don’t know where they live. You could, of course, walk down every street in the city, knock at every door, and ask everyone who opens: are you who I’m looking for? That would be a sequential scan in a database — no index used. You look at every row in the table until you find the one you’re after.
With a phone book you don’t have to walk down every street. The phone book is an index of the columns first name, last name, street, house number, phone number. If you want to find someone named Enzo McNacho you look them up in the phone book by opening the pages that have the names starting with M, find the exact page that has Enzo, get their address, and walk to their house. Index used — no sequential street walking and door knocking needed. Find where the row is and open it directly.
So what’s an index-only scan then, I asked.
Say you want to find out on which street someone lives or what their phone number is. Both pieces of data are contained in the phone book itself — in the index. You don’t have to walk to their house, knock, and ask them. You find their entry in the phone book by looking up their name and there it is — street and phone number.
That’s what an index-only scan means in database land too: it gets you the data you want by only looking at the index, without having to actually open the main database files on disk. It’s a SELECT name FROM people WHERE name = ‘Enzo McNacho'
with an index on name
.
I’ve thought of that analogy many times in the past ten years because it’s so great, but only now did I realise that it also helps explain why sometimes a sequential scan is better: if you want to query a whole lot of rows, or all the rows of a tiny table, it makes sense to find the first entry via the index and then just scan the rest of the rows that are next to it. That’s faster than finding an entries and opening each row you come across.
Just like it’s faster to walk down every street in sequence if you want to create a survey and actually do want to knock on every door. Or when you live in a tiny town with a single street and you want to ask everybody what their favorite meal is.
In those cases, you don’t want to carry the phone book with you and instead just go knocking.
(Cover photo is also by Brewer Bob, CC BY-SA 4.0)
David j malan has explained this concept very well in CS50