Watching A Pathological System
The first time I came across the term pathological system I had to look up what it means. The definition I found was similar to this:
pathological system: exhibits extreme, abnormal, or self-destructive behaviors
Second time I came across it was in Bryan Cantrill’s talk on pathologically performing systems and defined it as “systems that don’t work.”
It stuck with me. (The idea, not the word of course. That I had to look up again just now.) There’s something really fascinating about systems that don’t perform as they should. I mean: behavior that no one designed, no one would expect, and possibly no one understands — what’s more fascinating than that?
In my life as a software developer I’ve had quite a few chances to observe pathological systems. Most of them early on, when I worked on SaaS applications and the pet/cattle distinction wasn’t that common yet and your production system was a nice pet that you check in on every day, making sure it doesn’t get sick.
But this week brought another chance and I spent with a colleague mainly doing one thing: watching a pathological system and trying to fix it, patch by patch.
Watching a pathological system is a contemplative act: you stare at graphs, you poke at the database and run some queries, you look at this dashboard and that dashboard and try to correlate behavior, you read logs to try to fill in gaps of knowledge. You create hypotheses and theories, some of them very short-lived, others maybe leading to a solution.
The system – running in data centers far away from you – starts to feel much more real and, dare I say it in times of AI, alive when you spend two days looking at three graphs that go up and down depending on which button you press.
You say things like these out loud:
“Oh, now why did this go up? It shouldn’t, right?”
“90% of these graphs are useless.”
“Okay, we need X.”
“Yup, no, that’s what the code does. Looks like I even reviewed and approved it.”
“Ahhh, you know wha- never mind.”
“Has anyone ever seen this?”
“Why would a restart even fix this?”
“This does not make sense.”
“Fuck it, let’s try it.”
Every line here is a lesson. Watching a pathological system you find bugs and incorrect assumptions in code. You learn how the system actually behaves and how wrong your assumptions of what a correctly-behaving system would look like were.
As Bryan Cantrill puts it in his slides for another talk:
But odd behavior is worth understanding: at worst, it enhances our own understanding (that is, that the behavior is in fact expected) but odd behavior can be an indicator of something much more deeply amiss – and in fact represents an otherwise innocuous presentation of an important defect!
So, next time you come across something odd: treat it as a chance to learn something.
Thanks for reading. Subscribe here if you want to give me the confidence to try to use more semicolons in the future: