Joy & Curiosity #66
Interesting & joyful things from the previous week
The number one thing that I keep thinking about these days is how to reconcile the following in my head:
Friends of mine — very experienced and very good programmers — are saying loudly and publicly and in private too that agents are useless. It’s slop, they say. It’s all just averages. No brilliance, no creativity, and half of it doesn’t work. It can’t do what I do. They point out where it failed to make an edit. Where it gave a function a weird name. Where it didn’t run the tests.
Then there’s my experience with using them. I see Amp knock out stuff that would’ve taken me days and, as a junior engineer, possibly weeks, if I had done it myself. I saw Amp built a tiny and brilliant renderer for box-drawing characters in my terminal emulator. It does performance optimizations, it builds very good looking animations in our TUI framework. It built this and this and this with no hands on the wheel. I saw it build a one-off migration tool for our production database, carefully balancing tradeoffs here to build something that’s reliable and inspectable, but also one-off, with as little code as I would’ve written. It helped me deploy and run it and then refine it. When it can’t find jq on a machine, it uses Python for one-liners, doesn’t matter. Yesterday I had it help me configure Home Assistant on my Raspberry Pi and I just sat there and talked to it like a true assistant.
It’s very good. Very, very, very good. So good that I’m starting to think the whole “we will all just delegate work to agents” might not be as unrealistic as it sounds like.
But then someone writes somewhere that these agents “can barely write code” and I can’t help but silently wonder: what the hell are you talking about man?
My teammate Lewis wrote about how he does context management in Amp and how 200k of tokens is plenty if you use handoff, references, and forks. And then he also shipped the thread map, which, when he first showed it off in Tallinn a week ago, caused a whole scene in the room, with people saying “whoa” and “holy shit” out loud. Not kidding. People literally gathered around Lewis saying ooh and aah while showed off what he had built. That, in turn, made me get up and walk over and also say “holy shit, dude”.
Ryan and I talked to Mckay Wrigley about Opus 4.5 and how we see the future of software development. It’s all happening, isn’t it.
Obie Fernandez (the Obie Fernandez in case you also used Rails between 2010 and 2015) on “what happens when the coding becomes the least interesting part of the work”. This is some of the realest, truest, most experienced writing on the topic of agents taking over. I agree with nearly everything he writes here.
Related, Paul Buchheit: “It was always possible to clone software, but doing so was costly and time consuming, and the clone would need to be much cheaper, making any such venture financially non-viable. With AI, that equation is now changing.” Read the other two paragraphs.
Great: The Resonant Computing Manifesto. Before I started reading, just by name and context alone, I had assumed this it was going to be another “we have to fight the slop!” chant, but it’s not. It actually sees a chance in AI: “This is where AI provides a missing puzzle piece. Software can now respond fluidly to the context and particularity of each human—at scale. One-size-fits-all is no longer a technological or economic necessity. Where once our digital environments inevitably shaped us against our will, we can now build technology that adaptively shapes itself in service of our individual and collective aspirations. We can build resonant environments that bring out the best in every human who inhabits them.” But it’s a chance we have to take: “Regardless of which path we choose, the future of computing will be hyper-personalized. The question is whether that personalization will be in service of keeping us passively glued to screens—wading around in the shallows, stripped of agency—or whether it will enable us to direct more attention to what matters.”
My productivity app is a never-ending .txt file. Mine is Apple Notes.
How Not to Waste Your Life: “it is only, in poet Mario Benedetti’s shimmering words, when we cease sparing ourselves and start spending ourselves that we come truly alive.” When we cease sparing ourselves and start spending ourselves. And then: “Because the mind is the crucible of experience and perception, there is no greater waste of life than the waste of mind.” And then, on Hawthorne: “What fortifies the spirit to do its work in the world, be it art or activism, often appears on the surface as wasted time — the hours spent walking in a forest and watching the clouds over the city skyline and pebble-hunting on the beach, the purposeless play of the mind daydreaming and body dancing, all the while ideas and fortitudes fermenting within.” How very hard is it to live your life like that and how very easy it sounds, doesn’t it.
How to be exceptional at anything. Alternative name: how to be someone people want to work with.
GPT-5.2 came out. A new frontier model is very interesting, of course, but this part here, in the “economically valuable tasks” section of the post, really stuck with me: “GPT‑5.2 Thinking beats or ties top industry professionals on 70.9% of comparisons on GDPval knowledge work tasks, according to expert human judges. These tasks include making presentations, spreadsheets, and other artifacts. GPT‑5.2 Thinking produced outputs for GDPval tasks at >11x the speed and <1% the cost of expert professionals, suggesting that when paired with human oversight, GPT‑5.2 can help with professional work.” It’s been years since I’ve had to create a spreadsheet or a presentation for work, so it’s a bit of a blind spot for me, but all I could think was: yeah, they’re really going after office work now, just like Anthropic.
Hacker News user keepamovin used Gemini 3 to hallucinate the Hacker News frontpage ten years from now: Hacker News frontpage 2035. It’s very good! My favorite: “Show HN: A text editor that doesn’t use AI”
That Hacker News experiment caused Andrej Karpathy to look backwards in time. He built the Hacker News time capsule in which GPT-5.1 grades comments from ten years ago on their prescience.
Brian Cantrill on using LLMs at Oxide. There’s a lot of good stuff in there, but this, I think, is my favorite distillation: “Finally, LLM-generated prose undermines a social contract of sorts: absent LLMs, it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion. (That is, it is more work to write than to read!) For the reader, this is important: should they struggle with an idea, they can reasonably assume that the writer themselves understands it — and it is the least a reader can do to labor to make sense of it. If, however, prose is LLM-generated, this social contract becomes ripped up: a reader cannot assume that the writer understands their ideas because they might not so much have read the product of the LLM that they tasked to write it.” Makes me think of all the times I’ve asked other engineers to write an RFC about something they propose we should do. Often the goal wasn’t the RFC, but that someone sits down and thinks.
Wish I could remember how I ended up reading this: “Sometimes, the most romantic thing a person can do is hand you a thought they’ve been carrying for years. They do so gently, as though it might break in your hands. It could be a memory wrapped in a metaphor or a belief they’ve never said aloud until now. These moments are quiet offerings as invitations to step into their interior world.”
Daniel Miessler in The Bubble Is Labor: “Companies only hire people because they can’t do all the work themselves. […] In other words, the only reason the current labor market (and our economy that’s based on it) exist at all is because there’s a group of founders/owners who need lots of help producing their goods and services. They are not required by anyone to hire me or you to help them if they don’t need that help. And the exact moment they can do the work themselves, they will, and not a second after.” I really don’t mean this in a smug why-isn’t-everyone-as-smart-as-me way and maybe it’s because I’ve only ever worked in start-ups or small companies or maybe it’s because my parents and grandparents have always been self-employed, but: yeah, of course, and the sooner you figure this out, the better your career will go.
“What happens when Cormac McCarthy rewrites your economics paper?”
One of my many computing blind spots comes from the fact that I don’t use a lot of Microsoft products. I have no clue how probably 90% of all office workers in the world use their computers. So this article here, on Microsoft’s AI offerings failing, was very interesting: “Dare I say it, Gemini is actually helpful, and can usually execute tasks you might actually need in a day to day job. ‘Find me a meeting slot on this date to accommodate these timezones’ — Gemini will actually do it. Copilot 365 doesn’t even have the capability to schedule a calendar event with natural language in the Outlook mobile app, or even provide something as basic as clickable links in some cases. At least Xbox’s Gaming Copilot has a beta tag to explain why it fails half of the time. It’s truly absurd how half-baked a lot of these features are, and it’s odd that Microsoft sought to ship them in this state. And Microsoft wants to make Windows 12 AI first? Please.”
I gave a five minute speech at the pub last night about this: “I’ll say it every month. But competent use of Excel or Google Docs could have wiped out 30% of white collar jobs , but that’s not how it works. Heck, 40% of roles could be eliminated if people just knew how to run a meeting better and could prioritize.“ The big question isn’t whether these models are amazing (they are), but whether it’ll matter or not.
Admittedly, I didn’t get all of it, but this was very interesting: seeing like an agent. Good punchline too: “internal markets improve on hierarchy when coordination costs fall - False (GTM dominates Engineering even with full information)” And the article led me to read up on The Nature of the Firm, which I thought was very handy.
Adrien Grand and Morgan Gallant on the turbopuffer blog: Vectorized MAXSCORE over WAND, especially for long LLM-generated queries. The conclusion is very interesting: “‘Serial and dumb’ can often beat ‘smart and random’ on modern CPUs. Furthermore, agents write longer queries than humans, so it’s becoming increasingly important for text search to scale well with the number of terms. These factors force us to periodically revisit our choices of algorithms and their implementations. For text search, this means that the cursor has shifted more and more from WAND to MAXSCORE, which scales better with the number of terms and can be tuned to be more CPU-friendly.” (I pair-read this with GPT-5.2 to try it out and asked it about a bunch of the scoring-related algorithms, recommended.)
More Nano Banana prompting tips, this time straight from Google.
Talking about that Nano Banana: let’s not forget how confusing and absurd and funny (in the Kafka way) it is to even get started with Google models. Raise your hand and leave a like if you too thought “what the hell is a model garden, man” the first time you saw it.
I can’t tell you what it’s about, to be honest, and I don’t even know whether this is the right name, but Shopify Editions Winter ‘26 “The Renaissance Edition” is a very beautiful and fascinating page. I’m usually a bit meh as soon as scroll effects show up, but this one? Very nice.
Very often I think of this pair of Emmett Shear tweets: “People who have only ever worked for companies structured into clear hierarchy are missing an entire mode of work and collaboration which is vastly more … alive, for lack of a better word. The trusted-peers-on-a-mission mode it incredible. Early startups, bands, movie crews, sports teams, happy families on vacation…they mostly operate in this mode especially at first. You’re just jamming together, people are looking out for the group and themselves as one.”
Good reminder: Notes on Internet Addiction. I’m no saint and I catch myself ending thoughts with “… but it’s also been so good to me!” The only thing that truly has worked: set a 30 minute per day limit for all social media apps on my phone, lock that limit behind a password, and — here comes the crucial bit that makes this work — and let my wife set the password. Believe me: you don’t want to ask your wife “can you give me 20 more minutes of Twitter?” very often.
Hell yes, SVGs rock: an SVG is all you need.
This was inspiring and made me want to build more things and change how I approach building them: Rebuilding Our Website for the Agent Era. “Teams are buying powerful tools and wondering why everything still feels slow. The answer is usually that they’re running agents on infrastructure designed for humans. Once you rebuild the infrastructure, everything becomes this fast.” I’ve been saying this for the past year: for decades developer tooling had to adapt to the codebase. “Oh, your tool doesn’t work in our monorepo? Sorry, no can do.” Now, though, with these agents, the monorepo will adapt, is my bet. The pull is too strong. The big question: what does the codebase of the future look like? That’s what we want to find out at Amp.
Eli Bendersky is revisiting Jack Crenshaw’s “Let’s Build a Compiler” — lovely!
My Christmas present to you: Tinker, Tailor, Soldier, Spy (1979) Edited, HD, English Subtitles. Yes, yes, this is the BBC Tinker Tailor, the one with Alec Guinness playing George Smiley, all in one video, on YouTube, with freaking subtitles. You know how rare that is? It’s rare, man. I love the 2011 movie and watched it many times and last week I finally had the good sense to just search on YouTube for the BBC version, which I’ve been wanting to watch for years now, but it’s very hard to find in Germany, in English, with subtitles. It’s so good watching this and comparing it to the movie and see what choices they made differently.



agent frontier is very, very jagged.
sometimes they're doing stuff i haven't thought they can do, and failing on stuff that feels super obvious.
and it's not yet clear how does one grasp that frontier, like whether the one should try harder with prompting, with another agent -- or the issue is with something more fundamental, and it's 5x faster to go the old fashioned way for this particular issue
because the frontier keeps changing, and not necessarily improving in every spot!
I enjoyed reading that issue.
And your pub speech was probably so spot on that it hurt.