Joy & Curiosity #80
Interesting & joyful things from the previous week
Do you know how it should work? Does the agent? Or does the codebase?
Lately I’ve been thinking a lot about why sometimes using an agent leads to great results and other times it doesn’t. My current theory: it depends on what knowledge about the task at hand is encoded where.
If all the knowledge required to solve the task to your satisfaction is available either in your prompt, or in the codebase, or in the training data of the model, then things go fine.
Things go badly if there’s a gap. That is, if you wrongly assume the agent will know how to do something but it won’t because that knowledge is neither in the codebase nor in the training data.
If I ask the agent to fix a bug that has a very obvious solution, say: a button’s hover state doesn’t activate on hover, then everything you need to know to fix it is available. The problem is in the prompt, the code should explain what the button is, and what a hover state is is in the training data.
But what if there’s a bug and you don’t know even how to explain what the bug is or what the desired state is? Not good.
Or what if you tell the agent to build you a feature and you assume it does so by going over here and adding that and then going over there and adding this, but the codebase allows fifteen other ways, and the training data doesn’t say those fifteen other ways are bad? Not good.
Sometimes the codebase and its documentation contains that information through types or tests or conventions. Other times the training data tells the agent that there’s only one way to add a new endpoint in Rails or Next.js or SvelteKit. But if it’s neither in the codebase nor in the training data, then you have to put it in the prompt.
Theory is too big a word for these thoughts, yes, but I’ve been asking myself “where is the knowledge?” a lot when working with Amp this week and found it useful, so there you go, maybe you get something out of it too.
Last week I asked whether software is turning into a liquid and David Soria Parra, Member of Technical Staff at Anthropic and creator of MCP (meaning: someone who’s seen things up close), replied: “I think people don’t run the AI maximalist simulation of what this actually means and how far it will go just yet. Most code will just be ephemeral one time use”
John Regehr: Zero-Degree-of-Freedom LLM Coding using Executable Oracles. This is excellent and resonated with my thoughts from above. “When an LLM has the option of doing something poorly, we simply can’t trust it to make the right choices. The solution, then, is clear: we need to take away the freedom to do the job badly. The software tools that can help us accomplish this are executable oracles. The simplest executable oracle is a test case—but test cases, even when there are a lot of them, are weak. […] When I look at the best software testing efforts out there, there’s invariably something creative and interesting hiding inside. I feel like a lot of projects leave easy testing wins sitting on the floor because nobody has carefully thought about what test oracles might be used. Finding executable oracles for LLMs feels the same to me: with a little effort and critical thinking, we can often find a programmatic way to pin down some degree of freedom that would otherwise be available to the LLM to screw up.” I also want to quote that lovely last paragraph, but I won’t, because I want you to read everything else that leads up to it too. This is good stuff.
And here’s Mary Rose Cook, singing harmonies on top of Regehr’s lines when talking about freedom of expression and constraints for agents: Code generation that just works.
Cheng Lou has “crawled through depths of hell to bring you, for the foreseeable years, one of the more important foundational pieces of UI engineering (if not in implementation then certainly at least in concept): Fast, accurate and comprehensive userland text measurement algorithm in pure TypeScript, usable for laying out entire web pages without CSS, bypassing DOM measurements and reflow.” It’s called Pretext and it’s impressive. I mean, look at this demo! Move the orbs around! Or the ASCII one or click on the logos in this one. According to Lou, this was “achieved through showing Claude Code and Codex the browsers ground truth, and have them measure & iterate against those at every significant container width, running over weeks.” And yet the README doesn’t mention that at all. That tells me we’re past a big milestone.
If you’re on desktop, see also this dragon that’s built with Pretext.
Marc Brooker is asking: What about juniors? This is one of the most inspiring and motivating pieces of writing I’ve read in the past few months. I love the Wellington quote on engineering: “to define it rudely but not inaptly, it is the art of doing that well with one dollar, which any bungler can do with two after a fashion.” And I love Marc’s very own definition: “I believe that this is the core work of engineering: deeply understanding the problem to be solved, the constraints, the tools available, and the environment in which it operates, and coming up with an optimal solution. This requires real creativity, because the constraints are typically over constrained, and real empathy because many of the constraints come directly from human irrationality. It also requires a deep understanding of the tools available, and what those tools can and can’t do.” I also think his answer to the question is interesting and the question itself is very important. (I said similar things on last year’s You’ve Been A Bad Agent episode.)
Marc’s previous post is also great: “Over the next couple of years, the most valuable people to have on a software team are going to be experienced folks who’re actively working to keep their heuristics fresh. Who can combine curiosity with experience. Among the least valuable people to have on a software team are experienced folks who aren’t willing to change their thinking. Beyond that, it’s hard to see.”
If you read both of Marc’s posts, you’ll enjoy Pieter Hintjens’ A Tale of Two Bridges. Engineering is the art of making the tradeoffs, not building the perfect thing.
Michael Nielsen: Which Future? I’m very glad I read this. Bikini Atoll and fire safety will stay with me.
Sad news: Tracy Kidder, author of The Soul of a New Machine, has died. I highly recommend reading this book. I last did so in March of last year. And here I am again, telling you: read it, it’s fantastic. And then read Bryan Cantrill’s reflections on it.
Rands has been bitten by the agent bug: “I’ve never built more interesting, random, and useless scripts, tools, and services than I have in the last six months. The cost to go from ‘Random Thought’ to ‘Working Something’ has never been lower”
Linear: Issue tracking is dead. Look up to the sky, there’s me, in a tiny plane that’s pulling a banner saying in big red letters: told you.
This is very, very on the nose and I wouldn’t sign it without making some big changes, but there is something here that I’ve felt before, maybe not to this extent, maybe not in this exact shape, but something here resonates and makes parts of it feel true: “‘Collaboration’ is bullshit.” I don’t think Big Tech the Boogeyman is to blame (my 8-year-old had to do her first group project in school a few weeks ago — creating a stop-motion movie — and nearly lost her mind), but this this much, I think, is true: “most complex, high-quality work is done by individuals or very small groups operating with clear authority and sharp accountability, then rationalized into the language of teamwork afterward. Dostoevsky wrote _The Brothers Karamazov_ alone. The Apollo Guidance Computer came from a team at MIT small enough to have real ownership […] Communication matters, and shared context matters. But there’s a huge difference between communication and collaboration as infrastructure to support individual, high-agency ownership, and communication and collaboration as the primary activity of an organisation.”
Eoghan McCabe, CEO of Intercom, is saying the “age of vertical models is here.” I’m skeptical, because it all hinges on this idea of verticals and domain knowledge and I don’t know if that won’t be washed away by bigger models, but it is interesting: “the labs are in an interesting position where on one hand the horizontal, general purpose models are actually over-serving the market for specific use cases. E.g. their models are more generally intelligent than is needed for customer service. And on the other hand, the open-weight models are more than good enough where high quality domain specific post-training can make the resulting models superior at the special purpose jobs, and in the ways that matter to that particular job. E.g. in service, the soft factors really matter, like judgement, pleasantness, attentiveness (as well as the hard factors mentioned prior, like the ability to effectively resolve problems, quickly and cheaply).”
Google published TurboQuant, a “set of advanced theoretically grounded quantization algorithms that enable massive compression for large language models and vector search engines.” I won’t claim here to understand all of it, but I do think I understand the bit about how “PolarQuant converts the vector into polar coordinates using a Cartesian coordinate system” and that’s very cool. Also goes to show that if AI progress wasn’tt a race towards AGI and they’d all stop building bigger and bigger models, there’d be so many optimizations to make.
Systems Thinking is Brain Rot for Analysts. Refreshing.
This is the Gruber I love: “And the fucking autoplay videos, jesus. You read two paragraphs and there’s a box that interrupts you. You read another two paragraphs and there’s another interruption. All the way until the end of the article. We’re visiting their website to read a fucking article. If we wanted to watch videos, we’d be on YouTube. It’s like going to a restaurant, ordering a cheeseburger, and they send a marching band to your table to play trumpets right in your ear and squirt you with a water pistol while trying to sell you towels.”
And this is the Internet I love: 25 Years of Eggs. “Everyone needs a rewarding hobby. I’ve been scanning all of my receipts since 2001. I never typed in a single price - just kept the images. I figured someday the technology to read them would catch up, and the data would be interesting. This year I tested it. Two AI coding agents, 11,345 receipts. I started with eggs.”
Cursor’s crossroads: “It’s a story distinctly of the AI era: Cursor is four years old but already has an innovator’s dilemma, arguably outgunned by newer products in the market it popularized. Every AI startup fears OpenAI or Anthropic releasing a product directly in competition with theirs. It’s the nightmare scenario, and Cursor is living it, more quickly than Truell and his team ever expected. […] As Truell and I get ready to end our Zoom call, I notice the picture of Caro again. I think about how it took Caro six months to edit a single chapter of The Power Broker. Truell has less time than that before the next change.”
Great brain massage: Let’s see Paul Allen’s SIMD CSV parser.
Okay, now before you click the next link and close the tab right away, let me tell you: yes, I thought so too. I also thought that it’s not for me, doesn’t contain anything I didn’t know, that it’s boring old stuff, but it’s not! There’s some real whoa-moments in there: Google Has a Secret Reference Desk. Here’s How to Use It. The title is weird though, yes, but, hot damn, the
intitle: “index of” /pdfthing alone is worth it.Satisfyingly meta: Joel Meyerowitz on Photographing Giorgio Morandi’s Studio.
Stripe launched projects.dev which “lets you or your agents provision multiple services, generate and store credentials, and manage usage and billing from the CLI.” Makes total sense when you want to increase the GDP of the Internet.
Finally! Edward, Nick, Rasmus, and Julia shared the “first iteration of the Playbit runtime, our vision for building playful personal-scale software”: playbit.app.
Dappled light: “Growing up, I loved this mix of shade and sun I called ‘shun.’ Sunlight slipped through the leaves, and its tiny gaps turned into pinholes that project little dancing suns. It felt like magic.”
Note from the producer: no newsletter next week. One weekend of vacation.


