Since I’m not the biggest fan of Christmas Specials — I want to see the guy surf and not go on some adventure involving reindeer — let’s keep the recipe as it is: bag of links pointing to stuff that I found interesting this week.
You’ll next hear from me in about two weeks, because I plan to be offline (not really) until then and take a break (really). Or maybe not. Maybe I’ll write something.
Anything’s possible in the time between Christmas and New Year’s.
As you’ll see, this whole week was full of AI for me and what kicked it off, I guess, was reading the emails sent between Elon Musk and the OpenAI founders. Have to admit that I can barely judge what the publication of the emails is about, because I was so fascinated with the different writing styles. There’s emails in there that were most certainly written on a phone, quickly, and others that went through multiple drafts. Read the “business of building AGI” email sent by Ilya.
One of the most Joy & Curiosity things this week: tldraw.computer. I haven’t digested it yet but plan to over the next two weeks.
Spent my whole morning today playing around with CerebrasCoder and building little games. The inference speed is a game changer. It’s unbelievably fast.
In the past few years I’ve been thinking a lot about the idea of a Minimal Viable Product (MVP) and how crucial it is to know what “minimal” means for a given product. I came to think that the bar is often set too low — for some products “minimal” might mean that one part of it is better than anything else out there. Then, this week, I came across this tweet by Nikita Bier that describes a similar thought, more eloquently: “Your product doesn’t need a lot of features but the thing that you’re testing needs to feel like a German car door—reaching a level of quality that doesn’t distort signal.” Keep that thought in mind…
… and pair it with Patrick Collison on startups: “The main thing that I think companies screw up at the pre-product/market fit stage is speed of iteration.” Build a German car door that you can change quickly while talking to a lot of users. Nobody said it was easy, right?
Yesterday Simon Willison wrote: “I had big plans for December: for one thing, I was hoping to get to an actual RC of Datasette 1.0, in preparation for a full release in January. Instead, I’ve found myself distracted by a constant barrage of new LLM releases.” Hours later, OpenAI released benchmarks for their new o3 model.
The post, OpenAI o3 Breakthrough High Score on ARC-AGI-Pub, is fascinating: “o3's core mechanism appears to be natural language program search and execution within token space – at test time, the model searches over the space of possible Chains of Thought (CoTs) describing the steps required to solve the task”
Talking about AI progress: this post, asking whether AI progress is slowing down, is surprisingly level-headed and contains some very (I think) correct lines, such as this one: “Here are a couple of analogies that help illustrate why it might take a decade or more to build products that fully take advantage of even current AI capabilities. The technology behind the internet and the web mostly solidified in the mid-90s. But it took 1-2 more decades to realize the potential of web apps.” And this post also contained some interesting ideas on the same topic.
Beautiful writing shed: “It measures eight-by-10 feet. There is no telephone or running water. Its walls are lined with more than 1,000 books, and the only furniture is a desk, a comfortable chair, and a lamp.”
Recently I’ve been nerd-sniped by using LLMs to judge or grade the output of other LLMs. This post, Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge), contains a lot (“49 min read”) of different twists on the idea. Still not over how everything in the ML space always sounds so much fancier than it turns out to be. Take this: “Then, they manually annotated sentence-level factuality on the generated data.” Manually annotated sentence-level factuality! Sentence-level factuality! So they went through the output and marked sentences as true or false. Incredible.
Another masterpiece by Bartosz Ciechanowski: Moon. If you haven’t (lucky you!): click through Bartosz’s archive.
Here’s a good pairing. First, read this post, The end is nigh and here's why, that includes this line: “the point is that when you take a stroll through history, you don’t encounter many people saying things like ‘the forces of evil and the flaws of human nature have always been among us.’ Instead, you meet a lot of people people saying things like ‘the forces of evil and the flaws of human nature have JUST APPEARED what do we do now??’” Second, read this post, that was reposted into my timeline without any context at all, and reflect on our shared media experience of the last, say, ten years.
First impressions count for a lot and the name of this post — Principal Engineer Roles Framework — combined with the URL containing linkedin dot com— well, it doesn’t exactly scream joy, does it? But I thought it was very good and wish I had read it years ago.
“Do you know how much your computer can do in a second?” Computers are fast.
Robin Sloan, writing: “Sometime I think that, even amidst all these ruptures and renovations, the biggest divide in media exists simply between those who finish things, and those who don’t.”
One more thing on first impressions: this Cloudflare blog post is called “Resilient Internet connectivity in Europe mitigates impact from multiple cable cuts” when they should’ve gone with “dude, someone cut a fucking undersea cable and the internet did not go down, what the fuck, that’s amazing”’
Mind-blowing: “pushing single-GPU inference throughput to the edge without libraries” by “building an LLM inference engine using C++ and CUDA from scratch without libraries.” Performance, GPUs, low-level, a fancy word like inference, and code right there, front and center — hot.
From the same author: Notes From Figma II: Engineering Learnings. There’s gold in there: “If I had to pick a dividing line between the failures and successes, it’s that the successes always had a product goal in mind ahead-of-time.” And wisdom: “I spent my first couple years as a product engineer. I was not a very good product engineer.” And nerd-snipes: “It was a heroic effort, and succeeded I think because of engineers like Joey Liaw, who led the project and had also been part of the healthcare.gov rescue team.” — I now want to know all about the healthcare.gov rescue team. Joey Liaw, please talk to me.
I’m not a religious person but if I were and you’d ask me how I practice my religion I’d tell you that there are ten, fifteen videos floating around the internet that I’ve been watching regularly for the last ten, fifteen years and even though they’re corny, profane, trite, they’re also watering holes to me and I keep coming back to them to drink. Like this one: ”If you was hit by a truck and you was lying out there in that gutter dying, and you had time to sing one song. Huh? One song that people would remember before you're dirt. One song that would let God know how you felt about your time here on Earth. One song that would sum you up. You tellin' me that's the song you'd sing?”