The idea of new human-machine interaction methods is so fascinating, even more so when this has to co-exist with a whole legacy of systems.
Sure there will be ai-native langues but I can't image not having years of hybrid programming and tooling to embed prompts, different types of version control, and "containerization" to allow multiple versions of something to coexist easily in the development process to have developers pick parts of each, becoming kind of a feature-curator in that step.
Funny enough this might be the moment for microservices and smaller units of code to shine, as it's probably the easier way at the moment to "pin" something that we don't want to change but want to have everything else adapt (AI-assisted codegen). Will we leverage DDD to it's finest or adapt to something completely new?
so many open questions, so many interesting possible answers ❤️
My biggest concern is how to maintain any reasonable sized code base when most of it is written by AI.
Companies try to avoid a low "bus factor" of 1 ("who can maintain and fix bugs in this code base when John gets hit by a bus?") but what if the factor is not even 1 but 0?
Does this lead to "IKEA code" that is built once, but when you want to make changes you throw it all out and simply generate everything from scratch again?
He's talking a lot about the idea of regeneration — "let's just get into a habit of throwing this code away, and if it's that big, it's easy to throw away and rewrite" — and I often think of the talk.
I don't think this is much different than current people turnover, if anything we are likely to end up having more clear documentations and specifications as a positive side effect of AIs working well with them
> Does this lead to "IKEA code" that is built once, but when you want to make changes you throw it all out and simply generate everything from scratch again?
I think yes, the reason we don't do it now is because of the cost, keeping a codebase maintainable has proved to be more efficient long term than rewriting. But when we reach a point that we can trust that some specs will always be generated correctly, be it by testing, code-pinning, or other ways.
Once the result is deterministic enough it seems like a good place to be, I definitely see a possible shift to less application-level code being into more toolings and debugging development
But when you regenerate everything you also have to review and test everything from scratch.
Unless you have enough black box tests already in place that suffice to validate the new IKEA furniture, the costs are still quite for regenerating everything.
Or you take the risk that the new thing behaves differently than before or has new bugs that haven't been there before.
Well e.g. when you compile higher-level code (C++ or rust) to ASM or WASM, Or java to bytecode - you don't check actual ASM build and expect anyone to read/modify it directly. You just do "application level testing" for known interactions.
That's where the story about some way of _validating_ things comes exactly. Generation techniques may get quite reliable over time.
Regarding the first task, "Claude generated a 75-line Python script to turn the RSS feed of this very newsletter into a Markdown file." I could see this already being done through evaluating a prompt in your code instead of create the code for you.
Let's say the RSS of your feed changes, the code will break, too. If you just have a prompt that does the conversion for you, the logic will still work for, even if the RSS changes slightly.
There is a book that was just finished which explains the concept https://leanpub.com/php_and_llms - if you check out the sample chapter, you can see how the idea would run a prompt in your code to extract data of an email into JSON format.
This brings up other questions, like the one you proposed, to what happens if the AI suddenly behaves differently and cannot do the task it was asked for or does it differently?
For existing code bases, rather than trying to find a way to smoosh prompts into the mix, I think the easiest way would be to integrate AI tools into issue tracking systems.
Anyone can open an issue. It can be assigned to a person or an AI. When assigned to an AI, it writes some code to meet the requirements outlined in the issue, then opens a pull/merge request. Maybe even several alternatives are provided.
The team then have the opportunity to review the code written by the AI and validate the system works as needed. If it doesn’t, the AI can address the review comments and iterate on the code. If the change is good, the code can be merged!
I don’t think LLMs converting prompts into a programming language is all that useful. The output we see from LLMs is really just for us because we write in high level programming languages. LLMs aren’t constrained to just writing high level programming languages; they’re capable of outputting almost anything.
It’d be more useful if LLMs converted a prompt directly into assembly, or some form of byte code, or something similar to an AST.
However, if the prompt is merely meant to be executed by AI agents, then possibly no program is actually necessary at all. The prompt can be expect again and again by some AI agent without any intermediate transformation of the prompt… or more likely, the prompt is mapped into the required input format the AI agent requires.
Conversion of a prompt to output should in theory be deterministic. However, that determinism is lost when the model changes, or if randomisation of the output has been purposefully added. That wouldn’t be a useful feature if we want determinist output/action given some input. This is why I think LLMs and AI systems would benefit from integrating with Constraint Satisfaction Programming techniques to ensure all requirements are met, and A* search techniques to search a problem space and to find optimal solutions.
The idea of new human-machine interaction methods is so fascinating, even more so when this has to co-exist with a whole legacy of systems.
Sure there will be ai-native langues but I can't image not having years of hybrid programming and tooling to embed prompts, different types of version control, and "containerization" to allow multiple versions of something to coexist easily in the development process to have developers pick parts of each, becoming kind of a feature-curator in that step.
Funny enough this might be the moment for microservices and smaller units of code to shine, as it's probably the easier way at the moment to "pin" something that we don't want to change but want to have everything else adapt (AI-assisted codegen). Will we leverage DDD to it's finest or adapt to something completely new?
so many open questions, so many interesting possible answers ❤️
Fun times ahead for us 🚀
My biggest concern is how to maintain any reasonable sized code base when most of it is written by AI.
Companies try to avoid a low "bus factor" of 1 ("who can maintain and fix bugs in this code base when John gets hit by a bus?") but what if the factor is not even 1 but 0?
Does this lead to "IKEA code" that is built once, but when you want to make changes you throw it all out and simply generate everything from scratch again?
Have you ever seen the Chad Fowler talk about Immutable Infrastructure? https://www.youtube.com/watch?v=sAsRtZEGMMQ&t=1571s
He's talking a lot about the idea of regeneration — "let's just get into a habit of throwing this code away, and if it's that big, it's easy to throw away and rewrite" — and I often think of the talk.
Well, the obvious answer is, "the AI will maintain it". ;)
I don't think this is much different than current people turnover, if anything we are likely to end up having more clear documentations and specifications as a positive side effect of AIs working well with them
> Does this lead to "IKEA code" that is built once, but when you want to make changes you throw it all out and simply generate everything from scratch again?
I think yes, the reason we don't do it now is because of the cost, keeping a codebase maintainable has proved to be more efficient long term than rewriting. But when we reach a point that we can trust that some specs will always be generated correctly, be it by testing, code-pinning, or other ways.
Once the result is deterministic enough it seems like a good place to be, I definitely see a possible shift to less application-level code being into more toolings and debugging development
But when you regenerate everything you also have to review and test everything from scratch.
Unless you have enough black box tests already in place that suffice to validate the new IKEA furniture, the costs are still quite for regenerating everything.
Or you take the risk that the new thing behaves differently than before or has new bugs that haven't been there before.
Well e.g. when you compile higher-level code (C++ or rust) to ASM or WASM, Or java to bytecode - you don't check actual ASM build and expect anyone to read/modify it directly. You just do "application level testing" for known interactions.
That's where the story about some way of _validating_ things comes exactly. Generation techniques may get quite reliable over time.
Amazing questions. I never thought about most of them and I certainly have no answers.
Regarding the first task, "Claude generated a 75-line Python script to turn the RSS feed of this very newsletter into a Markdown file." I could see this already being done through evaluating a prompt in your code instead of create the code for you.
Let's say the RSS of your feed changes, the code will break, too. If you just have a prompt that does the conversion for you, the logic will still work for, even if the RSS changes slightly.
There is a book that was just finished which explains the concept https://leanpub.com/php_and_llms - if you check out the sample chapter, you can see how the idea would run a prompt in your code to extract data of an email into JSON format.
This brings up other questions, like the one you proposed, to what happens if the AI suddenly behaves differently and cannot do the task it was asked for or does it differently?
For existing code bases, rather than trying to find a way to smoosh prompts into the mix, I think the easiest way would be to integrate AI tools into issue tracking systems.
Anyone can open an issue. It can be assigned to a person or an AI. When assigned to an AI, it writes some code to meet the requirements outlined in the issue, then opens a pull/merge request. Maybe even several alternatives are provided.
The team then have the opportunity to review the code written by the AI and validate the system works as needed. If it doesn’t, the AI can address the review comments and iterate on the code. If the change is good, the code can be merged!
I don’t think LLMs converting prompts into a programming language is all that useful. The output we see from LLMs is really just for us because we write in high level programming languages. LLMs aren’t constrained to just writing high level programming languages; they’re capable of outputting almost anything.
It’d be more useful if LLMs converted a prompt directly into assembly, or some form of byte code, or something similar to an AST.
However, if the prompt is merely meant to be executed by AI agents, then possibly no program is actually necessary at all. The prompt can be expect again and again by some AI agent without any intermediate transformation of the prompt… or more likely, the prompt is mapped into the required input format the AI agent requires.
Conversion of a prompt to output should in theory be deterministic. However, that determinism is lost when the model changes, or if randomisation of the output has been purposefully added. That wouldn’t be a useful feature if we want determinist output/action given some input. This is why I think LLMs and AI systems would benefit from integrating with Constraint Satisfaction Programming techniques to ensure all requirements are met, and A* search techniques to search a problem space and to find optimal solutions.