I have not had great experiences with AI development tools. I’m not a Luddite, but if the tool takes me more time than just doing it by hand, and I get the same results, it’s not a tool I want to use. In some cases, the code had subtle bugs or logic little better than the naive implementation. Or in other cases, the code was not modular and well laid out. Or the autocomplete took more work to clean up than it saved in typing. And in some cases it would bring in dependencies, but the version numbers would date back from the early 2020s. They were out of date and didn’t match to current documentation. For the most part the code worked, but I knew if I accepted the code as is, I would open myself (or whoever followed me) to maintenance issues at some later date. One might argue that the AI would be much better then, and could do the yeoman’s work of making changes, but I wasn’t sold on that idea. (And I’m still skeptical).
I would turn on the AI features, use them for a while, but eventually turn them off. I found it helped me with libraries with which I wasn’t familiar. Give me a few working lines of code and a config to get me going, and I’ll fix it up. It would save me a search on the internet for an out-of-date Stack Overflow article, I guess. I used prompt files and tried to keep the requests narrow and focused. But sometimes, writing a hundred lines of markdown and a four sentence prompt to get a function, didn’t seem scalable.
Well, pigs are flying and I found something that appears to be working. First, it involves a specific product. At the time of writing Claude Opus/Sonnet 4.5 seem to be quite good. Second, I have a better way of incorporating prompt files in my workflow (more on that below). Third, language matters. I found Claude gave me the same problems listed above when working on a Jakarta EE code base. But Rust is great. (Rust also has the advantage of being strongly typed and eliminating some of the issues I’ve had when working with Python and LLMs). Fourth, I apply the advice about keeping changes small and focused. Fifth, I refactor the code to make it crisp and tight. (More on that below). Sixth, ask the LLM for a quick code review. (More on that below).
The first topic I’ll expand on is changing my relationship with prompt files. Instead of attempting to create prompt files for an existing code base, I started writing a prompt file for a brand new project. I had Claude generate a starter and then added my edits. I believe in design (at least enough to show you’ve thought about the problem). This actually dovetails with my need to think through the problem before coding. I still find writing prompt files for an existing code base tedious. But, if I have to sit down and think about my architecture and what pieces should do, the prompt file is as good a place as any.
The other thing I want to cover is refactoring what the LLM hath wrought. Claude generated serviceable code. It was on par with the examples provided for the Rust libraries I was using. (Which also happen to be very popular with plenty of on-line examples). Claude would have had access to rich training data and pulled in recent versions (although I had to help a little). But the code is not quite structured correctly. In this case I needed to move it out of the main function and into separate modules. But mostly it was cut and past and let the compiler tell me what’s broken. Next, in the refactor, is to minimize the publicly exposed elements. Now I have code that’s cohesive and loosely coupled. The LLM by itself does not do a great job at this. Taste is more a matter for meat minds than silicon minds at this stage.
The final thing I want to touch on is using the LLM to review the code after refactoring. This gives me another data point on the quality of my code and where I might have had blind spots during the refactoring. I work with lots of meat minds and we review each others’ code on every commit. There are some good reviews and there are some poor reviews. And reviewing code is harder, if you’re not familiar with the specific problem domain. But the machine can give a certain level of review prior to submitting the PR for team review.
So that’s what I’ve found works well so far: green-field projects, in highly popular frameworks and languages, performing design tasks in the prompt file, using LLM best practices, refactoring the code after it’s generated, and a code review before submitting the PR. Auto-complete is still off in the IDE. And I’ll see if this will scale well as code accumulates in the project. But for now, this seems to produce a product with which I am satisfied.
[A small addendum on the nature of invention and why I think this works].
Peoples’ ideas are not unique. As one of my relations by marriage pointed out years go, when he moved above 190th street in Manhattan, there seemed to be a sudden run to the tip of the island to find “affordable” housing. In a city of millions of people, even if very few people have the same idea at the same time, the demand quickly gobbles up the supply. Humans build ideas up mostly from the bits and pieces floating around in the ether. Even “revolutionary” ideas are often traced back to maybe a interesting recombination of existing ideas. Moreover, people have sometimes been doing that “revolutionary” thing before but didn’t connect it to some other need or didn’t bother repacking the idea. What’s more important about “ideas” is not the property of the idea but the execution of the idea.
There is still something about invention, even if it is largely derivative, that the LLMs don’t appear to posses. Nor do they have the ability to reason about problems from logical principles, as they are focused on the construction of language from statistical relationships. Some argue that enough statistics and you can approximate the logical reasoning as well as a human, but I haven’t seen solid examples to date. The LLM doesn’t understand what to create, but it does summarize all the relevant examples necessary to support the work of realizing the idea. But even then, there are gaps that we have to fill in with human intention. Does this revolutionize coding for me? No, I estimate it makes me some percentage more productive, but in the 5-15% range. And of the time and effort necessary to realize an idea, coding is maybe 1/3 or less of the effort. And I worry that we’ll never get to the point this technology will be available at a price point that makes the provider a profit while being affordable to most people. After all, there’s a limit to how much you would spend for a few percentage points of additional productivity.