My LLM codegen workflow

522 points by lolptdr 4 months ago

briga 4 months ago

Absolutely LLMs are great for greenfield projects. They can get you to a prototype for a new idea faster than any tool yet invented. Where they start to break down, I find, is when you ask it to make changes/refactors to existing code and mature projects. They usually lack context, so they doesn't hesitate to introduce lots of extra complexity, add frameworks you don't need, and in general just make the situation worse. Or if they get you to some solution it will have taken so long that you might as well have just done the heavy lifting yourself. LLMs are still no substitute for actually understanding your code.

wilkystyle 4 months ago

100% agree.
My experience to date across the major LLMs is that they are quick to leap to complex solutions, and I find that the code often is much harder to maintain than if I were to do it myself.
But complex code is only part of the problem. Another huge problem I see is the rapid accumulation of technical debt. LLMs will confidently generate massive amounts of code with abstractions and design patterns that may be a good fit in isolation, but are absolutely the wrong pattern for problem you're trying to solve or the system you're trying to build. You run into the "existing code pattern" problem that Sandi Metz talked about in her fantastic 2014 RailsConf talk, "All the little things" [0]:
> "We have a bargain to follow the pattern, and if the pattern is a good one then the code gets better. If the pattern is a bad one, then we exacerbate the problem."
Rapidly generating massive amounts of code with the wrong abstractions and design patterns is insidious because it feels like incredible productivity. You see it all the time in posts on e.g. Twitter or LinkedIn. People gushing about how quickly they are shipping products with minimal to zero other humans involved. But there is no shortcut to understanding or maintainability if you care about building sustainable software for the medium to long-term.
EDIT: Forgot to add link
[0] https://www.youtube.com/watch?v=8bZh5LMaSmE&t=8m11s
- williamcotton 4 months ago
  
  But why follow the wrong abstraction and why try to build something that you don't fundamentally understand?
  I've built some rather complex systems:
  Guish, a bi-directional CLI/GUI for constructing and executing Unix pipelines: https://github.com/williamcotton/guish
  WebDSL, fast C-based pipeline-driven DSL for building web apps with SQL, Lua and jq: https://github.com/williamcotton/webdsl
  Search Input Query, a search input query parser and React component: https://github.com/williamcotton/search-input-query
  - jdlshore 4 months ago
    
    I'm not trying to throw shade when I say this: those codebases are very small. (I'm assuming what I found in the src/ directories is their code.) Working in large codebases is a different kind of experience than working in a small codebase. It's no longer possible to keep the whole system in mind, to keep the dozens+ people working on it in sync, or keep up to date with all the changes being made. In that environment, consistency is a useful mechanism to keep things under control, although it can be overused.
jrvarela56 4 months ago

I would recommend everyone reading this to think of it as a skill issue. You can learn to use the LLM/agent to document your code base, test isolated components and refactor your spaghetti into modular chunks easily understandable by the agent.
The greenfield projects turn into a mess very quick because if you let it code without any guidance (wrt documentation, interactivity, testability, modularity) it generates crap until you can't modify it. The greenfield project turns into legacy as fast as the agent can spit out new code.
- ukuina 4 months ago
  
  > turns into legacy as fast as the agent can spit out new code.
  This is an important point. Unconstrained code generation lets you witness accelerated codebase aging in real-time.
HPsquared 4 months ago

LLM coding is quite well-suited to projects that apply the Unix philosophy. Highly modular systems that can be broken into small components that do one thing well.
- StableAlkyne 4 months ago
  
  I've found modularity to be helpful in general whether there's an LLM or not.
  Easier to test, lower cognitive overload, and it's faster to onboard someone when they only need to understand a small part at a time.
  I almost wonder if these LLMs can be used to assess the barrier for onboarding. If it gets confused and generating shitty suggestions, I wonder if that could that be a good informal smoke alarm for trouble areas the next junior will run into.
infecto 4 months ago

You are right but that's also a good indication that the codebase itself is too complex and at a certain size / scale, too much for a human to reason over even or even if something a human could do, is not efficient doing so.
You should not be structuring the code for a LLM alone but I have found that trying to be very modular has helped both my code as well as the ability to utilize LLM on top of it.
ziml77 4 months ago

I've found they will also introduce subtle changes. I just used o1 recently to pull code from a Python notebook and remove all the intermediate output. It basically got it right except for one string that was used to look up info from an external source. It just dropped 2 characters from the end of the string. That issue required a bit of time to track down because I thought it was an issue with the test environment!
Eventually I ended up looking at the notebook and the extracted code side-by-side and carefully checking every line. Despite being split across dozens of cells, it would have been faster if I had started out by just manually copying the code out of each meaningful cell and pasted it all together.
- cheema33 4 months ago
  
  I have seen LLMs do this. This is why I try to create a git repo, even if it is just local and then diff my changes. This allows me to catch LLM changing something that was completely unrelated to the task it was working on.
sejje 4 months ago

You can use LLMs and actually understand your code.
- y1n0 4 months ago
  
  I agree, but where I run into problems is my existing projects are large. In the last couple weeks I’ve had two cases where I really wanted AI help but I couldn’t fit my stuff in the 128k context window.
  These are big legacy projects where I didn’t write the code to begin with, so having an AI partner would have been really nice.

jrexilius 4 months ago

The first part of this, where you told it to ask YOU questions, rather than laboriously building prompts and context yourself was the magic ticket for me. And I doubt I would have stumbled on that sorta inverse logic on my own. Really great write up!

danphilibin 4 months ago

This is the key to a lot of my workflows as well. I'll usually tack some form of "ask me up to 5 questions to improve your understanding of what I'm trying to do here" onto the end of my initial messages. Over time I've noticed patterns in information I tend to leave out which has helped me improve my initial prompts, plus it often gets me thinking about aspects I hadn't considered yet.
- daxfohl 4 months ago
  
  Frankly getting used to doing this may help our communication with other engineers as well.
  - fragmede 4 months ago
    
    promo from L5->L7 confirmed.
    
    daxfohl 4 months ago
    
    The big question is, which level will be replaced by GPT first?
treetalker 4 months ago

Indeed!
The example prompts are useful. They not only reduced the activation energy required for me to start installing this habit in my personal workflows, but also inspired the notion that I can build a library of good prompts and easily implement them by turning them into TextExpander snippets.
P.S.: Extra credit for the Insane Clown Posse reference!
CamperBob2 4 months ago

That's one of the key wins with o1-pro's deep research feature. The first thing it tends to do after you send a new prompt is ask you several questions, and they tend to be good ones.
One idea I really like here is asking the model to generate a todo list.
nijaru 4 months ago

I add something like “ask me any clarifying questions” to my my initial prompts. For larger requests, it seems to start a dialogue of refinement before providing solutions.
theturtle32 4 months ago

Can confirm, this is an excellent tactic when working with LLMs!

bcoates 4 months ago

That lonely/downtime section at the end is a giant red flag for me.

It looks like the sort of nonproductive yak-shaving you do when you're stuck or avoiding an unpleasant task--coasting, fooling around incrementally with your LLM because your project's fucked and you psychologically need some sense of progress.

The opposite of this is burnout--one of the things they don't tell you about successful projects with good tools is they induce much more burnout than doomed projects. There's a sort of Amdahl's Law in effect, where all the tooling just gives you more time to focus on the actual fundamentals of the product/project/problem you’re trying to address, which is stressful and mentally taxing even when it works.

Fucking around with LLM coding tools, otoh, is very fun, and like constantly clean-rebuilding your whole (doomed) project, gives you both some downtime and a sense of forward momentum--look how much the computer is chugging!

The reality testing to see if the tool is really helping is to sit down with a concrete goal and a (near) hard deadline. Every time I've tried to use an LLM under these conditions it just fails catastrophically--I don't just get stuck, I realize how basically every implicit decision embedded in the LLM output has an unacceptably high likelihood of being wrong, and I have an amount of debug cycles ahead of me exceeding the time to throw it all away and do it without the LLM by, like, an order of magnitude.

I'm not an LLM-coding hater and I've been doing AI stuff that's worked for decades, but current offerings I've tried aren't even close to productive compared to searching for code that already exists on the web.

getnormality 4 months ago

It sounds like LLMs are the new futzing with emacs configuration.
- wilkystyle 4 months ago
  
  Old and busted: Futzing around with my Emacs configuration. New hotness: Having an LLM do it for me.
krupan 4 months ago

Seriously!! Coding with LLM's is marketed as a huge time saver, but every time I've tried, it hasn't been. I'm told I just need to put in the time (ironic, no?) to learn to use the LLM properly. Why don't I just use that time to learn to write code better myself?
- anon7000 4 months ago
  
  It’s not really ironic. You could spend a couple hours making yourself twice as good as using AI tools, or a couple hours making yourself like .1% of a better programmer, assuming you’re not banging your head against the wall anyways.
  It’s one of those things where a little upskilling can make a big impact. So many things in life need a bit of practice before they’re useful to you.
  For starters, you need to change the default prompt in your editor to make it do what you want. If it does something annoying or weird, put it in the prompt to not take that approach. For me, that was absurdly long, useless explanations. And now it’s short and sweet.
- mdrzn 4 months ago
  
  Seriously!! Cars are marketed as a huge time saver, but every time I’ve tried one, they haven’t been. I’m told I just need to put in the time (ironic, no?) to learn to drive properly. Why don’t I just use that time to train my legs and run faster instead?
  - krupan 4 months ago
    
    I think the difference here is it is not at all obvious to me that an LLM is a force multiplier on same the order as cars to legs.
    Cars are pretty easy to observe in action doing what they promise to do. Driving a car is a very straightforward, mechanical, repeatable, intuitive operation.
    Working with an LLM is not repeatable or straightforward.
    I'm short, your analogy is not helping me
khqc 4 months ago

I guess you're not a big fan of rubber duck debugging then? Whenever I get stuck I like to ask myself a bunch of questions and thought experiments to get a better understanding of the problem/project, and with LLMs I'm forced to spell out each one of these questions/experiments coherently, which ends up being great documentation later on. I think LLMs are great if you're actually interested in the fundamentals of your problem/project, otherwise it just turns into a sinkhole that sucks you in.
- bcoates 4 months ago
  
  Sure, but I just use OneNote, the company wiki, or like physical sticky notes for that. I'm not seeing the value of having the LLM give feedback aside from entertainment
triyambakam 4 months ago

It's more like waiting for the code to compile (or node_modules to install before npm improved)
flir 4 months ago

> constantly clean-rebuilding your whole (doomed) project, gives you both some downtime and a sense of forward momentum
ouch. You've thought about this, haven't you? Your ideas are intriguing to me, and I wish to subscribe to your newsletter.

rotcev 4 months ago

This is the first article I’ve come across that truly utilizes LLMs in a workflow the right way. I appreciate the time and effort the author put into breaking this down.

I believe most people who struggle to be productive with language models simply haven’t put in the necessary practice to communicate effectively with AI. The issue isn’t with the intelligence of the models—it’s that humans are still learning how to use this tool properly. It’s clear that the author has spent time mastering the art of communicating with LLMs. Many of the conclusions in this post feel obvious once you’ve developed an understanding of how these models "think" and how to work within their constraints.

I’m a huge fan of the workflow described here, and I’ll definitely be looking into AIder and repomix. I’ve had a lot of success using a similar approach with Cursor in Composer Agent mode, where Claude-3.5-sonnet acts as my "code implementer." I strategize with larger reasoning models (like o1-pro, o3-mini-high, etc.) and delegate execution to Claude, which excels at making inline code edits. While it’s not perfect, the time savings far outweigh the effort required to review an "AI Pull Request."

Maximizing efficiency in this kind of workflow requires a few key things:

- High typing speed – Minimizing time spent writing prompts means maximizing time generating useful code.

- A strong intuition for "what’s right" vs. "what’s wrong" – This will probably become less relevant as models improve, but for now, good judgment is crucial.

- Familiarity with each model’s strengths and weaknesses – This only comes with hands-on experience.

Right now, LLMs don’t work flawlessly out of the box for everyone, and I think that’s where a lot of the complaints come from—the "AI haterade" crowd expects perfection without adaptation.

For what it’s worth, I’ve built large-scale production applications using these techniques while writing minimal human code myself.

Most of my experience using these workflows has been in the web dev domain, where there's an abundance of training data. That said, I’ve also worked in lower-level programming and language design, so I can understand why some people might not find models up to par in every scenario, particularly in niche domains.

brokencode 4 months ago

> “I appreciate the time and effort the author put into breaking this down.”
Let’s be honest. The author was probably playing cookie clicker while this article was being written.

rd 4 months ago

Has anyone who evolved from a baseline of just using Cursor chat and freestyling to a proper workflow like this got any anecdata to share on noticeable improvements?

Does the time invested into the planning benefit you? Have you noticed less hallucinations? Have you saved time overall?

I’d be curious to hear because my current workflow is basically

1. Have idea

2. create-next-app + ShadCN + TailwindUI boilerplate

3. Cursor Composer on agent mode with Superwispr voice transcription

I’m gonna try the author’s workflow regardless, but would love to hear others opinions.

ghuntley 4 months ago

If you steer it and build a stdlib, you get better outcomes. See https://ghuntley.com/stdlib
- fragmede 4 months ago
  
  > I'm hesitant to give this advice away for free
  With all the layoffs in our sector, I wouldn't blame you if you didn't share it, so thank you for sharing.
  - margalabargala 4 months ago
    
    Yeah, don't they know how to hustle? I bet they're still asleep at 5am.
    Seriously, though, it's really sad that not trying to profit off a discussion of industry tooling is something someone has to "push through".
  - risyachka 4 months ago
    
    How does it help and not make it worse when it comes to layoffs?
    
    fragmede 4 months ago
    
    Because ghuntly doesn't have to outrun the bear, just outrun the rest of us.
    Meaning, if ghuntly can provide more value to an employer than a different employee, who doesn't know this trick, it's the other employee that's getting laid off, not ghuntly. In sharing this trick, it means that ghuntly now also has to outperform the other employee who also has this trick.
    
    risyachka 4 months ago
    
    This is a race to the bottom and doesn’t help anyone
- MarkMarine 4 months ago
  
  I’ve been following this, my workflow doesn’t us cursor (VS Code descendants just aren’t my preference) but I’ve built your advice into my home made system using emacs and gptel. I keep a style guide that is super detailed for each language and project, and now I’ve been building the stdlib you recommended. It’s great, thanks for writing this!
  - ghuntley 4 months ago
    
    no problem <3
- newtwilly 4 months ago
  
  Hi, I appreciate you sharing. I've been starting to use this advice with a different tool. Just FYI, this sentence kind of came out of nowhere and it wasn't clear what you meant: > The foundational LLM models right now are what I'd estimate to be at circa 45% accuracy and require frequent steering
  Do your rules count as frequent steering and lead to increased 'accuracy', or is that the 'accuracy' you're seeing with your current workflow, rules and all?
- e12e 4 months ago
  
  Looks like 70% of those rules would benefit from being shared, just like dot files and editor configs.
mike_hearn 4 months ago

Aider + AI generated maps and user guides for internal modules has worked well for me. Just today I did my own version of a script that uses Gemini 2 Flash (1M context window) to generate maps of each module in my codebase, i.e. a short one or two sentence description of what's in every file. Aider's repo maps don't work well for me, so I disable them, and I think this will work better.
I also have a scratchpad file that I tell the model it can update to reflect anything new it learns, so that gives it a crude form of memory as it works on the codebase. This does help it use internal utility APIs.
- manmal 4 months ago
  
  LLMs forcing us to improve our documentation habits. Seriously though, many languages allow API doc generation out of comments. Maybe these docs can just be flattened into a file.
  - mike_hearn 4 months ago
    
    Yes sort of. This particular codebase is a mix of Java and Kotlin, and all my internal code is documented with proper Javadocs/KDocs already since years, just for myself and other people I work with. That's partly why Gemini can make such accurate maps.
    The problem isn't a lack of docs but rather birds-eye context: even with models that allow huge context windows and are fast, you can drown a model in irrelevant stuff and it's expensive. I'm still with Claude 3.5 for coding and its window is large but not unlimited. You really don't want to add a bunch of source files _and_ the complete API docs for tens of thousands of classes into every prompt, not unless you like waiting whilst money burns and getting problems due to the model getting distracted.
    It's also just wasteful, docs contain a lot of redundancy and stuff the model can guess. If you ask models to make notes about only the surprising stuff, it's a form of compression that lets you make smaller prompts.
    Aider provides a quick fix because it's easy to control what files are in the context. But to 'level up' I need to let the AI find and add files itself. Aider can do this: it gives the model tools for requesting files to be added to the chat. And in theory, Aider computes a PageRank over symbols and symbolic references to find the most important stuff in the repository and computes a map that's prepended to the prompt so the model knows what to ask for. In practice for reasons I don't understand, Aider's repo maps in this project are full of random useless stuff. Maybe it works better for Python.
    Finding the right way to digest codebases is still an open problem. I haven't tried RAG, for instance. If things are well abstracted it in theory shouldn't be needed.
  - throwup238 4 months ago
    
    Indexing automatically generated API docs using Cursor seems to work very well. I also index any guides/mdbooks libraries have available, depending on whether I’m trying to implement something new or modifying existing code.
- orsenthil 4 months ago
  
  > Aider + AI generated maps and user guides
  How do do that? Especially the AI generated map?
  - mike_hearn 4 months ago
    
    I have a custom script. It selects all the source files, strips any license headers and concatenates them like this:
    <source_file name="foo/bar/Baz.java"> ... </source_file>
    It then chunks them to fit within model context window limits, sends it to the LLM with a system prompt that asks it to summarize each file in a compact way, and writes the result back out to the tree.
    The ugly XML tag is to avoid conflicts. Some other scripts try to make a Markdown document of the tree which is silly because your tree is quite likely to contain Markdown already, and so it's confusing for the model to see ``` that doesn't really terminate the block. Using a marker pattern that's unlikely to occur in your code fixes that.
dimitri-vs 4 months ago

Yes, and then I keep going back to the basics:
- small .cursorrules file explaining what I am trying to build and why at a very high level and my tech stack
- a DEVELOPMENT.md file which is just a to-do/issue list for me that I tell cursor to update before every commit
- a temp/ directory where I dump contextual md and txt files (chat logs discussing feature, more detailed issue specs, etc.)
- a separate snippet management app that has my commonly used request snippets (write commit message, ask me clarify questions, update README, summarize chat for new session, etc.)
Otherwise it's pretty much what your workflow is.
cynicalpeace 4 months ago

I'm wondering the same thing.
Most of these workflows are just context management workflows and in Cursor it's so simple to manage the context.
For large files I just highlight the code and cmd+L. For short files, I just add them all by using /+downarrow
I constantly feed context like this and then usually come to a good solution for both legacy and greenfield features/products.
If I don't come to a good solution it's almost always because I didn't think through my prompt well enough and/or I didn't provide the correct context.

rollinDyno 4 months ago

Something I quickly learned while retooling this past week is that it’s preferable not to add opinionated frameworks to the project as they increase the size of the context the model should be aware of. This context will also not likely be available in the training data.

For example, rather than using Plasmo for its browser extension boilerplate and packaging utilities, I’ve chosen to ask the LLM to setup all of that for me as it won’t have any blindspots when tasked with debugging.

sampton 4 months ago

The end of artisan frameworks - probably for the better.
- balls187 4 months ago
  
  It's likely the end of a lot of abstractions that made programming easier.
  At some point, specialized code-gen transformer models should get really good at just spitting out the lowest level code required to perform the job.
  - yoz 4 months ago
    
    Disagree. Some abstractions are still vital, and it's for the same reasons as always: communicate purpose and complexity concisely rather than hiding it.
    The best code is that which explains itself most efficiently and readably to Whoever Reads It Next. That's even more important with LLMs than with humans, because the LLMs probably have far less context than the humans do.
    Developers often fall back on standard abstraction patterns that don't have good semantic fit with the real intent. Right now, LLMs are mostly copying those bad habits. But there's so much potential here for future AI to be great at creating and using the right abstractions as part of software that explains itself.
    
    balls187 4 months ago
    
    I’ve thought about your comment, and I think we’re both right.
    Fundamentally, computers are a series of high and low voltages, and everything above that is a combination of abstraction and interpretations.
    Fundamentally there will always be some level of this-it’s not like an A(G)I will interface directly using electrical signals (though in some distant future it could).
    However from what I believe, this current phase of AI (LLMs + Generators + Tools) are showing that computers do not need to solve problems the same way humans need to because computers face different constraints.
    So abstractions that programmers utilize to manage complexity won’t be necessary (in some future time).
  - hy4000days 4 months ago
    
    This.
    Future programming language designers are then answering questions like:
    "How low-level can this language be while considering generally available models and hardware available can only generate so many tokens per second?",
    "Do we have the language models generate binary code directly, or is it still more efficient time-wise to generate higher level code and use a compiler?"
    "Do we ship this language with both a compiler and language model?"
    "Do we forsake code readability to improve model efficiency?"
    
    superb_dev 4 months ago
    
    I’m excited for my new woodworking career if this ever becomes a reality. LLMs are truly sucking the art out of everything.
    
    Miraste 4 months ago
    
    I think the dichotomy between how developers have reacted to LLMs (mass adoption) and how authors, illustrators, etc. have reacted (derision, avoidance) demonstrates that coding was never an art to begin with. Code is not an end in itself, it's an obstacle in the way of an end.
    There are people who enjoy code for the sake of it, but they're a very, very small group.
  - bee_rider 4 months ago
    
    Surely no respectable professional would just ship code they don’t understand, right? So the LLM should probably spit out code in reasonably well known languages using reasonably well known libraries and other abstractions…
    
    balls187 4 months ago
    
    Right now, and perhaps the immediate future sure. But eventually I do think software that writes software will do it better than current programmers can.
    Do you ever think twice about the bayer filter applied to your cmos image sensor?
fastball 4 months ago

It's not just frameworks – I noticed this recently when starting a new project and utilizing EdgeDB. They have their own Typescript query builder, and [insert LLM] cannot write correct constructions with that query builder to save its life.

bambax 4 months ago

This is all fine for a solo dev, but how does this work with a team / squad, working on the same code base?

Having 7 different instances of an LLM analyzing the same code base and making suggestions would not just be economically wasteful, it would also be unpractical or even dangerous?

Outside of RAG, which is a different thing, are there products that somehow "centralize" the context for a team, where all questions refer to the same codebase?

sambo546 4 months ago

I've started substituting "human" for "LLM" when I read posts like these. Is having 7 different humans analyzing the same code base any less wasteful?
- bambax 4 months ago
  
  They are not analyzing the same code base, they are all contributing to the same code base, each in their own domain. It would seem relevant that any advice an LLM gives to one of them is kept consistent -- in real time -- with any other advice to any other dev, instead of having to wait for each commit or push.
staindk 4 months ago

I've only recently switched to Cursor so am not clued up about everything, but they mention that the embedded indexing they do on your code is shared with others (others who have access to that repository? Unsure).
It did seem to take a while to index, even though my colleagues had been using Cursor for a while, so I'm likely misunderstanding something.
adr1an 4 months ago

Cody/ sourcegraph provide workspaces for teams/ enterprise. Probably for this and other reasons.

tarkin2 4 months ago

Most new programmers forget the specification and execution plan part of programming.

I ended up finishing my side projects when I kept these in mind, rather than focusing on elegant code for elegant code's sake.

It seems the key to using LLMs successfully is to make them create a specification and execution plan, through making them ask /you/ questions.

If this skill--specification and execution planning--is passed onto LLMs, along with coding, then are we essentially souped-up tester-analysts?

fullstackwife 4 months ago

Looks similar to my experience, except this part:

> if it doesn’t work, Q&A with aider to fix

I fix errors myself, because LLMs are capable of producing large chunks of really stupid/wrong code, which needs to be reverted, and thats why it makes sense to see the code at least once.

Also I used to find myself in a situation when I tried to use LLM for the sake of using LLM to write code (waste of time)

codeisawesome 4 months ago

Would be great if there were more details on the costs of doing this work - especially when loading lots of tokens of context via repo mix and then generating code with context (context-loaded inference API calls are more expensive, correct?). A dedicated post discussing this and related considerations would be even better. Are there cost estimations in the tools like aider (vs just refreshing the LLM platform’s billing dashboard?)

keyle 4 months ago

I have been using LLM for a long time but these prompts ideas are fantastic; they really opened up a world for me.

Because a lot of the benefits of LLM is bringing ideas or questions I am not thinking of right now, and this really does that. Typically this would happen as I dig through a topic, not before hand. So that's a net benefit.

I also tried it and it worked a charm, the LLM did respect context and the step by step approach, poking holes in my ideas. Amazing work.

I still like writing codes and solving puzzles in my mind so I won't be doing the "execution" part. From there on, I mostly use LLM as auto complete and I'm stuck here or obscure bug solving. Otherwise, I don't get any satisfaction from programming, having learnt nothing.

ggulati 4 months ago

Nice, I coincidentally wrote a blog post today exploring workflows as well: https://ggulati.wordpress.com/2025/02/17/cursorai-for-fronte...

Your workflow is much more polished, will definitely try it out for my next project

fragmede 4 months ago

> paste in prompt into claude copy and paste code from claude.ai into IDE
is more polished? What's your workflow, banging rocks together?
- ggulati 4 months ago
  
  More or less, I tried out Cursor for the first time a week ago. So very much in the newbie stage and looking to learn
- shoemakersteve 4 months ago
  
  This made me laugh audibly. Thank you.
harper 4 months ago

let me know how it works!
- hnuser123456 4 months ago
  
  Looks like your blog crashed, I've been wanting to read it
- hnuser123456 4 months ago
  
  Thank you for fixing it

Isamu 4 months ago

I’m curious, is adding “do not hallucinate” to prompts effective in preventing hallucinations? The author does this.

watt 4 months ago

It will work - you can see it well with a Chain of Thought (CoT) model: it will keep asking itself: "am I hallucinating? let's double check" and then will self-reject thoughts if it can't find a proper grounding. In fact, this is the best part of CoT model, that you can see where it goes off rails and can add a message to fix it in the prompt.
For example, there is this common challenge, "count how many r letters in strawberry", and you can see the issue is not counting, but that model does not know if "rr" should be treated as single "r" because it is not sure if you are counting r "letters" or r "sounds" and when you sound out the word, there is a single "r" sound where it is spelled with double "r". so if you tell the model, double "r" stands for 2 letters, it will get it right.
simonw 4 months ago

Apple were using that in their Apple Intelligence system prompts last year, I don't know if they still have that in there. https://simonwillison.net/2024/Aug/6/apple-intelligence-prom...
I have no idea if it works or not!
- harper 4 months ago
  
  I added it because of the apple prompts! I figured it is worth a try. The results are good, but i did not test it extensively
becquerel 4 months ago

I don't know about this specific technique, but I have found it useful to add a line like 'it's OK if you don't know or this isn't possible' at the end of queries. Otherwise LLMs have a tendency to tilt at whatever windmill you give them. Managing tone and expectations with them is a subtle but important art.
krainboltgreene 4 months ago

It seems absurd, but I suppose it’s the same as misspelling with similar enough trigrams as to get the best autocorrect results.

cipehr 4 months ago

Am I the only one that doesn’t see the hype with Claude? I recently tried it, hit the usage limit, read around found tons of blogs and posts from devs saying Claude is the best code assistant LLM… so I purchased Claude pro… and I hate it. I have been asking it surface level questions about Apache spark (configuring the number of tasks retries, errors, error handling, etc.) and it hallucinated so much, so frequently. It reminds me of like ChatGPT 3…

What am I doing wrong or what am I missing? My experience has been so underwhelming I just don’t understand the hype for why people use Claude over something else.

Sorry I know there are many models out there, and Claude is probably better than 99% of them. Can someone help me understand the value of it over o1/o3? I honestly feel like I like 4o better.

/frustration-rant

btucker 4 months ago

It seems like you might be trying to use it like a search engine, which is a common mistake people make when first trying LLMs. LLMs are not like Google.
The key is to give it context so it can help you. For example, if you want it to help you with Spark configuration, give it the Spark docs. If you want it to help you write code, give it your codebase.
Tools like cursor and the like make this process very easy. You can also set up a local MCP server so the LLM can get the context and tools it needs on its own.
- cipehr 4 months ago
  
  Thank you very much for the ideas here, i will try the approach of giving it context. I havent got into cursor, since i use helix and intellij… i need to look into the MCP server thing
  Thanks again!
  - foretop_yardarm 4 months ago
    
    Giving examples of inputs and outputs can also help
    
    cipehr 4 months ago
    
    thank you, I will try this too. I feel like I didn't have to do this much work with other models like o1/o3/4o... but if it provide the return value I'm hearing from the hype around Claude I am willing to try.
dimitri-vs 4 months ago

Claude is exceptionally good at taking "here is two paragraphs of me rambling off about a feature I want in broken voice to text" and actually understanding what you want. It has really good prompt adherence but at the same time knows when to read between the lines.
- cipehr 4 months ago
  
  Awesome, thanks for the reassurance. I’ll stick with it and keep trying to improve my usage

hooverd 4 months ago

I think LLM codegen still requires a mental model of the problem domain. I wonder how many upcoming devs will simply never develop one. Calculators are tools for engineers /and/ way too many people can't even do basic receipt math.

jack_pp 4 months ago

Calculations are for calculators. I was good at math in school but now I struggle / take so much time doing receipt math and for what? What's the purpose of the time you spend doing it, when do you need to have your brain trained for this specific task?
- hooverd 4 months ago
  
  For me, being able to notice when you mess up your own calculations. It doesn't help that we teach arithmetic operations ass-backwards (smallest to largest instead of largest to smallest).

junto 4 months ago

Something I’ve started to do recently is mob programming with LLM’s.

I act as the director, creativity and ideas person. I have one LLM that implements, and a second LLM that critiques and suggests improvements and alternatives.

krupan 4 months ago

I hope this is a joke, but I'm guessing it isn't, lol!
- triyambakam 4 months ago
  
  It's a good idea. Get diverse model output.

jacooper 4 months ago

I find making the LLMs think and plan the project a bit worrying, I understand this helps with procrastination but when these systems eventually get better and more integrated, the most likely thing to happen to software devs is them moving away from purely coding to more of a solution architect role (aka Planning stuff), not taking into account the negative impact of giving up critical thinking to LLMs.

https://news.ycombinator.com/item?id=43057907

Other than that a great article! Very insightful.

harper 4 months ago

I actually think it is going to be way worse than you are suggesting. I think that the LLM codegen is going to replace most if not all of software eng workflow and teams that we see today.
Software is going to be prompt wrangling with some acceptance testing. Then just prompt wrangling.
I don't have a lot of hope for the software profession to survive.

avandekleut 4 months ago

This is pretty much my flow that I landed on as well. Dump existing relevant files into context, explain what we are trying to achieve, and ask it to analyze various approaches, considerations, and ask clarifying questions. Once we both align on direction, I ask for a plan of all files to be created/modified in dependency order with descriptions of the required changes. Once we align on the plan I say lets proceed one file at a time, that way I can ensure each file builds on the previous one and I can adjust as needed.

randomcatuser 4 months ago

> I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience. There is so much opportunity to fix this and make it amazing.

This i think is the grand vision -- what could it look like?

in my mind programming should look like a map -- you can go anywhere, and there'll be things happening. and multiple people.

If anyone wants to work on this (or have comments, hit me up!)

blah2244 4 months ago

This is a great article -- I really appreciate the author giving specific examples. I have never heard of mise (https://mise.jdx.dev/) before either, and the integration with the saved prompts is a nifty idea -- excited to try it out!

abrookewood 4 months ago

Mise is great - it's an alternative to ASDF and remains call compatible from memory, but is much faster.

krupan 4 months ago

If I have to go to this much effort, what is AI buying us here? Why don't we just put the effort in to learn to write code ourselves? Instead of troubleshooting AI problems and coming up with clever workarounds for those problems, troubleshoot your code, solve those problems directly!

harper 4 months ago

the speed is way faster. I am a good programmer with more than 25 years of professional experience. The AI is a better programmer in every way. Why do it myself when i can outsource it and play cookie clicker?
The real thing that sold me is the entire workflow takes 10 minutes to plan, and then 10-15 minutes to execute (let's say a python script of medium complexity). after a solid ~20-30 min I am largely done. no debugging necessary.
it would have taken me an hour or two to do the same script.
this means i can spend a lot more time with the fam, hacking on more things, and messing about.
triyambakam 4 months ago

Speed. The abstraction layer has moved up. You probably aren't writing machine code anymore.

runoisenze 4 months ago

Great write up! Roughly how many Claude tokens are you using per month with this workflow? What’s your monthly API costs?

Also what do you mean by “I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience.” ?

harper 4 months ago

most of this is aider / codegen:
Total tokens in: 26,729,994 Total tokens out: 1,553,284
Last month anthropic bill was $89.30
--
I want to program with a team, together. not a group of people individually coding with an agent, and then managing merges. I have been playing a lot with merging team context - but haven't gotten too far yet.
- fragmede 4 months ago
  
  Have you tried using OpenHands so you can just give it the todo.md until it gets stuck/is finished?
  - harper 4 months ago
    
    when it first was released i played with it - but haven't since. i should try again.

bionhoward 4 months ago

I don’t mind LLMs, but what irks me is the customer noncompete, you have these systems that can do almost anything and the legal terms explicitly say you’re not allowed to use the thing for anything that competes with the thing. But if the things can do almost anything then you really can’t use it for anything. Making a game with Grok? No, that competes with the xAI game studio. Making an agents framework with ChatGPT? No, that competes with Swarm. Making legal AI with Claude? No, that competes with Claude. Seems like the only American companies making AI we can actually use for work are HuggingFace and Meta.

thornewolf 4 months ago

Ignore the noncompetes. Never get sued. If you do, everyone else is on your side.
- bionhoward 4 months ago
  
  Meh, why pay to teach someone else’s bot? I’m sticking with open source
biddit 4 months ago

Form a Nonprofit X and a Corp Y:
Noprofit X publishes outputs from competing AI, which is not copyrightable.
Corp Y injests content published by Nonprofit X.

dfltr 4 months ago

> Legacy modern code

As opposed to Vintage Pioneer code?

harper 4 months ago

in my experience, there is quite a spectrum of legacy code.
Legacy modern code would be anything from the last 5-10 years. Vintage Pioneer code (which i have both initialized, and maintained) is more than 20 years old.
I am trying not to be a vintage pioneer these days.

mrklol 4 months ago

About the first step, you probably also need some kind of context that the LLM has most information to iterate with you about the new feature idea.

so either you put the whole codebase into the context (will mostly lead to problems as tokens are limited) or you have some kind of summary with your current features etc.

Or you do some kind of "black box" iterations, which I feel won’t be that useful for new features, as the model should know about current features etc?

What’s the way here?

maelito 4 months ago

Given a 3 648 318 tokens repository (number from Repomix), I'm not sure what would be the cost of using a leading LLM to analyse it and ask improvements.

Isn't the input token number way more limited than that ?

This is part is unclear to me in the "non-Greenfield" part of the article.

Iterating with aider on very limited scopes is easy, I've used it often. But what about understanding a whole repository and act on it ? Following imports to understand a Typescript codebase as a whole ?

kridsdale3 4 months ago

Well, do you as a human have the whole codebase loaded in to your memory with the ability to mentally reason with it? No, you work on a small scope at a time.
- layer8 4 months ago
  
  You may work in a limited scope at a time, but you are aware how it fits into the larger scope, and more often than not you actually have to connect things across different scopes.
  - kridsdale3 4 months ago
    
    I work on projects with hundreds of thousands of files and tens of millions of lines. I am honestly clueless how most of it fits together and working here as a human feels often not too different (just a few steps of scale higher) than LLM-based coding like Cursor.
  - jack_pp 4 months ago
    
    Well you can use an LLM similarly. Have it write docs for all your files including a summary for each function / class, ideally in order of dependency. Then use only the summaries in context. This should significantly lower your token count.
    Haven't tried it personally but it should work
    
    layer8 4 months ago
    
    In my experience, you often remember and/or discover relationships to other parts of the system during the current development task, by delving into the implementation. These relationships also aren't necessarily explicit in the code you're looking at. For example, they can relate to domain-level invariants or to shared resources, or simply shared patterns and conventions. In general you can't prepare everything that would be relevant up front.
  - lanthissa 4 months ago
    
    you do the same thing with the llm, you have it describe the api of modules not related to your code and that in place of those segments of the code.
- bcoates 4 months ago
  
  On larger codebases, I use tools heavily instead of just winging it. If the LLM can't either orchestrate those tools or apply its own whole-program analysis it quite impossible for it to do anything useful.

thedeep_mind 4 months ago

This is effing great...thanks for sharing your experience.

I was just wondering how to give my edits back to in-browser tools like Claude or ChatGPT, but the idea of repo mix is great, will try!

Although I have been flying bit with copilot in vscode, so right now I have essentially two AI, one for larger changes (in the browser), and then minor code fixes (in vscode).

ChrisRob 4 months ago

In our company we are only allowed to use GitHub Copilot with GPT or Claude, but not Claude directly. I'm quite struggling with getting good results from it, so I'll try to adapt your workflow into that setup. To the community: Do you have some additional guidance for that setup?

debian3 4 months ago

Use vs code insider if you can. They double the context size and it really makes a difference. You get 128k input token.

psadri 4 months ago

One more tool he could make is one to wrap that entire process so there is less copy/pasting needed.

sejje 4 months ago

Aider, a tool he uses, can do that automatically. He could just use that feature.
- harper 4 months ago
  
  Yea - i have had OK luck with the architect mode. but was using that as part of this (using deepseek for reasoning) and it good. but oh so slow.
  Ultimately, i would love to just use one tool
  - sejje 4 months ago
    
    I don't mean the architect mode, I mean the copy-paste mode.
    https://aider.chat/docs/usage/copypaste.html

pyreal 4 months ago

I'm curious to see his mise tasks. He lists a few of them near the end but I'm not sure what his LLM CLI is. Is that an actual tool or is he using it as a placeholder for "insert your LLM CLI tool here"?

nickrj 4 months ago

https://github.com/simonw/llm
It is linked to in the article - a brilliant utility from Simon.

fallinditch 4 months ago

Great post and discussion.

Also, don't forget that your favorite AI tools can be of great help with the factors that cause us to make software: research, subject expertise, marketing, business planning, etc.

insin 4 months ago

I liked the bit where he asked it not to hallucinate

jdenning 4 months ago

Question to folks with good workflows: Are you using tools like DSPy to generate prompts? Any other tools/tips about managing prompts?

harper 4 months ago

i really wanted to use DSPy to generate prompts, but it wasn't quite as compatible with my workflow as i wanted. I love the idea tho - code instead of strings.
i will dig in again. It is an exciting idea.

mark_mcnally_je 4 months ago

I'm a bit confused here, what promt do you use to start Aider and how do you just let Aider run wild so you can play cookie clicker?

harper 4 months ago

the prompts are generated from the planning steps. If you were to follow the prompts in the planning phase, you would get output that is clearly the "starting prompt"
that is the first thing you send to aider.
also - there was a joke below, but you can do --yes-always and it will not ask for confirmation. I find it does a pretty good job.
fragmede 4 months ago
Am I overthinking
```
    yes | aider
```
- mark_mcnally_je 4 months ago
  
  Hahahaa well that might work but I wish you could just say `aider --go-hog-wild`
  - fragmede 4 months ago
    
    fwiw it doesn't/that was a joke. In some cases the LLM will suggest running a command that doesn't terminate (eg npm run dev to run a webserver), so it'd get stuck running that command just waiting for user input.
    
    mark_mcnally_je 4 months ago
    
    :( big sad
    
    e12e 4 months ago
    
    Surely you can just ask the ai to make sure the code will terminate before it runs it? /s
    
    fragmede 4 months ago
    
    You jest, but given something that's in it's training data, ChatGPT's able to say the code won't terminate (eg a code to give all the Fibonacci numbers), so there are subsets of the halting problem it can solve for.
matsemann 4 months ago

Yeah, everyone is saying this article is great, but it's akin to "then draw rest of the owl". It's very detailed in planning, and then for execution just "paste prompt into claude". What prompt?

bill_lau19 4 months ago

I learn this things from this blog: 1. Use multi turn with LLM tools to finish a job. 2. Work step by step.

sprobertson 4 months ago

Strange capitalization of atm as ATM in the HN title, but great tips in there

Philpax 4 months ago

HN will implicitly modify the title, including uppercasing acronyms. Very possible this was one of those changes.

snowwrestler 4 months ago

Spelling nit:

“Over my skis” ~ in over my head.

“Over my skies” ~ very far overhead. In orbit maybe?

matsemann 4 months ago

> “Over my skis” ~ in over my head.
Is that correct? Never heard the expression before, but as a skier if you're over your skis you're in control of them, while if you're backseated the skis will control you.
- jdlshore 4 months ago
  
  It seems to be currently popular corporate-speak for "overextended." I've heard it a bunch lately. Never really thought about whether it was accurate, though!
pyreal 4 months ago

Thanks for that clarification! I was wondering what skies had to do with skiing.
harper 4 months ago

fixed! thanks

zackify 4 months ago

Cline over everything for me

oars 4 months ago

Good for greenfield projects.

timmyers 4 months ago

[dead]

dkkergoog 4 months ago

[dead]

tyiz 4 months ago

[dead]