Ewan Valentine | My thoughts on AI editors

I was initially fairly excited by the idea of AI editors, such as Cursor, etc. I’d used Copilot for a while when it first came out. I found the auto-completions relatively useful, impressive at times, in fact. So the concept of agentic editors sounded cool. I tried Cursor in its infancy, prior to the agentic integration. It was scarcely more useful than the Copilot plugin in my existing editor. So I wasn’t blown away. However, when the agentic features started to be rolled out, I could see the potential.

I used Cursor for about a month, in a variety of different projects. For building simple websites, and setting up small projects from scratch, I found it to be incredible in terms of scaffolding a project and getting something up and running quickly. However, trying to use it for more complex projects, and existing projects. The cracks began to emerge.

Models

I found using the default, cheaper model that comes with Cursor to vary in quality a lot. For smaller projects, it would perform fairly, most of the time. But, often, it would take paths that overcomplicated the original issue, without actually fixing the real issue. Despite describing the issue in great detail, providing lots of context, etc. It would just keep adding more and more layers around the issue, without actually fixing it.

I started reading about how impressive models like Claude Sonnet 4 were, so I switched models. This significantly improved the previous flakiness, fairly drastically. However, it would still often overcomplicate problems, making issues more complicated than they needed to be. It would also dive down rabbit holes at an alarming rate.

You are absolutely right!

I caught Sonnet heading down a path that made no sense to me at all. So I intervened and said “hey, what about X”. I realised it would often say “you are absolutely right!”, change course, and do something entirely different. This irked me. There was no discussion, no attempt to further understand the problem or where it went wrong the first time. Just a complete acceptance and change of course.

This eroded a lot of trust I had in what it was actually doing, so I realised that I had to do a lot of hand holding and reviewing every single line it suggested, to ensure it had understood the task at hand. At that point, I stopped worrying about being replaced by AI, it was merely saving me typing out all of the code manually. But my experience, knowledge, and understanding of a code base was still absolutely vital.

Start again…

The speed of some of the premium models were slightly tricky to keep up with. Especially the Claude CLI, which presents all of the changes in a terminal window, in the form of a Git diff. So you’re essentially attempting to read hundreds of diffs in a short space of time, often growing increasingly nervous about what it’s up to. Eventually, it boldly declares it’s “Done! ✅🎉”. And you’re not entirely sure what it’s done through the barrage of commands being run, amidst a sea of diffs being presented to you in short order.

Often at this point, the only way to verify it’s performed its task as expected is to open up the app, or run the broken command again whilst holding your breath. It’s usually at this point, in my experience, that you realise it’s made things ten times worse. New, fancy, emoji laden logs appear everywhere, and you find it’s added hundreds of changes, just to face the exact same problem as before, or, in many cases I found, made the original problem even worse.

$ git stash - sigh.

Usage Limits

Just at the point where I’ve had to stash all the broken, overly complex changes and start again, I get hit with “You’ve reached your individual usage limit”. Great, so I’ve just spanked millions of tokens, whilst probably already facing some kind of eye watering bill at the end of the month, on an hour long rabbit hole session, where it completely failed. I’d estimate roughly 50% of my usage is intervening and trying to guide it out of a rabbit hole.

Often, I’m still in a rabbit hole when I’m unceremoniously cut-off. Usually, at that point, I try to switch back to the default cheaper models, who seem to have no clue what’s happening, and somehow manages to make the situation even worse. And I end up back to the drawing board, fixing it myself. Having wasted an hour arguing with a bot (just like being on Twitter).

When I run out of premium credits on a personal project, I will just put my laptop down and do something else. On work projects, I would revert to my own pink squidgy AI model. Rather than use the default models, that’s how little value I get out of those most of the time. So the availability and affordability of the current premium models feels like the acceptable baseline, in terms of me wanting to use these tools full time.

Dangerous Rabbit Holes

This is perhaps the most concerning one, sometimes the premium models get so carried away, and so confident, whilst making changes so wrong that, had I not have noticed, would have caused serious problems. I’ve heard of horror stories where these models have completely ruined peoples projects, permanently deleted files, and wiped databases.

Reminder: if you’re using an AI editor for anything other than a throwaway project, don’t give it complete access to your filesystem, don’t allow it to run destructive actions where you don’t fully understand what the impact is. Don’t use it in projects that aren’t version controlled, and dear god man, don’t give it free reign over your production database.

Will I stop using it?

Probably not, is the short answer. I still see the value in it, and when it works, it saves me a lot of time. When these tools get to the point where more time is saved when things go right, than wasted when things go wrong, I’ll be pretty happy with it. But, for that to happen, the current batch of premium models need to become much cheaper. I don’t begrudge paying £20 a month or whatever it is, for time genuinely saved, hell, I’d pay a little more than that. But when you’re having to top up with that, multiple times per month, just to not get a heavily degraded experience (and even that costly experience isn’t great, half of the time…), that is completely untenable.

I’m tempted to to stop using them, for now… But a part of me knows I’ll deeply miss the times when it goes really well. I’ll probably continue to use them on less complex, personal projects. Just to get them up and running. But for anything remotely serious, I’ll probably hold off, and try again in another 6 months or so when the landscape has inevitably changed again.