It’s no secret that I was highly skeptical of LLM-s.

Cool, there is this new thing that can spit out plausible text and create cheap-looking images and videos, resulting in a lot of low-quality content being shared.

It was also a huge disappointment to see a human-written post that’s excellent, only for it to be cheapened with a generic AI-generated image as the cover. 1

A few years into this wild ride, I have partially changed my view because some people have figured out how to make LLM-s actually useful, and I like that part a lot. Not the part where the industry is killing my hobby and increasing energy usage worldwide, but there are some parts that I genuinely find useful.

What changed?

If you’re not that excited about LLM-based tools or have otherwise strong opinions on them, then please read the final words first.

Background

I hate the online discourse around LLM-s, or “AI”, or now that I think about it, everything. Everyone has their own opinion that they hold as absolute truth, arguments spawn from a few sentences, but nobody in the threads provide the actual context around their views and experiences, which is critical to properly understanding and arguing with a statement.

This has especially been true around LLM-based tools.

“I tried LLM-s, I absolutely hate it.” -> someone who used fancy autocomplete powered by one of the many Copilot-branded services and was disappointed due to hype augmenting their expectations and the result falling short of them by a large margin.

“LLM-s are great and will replace engineers, it’s a game changer.” -> they vibe-coded a to-do list app over a few hours with more vulnerabilities than working features.

It’s horrible in both extremes.

The sad part is that I understand both perspectives.

Looking at the whole situation outside in, it’s pure insanity. Sketchy financial deals, datacenter build-outs having very real environmental costs that we all pay for, supply chain crunches that are not helped by senile old men starting new wars before they’ve finished their existing ones, and every service adding some AI component in there even if it makes no sense is genuinely frustrating.

And yet when you jump in head-first with no assumptions, it feels like a world where previously impossible or frustrating tasks are now solvable by anyone who knows how to wield this type of tooling. It’s a world of optimism, experimentation and rapid development. No more “here’s a new JS framework” level of depressing churn, we have people who are experimenting with changing the whole landscape of software engineering!

The context that I’m working in has so far been one of the best case scenarios for experimenting with this type of tooling:

  • relatively young product with a predictable, classical technological stack (Java, Spring Boot, Svelte, Docker, Linux VMs)
  • small team with a modest degree of autonomy and encouragement to experiment with new tooling where it makes sense
  • a decade of professional experience in IT and a long list of incidents that I’ve had to resolve
  • need to be as productive as possible with a small team, to be able to build big things

What follows is my experience in a roughly correct timeline. Some of these findings and thoughts can feel like old news, but that’s the order in which I experienced them.

July 2025: ChatGPT and Copilot

One of the first things I picked up on was how easy it felt to ask about some details on things that I know something about, but needed quick clarification or examples with. Googling was a two-step progress: come up with a query, and then work through the results to find what you need.

LLM-s? I used Google search terms as prompts verbatim, and could get what I wanted, fast, and most of the time they were correct, and mainly correct enough for them to be useful to me.

When it came to development, I thought that hey, let’s try out Copilot that we could use through our GitHub organization. I use IntelliJ IDEA, and this one had a plugin that integrated with it, so it felt like a good option to go with.

I tried the fancy autocomplete option first, and it was an immediate source of frustration. It was slow enough to be unusable, and once the results did arrive, they were useless.

The agent option was more useful, but you had to manually give it the necessary context, and it felt clunky. It did an okay job of writing new tests based on previous examples, but nothing revolutionary.

July-August 2025: the turning point

I was pretty disappointed at this point with this level of tooling.

Then, we had an urgent issue that we had to resolve, but based on our estimates it would’ve taken a few days to implement and properly test. We didn’t have that luxury.

My colleague was trying out Cursor at that time. They took it, looked at the problem, and figured out a tested, validated and correct solution in about two hours total. I know that because I validated that solution myself.

At that point I realized that there is something very interesting going on here with LLM-assisted tooling, and got curious.

August 2025: Codex and Claude Code

We agreed in the team to go into experimentation mode and to try out different tools to see what works for us. Cursor was already taken, so I looked at alternatives.

I’ve used Jetbrains products for over a decade at that point, so I looked at their AI offering Junie, but I ruled them out pretty quick after stumbling on some forum threads where users were tearing Jetbrains to shreds for offering an AI product where you can run out of a months’ worth of token allowance within mere hours. In hindsight, it all makes sense now: tokens are actually expensive, and Jetbrains did the tragic “mistake” of not subsidising the cost of tokens with billions of VC funding.

Then I looked into Claude Code. At that point, it was a quite young product, about half a year of it being available. Its main selling point for me was the fact that it ran in a terminal window, allowing me to keep using IntelliJ IDEA while operating in an environment that felt native to me as a Linux user.2

Claude Code felt like magic. I give it a prompt, it goes and finds relevant context by reading through potentially related files, right there on my disk, and it could also call tools and scripts. “Hey, let’s rename this enum from BAD_ENUM_PATTERN_HERE to something better”, and then it would actually do it. Doesn’t sound super impressive once you realize that an IDE can do the same thing much faster as long as you come up with the new name yourself, but it felt magical. The way that it showed the diffs and the overall progress and steps felt natural.

As a Pro tier user, I ran into the 5-hour quota a lot. Whenever that happened, I tried out Codex as I already had a ChatGPT subscription and I had nothing to lose.

Codex was a mixed bag. Sometimes it would do a good job, but in its default settings it felt slow, while with Claude Code I felt that it was just ripping through doing useful work. Tune Codex to be faster, and its output degraded noticeably. I realized quite soon that I prefer quick feedback and iterating more on a solution compared to trying to one-shot it with Codex, so after I upgraded to a Max 5x plan, I left Codex behind.

I have a strong technical background from an era before this type of tooling was available. Equipped with Claude Code, I felt like I had superpowers. I knew what needed to be done, what failure modes are common, what to protect against, what to keep in mind when rolling out new features, and how to resolve incidents. This tool just made all of that faster and even more accessible. And at the same time, I could more easily detect if it was giving be garbage answers with a glance.

As a relatively new joiner in the team, Claude Code was also a fantastic way to speed up my own onboarding to the product and the technical aspects. Previously, finding answers to project or domain specific questions was an exercise in good IDE usage and building a mental model for yourself. Now, anything I needed was a few well thought out prompts away.

For me, this marks the “oh shit” moment with LLM-assisted tooling.

September-December 2025: optimism, experimentation and crunch

Claude Code and Cursor soon went from a fun thing to experiment with to a critical tool that we had to make the most of out of necessity. Deadlines loomed, and even with great engineers, there is a practical limit to what you can achieve if there aren’t too many of them available.

This is the time when we pushed the tooling further more and more. Claude skills became a thing around then, so we started collecting project-specific input and general guidance under those.

I found Claude Code to be the most useful by running it in its bypass permissions mode, but I also valued my home folder not being deleted by accident, so I vibe-coded a basic sandbox container in which I can safely run Claude Code with filesystem-level isolation. It also allowed me to run multiple Claude Code instances in a way that prevented them from interfering with each other, which opened the door for running some wild-ass ideas and experiments in the background.

Integration tests are taking too long to run? Let Claude Code come up with optimization ideas, and let it put together a benchmarking plan. Most of the recommendations did not do much, but a few lines made integration tests 10% faster!

Worried about your authorization setup containing holes? Give that hunch in as input, let Claude Code do some checks, and validate the findings. Whoops, some endpoints were unguarded? Write tests that demonstrate the issue, then let Claude Code fix it. What would’ve taken weeks took mere hours to improve.

We also started seeing first signs of what happens when you push too hard with this level of tooling. With a looming hard deadline and stress, it was not uncommon to see 5000-line PR-s which were hell to review. Vibe-coding artifacts slipped in, subtle bugs became issues that needed to be rectified. Transactional boundary related issues were especially easy to slip in, and difficult to rectify.

And no matter how much you instruct Claude Code, it will ignore a non-zero percentage of the instructions at all times. Using var or deciding to write out full package names for defining a variable type were common and yet basic annoyances.

Product and model churn

When using a tool like Claude Code for the better part of your work day, it will naturally become a critical part of your workflow.

Critical part that is under a rapid pace of product development.

Sometimes the improvements are positive and genuinely useful.

Sometimes you’re hit with a bug that results in a heavy memory leak, leading to all Claude Code sessions terminating after a few minutes due to being OOM-killed.

I feel like a subject to a grand experiment. It makes sense from Anthropics’ perspective, you have to experiment and try out new things to see what works, but as a heavy user of the tool, it makes every working day a game of lottery and introduces an additional source of uncertainty.

Lately the situation has improved somewhat, but then Anthropic has had constant scaling issues. I have the benefit of working in Europe, so I can get my work done before the US wakes up and demolishes their servers with high load or buggy releases, but even then I’m not immune to outages.

The models behind Claude Code have also seen a rapid release cadence, which seems to follow a pattern of:

  • new model released
  • it is better than previous ones
  • few weeks later you can feel some level of degradation, you see more complaints online
  • back to step 1

Purely vibes-based, but it certainly feels that way.

That leads me to one of the biggest frustration points with tools like Claude Code. When everything is changing so fast, so rapidly, and you have no idea what experiments you’re in or what toggles Anthropic has just changed, how are you supposed to reliably get useful output with this type of tooling? Not to mention that LLM-s are still fundamentally non-deterministic, which spices things up even more.

It feels very chaotic and could very well be normal “early adopter” pain, but it doesn’t change the fact that this level of uncertainty contributes to feeling burnt out.

Agentic coding may very well be the norm in the future, but in an era of wild experimentation I feel it doesn’t make sense to build a meaningful amount of supporting infrastructure on top of a foundation made out of sand.

The real AI productivity gains

A large language model alone is not that useful. Put a chat interface in front of it, and things get more interesting. Give it ability to call tools and source the necessary information itself, and now you’re cooking.

AI based tooling has been marketed a lot as a major productivity booster and I agree that it does help with that, with a few dozen asterisks and nuances. However, I’ve observed that most of the actual gains seem to come from things like ignoring good practices. You will do more by putting Claude Code into auto mode or the spicier bypass permissions mode, and if you give it access to Slack, Notion, Jira, Linear, Google Drive, GitHub and more, it will have no issues gathering necessary context and performing boring actions on your behalf.

Need to mass-create Linear tickets and set proper dependencies between them? Claude Code is genuinely useful here.

But what happens when Claude is tricked into performing malicious actions? Or Claude just goes wild and deletes your companies’ Google Drive?

It’s a lot of trust put into a rapidly growing company headquartered in the USA. A few years ago, you would have been fired for sharing your intellectual property and internal company information with a third party, but now it’s called AI-native something-something and you’ll fall behind if you don’t use it.

We’ve given everyone a loaded revolver without explaining things like risk management, threat modelling, data privacy and GDPR, and how to reasonably deal with all of that while balancing it with productivity gains. Pessimist in me says that it will have consequences sooner or later.

The bottleneck

Humans are still the bottleneck. In an established product, you will have actual paying clients, and people who depend on your product. I don’t believe that going full vibe-coding-superstar-engineer in such a context makes a lot of sense, which means understanding, reviewing and testing your own changes. But that takes time and effort. It always has taken time and effort, but with code being cheaper to produce, it’s ballooning.

I’m working in a team where I have high trust in my fellow engineers, which means that we are trying things like reviewing the high level plan of an intended change and not necessarily the final end result, that has to be done by the implementing engineer. This should help us achieve more while having basic architectural-level thinking and checks in place, and it discourages 5000-line PR-s because the author needs to review that by themselves. Jury’s still out on that one and we do have exceptions like still reviewing junior engineers’ work to give them better feedback while they grow into an experienced engineer.

Some try to solve the AI unreliability issue with adding more AI to review AI code. We’re also giving that a go with a custom skill that amounts to just calling each project skill depending on the context of the changes to try and flag some areas that may need more consideration or that don’t make sense given the intent of the changes. It’s okay, but not a replacement for a human review. Claude Code can complain about a function not being performant enough while a human reviewer can identify that the changes can be completely skipped because we can solve the problem with a product-level decision, or an existing query can solve the same issue in a more elegant way.

It seems that a combination of classical tooling (linters, formatters, static analysis) and LLM-level insights is an approach worth trying for doing reviews, but you’ll have to layer them on to have a chance to have meaningful and somewhat reliable results, which means a high token spend. What are you willing to pay for an LLM-assisted code review? 1 EUR? 10 EUR? 100 EUR?

But review is rarely only about the code. Does the solution achieve what it’s supposed to do? Is it the best way to solve that problem? Does it actually work when put into the hands of actual customers?

The good news is that you can make it easier to also set up local development environments with production-like data and custom convenience tooling using tools like Claude Code. The productivity gains from simple internal tools like that are insane and allow you to do more, safely. But it will still take time, focus and context switching, and you can’t really skip that because LLM-based tools often have weird failure modes with their output that may only come up during a manual test of the whole solution.

Bashing out e2e tests for each new feature that demonstrates its functionality and correctness seems to also be a solid approach in a greenfield project where you’re prototyping something quickly and then elevating it into something that can actually be used, reviewed and released.

The economics

Subscription-based pricing is still here for now and all I can say here is that we should take full advantage of that while we still can to improve parts of our world that we have control over.

Let the investors subsidize tackling the technical debt in your project, or performing that maintenance you postponed due to lack of resources, or experimenting with some wild-ass ideas. At some point it’s going to change and API-based pricing is a better reflection of the actual costs, and it’s not looking great.

Screw tokenmaxxers though, you’re ruining it for the rest of us.

LLM-s as a force of good

A lot of discussions out there around LLM-s seem to be focused on the slop angle. It certainly makes it much easier compared to copying answers off of StackOverflow, but that doesn’t mean that you have to use these tools to go fast and break a lot of things. You can use the same tooling and do what you’ve been doing already, but with more intent and much higher quality.

After adopting LLM-based tooling, I have observed these positive changes in my day-to-day work:

  • code is better tested
  • number of TODO-s is dropping
  • investigations to customer questions and fixes to one-off problems are way faster and more correct
  • improving platform security doesn’t have to wait for Q4 2027 any longer
  • I have more time to think about the high-level architecture of the solution and play around with different approaches, evaluating them against our requirements and limitations
  • existing parts of the platform are much more resilient now as a result of applying experience from past incidents
  • project patterns, practices and agreements are documented
  • moving towards infrastructure-as-code setup is much more approachable, especially to other engineers in the team that don’t have a lot of exposure to this area
  • we’ve resolved major performance issues on the fly and made proactive performance improvements that have avoided a lot of issues during periods of high load and scaling the platform

This aspect is what I love about LLM-assisted tooling. I can take my experience and strong technical background, plus all the countless painful incidents I’ve worked through, and apply those lessons in my current work, at a faster pace, and yet with better quality.

Feels like a superpower, but you have to apply it properly and with rigor to make the most of it.

AI vs my self-hosting hobby

This positivity has also expanded into my hobby, which involves managing my fleet of machines via Ansible and hosting a bunch of services in containers. Validating my existing Ansible playbooks and coming up with new roles on the fly whenever I add something to my setup is much more approachable. My free time is much more limited nowadays and games like Forza Horizon 6 don’t help there, so dabbling with my hobby for a few hours here and there and actually achieving something is genuinely great.

To balance that excitement out, the computer parts market has gone to shit. With everything being much more expensive, I’ve reworked my setup to use what I have and to pray that no expensive parts die. I’ve stopped watching most videos of new hardware as a result, because it’s hard to become excited about a new mini PC that is outside my budget.

I’m not sure where I stand with my hobby now. With LLM-assisted tooling, I’ve blasted through my ideas to-do list there and fixed issues that have bothered me a lot, and yet I’ve lost the excitement on the hardware side because I won’t be buying new platforms anyway.

One area that remains as an unexplored area is running local LLM-s. Other than that, I’m not sure. I suppose I’m taking a small break from it for the first time in 10+ years, and that makes me sad.

LLM-s and this blog

This one has not changed, this blog is my voice and replacing that with the one from a machine is still a no-go for me.

I have featured content where the subject of the post was thrown together with LLM-assisted tools for jokes where realistically only a handful of people reading this blog will get. That’s still fine by me, and I encourage having fun. Otherwise, what’s the point of living?

The non-determinism

It has been 0 days since Claude Code has made up a link to a pull request within our own repository to which it has full access via the GitHub CLI.

It’s not a new phenomenon that LLM-s make up plausible shit, and yet it keeps frustrating me every time that it does that. The profuse apologising certainly does not help.

“Oh yeah mate I totally forgot that I shouldn’t wrap every line of code in a try-catch block, that is on me, I will do better.” and then it does the same thing 2 minutes later.

God, I hate that.

The solution to this is, once again, to layer more AI on top. I suppose if your tools are correct 95% of the time, and you do the same thing repeatedly, then eventually you’ll get close to being 100% correct, but never to 100% exactly.

The worst parts are times when it outputs Java package names belonging to actual software development consultancies in Estonia. Did they leak something, mix up some sessions, or does it come from the training data? Do I want to know the answer to that?

The dumb-ass babysitting

In the pursuit of “safety”, providers like Anthropic have crippled the functionality of their solutions in certain areas, such as cybersecurity. Ask Claude Code to help write a proof of concept for a known vulnerability against your own service, and it will politely refuse or hit you with an API error.

Great, I didn’t really need to test my own service that I’m responsible for against a type of actively exploited vulnerability that could end the business in one go. Thanks, Anthropic, you’ve really made the world safer now. /s

Judgement

Turns out that all the experience I’ve accumulated is not useless, it’s become much more critical.

More often than not, you need to use your own judgement when making changes, choosing between alternatives, and just plain thinking about the issue at hand.

I can give Claude Code a well-thought-out prompt, highlighting common patterns that we need to tackle and address, and it will do an okay job, or at least that’s what it looks like.

But when I investigate the result, I still see areas that it misses because it lacks the wider context, or is blissfully unaware of alternatives, or it just gets its investigations really wrong by making shit up on the fly or misunderstanding a functionality completely. Press it on some findings, and you’ll often find that it did a really shitty job, actually, and you can improve the solution a lot.

Interestingly, I’ve found myself arguing about a topic with Claude Code, only to then discover with a manual investigation that I was in fact very wrong and Claude Code was actually right. Usually that’s followed up by a documentation update or a refactor clarifying the solution, but those sessions serve as a good reminder that I’m not that infallible myself.

How I work vs how Claude Code works

It’s interesting to observe how Claude Code operates. In a lot of ways, it mirrors how I operate.

I have a problem that needs solving. Okay, let’s gather more context, search for relevant files, check some historic Jira tickets on that topic for good measure. Do some Slack searches. Try to get the full picture.

Now that I have that, I can try to come up with a solution. Often that ends up with minor changes, at other times I will copy-paste existing files to create a new endpoint, adjusted for my use case, named properly. Maybe I’ll add a few tests for good measure.

Claude Code does all of that, but better. I find it so much easier to judge a proposed solution than to write it all from scratch. I was never the person that enjoyed tackling compilation errors, or checking why once again my tests don’t work because of some Mockito nuance. All that focus is now spent on brainstorming a solution, improving its design, and thinking about security, performance, compliance, architecture and how it all fits together. I’ve rarely worked in a team where those items got the proper attention that they deserve.

Skill atrophy

There are concerns out there around skill atrophy when relying on LLM-s too much. I’m not too concerned with that.

I learned to write using a pen and paper, but picked up on writing on a keyboard at a modest speed3, and yet I’m much faster with it. I haven’t forgotten to write in cursive, it just looks less beautiful than it did when I was younger, and that’s OK.

If LLM-s disappeared right this second, I’ll revert back to the old ways of working. Sure, the pace will be slower in the short term, but I’ll make some choices and changes to ways of working, expected pace and will shed expectations and workloads that I won’t have time for.

Did you forget to ride a bicycle the moment you got your first car?

Good practices are socially acceptable now?

One interesting observation is that every good practice of classical software engineering has now become a requirement to use LLM-assisted tools effectively. You know, those items that you had to fight for prioritizing in a poorly functioning organization?

You should have documentation, and it should be kept up-to-date. Amazing insights!

Yes, you should tackle that tech debt now because otherwise Claude Code will make use of deprecated features and fields and introduce more legacy code!

Having tests that catch regressions are good!

Functional, stable, performant CI/CD pipelines and team processes are foundational to a well performing engineering team, who would have thought?

Those who were already doing a good job are now doing great, and the poorly performing teams are suffering when applying the same tools.

Async development

If you’ve followed my blog for a while, then you’ll know that I have a home server that’s on 24/7.

This has allowed me to spawn a Claude Code instance on a separate VM inside of it that mirrors my setup at work, and I’ve used that always-on playground as a way to tackle annoying long-running tasks or wild-ass investigations and tests that take hours to complete.

For example, we are firm believers in rebasing changes on top of the main branch, but if you have a bunch of PR-s ready to merge, it goes into an annoying cycle of rebase, update other branch, wait for CI to run, complete, start again. Turns out that you can prompt Claude Code with a simple automation loop and it will take care of that by itself, including the resolution of conflicts.

For larger investigations and technical migrations, I have successfully set up a prompt to achieve a goal, some guidance, and my expectation of it running autonomously. I can come back to it the next morning and review its output. It is straight up magic to have the computer work on a Spring Boot 4 upgrade while I’m playing Forza Horizon 6 (after work, of course).

It’s also possible to schedule some work in advance. If my 5-hour quota gets refreshed at 19:00, I can set Claude up with a goal and instructions to start at that specific time, meaning that you can use your AI subscription plan to make the most of your AI subscription plan.

I’ve long dreamed of setups where my laptop is a very basic machine with great battery life, and all the heavy lifting happens on a powerful remote server. With classical development, that approach would’ve included a remote desktop setup. The necessity of a good internet connection was a major blocker for using such a setup for all of my work, and video compression artifacts make text look like trash.

With Claude, you can just run it in a terminal, over SSH. All you’re moving is text back-and-forth, which is infinitely more performant even in low internet connectivity scenarios. May not be the best flow for front-end or design-heavy work, but you can successfully offload a wide variety of activities to a remote Claude Code instance hosted on your hardware.

This is what this tooling should allow us to do: achieve more while spending less time. We’re not there yet, but it’s a goal we should aspire towards instead of the productivity gains quietly slipping into the pockets of billionaires.

Zero predictions, many questions

At the current technical level, I don’t believe that we can reliably shift to a model where a coding agent takes in human input and you’ll have a reliable, tested and correctly architected solution that fits together with the rest of your project, with zero human review in the process.

If you put in a lot of effort into building a custom harness, adding layers of checks on top, and keeping that machinery running with active maintenance, you will likely reach a point where you can somewhat reliably use this approach to get solid results. To get to that point, you will need to shift your focus from building your product to becoming a professional harness engineer, and the end result might cost a lot of tokens to run.

Is that sacrifice worth it, and will that same approach remain working in 6 months?

We’ve already seen that you can build a spaghetti architecture and end up with an unmaintainable dumpster fire of a product using classical engineering approaches. Once you reach that point, any progress grinds to a halt and you’re stuck fighting fires while losing customers. You can reach that point faster if you build more with LLM-assisted tools without having a proper plan and architecture in place. What use is a harness if you can’t build anything impactful with it?

You can take that tooling and augment your own work in a positive way, making iterative changes and trying out new approaches and ideas at a sustainable pace that doesn’t steal focus from your product that you’re supposed to be working on.

It’s also clear that the demand for this type of tooling is there. 200 EUR/month subscriptions for a tool was not the norm even a few years ago, and here we are with people happily paying that and still finding that it brings great value to them.

Since the space keeps evolving and external forces, such as infinite money glitches not being a thing in real life, it raises some topics that I’m keenly keeping an eye on, even if there is a factor of morbid curiosity there that stems from a desire to see how it all plays out in the end.

What will a successful engineering team look like from a few years from now?

If the real cost of tokens is passed on to consumers or availability suffers dramatically due to an event, then what will happen to existing AI-first workflows?4

At which point is the tooling too expensive to use?

When will locally runnable open weights models and open source harnesses be good enough to replace tools like Claude Code?5

When will a state-of-the-art model from Anthropic or OpenAI be leaked?

When will Anthropic/OpenAI get hacked in a catastrophic way and what implications will it have for, well, everything?

Final words

If you work in an engineering position and you’ve avoided relying on this type of tooling, leave the very real downsides and risks aside for a moment and give it an honest try. Push its limits. Do something with it that brings joy. After that, you’ll at least have a more informed opinion on this type of technology, and perhaps it could end with renewed interest in a practical application of LLM-s that could branch to using open source coding agents and harnesses, and exploring various locally runnable open weights models that are desperately needed to seize the means of code production.6

If you’re heavily using this type of tooling already to move fast, then take a break. Move slowly. Act with intent. We have a choice to either build more and faster, or to build what we already wanted to build, but with much better quality. Before LLM-assisted tooling came into the picture, we were already in a software crisis where too much was built with not enough quality controls in place and with maintenance, security and performance being distant afterthoughts. Now, we have the means to better address those areas. Don’t waste this chance to make the software world a better place, and through that the real world.

Despite the challenges and very real near future risks around relying on this type of tooling, I remain cautiously optimistic and will keep using an LLM-first approach to building and maintaining services and infrastructure. For now, the productivity gains and enjoyment are outweighing the feeling of being burnt out.

If it doesn’t work out, then I will sleep well knowing that I have my beekeepers’ hat waiting for me.


  1. my unofficial policy on my own blog post covers is simple: if I don’t have a topical one, I’ll pick one with a cat from my personal collection, or scribble something together in GIMP. The one on this is my beloved cat Tux sitting on top of a ThinkPad X230 that has one of those chonker docks on them. She is an absolute delight of a cat. In fact, she is the best cat, period. ↩︎

  2. btw I use Fedora ↩︎

  3. I learned to touch type one afternoon, but, like, half-way. ↩︎

  4. this is a topic that’s actively playing out with more providers moving to token-based pricing instead of subscription-based fixed price plans. ↩︎

  5. geopolitically motivated competition in the realm of AI could end up being beneficial for the rest of us after all. ↩︎

  6. I love the approach that Wendell from Level1Techs has taken: embrace the new technology, but be mindful of the very real downsides and risks. Instead of putting your head in the sand or trusting big providers blindly, fight for the right to run local models on hardware that you control! It’s self-hosting, but taken to LLM-s, and I’m fully on board with those ideals and ideas. ↩︎