<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>./techtipsy</title>
    
    
    
    <link>https://ounapuu.ee/tags/llm/</link>
    <description>Recent content on ./techtipsy, a blog written by Herman Õunapuu.</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <managingEditor>ihavesomethoughtsonyourblog@ounapuu.ee (Herman Õunapuu)</managingEditor>
    <webMaster>ihavesomethoughtsonyourblog@ounapuu.ee (Herman Õunapuu)</webMaster>
    <lastBuildDate>Mon, 08 Jun 2026 09:00:00 +0300</lastBuildDate>
    
	<atom:link href="https://ounapuu.ee/tags/llm/index.xml" rel="self" type="application/rss+xml" />
    
    
    
    <item>
      <title>My experience with LLM-assisted tools in software development</title>
      <link>https://ounapuu.ee/posts/2026/06/08/llm/</link>
      <pubDate>Mon, 08 Jun 2026 09:00:00 +0300</pubDate>
      <author>ihavesomethoughtsonyourblog@ounapuu.ee (Herman Õunapuu)</author>
      <guid>https://ounapuu.ee/posts/2026/06/08/llm/</guid>
      <description>
        
          &lt;img src=&#34;https://ounapuu.ee/posts/2026/06/08/llm/media/cover.jpg&#34;/&gt;
          
        
        
        &lt;p&gt;It&amp;rsquo;s no secret that I was &lt;em&gt;&lt;strong&gt;highly&lt;/strong&gt;&lt;/em&gt; skeptical of LLM-s.&lt;/p&gt;
&lt;p&gt;Cool, there is this new thing that can spit out plausible text and create cheap-looking images and videos, resulting in
a lot of low-quality content being shared.&lt;/p&gt;
&lt;p&gt;It was also a huge disappointment to see a human-written post that&amp;rsquo;s excellent, only for it to be cheapened with a
generic AI-generated image as the cover. &lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;A few years into this wild ride, I have partially changed my view because some people have figured out how to make LLM-s
actually useful, and I like that part a lot. Not the part where the industry is killing my hobby and increasing energy
usage worldwide, but there are some parts that I genuinely find useful.&lt;/p&gt;
&lt;p&gt;What changed?&lt;/p&gt;
&lt;p&gt;&lt;em&gt;If you&amp;rsquo;re not that excited about LLM-based tools or have otherwise strong opinions on them, then
please &lt;a href=&#34;https://ounapuu.ee/posts/2026/06/08/llm/#final-words&#34;&gt;read the final words first.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&#34;background&#34;&gt;Background&lt;/h3&gt;
&lt;p&gt;I hate the online discourse around LLM-s, or &amp;ldquo;AI&amp;rdquo;, or now that I think about it, &lt;em&gt;&lt;strong&gt;everything.&lt;/strong&gt;&lt;/em&gt; Everyone has their
own opinion that they hold as absolute truth, arguments spawn from a few sentences, but nobody in the threads provide
the actual context around their views and experiences, which is &lt;em&gt;&lt;strong&gt;critical&lt;/strong&gt;&lt;/em&gt; to properly understanding and arguing
with a statement.&lt;/p&gt;
&lt;p&gt;This has especially been true around LLM-based tools.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;I tried LLM-s, I absolutely hate it.&amp;rdquo; -&amp;gt; someone who used fancy autocomplete powered by one of the many Copilot-branded
services and was disappointed due to hype augmenting their expectations and the result falling short of them by a large
margin.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;LLM-s are great and will replace engineers, it&amp;rsquo;s a game changer.&amp;rdquo; -&amp;gt; they vibe-coded a to-do list app over a few hours
with more vulnerabilities than working features.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s horrible in both extremes.&lt;/p&gt;
&lt;p&gt;The sad part is that I understand both perspectives.&lt;/p&gt;
&lt;p&gt;Looking at the whole situation outside in, it&amp;rsquo;s pure insanity.
Sketchy financial deals, datacenter build-outs having very real environmental costs that we all pay for, supply chain
crunches that are not helped by senile old men starting new wars before they&amp;rsquo;ve finished their existing ones, and every
service adding some AI component in there even if it makes no sense is genuinely frustrating.&lt;/p&gt;
&lt;p&gt;And yet when you jump in head-first with no assumptions, it feels like a world where previously impossible or
frustrating tasks are now solvable by anyone who knows how to wield this type of tooling. It&amp;rsquo;s a world of optimism,
experimentation and rapid development. No more &amp;ldquo;here&amp;rsquo;s a new JS framework&amp;rdquo; level of depressing churn, we have people who
are experimenting with changing the whole landscape of software engineering!&lt;/p&gt;
&lt;p&gt;The context that I&amp;rsquo;m working in has so far been one of the best case scenarios for experimenting with this type of
tooling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;relatively young product with a predictable, classical technological stack (Java, Spring Boot, Svelte, Docker, Linux
VMs)&lt;/li&gt;
&lt;li&gt;small team with a modest degree of autonomy and encouragement to experiment with new tooling where it makes sense&lt;/li&gt;
&lt;li&gt;a decade of professional experience in IT and a long list of incidents that I&amp;rsquo;ve had to resolve&lt;/li&gt;
&lt;li&gt;need to be as productive as possible with a small team, to be able to build big things&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What follows is my experience in a roughly correct timeline. Some of these findings and thoughts can feel like old news,
but that&amp;rsquo;s the order in which I experienced them.&lt;/p&gt;
&lt;h3 id=&#34;july-2025-chatgpt-and-copilot&#34;&gt;July 2025: ChatGPT and Copilot&lt;/h3&gt;
&lt;p&gt;One of the first things I picked up on was how easy it felt to ask about some details on things that I know something
about, but needed quick clarification or examples with. Googling was a two-step progress: come up with a query, and then
work through the results to find what you need.&lt;/p&gt;
&lt;p&gt;LLM-s? I used Google search terms as prompts verbatim, and could get what I wanted, fast, and most of the time they
were correct, and mainly correct enough for them to be useful to me.&lt;/p&gt;
&lt;p&gt;When it came to development, I thought that hey, let&amp;rsquo;s try out Copilot that we could use through our GitHub
organization. I use IntelliJ IDEA, and this one had a plugin that integrated with it, so it felt like a good option to
go with.&lt;/p&gt;
&lt;p&gt;I tried the fancy autocomplete option first, and it was an immediate source of frustration. It was slow
enough to be unusable, and once the results did arrive, they were useless.&lt;/p&gt;
&lt;p&gt;The agent option was more useful, but you had to manually give it the necessary context, and it felt clunky. It did an
okay job of writing new tests based on previous examples, but nothing revolutionary.&lt;/p&gt;
&lt;h3 id=&#34;july-august-2025-the-turning-point&#34;&gt;July-August 2025: the turning point&lt;/h3&gt;
&lt;p&gt;I was pretty disappointed at this point with this level of tooling.&lt;/p&gt;
&lt;p&gt;Then, we had an urgent issue that we had to resolve, but based on our estimates it would&amp;rsquo;ve taken a few days to
implement and properly test. We didn&amp;rsquo;t have that luxury.&lt;/p&gt;
&lt;p&gt;My colleague was trying out Cursor at that time. They took it, looked at the problem, and figured out a tested,
validated and correct solution in about two hours total. I know that because I validated that solution myself.&lt;/p&gt;
&lt;p&gt;At that point I realized that there is something very interesting going on here with LLM-assisted tooling, and got
curious.&lt;/p&gt;
&lt;h3 id=&#34;august-2025-codex-and-claude-code&#34;&gt;August 2025: Codex and Claude Code&lt;/h3&gt;
&lt;p&gt;We agreed in the team to go into experimentation mode and to try out different tools to see what works for us. Cursor
was already taken, so I looked at alternatives.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve used Jetbrains products for over a decade at that point, so I looked at their AI offering Junie, but I ruled them
out pretty quick after stumbling on some forum threads where users were tearing Jetbrains to shreds for offering an AI
product where you can run out of a months&amp;rsquo; worth of token allowance within mere hours. In hindsight, it all makes sense
now: tokens are actually expensive, and Jetbrains did the tragic &amp;ldquo;mistake&amp;rdquo; of not subsidising the cost of tokens with
billions of VC funding.&lt;/p&gt;
&lt;p&gt;Then I looked into Claude Code. At that point, it was a quite young product, about half a year of it being available.
Its main selling point for me was the fact that it ran in a terminal window, allowing me to keep using IntelliJ IDEA
while operating in an environment that felt native to me as a Linux user.&lt;sup id=&#34;fnref:2&#34;&gt;&lt;a href=&#34;#fn:2&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Claude Code felt like magic. I give it a prompt, it goes and finds relevant context by reading through potentially
related files, right there on my disk, and it could also call tools and scripts. &amp;ldquo;Hey, let&amp;rsquo;s rename this enum from
BAD_ENUM_PATTERN_HERE to something better&amp;rdquo;, and then it would actually do it. Doesn&amp;rsquo;t sound super impressive once you
realize that an IDE can do the same thing much faster as long as you come up with the new name yourself, but it felt
magical. The way that it showed the diffs and the overall progress and steps felt natural.&lt;/p&gt;
&lt;p&gt;As a Pro tier user, I ran into the 5-hour quota a lot. Whenever that happened, I tried out Codex as I already had a
ChatGPT subscription and I had nothing to lose.&lt;/p&gt;
&lt;p&gt;Codex was a mixed bag. Sometimes it would do a good job, but in its default settings it felt slow, while with Claude
Code I felt that it was just ripping through doing useful work. Tune Codex to be faster, and its output degraded
noticeably. I realized quite soon that I prefer quick feedback and iterating more on a solution compared to trying to
one-shot it with Codex, so after I upgraded to a Max 5x plan, I left Codex behind.&lt;/p&gt;
&lt;p&gt;I have a strong technical background from an era before this type of tooling was available. Equipped with Claude Code, I
felt like I had superpowers. I knew what needed to be done, what failure modes are common, what to protect against, what
to keep in mind when rolling out new features, and how to resolve incidents. This tool just made all of that faster and
even more accessible. And at the same time, I could more easily detect if it was giving be garbage answers with a
glance.&lt;/p&gt;
&lt;p&gt;As a relatively new joiner in the team, Claude Code was also a fantastic way to speed up my own onboarding to the
product and the technical aspects. Previously, finding answers to project or domain specific questions was an exercise
in good IDE usage and building a mental model for yourself. Now, anything I needed was a few well thought out prompts
away.&lt;/p&gt;
&lt;p&gt;For me, this marks the &amp;ldquo;oh shit&amp;rdquo; moment with LLM-assisted tooling.&lt;/p&gt;
&lt;h3 id=&#34;september-december-2025-optimism-experimentation-and-crunch&#34;&gt;September-December 2025: optimism, experimentation and crunch&lt;/h3&gt;
&lt;p&gt;Claude Code and Cursor soon went from a fun thing to experiment with to a critical tool that we had to make the most of
out of necessity. Deadlines loomed, and even with great engineers, there is a practical limit to what you can achieve if
there aren&amp;rsquo;t too many of them available.&lt;/p&gt;
&lt;p&gt;This is the time when we pushed the tooling further more and more. Claude skills became a thing around then, so we
started collecting project-specific input and general guidance under those.&lt;/p&gt;
&lt;p&gt;I found Claude Code to be the most useful by running it in its bypass permissions mode, but I also valued my home folder
not being deleted by accident, so I vibe-coded a basic sandbox container in which I can safely run Claude Code with
filesystem-level isolation. It also allowed me to run multiple Claude Code instances in a way that prevented them from
interfering with each other, which opened the door for running some wild-ass ideas and experiments in the background.&lt;/p&gt;
&lt;p&gt;Integration tests are taking too long to run? Let Claude Code come up with optimization ideas, and let it put together a
benchmarking plan. Most of the recommendations did not do much, but a few lines made integration tests 10% faster!&lt;/p&gt;
&lt;p&gt;Worried about your authorization setup containing holes? Give that hunch in as input, let Claude Code do some checks,
and validate the findings. Whoops, some endpoints were unguarded? Write tests that demonstrate the issue, then let
Claude Code fix it. What would&amp;rsquo;ve taken weeks took mere hours to improve.&lt;/p&gt;
&lt;p&gt;We also started seeing first signs of what happens when you push too hard with this level of tooling. With a looming
hard deadline and stress, it was not uncommon to see 5000-line PR-s which were hell to review. Vibe-coding artifacts
slipped in, subtle bugs became issues that needed to be rectified. Transactional boundary related issues were especially
easy to slip in, and difficult to rectify.&lt;/p&gt;
&lt;p&gt;And no matter how much you instruct Claude Code, it will ignore a non-zero percentage of the instructions at all times.
Using &lt;code&gt;var&lt;/code&gt; or deciding to write out full package names for defining a variable type were common and yet basic
annoyances.&lt;/p&gt;
&lt;h3 id=&#34;product-and-model-churn&#34;&gt;Product and model churn&lt;/h3&gt;
&lt;p&gt;When using a tool like Claude Code for the better part of your work day, it will naturally become a critical part of
your workflow.&lt;/p&gt;
&lt;p&gt;Critical part that is under a &lt;strong&gt;&lt;em&gt;rapid&lt;/em&gt;&lt;/strong&gt; pace of product development.&lt;/p&gt;
&lt;p&gt;Sometimes the improvements are positive and genuinely useful.&lt;/p&gt;
&lt;p&gt;Sometimes you&amp;rsquo;re hit with a bug that results in a heavy memory leak, leading to all Claude Code sessions terminating
after a few minutes due to being OOM-killed.&lt;/p&gt;
&lt;p&gt;I feel like a subject to a grand experiment. It makes sense from Anthropics&amp;rsquo; perspective, you have to experiment
and try out new things to see what works, but as a heavy user of the tool, it makes every working day a game of lottery
and introduces an additional source of uncertainty.&lt;/p&gt;
&lt;p&gt;Lately the situation has improved somewhat, but then Anthropic has had constant scaling issues. I have the benefit of
working in Europe, so I can get my work done before the US wakes up and demolishes their servers with high load or buggy
releases, but even then I&amp;rsquo;m not immune to outages.&lt;/p&gt;
&lt;p&gt;The models behind Claude Code have also seen a rapid release cadence, which seems to follow a pattern of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;new model released&lt;/li&gt;
&lt;li&gt;it is better than previous ones&lt;/li&gt;
&lt;li&gt;few weeks later you can feel some level of degradation, you see more complaints online&lt;/li&gt;
&lt;li&gt;back to step 1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Purely vibes-based, but it certainly feels that way.&lt;/p&gt;
&lt;p&gt;That leads me to one of the biggest frustration points with tools like Claude Code. When everything is changing so fast,
so rapidly, and you have no idea what experiments you&amp;rsquo;re in or what toggles Anthropic has just changed, how are you
supposed to reliably get useful output with this type of tooling? Not to mention that LLM-s are still fundamentally
non-deterministic, which spices things up even more.&lt;/p&gt;
&lt;p&gt;It feels very chaotic and could very well be normal &amp;ldquo;early adopter&amp;rdquo; pain, but it doesn&amp;rsquo;t change the fact that this level
of uncertainty contributes to feeling burnt out.&lt;/p&gt;
&lt;p&gt;Agentic coding may very well be the norm in the future, but in an era of wild experimentation I feel it doesn&amp;rsquo;t make
sense to build a meaningful amount of supporting infrastructure on top of a foundation made out of sand.&lt;/p&gt;
&lt;h3 id=&#34;the-real-ai-productivity-gains&#34;&gt;The real AI productivity gains&lt;/h3&gt;
&lt;p&gt;A large language model alone is not that useful. Put a chat interface in front of it, and things get more interesting.
Give it ability to call tools and source the necessary information itself, and now you&amp;rsquo;re cooking.&lt;/p&gt;
&lt;p&gt;AI based tooling has been marketed a lot as a major productivity booster and I agree that it does help with that, with a
few dozen asterisks and nuances. However, I&amp;rsquo;ve observed that most of the actual gains seem to come from things like
ignoring good practices. You &lt;em&gt;&lt;strong&gt;will&lt;/strong&gt;&lt;/em&gt; do more by putting Claude Code into auto mode or the spicier bypass
permissions mode, and if you give it access to Slack, Notion, Jira, Linear, Google Drive, GitHub and more, it will have
no issues gathering necessary context and performing boring actions on your behalf.&lt;/p&gt;
&lt;p&gt;Need to mass-create Linear tickets and set proper dependencies between them? Claude Code is genuinely useful here.&lt;/p&gt;
&lt;p&gt;But what happens when Claude is tricked into performing malicious actions? Or Claude just goes wild and deletes your
companies&amp;rsquo; Google Drive?&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a lot of trust put into a rapidly growing company headquartered in the USA. A few years ago, you would have been
fired for sharing your intellectual property and internal company information with a third party, but now it&amp;rsquo;s called
AI-native something-something and you&amp;rsquo;ll fall behind if you don&amp;rsquo;t use it.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve given everyone a loaded revolver without explaining things like risk management, threat modelling, data privacy
and GDPR, and how to reasonably deal with all of that while balancing it with productivity gains. Pessimist in me says
that it will have consequences sooner or later.&lt;/p&gt;
&lt;h3 id=&#34;the-bottleneck&#34;&gt;The bottleneck&lt;/h3&gt;
&lt;p&gt;Humans are still the bottleneck. In an established product, you will have actual paying clients, and people who
depend on your product. I don&amp;rsquo;t believe that going full vibe-coding-superstar-engineer in such a context makes a lot of
sense, which means understanding, reviewing and testing your own changes. But that takes time and effort. It always has
taken time and effort, but with code being cheaper to produce, it&amp;rsquo;s ballooning.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m working in a team where I have high trust in my fellow engineers, which means that we are trying things like
reviewing the high level plan of an intended change and not necessarily the final end result, that has to be done by the
implementing engineer. This should help us achieve more while having basic architectural-level thinking and checks in
place, and it discourages 5000-line PR-s because the author needs to review that by themselves. Jury&amp;rsquo;s still out on that
one and we do have exceptions like still reviewing junior engineers&amp;rsquo; work to give them better feedback while they grow
into an experienced engineer.&lt;/p&gt;
&lt;p&gt;Some try to solve the AI unreliability issue with adding more AI to review AI code. We&amp;rsquo;re also giving that a go with a
custom skill that amounts to just calling each project skill depending on the context of the changes to try and flag
some areas that may need more consideration or that don&amp;rsquo;t make sense given the intent of the changes. It&amp;rsquo;s okay, but not
a replacement for a human review. Claude Code can complain about a function not being performant enough while a human
reviewer can identify that the changes can be completely skipped because we can solve the problem with a product-level
decision, or an existing query can solve the same issue in a more elegant way.&lt;/p&gt;
&lt;p&gt;It seems that a combination of classical tooling (linters, formatters, static analysis) and LLM-level insights is an
approach worth trying for doing reviews, but you&amp;rsquo;ll have to layer them on to have a chance to have meaningful and
somewhat reliable results, which means a high token spend. What are you willing to pay for an LLM-assisted code review?
1 EUR? 10 EUR? 100 EUR?&lt;/p&gt;
&lt;p&gt;But review is rarely only about the code. Does the solution achieve what it&amp;rsquo;s supposed to do? Is it the best way to
solve that problem? Does it actually work when put into the hands of actual customers?&lt;/p&gt;
&lt;p&gt;The good news is that you &lt;em&gt;can&lt;/em&gt; make it easier to also set up local development environments with production-like data
and custom convenience tooling using tools like Claude Code. The productivity gains from simple internal tools like that
are insane and allow you to do more, safely. But it will still take time, focus and context switching, and you can&amp;rsquo;t
really skip that because LLM-based tools often have weird failure modes with their output that may only come up during a
manual test of the whole solution.&lt;/p&gt;
&lt;p&gt;Bashing out e2e tests for each new feature that demonstrates its functionality and correctness seems to also be a solid
approach in a greenfield project where you&amp;rsquo;re prototyping something quickly and then elevating it into something that
can actually be used, reviewed and released.&lt;/p&gt;
&lt;h3 id=&#34;the-economics&#34;&gt;The economics&lt;/h3&gt;
&lt;p&gt;Subscription-based pricing is still here for now and all I can say here is that we should take full advantage of that
while we still can to improve parts of our world that we have control over.&lt;/p&gt;
&lt;p&gt;Let the investors subsidize tackling the technical debt in your project, or performing that maintenance you postponed
due to lack of resources, or experimenting with some wild-ass ideas. At some point it&amp;rsquo;s going to change and API-based
pricing is a better reflection of the actual costs, and it&amp;rsquo;s not looking great.&lt;/p&gt;
&lt;p&gt;Screw &lt;a href=&#34;https://blog.pragmaticengineer.com/the-pulse-tokenmaxxing-as-a-weird-new-trend/&#34;&gt;tokenmaxxers&lt;/a&gt; though, you&amp;rsquo;re
ruining it for the rest of us.&lt;/p&gt;
&lt;h3 id=&#34;llm-s-as-a-force-of-good&#34;&gt;LLM-s as a force of good&lt;/h3&gt;
&lt;p&gt;A lot of discussions out there around LLM-s seem to be focused on the slop angle. It certainly makes it much easier
compared to copying answers off of StackOverflow, but that doesn&amp;rsquo;t mean that you &lt;em&gt;have&lt;/em&gt; to use these tools to go fast
and break a lot of things. You can use the same tooling and do what you&amp;rsquo;ve been doing already, but with more intent and
much higher quality.&lt;/p&gt;
&lt;p&gt;After adopting LLM-based tooling, I have observed these positive changes in my day-to-day work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;code is better tested&lt;/li&gt;
&lt;li&gt;number of TODO-s is dropping&lt;/li&gt;
&lt;li&gt;investigations to customer questions and fixes to one-off problems are way faster and more correct&lt;/li&gt;
&lt;li&gt;improving platform security doesn&amp;rsquo;t have to wait for Q4 2027 any longer&lt;/li&gt;
&lt;li&gt;I have more time to think about the high-level architecture of the solution and play around with different approaches,
evaluating them against our requirements and limitations&lt;/li&gt;
&lt;li&gt;existing parts of the platform are much more resilient now as a result of applying experience from past incidents&lt;/li&gt;
&lt;li&gt;project patterns, practices and agreements are documented&lt;/li&gt;
&lt;li&gt;moving towards infrastructure-as-code setup is much more approachable, especially to other engineers in the team
that don&amp;rsquo;t have a lot of exposure to this area&lt;/li&gt;
&lt;li&gt;we&amp;rsquo;ve resolved major performance issues on the fly and made proactive performance improvements that have avoided a lot
of issues during periods of high load and scaling the platform&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This aspect is what I love about LLM-assisted tooling. I can take my experience and strong technical background,
plus all the countless painful incidents I&amp;rsquo;ve worked through, and apply those lessons in my current work, at a faster
pace, and yet with better quality.&lt;/p&gt;
&lt;p&gt;Feels like a superpower, but you have to apply it properly and with rigor to make the most of it.&lt;/p&gt;
&lt;h3 id=&#34;ai-vs-my-self-hosting-hobby&#34;&gt;AI vs my self-hosting hobby&lt;/h3&gt;
&lt;p&gt;This positivity has also expanded into my hobby, which involves managing my fleet of machines via Ansible and hosting a
bunch of services in containers. Validating my existing Ansible playbooks and coming up with new roles on the fly
whenever I add something to my setup is much more approachable. My free time is much more limited nowadays and games
like &lt;a href=&#34;https://ounapuu.ee/posts/2026/05/24/forza/&#34;&gt;Forza Horizon 6&lt;/a&gt; don&amp;rsquo;t help there, so dabbling with my hobby for a few hours here and
there and actually achieving something is genuinely great.&lt;/p&gt;
&lt;p&gt;To balance that excitement out, the computer parts market has gone to shit. With everything being much more expensive,
I&amp;rsquo;ve reworked my setup to use what I have and to pray that no expensive parts die. I&amp;rsquo;ve stopped watching most videos of
new hardware as a result, because it&amp;rsquo;s hard to become excited about a new mini PC that is outside my budget.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not sure where I stand with my hobby now. With LLM-assisted tooling, I&amp;rsquo;ve blasted through my ideas to-do list there
and fixed issues that have bothered me a lot, and yet I&amp;rsquo;ve lost the excitement on the hardware side because I won&amp;rsquo;t be
buying new platforms anyway.&lt;/p&gt;
&lt;p&gt;One area that remains as an unexplored area is running local LLM-s. Other than that, I&amp;rsquo;m not sure. I suppose I&amp;rsquo;m taking
a small break from it for the first time in 10+ years, and that makes me sad.&lt;/p&gt;
&lt;h3 id=&#34;llm-s-and-this-blog&#34;&gt;LLM-s and this blog&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://ounapuu.ee/posts/2024/09/06/blog/#:~:text=No%20generative%20AI%20garbage&#34;&gt;This one has not changed,&lt;/a&gt; this blog is my voice and
replacing that with the one from a machine is still a no-go for me.&lt;/p&gt;
&lt;p&gt;I have featured content where the subject of the post was thrown together with LLM-assisted tools for jokes where
&lt;a href=&#34;https://ounapuu.ee/posts/2026/02/15/btrfs/&#34;&gt;realistically only a handful of people reading this blog will get.&lt;/a&gt; That&amp;rsquo;s still fine by me,
and I encourage having fun. Otherwise, what&amp;rsquo;s the point of living?&lt;/p&gt;
&lt;h3 id=&#34;the-non-determinism&#34;&gt;The non-determinism&lt;/h3&gt;
&lt;p&gt;It has been 0 days since Claude Code has made up a link to a pull request within our own repository to which it has full
access via the GitHub CLI.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not a new phenomenon that LLM-s make up plausible shit, and yet it keeps frustrating me every time that it does
that. The profuse apologising certainly does not help.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Oh yeah mate I totally forgot that I shouldn&amp;rsquo;t wrap every line of code in a try-catch block, that is on me, I will do
better.&amp;rdquo; &lt;em&gt;and then it does the same thing 2 minutes later.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;God, I hate that.&lt;/p&gt;
&lt;p&gt;The solution to this is, once again, to layer more AI on top. I suppose if your tools are correct 95% of the time, and
you do the same thing repeatedly, then eventually you&amp;rsquo;ll get close to being 100% correct, but never to 100% exactly.&lt;/p&gt;
&lt;p&gt;The worst parts are times when &lt;strong&gt;it outputs Java package names belonging to actual software development consultancies in
Estonia.&lt;/strong&gt; Did they leak something, mix up some sessions, or does it come from the training data? Do I &lt;em&gt;&lt;strong&gt;want&lt;/strong&gt;&lt;/em&gt; to
know the answer to that?&lt;/p&gt;
&lt;h3 id=&#34;the-dumb-ass-babysitting&#34;&gt;The dumb-ass babysitting&lt;/h3&gt;
&lt;p&gt;In the pursuit of &amp;ldquo;safety&amp;rdquo;, providers like Anthropic have crippled the functionality of their solutions in certain
areas, such as cybersecurity. Ask Claude Code to help write a proof of concept for a known vulnerability against your
own service, and it will politely refuse or hit you with an API error.&lt;/p&gt;
&lt;p&gt;Great, I didn&amp;rsquo;t &lt;em&gt;&lt;strong&gt;really&lt;/strong&gt;&lt;/em&gt; need to test my own service that I&amp;rsquo;m responsible for against a type of actively exploited
vulnerability that could end the business in one go. Thanks, Anthropic, &lt;em&gt;you&amp;rsquo;ve really made the world safer now.&lt;/em&gt; &lt;code&gt;/s&lt;/code&gt;&lt;/p&gt;
&lt;h3 id=&#34;judgement&#34;&gt;Judgement&lt;/h3&gt;
&lt;p&gt;Turns out that all the experience I&amp;rsquo;ve accumulated is not useless, it&amp;rsquo;s become much more critical.&lt;/p&gt;
&lt;p&gt;More often than not, you need to use your own judgement when making changes, choosing between alternatives, and just
plain thinking about the issue at hand.&lt;/p&gt;
&lt;p&gt;I can give Claude Code a well-thought-out prompt, highlighting common patterns that we need to tackle and address, and
it will do an okay job, or at least that&amp;rsquo;s what it looks like.&lt;/p&gt;
&lt;p&gt;But when I investigate the result, I still see areas that it misses because it lacks the wider context,
or is blissfully unaware of alternatives, or it just gets its investigations really wrong by making shit up on the fly
or misunderstanding a functionality completely. Press it on some findings, and you&amp;rsquo;ll often find that it did a really
shitty job, actually, and you can improve the solution a lot.&lt;/p&gt;
&lt;p&gt;Interestingly, I&amp;rsquo;ve found myself arguing about a topic with Claude Code, only to then discover with a manual
investigation that I was in fact very wrong and Claude Code was actually &lt;em&gt;&lt;strong&gt;right&lt;/strong&gt;&lt;/em&gt;. Usually that&amp;rsquo;s followed up by a
documentation update or a refactor clarifying the solution, but those sessions serve as a good reminder that I&amp;rsquo;m not
that infallible myself.&lt;/p&gt;
&lt;h3 id=&#34;how-i-work-vs-how-claude-code-works&#34;&gt;How I work vs how Claude Code works&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s interesting to observe how Claude Code operates. In a lot of ways, it mirrors how I operate.&lt;/p&gt;
&lt;p&gt;I have a problem that needs solving. Okay, let&amp;rsquo;s gather more context, search for relevant files, check some historic
Jira tickets on that topic for good measure. Do some Slack searches. Try to get the full picture.&lt;/p&gt;
&lt;p&gt;Now that I have that, I can try to come up with a solution. Often that ends up with minor changes, at other times I will
copy-paste existing files to create a new endpoint, adjusted for my use case, named properly. Maybe I&amp;rsquo;ll add a few tests
for good measure.&lt;/p&gt;
&lt;p&gt;Claude Code does all of that, but better. I find it so much easier to judge a proposed solution than to write it all
from scratch. I was never the person that enjoyed tackling compilation errors, or checking why once again my tests don&amp;rsquo;t
work because of some Mockito nuance. All that focus is now spent on brainstorming a solution, improving its design, and
thinking about security, performance, compliance, architecture and how it all fits together. I&amp;rsquo;ve rarely worked in a
team where those items got the proper attention that they deserve.&lt;/p&gt;
&lt;h3 id=&#34;skill-atrophy&#34;&gt;Skill atrophy&lt;/h3&gt;
&lt;p&gt;There are concerns out there around skill atrophy when relying on LLM-s too much. I&amp;rsquo;m not too concerned with that.&lt;/p&gt;
&lt;p&gt;I learned to write using a pen and paper, but picked up on writing on a keyboard at a modest speed&lt;sup id=&#34;fnref:3&#34;&gt;&lt;a href=&#34;#fn:3&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;3&lt;/a&gt;&lt;/sup&gt;, and yet I&amp;rsquo;m much
faster with it. I haven&amp;rsquo;t forgotten to write in cursive, it just looks less beautiful than it did when I was younger,
and that&amp;rsquo;s OK.&lt;/p&gt;
&lt;p&gt;If LLM-s disappeared right this second, I&amp;rsquo;ll revert back to the old ways of working. Sure, the pace will be slower in
the short term, but I&amp;rsquo;ll make some choices and changes to ways of working, expected pace and will shed expectations and
workloads that I won&amp;rsquo;t have time for.&lt;/p&gt;
&lt;p&gt;Did you forget to ride a bicycle the moment you got your first car?&lt;/p&gt;
&lt;h3 id=&#34;good-practices-are-socially-acceptable-now&#34;&gt;Good practices are socially acceptable now?&lt;/h3&gt;
&lt;p&gt;One interesting observation is that every good practice of classical software engineering has now become a requirement
to use LLM-assisted tools effectively. You know, those items that you had to fight for prioritizing in a poorly
functioning organization?&lt;/p&gt;
&lt;p&gt;You should have documentation, and it should be kept up-to-date. Amazing insights!&lt;/p&gt;
&lt;p&gt;Yes, you &lt;em&gt;&lt;strong&gt;should&lt;/strong&gt;&lt;/em&gt; tackle that tech debt now because otherwise Claude Code will make use of deprecated features and
fields and introduce more legacy code!&lt;/p&gt;
&lt;p&gt;Having tests that catch regressions are good!&lt;/p&gt;
&lt;p&gt;Functional, stable, performant CI/CD pipelines and team processes are foundational to a well performing engineering
team, who would have thought?&lt;/p&gt;
&lt;p&gt;Those who were already doing a good job are now doing great, and the poorly performing teams are suffering when applying
the same tools.&lt;/p&gt;
&lt;h3 id=&#34;async-development&#34;&gt;Async development&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;ve followed my blog for a while, then you&amp;rsquo;ll know that I have a home server that&amp;rsquo;s on 24/7.&lt;/p&gt;
&lt;p&gt;This has allowed me to spawn a Claude Code instance on a separate VM inside of it that mirrors my setup at work, and
I&amp;rsquo;ve used that always-on playground as a way to tackle annoying long-running tasks or wild-ass investigations and tests
that take hours to complete.&lt;/p&gt;
&lt;p&gt;For example, we are firm believers in rebasing changes on top of the main branch, but if you have a bunch of PR-s ready
to merge, it goes into an annoying cycle of rebase, update other branch, wait for CI to run, complete, start again.
Turns out that you can prompt Claude Code with a simple automation loop and it will take care of that by itself,
including the resolution of conflicts.&lt;/p&gt;
&lt;p&gt;For larger investigations and technical migrations, I have successfully set up a prompt to achieve a goal, some
guidance, and my expectation of it running autonomously. I can come back to it the next morning and review its output.
It is straight up magic to have the computer work on a Spring Boot 4 upgrade while I&amp;rsquo;m playing Forza Horizon 6 (after
work, of course).&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also possible to schedule some work in advance. If my 5-hour quota gets refreshed at 19:00, I can set Claude up
with a goal and instructions to start at that specific time, meaning that you can use your AI subscription plan to make
the most of your AI subscription plan.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve long dreamed of setups where my laptop is a very basic machine with great battery life, and all the heavy lifting
happens on a powerful remote server. With classical development, that approach would&amp;rsquo;ve included a remote desktop setup.
The necessity of a good internet connection was a major blocker for using such a setup for all of my work, and video
compression artifacts make text look like trash.&lt;/p&gt;
&lt;p&gt;With Claude, you can just run it in a terminal, over SSH. All you&amp;rsquo;re moving is text back-and-forth, which is infinitely
more performant even in low internet connectivity scenarios. May not be the best flow for front-end or design-heavy
work, but you can successfully offload a wide variety of activities to a remote Claude Code instance hosted on your
hardware.&lt;/p&gt;
&lt;p&gt;This is what this tooling &lt;em&gt;should&lt;/em&gt; allow us to do: achieve more while spending less time. We&amp;rsquo;re not there yet, but it&amp;rsquo;s
a goal we should aspire towards instead of the productivity gains quietly slipping into the pockets of billionaires.&lt;/p&gt;
&lt;h3 id=&#34;zero-predictions-many-questions&#34;&gt;Zero predictions, many questions&lt;/h3&gt;
&lt;p&gt;At the current technical level, I don&amp;rsquo;t believe that we can reliably shift to a model where a coding agent takes in
human input and you&amp;rsquo;ll have a reliable, tested and correctly architected solution that fits together with the rest of
your project, with zero human review in the process.&lt;/p&gt;
&lt;p&gt;If you put in a lot of effort into building a custom harness, adding layers of checks on top, and keeping that machinery
running with active maintenance, you will likely reach a point where you can somewhat reliably use this approach to get
solid results. To get to that point, you will need to shift your focus from building your product to becoming a
professional harness engineer, and the end result might cost a lot of tokens to run.&lt;/p&gt;
&lt;p&gt;Is that sacrifice worth it, and will that same approach remain working in 6 months?&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve already seen that you can build a spaghetti architecture and end up with an unmaintainable dumpster fire of a
product using classical engineering approaches. Once you reach that point, any progress grinds to a halt and you&amp;rsquo;re
stuck fighting fires while losing customers. You can reach that point faster if you build more with LLM-assisted tools
without having a proper plan and architecture in place. What use is a harness if you can&amp;rsquo;t build anything impactful with
it?&lt;/p&gt;
&lt;p&gt;You &lt;em&gt;&lt;strong&gt;can&lt;/strong&gt;&lt;/em&gt; take that tooling and augment your own work in a positive way, making iterative changes and trying out new
approaches and ideas at a sustainable pace that doesn&amp;rsquo;t steal focus from your product that you&amp;rsquo;re supposed to be working
on.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also clear that the demand for this type of tooling is there. 200 EUR/month subscriptions for a tool was not the
norm even a few years ago, and here we are with people happily paying that and still finding that it brings great value
to them.&lt;/p&gt;
&lt;p&gt;Since the space keeps evolving and external forces, such as infinite money glitches not being a thing in real life, it
raises some topics that I&amp;rsquo;m keenly keeping an eye on, even if there is a factor of morbid curiosity there that stems
from a desire to see how it all plays out in the end.&lt;/p&gt;
&lt;p&gt;What will a successful engineering team look like from a few years from now?&lt;/p&gt;
&lt;p&gt;If the real cost of tokens is passed on to consumers or availability suffers dramatically due
to &lt;a href=&#34;https://www.reuters.com/world/middle-east/amazons-cloud-unit-reports-fire-after-objects-hit-uae-data-center-2026-03-01/&#34;&gt;an event&lt;/a&gt;,
then what will happen to existing AI-first workflows?&lt;sup id=&#34;fnref:4&#34;&gt;&lt;a href=&#34;#fn:4&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;At which point is the tooling too expensive to use?&lt;/p&gt;
&lt;p&gt;When will locally runnable open weights models and open source harnesses be good enough to replace tools like Claude
Code?&lt;sup id=&#34;fnref:5&#34;&gt;&lt;a href=&#34;#fn:5&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;5&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;When will a state-of-the-art model from Anthropic or OpenAI be leaked?&lt;/p&gt;
&lt;p&gt;When will Anthropic/OpenAI get hacked in a catastrophic way and what implications will it have for, well,
&lt;em&gt;&lt;strong&gt;everything&lt;/strong&gt;&lt;/em&gt;?&lt;/p&gt;
&lt;h3 id=&#34;final-words&#34;&gt;Final words&lt;/h3&gt;
&lt;p&gt;If you work in an engineering position and you&amp;rsquo;ve avoided relying on this type of tooling, leave the very real downsides
and risks aside for a moment and give it an honest try. Push its limits. Do something with it that brings joy.
After that, you&amp;rsquo;ll at least have a more informed opinion on this type of technology, and perhaps it could end with
renewed interest in a practical application of LLM-s that could branch to using open source coding agents and harnesses,
and exploring various locally runnable open weights models that are desperately needed to seize the means of code
production.&lt;sup id=&#34;fnref:6&#34;&gt;&lt;a href=&#34;#fn:6&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re heavily using this type of tooling already to move fast, then take a break. Move slowly. Act with intent. We
have a choice to either build more and faster, or to build what we already wanted to build, but with much better
quality. Before LLM-assisted tooling came into the picture, we were already in a software crisis where too much was
built with not enough quality controls in place and with maintenance, security and performance being distant
afterthoughts. Now, we have the means to better address those areas. Don&amp;rsquo;t waste this chance to make the software world
a better place, and through that the real world.&lt;/p&gt;
&lt;p&gt;Despite the challenges and very real near future risks around relying on this type of tooling, I remain cautiously
optimistic and will keep using an LLM-first approach to building and maintaining services and infrastructure. For now,
the productivity gains and enjoyment are outweighing the feeling of being burnt out.&lt;/p&gt;
&lt;p&gt;If it doesn&amp;rsquo;t work out, then I will sleep well knowing that I have my beekeepers&amp;rsquo; hat waiting for me.&lt;/p&gt;
&lt;div class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34;&gt;
&lt;p&gt;my unofficial policy on my own blog post covers is simple: if I don&amp;rsquo;t have a topical one, I&amp;rsquo;ll pick one with a cat
from my personal collection, or scribble something together in GIMP. The one on this is my beloved cat Tux sitting on
top of a ThinkPad X230 that has one of those chonker docks on them. She is an absolute delight of a cat. In fact, she is
the best cat, period.&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:2&#34;&gt;
&lt;p&gt;btw I use Fedora&amp;#160;&lt;a href=&#34;#fnref:2&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:3&#34;&gt;
&lt;p&gt;I learned to touch type one afternoon, but, like, half-way.&amp;#160;&lt;a href=&#34;#fnref:3&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:4&#34;&gt;
&lt;p&gt;this is a topic that&amp;rsquo;s actively playing out with more providers moving to token-based pricing instead of
subscription-based fixed price plans.&amp;#160;&lt;a href=&#34;#fnref:4&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:5&#34;&gt;
&lt;p&gt;geopolitically motivated competition in the realm of AI could end up being beneficial for the rest of us after
all.&amp;#160;&lt;a href=&#34;#fnref:5&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:6&#34;&gt;
&lt;p&gt;I love the approach that Wendell from Level1Techs has taken: embrace the new technology, but be mindful of the
very real downsides and risks. Instead of putting your head in the sand or trusting big providers blindly, fight for the
right to run local models on hardware that you control! It&amp;rsquo;s self-hosting, but taken to LLM-s, and I&amp;rsquo;m fully on board
with those ideals and ideas.&amp;#160;&lt;a href=&#34;#fnref:6&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

        
        </description>
    </item>
    
    
  </channel>
</rss>