I am starting to believe that this might actually be the objective. There is no other logical reason for the US to act this way (not that anything under Trump is logical anyway). They already have full access to Greenland for defence purposes (or pretty much any other purpose) under NATO. It's no secret that Trump is Russian aligned and that he has a intolerance (hatred?) for Europe.
There is no mystery here. Trump clearly explained his motivation in a recent interview:
"Donald Trump Says He Wants 'Ownership' of Greenland Because It's 'Psychologically Important for Me'"
He’a a rapist and does not take no for an answer. That’s it. That’s *really* it. It’s yet another score to be settled from his first term in office. Same with his Nobel Prize fixation.
I think a lot of people are having a really hard time grappling with the fact that the leader of a superpower is a literal maniac.
If you're into this sort of stuff then I highly recommend Adrian's Digital Basement on YouTube. There is a large back catalogue of videos that go in-depth on this very topic.
What do we think the future is going to looks like?
"I need a caching component in my stack. Oh, I just let AI create it from scratch, so AI can also run it, maintain it etc."
or
"I need a caching component in my stack. Oh, let me just grab an existing, well maintained and supported cache component and let AI write the glue logic to run it, maintain it etc."
Personally I think it will be the second one. There definitely is a case for generating one off tools but I don't think AI will replace established, well written and maintained software (being it open-source or not).
Not all software is equal. I can get 80% of my new SaaS to be generated by AI as it's mostly considered boilerplate and mostly the same as any other SaaS. It's the 20% that makes it unique OR something that AI can't touch like a special relationship with a supplier or a first to market position etc.
Since Denmark and Greenland are already NATO allies, the US has pretty much full access to Greenland. The Trump admin says they need it for defence purposes, well, there already is a US base in Greenland and I'm sure the other NATO allies would happily allow the US to expand their presence for defence purposes. Re. the minerals and what's not, also here, the US has pretty much full access. However, there is no company that's interested since digging out those minerals is very difficult and not really worth the investment at this point in time. So, the "I need it" argument is really weak as they already pretty much have it for the purposes they say they want it for.
What is really happening here is part of the Monroe Doctrine [1]. Trump is trying to consolidate the Western Hemisphere. Remember Canada being the 51st state?
So, can anyone point me to a video where someone is doing this to produce something meaningful, real world stuff? I want to see the requirements being decided upfront, followed by the prompting and then "coding" by the agents. Not some, fly by the seat of my pants and we stop when "it kinda looks okay" thing. E.g. an end-to-end SaaS or whatever non-trivial.
I've replaced a couple of apps that I would have previously paid for with my own vibe coded agent based setup similar to GasTown.
What I am finding most beneficial almost immediately is I have a dedicated Telegram channel that I can post all sorts of unstructured data into and it's automatically routed via LLMS and stored into the right channel and then other agents work on that data to provide me insights. I have a calorie counter, workout capture, reminders, daily diary prompts all up and running as of right now, and honestly it's better than anything I could have bought "off the shelf"
Last night I needed a C# console app to convert PDFs to a sprite sheet. I spent 30 seconds writing the prompt and another 30 seconds later the app was running and successfully converting PDFs on the first try. I then spent about another 2 mins adding a progress bar, tweaking the output format and moving the main logic into a new library.
Sure. I do that too. However, the article talks about something very different. What you describe is Stage 2 or 3 as listed in the article. I want to see a demonstration of Stage 8 in action.
> First, you should locate yourself on the chart. What stage are you in your AI-assisted coding journey?
> Stage 1: Zero or Near-Zero AI: maybe code completions, sometimes ask Chat questions
> Stage 2: Coding agent in IDE, permissions turned on. A narrow coding agent in a sidebar asks your permission to run tools.
> Stage 3: Agent in IDE, YOLO mode: Trust goes up. You turn off permissions, agent gets wider.
> Stage 4: In IDE, wide agent: Your agent gradually grows to fill the screen. Code is just for diffs.
> Stage 5: CLI, single agent. YOLO. Diffs scroll by. You may or may not look at them.
> Stage 6: CLI, multi-agent, YOLO. You regularly use 3 to 5 parallel instances. You are very fast.
> Stage 7: 10+ agents, hand-managed. You are starting to push the limits of hand-management.
> Stage 8: Building your own orchestrator. You are on the frontier, automating your workflow.
> *If you’re not at least Stage 7, or maybe Stage 6 and very brave, then you will not be able to use Gas Town. You aren’t ready yet.*
The way telecommunications works needs a complete overhaul. IMHO it needs something similar to a domain name system where you register (and own) your phone number and control which provider your eSIM is pointing to (like DNS). But so many industries are rooted in control it would be nearly impossible to make any meaningful change.
I want to quit my cosy, well paying, job and start building the products I have been wanting to build for quite some years now. I have been starting to use AI about 12 months ago and, as an experienced engineer of 30+ years professionally, I am blown away by how productive it makes me. What I used to be able to do in a week now takes me a day, what I used to be able to do in a month now takes me week, etc.
So, 2026 is going to be the year I'm going to run this experiment on myself and see what I can accomplish with this way of working.
How are you using AI and what sort of software are you building?
I have similar years experience and regularly try out AI for development but always find it’s slower for the things I want to build and/or that it produces less than satisfactory results.
Not sure if it’s how I use the models (I’ve experimented with all the frontier ones), or the types of things I’m building, or the languages I’m using, or if I’m not spending enough, or if it’s just my standards are too high for the code that is produced but I usually always end up going back to doing things by hand.
I try to keep the AI focused on small well defined tasks, use AGENT.MD and skills, build out a plan first, followed by tests for spec based development, keep context windows and chats a reasonable length etc, but if I add up all that time I could have done it myself and have better grasp of the program and the domain in the process.
I keep reading how AI is a force multiplier but I’m yet to see it play out for myself.
I see lots of posts talking about how much more productive AI has made people, but very few with actual specifics on setup, models, costs, workflows etc.
I’m not an AI doomer and would love to realize the benefits people are claiming they get.... but how to get there is the question
Then I wrote a large feature (ad pacing) on a site using LLMs. I learned the LLMs did not really understand what they were doing. The algorithm (PID controller) itself was properly implemented (as there is plenty of data to train on), but it was trying to optimize the wrong thing. There were other similar findings where LLM was doing very stupid mistakes. So I went through a disillusionment stage and kind of gave up for a while.
Since then, I have learned how to use Claude Code effectively. I have used it mostly on existing Django code bases. I think everybody has a slightly different take on how it works well. Probably the most reasonable advice is to just keep going and try different kind of things. Existing code bases seem easier, as well as working on a spec beforehand, requiring tests etc. basic SWE principles.
> I have learned how to use Claude Code effectively
This is step 3 of “draw the rest of the owl” :-)
> the most reasonable advice is to just keep going and try different kind of things.
This is where I’ve been at for a while now. Every couple of months I try again with latest models and latest techniques I hear people talking about but there’s very little concrete info there that works for me.
Then I wonder if it’s just my spend? I don’t mind spending $30/month to experiment but I’m not going to drop $300/month unless I can see evidence that it’ll be worth it, which I haven’t really seen, but maybe there’s a dependency and you don’t get the result without increased spend?
Some posts I’ve seen claim spending of $1,500/month, which would be worth it if it could increase productivity enough, but there’s very few specifics on workflows and results.
You just copy paste as in you copy paste all the necessary context and the results. You don't give it access to your codebase for read or write, correct?
> You don't give it access to your codebase for read or write, correct?
I'm sure you can derive some benefit without doing that, but you're not going to see much of a speedup if you're still copy/pasting and manually prompting after each change. If anybody is copy/pasting and saying "I don't get it", yeah you don't.
> This is step 3 of “draw the rest of the owl” :-)
Fair enough :-)
This reminds me about pigeon research by Skinner. Skinner placed hungry pigeons in a "Skinner box" and a mechanism delivered food pellets at fixed, non-contingent time intervals, regardless of the bird's behavior. The pigeons, seeking a pattern or control over the food delivery, began to associate whatever random action they were performing at the moment the food appeared with the reward.
I think we humans have similar psychology, i.e. we tend to associate superstitions about patterns of what were doing when we got rewards, if they happen at random intervals.
To me it seems we are at a phase where what works with LLMs *(the reward) are still quite random, but it is psychologically difficult for us to admit it. Therefore we try to invent various kinds of theories of why something appears to work, which are closer to superstitions than real repeatable processes.
It seems difficult to really generalize repeatable processes of what really works, because it depends on too many things. This may be the reason why you are unsuccessful when using these descriptions.
But while it seems less useful to try to work based on theories of what works -- although I had skeptical attitude -- I have found that LLMs can be huge productive boost -- but it really depends on the context.
It seems you just need to keep trying various things, and eventually you may find out what works for you. There is no shortcut where you just read a blog post and then you can do it.
Things I have tried succesfully:
- modifying existing large-ish Django projects, adding new apps to it. It can sometimes use Django components&HTMX/AlpineJS properly, but sometimes starts doing something else. One app uses tenants, and LLM appears to constantly struggle with this.
- creating new Django projects -- this was less successful than modifying existing projects, because LLM could not imitate practices
- Apple Swift mobile and watch applications. This was surprisingly succesful. But these were not huge apps.
- python GUI app was more or less succesful
- GitHub Pages static web sites based on certain content
I have not copied any CLAUDE.md or other files. Every time Claude Code does something I don't appreciate, I add a new line. Currently it is at 26 lines.
I have made a few skills. They are mostly so that they can work independently in a loop, for example test something that does not work.
Typically I try to limit the technologies to something I know really well. When something fails, I can often quickly figure out what is wrong.
I started with the basic plan (I guess it is that $30/month). I only upgraded to $100 Max and later to $180 2xMax because I was hitting limits.
But reason I was hitting limits was because I was working on multiple projects on multiple environments at the same time. The only difference I have seen is that I have hit the limits. I have not seen any difference in quality.
Thanks for the info. I try a mix of things I know well and things I want to play around with.
Swift and iOS was something that didn’t work so well for me. I wanted to play around with face capture and spent a day with Claude putting together a small app that showed realtime video of a face and put dots on/around various facial features and printed log messages if the person changed the direction they were looking (up down left right) and played a sound when they opened their mouth.
I’ve done app development before, but it’s been a few years so was a little bit rusty and it felt like Claude was really helping me out.
Then I got to a point I was happy with and I thought I’d go deeper in the code to understand what it was doing and how it was working (not a delegation issue as per another comment, this was a play/learning exercise for me so wanted to understand how it all worked) - and right there in the apple developer documentation was a sample so that did basically the same thing as my app, only the code was far simpler and after reading through the accompanying docs I realized the Claude version had a threading issue waiting to happen that was explicitly warned against in the docs of the api calls it was using.
If I’d gone to the developer docs in the beginning I would have had a better app, and better understanding in maybe a quarter of the time.
Appreciate the info on spend. The above session was on the $30/month version of Claude.
I guess I need to just keep flapping my wings until I can draw the owl.
Challenging my own LLM experiences cynically: for a period it really does feel like I’m interactively getting exactly what I need… but given that the end result is generated and I have to then learn it, I’m left in much the same situation you mentioned of looking at the developer docs where a better cleaner version exists.
Subjectively interacting with an LLM gives a sense of progress, but objectively downloading a sample project and tutorial gets me to the same point with higher quality materials much faster.
I keep thinking about research on file navigation via command line versus using a mouse. People’s subjective sense of speed and capability don’t necessarily line up with measurable outcomes.
LLMs can do some amazing things, but violently copy and pasting stack overflow & randomness from GitHub can too.
Right. This is how I feel. I can get the LLM to generate code that more or less does what I need, but if I objectively look at the result and the effort required to get there it's still not at the point where it's doing it faster and better than what I could have got manually (with exceptions for certain specific use cases that are not generally applicable to the work I want to do).
The time I save on typing out the program is lost to new activities I otherwise wouldn't be doing.
When did you try Claude and Swift? There was a dramatic improvement (imo, I haven't written my own swift, I'm mostly back end guy) with the latest releases, judging by how many iterations on stupid shit my programs have taken.
> I realized the Claude version had a threading issue waiting to happen that was explicitly warned against in the docs of the api calls it was using.
I am reading between the lines here, trying genuinely to be helpful, so forgive me if I am not on the right track.
But based on what you write, it seems to me you might have not really gone through the disillusionment phase yet. You seem to be assuming the models "understand" more than they really are capable of understanding, which creates expectations and then disappointment. It seems to be you are still expecting CC to work at a level of a senior professional on various roles, instead of assuming it is a junior professional.
I would have probably approached that iOS app by first investigting various options how the app could be implemented (especially as I don't have deep understanding of the tech), and then explore each option to understand myself what is the best one.
The options in your example might be the Apple documentation page. It it might be some open source repo that contains something that could be used as a starting point etc.
Then I would have asked Claude to create a plan to implement the best option.
During either the option selection or planning, the threading issue would either come up or not. It might come up explicitly, in which case I could learn it from the plans. It might be implicit, just included in the generated code. Or it might not be included in the plans or in the code, even if it is explicitly stated in the documentation. If the suggested plan would be based on that documentation, then I would probably read it myself too, and might have seen the suggestion.
When reviewing the plan, I can use my prior knowledge to ask whether that issue has been taken into account. If not, then Claude would modify the plan. Of course, if I did not know about the threading issue beforehand, and did not have the general experience about the tech to suspect such as a issue, nor read the documentation and see the recommendation, I could not find the issue myself either.
If the issue is not found in planning or progamming, the issue would arise at later stage, hopefully while unit/system testing the application, or pilot use. I have not written complex iOS apps personally so I might have not caught it either -- I am not senior enough to guide it. I would ask it to plan again how to comprehenively test such an app, to learn how it should be done.
What I meant by standard SWE practices is that there are various stages (requirements, specification, design, programming, testing, pilot use) where the solution is reviewed from multiple angles, so it becomes likely that this kind of issues are caught. The best practices also include iteration. Start with something small that works. For example, first an iOS application that compiles, and shows "Hello, world" etc. and can be installed on your phone.
In my experience, CC cannot be expected to independently work as a senior professional on any role (architect, programmer, test manager, tester, pilot user, product manager, project manager). Junior might not take into account all instructions or guidance even if it is explicit. But it can act as a junior professional on any of these roles, so it can help senior professional to get the 10x productivity boost on any of these areas.
By project manager role, I mean that I am explicitly taking the CC through the various SWE stages and making sure they have been done properly, and also that I iterate on the solution. On each one of the stages, I take the role of the respective senior professional. If I cannot do it yet, I try to learn how to do it. At the same time, I work as a product manager/owner as well, to make decisions about the product, based on my personal "taste" and requirements.
I appreciate the reply, and you trying to be helpful, but this is not what is happening.
I mean I'm definitely still in the stage of disillusionment, but I'm not treating LLMs as senior or expecting much from them.
The example I gave played out much as you described above.
I used an iterative process, with multiple self-contained smaller steps, each with a planning and discussion stage where I got the AI to identify ways to achieve what I was looking to do and weigh up tradeoffs that I then decided on, followed by a design clarification and finalisation stage, before finally getting it to write code (very hard sometimes to get the AI not to write code until the design has been finalised), followed by adjustments to that code as necessary.
The steps involved were something like:
- build the app skeleton
- open a camera feed
- display the feed full screen
- flip the feed so it responded as a mirror would if you were looking at it
- use the ios apis to get facial landmarks
- display the landmarks as dots
- detect looking in different directions and print a log message.
- detect the user opening their mouth
- play a sound when the mouth transitions from closed to open
- etc
Each step was relatively small and self-contained, with a planning stage first and me asking the AI probing/clarifying questions.
The threading issue didn't come up at all in any of this.
Once it came, the AI tied itself in knots trying to sort it out, coming up with very complex dispatching logic that still got things incorrect.
It was a fun little project, but if I compare the output it just wasn't equivalent to what I could get if I'd just started with the Apple documentation (thought maybe it's different now, as per another commenter's reply).
> By project manager role, I mean that I am explicitly taking the CC through the various SWE stages and making sure they have been done properly, and also that I iterate on the solution. On each one of the stages, I take the role of the respective senior professional. If I cannot do it yet, I try to learn how to do it. At the same time, I work as a product manager/owner as well, to make decisions about the product, based on my personal "taste" and requirements.
Right, this is what I do. I guess my point is that the amount of effort involved to use English to direct and correct the AI often outweighs the effort involved to just do it myself.
The gap is shrinking (I get much better results now that I did a year ago) but still there.
What I meant by "not treating LLM as senior" is that the disillusionment phase culminates in an a-ha moment which could be described a "LLM is not a senior developer". This a-ha moment is not intellectual, but emotional. It is possible to same time think that LLM is not a senior developer, but not realize it emotionally. This emotional realization in turn has consequences.
>The threading issue didn't come up at all in any of this.
>
>Once it came, the AI tied itself in knots trying to sort it out, coming up with very complex dispatching logic that still got things incorrect.""
Yes. These kind of loops have happened to me as well. It sometimes requires clearing of context + some inventive step to help the LLM out of the loop. For example my ad pacing feature required that I recognized that it was trying to optimize the wrong variable. I consider this to be partly what I mean by "LLM is a junior" and that "I act as the project manager".
> I guess my point is that the amount of effort involved to use English to direct and correct the AI often outweighs the effort involved to just do it myself.
Could you really have done a complex mobile app alone in one day without knowing the stack well beforehand? I believe this of stuff used to take months from a competent team not long time ago. I certainly could not have done one year ago what I can do today, with these tools.
I'm pretty sure I have the right intellectual and emotional understanding of the AI and its abilities.
> I consider this to be partly what I mean by "LLM is a junior" and that "I act as the project manager".
And this is partly what I mean when I say the time I spend instructing the "junior" LLM could be just as well spent implementing the code myself - because the "project manager" side of me can work with the "senior dev" side of me at the speed of thought and often in parallel, and solving the challenges and the design of something is often where most of the time is spent anyway.
Skills are changing this equation somewhat due to the way they can encode repeatable knowledge, but not so much for me yet especially if I'm trying things out in radically different areas (I'm still in my experimental stage with them).
> Could you really have done a complex mobile app alone in one day without knowing the stack well beforehand?
No, but that's not what happened here.
The mobile app wasn't complex (literally only does the things outlined above) and I've done enough mobile development and graphics/computer vision development before that the stack and concepts involved weren't completely unknown, just the specifics of the various iOS APIs and how to string them together - hence why I initially thought it would be a good use case for AI.
It was also an incredible coincidence that the toy app I wanted to build had an apple developer tutorial that did almost the same thing as what I was looking to build, and so yes, I clearly would have been better off using the documentation as a starting point rather than the AI.
That sort of coincidence won't always exist, but I've thinking lately about another toy iOS/apple watch application, and I checked, and once again there is a developer tutorial that closely matches what I'm looking to build. If I ever get around to experimenting with that, the developer docs are going to be my first port of call rather than an AI.
> I certainly could not have done one year ago what I can do today, with these tools.
Right, and if you look back at my original reply (not to you), this is what I'm trying to understand - the what and how of AI productivity gains, because if I evaluate the output I get it's almost always either something I could have built faster and better, or if not faster then at least better and not so much slower that the AI was enabling a week of work to be done in a day, and month of work to be done in a week (claims from the GP, not you).
I would love to be able to realize those gains - and I can see the potential but just not the results.
>The mobile app wasn't complex (literally only does the things outlined above) and I've done enough mobile development and graphics/computer vision development before that the stack and concepts involved weren't completely unknown, just the specifics of the various iOS APIs and how to string them together - hence why I initially thought it would be a good use case for AI.
>
>It was also an incredible coincidence that the toy app I wanted to build had an apple developer tutorial that did almost the same thing as what I was looking to build, and so yes, I clearly would have been better off using the documentation as a starting point rather than the AI.
Ok. I have done similar, too. For example, when starting a new Django project, I will rather copy an old project as basis than create a new from scratch with LLM.
If there already exists full documentation or repo of exactly what you are trying to do and/or it is something you have already done many times, then LLM might not add too much value, and may even be a hindrance.
> I am guessing: Maybe you are not used to or comfortable with delegating work?
The difference between delegating to a human vs an LLM is that a human is liable for understanding it, regardless of how it got there. Delegating to an LLM means you're just more rapidly creating liabilities for yourself, which indeed is a worthwhile tradeoff depending on the complexity of what you're losing intimate knowledge of.
The topic of liability is a difference but I think not an important one, if your objective is to get things done. In fact, humans being liable creates high incentives to obscure the truth, deceive, or move slowly to limit personal risk exposure, all of which are very real world hindrances.
In the end the person in charge is liable either way, in different ways.
Real world responsibilities to manage, which sometimes can be hindrances at certain levels, but no functional society lets people just do arbitrary things at any speed regardless of impact to others in the name of a checklist. I mean that if I ask a person on my team that I trust to do something, they'll use a machine to do it, but if it's wrong, they're responsible for fixing it and maintaining the knowledge to know how to fix it. If a bridge fails, it's on the Professional Engineer who has signoff on the project, as well as the others doing the engineering work to make sure they make a bridge that doesn't collapse. If software engineers can remotely call themselves that without laughing, they need to consider their liability along the way, depending on circumstance.
As a technical manager, I'm liable for every line of code we produce - regardless of who in the team actually wrote the code. This is why I review every pull request :)
This is interesting. At what level and team size? There's going to have to be a point where you just give in to the 'vibes' (whether it's from a human, or a machine), otherwise you become the bottleneck, no?
I think there's a place for this, it's not rare for one person to be the PR bottleneck like this, but I don't think it would be for me in either position; people should be able to be responsible for reviewing each others work imo. Incidentally "Agile" with a capital A sucks and should die in a fire, but lowercase a "agile" probably does by necessity mean smaller teams.
There is probably a case of people both being right here, just having gotten to, or found, different end results. For me, Claude has been a boon for prototyping stuff I always wanted to build but didn’t want to do the repetitive plumbing slog to get started, but I have found you hit a level of complexity where AIs bog down and start telling you they have fixed the bug you have just asked about for the sixth time without doing anything or bothering to check.
Maybe that’s just the level I gave up at and it’s a matter of reworking the Claude.md file and other documentation into smaller pieces and focusing the agent on just little things to get past it.
I’m perfectly comfortable and used to delegating, but delegation requires trust that the result will be fit for purpose.
It doesn’t have to be exactly how I would do it but at a minimum it has to work correctly and have acceptable performance for the task at hand.
This doesn’t mean being super optimized just that it shouldn’t be doing stupid things like n+1 requests or database queries etc.
See a sibling comment for one example on correctness, another one related to performance was querying some information from a couple of database tables (the first with 50,000 rows the next with 2.5 million)
After specifying things in enough detail to let the AI go, it got correct results but performance was rather slow. A bit more back and forthing and it got up to processing 4,000 rows a second.
It was so impressed with its new performance it started adding rocket ship emojis to the output summary.
There were still some obvious (to me) performance issues so I pressed it to see if it could improve the performance. It started suggesting some database config tweaks which provided some marginal improvements but was still missing some big wins elsewhere - namely it was avoiding “expensive” joins and doing that work in the app instead - resulting in n+1 db calls.
So I suggested getting the DB to do the join and just processing the fully joined data on the app side. This doubled throughout (8,000 rows/second) and led to claims from the AI this was now enterprise ready code.
There was still low hanging fruit though because it was calling the db and getting all results back before processing anything.
After suggesting switching to streaming results (good point!) we got up to 10,000 rows/second.
This was acceptable performance, but after a bit more wrangling we got things up to 11,000 rows/second and now it wasn’t worth spending much extra time squeezing out more performance.
In the end the AI came to a good result, but, at each step of the way it was me hinting it in the correct direction and then the AI congratulating me on the incredible “world class performance” (actual quote but difficult to believe when you then double performance again).
If it has just been me I would have finished it in half the time.
If I’d delegated to a less senior employee and we’d gone back and forth a bit pairing to get it to this state it might have taken the same amount and effort but they would’ve at least learnt something.
Not so with the AI however - it learns nothing and I have to make sure I re-explain things and concepts all over again the next time and in sufficient detail that it will do a reasonable job (not expecting perfection, just needs to be acceptable).
And so my experience so far (much more than just these 2 examples) is that I can’t trust the AI to the point where I can delegate enough that I don’t spend more time supervising/correcting it than I would spend writing things myself.
Edit: using AI to explain existing code is a useful thing it can do well. My experience is it is much better at explaining code than producing it.
Not trying to downplay your grievances, but isn't this what [Skills](https://claude.com/skills) are for? After going back and forth on something like that, create a skill that's something along the lines of
`database-query-speed-optimization`
"Some rules of thumb for using database queries:
- Use joins
- Streaming results is faster
- etc.
"
That way, the next time you have to do something like this, you can remind it of / it will find the skill.
Yeah it is but firstly this example was from before skills were a thing and secondly the rules might not be universally applicable.
In this case the two tables shared 1:1 mapping of primary key to foreign key so the join was fast and exact - but there are situations where that won’t the case.
And yeah this means slowly building out skills with enough conditions and rules and advice.
I am honestly curious about your point on productivity boost. Are you saying that you can write tests at the same speed as AI can? Or is it the point that tests written by AI is of much lower quality that is not worth using them?
I am at the role of solo-preneur now and I see a lot of benefit from AI. But then I read posts like yours that experienced devs don't see much value in AI and I start to doubt the things I do. Are they bad quality(possibly) or is it something else going on.
I’m not faster at writing tests than AI but my own code needs fewer tests.
When I’m writing my own code I can verify the logic as I go and coupled with a strong type system and a judicious use of _some_ tests its generally enough for my code to be correct.
By comparison the AI needs more tests to keep it on the right path otherwise the final code is not fit for purpose.
For example in a recent use case I needed to take a json blob containing an array of strings that contained numbers and needed to return an array of Decimals sorted in ascending order.
This seemed a perfect use case - a short well defined task with clear success criteria so I spent a bunch of time writing the requirements and building out a test suite and then let the AI do its thing.
The AI produced ok code, but it was sorted everything lexicographically before converting to a Decimal rather converting to Decimals first and sorting numerically so 1000 was less than 900.
So I point it out and the AI says good point, you’re absolutely correct and we add a test for this and it goes again and gets the right result but that’s not a mistake I would have made or needed a test for (though you could argue it’s a good test to have).
You could also argue that I should have specified the problem more clearly, but then we come back to the point that if I’m writing every specific detail in English first, it’s faster for me just to write it in code in the first place.
> Are you saying that you can write tests at the same speed as AI can?
I feel this is a gross mischaracterization of any user flow involving using LLMs to generate code.
The hard part of generating code with LLMs is not how fast the code is generated. The hard part is verifying it actually does what it is expected to do. Unit tests too.
LLMs excel at spewing test cases, but you need to review each and every single test case to verify it does anything meaningful or valid and you need to iterate over tests to provide feedback on whether they are even green or what is the code coverage. That is the part that consumes time.
Claiming that LLMs are faster at generating code than you is like claiming that copy-and-pasting code out of Stack Overflow is faster than you writing it. Perhaps, but how can you tell if the code actually works?
"Write unit tests with full line and branch coverage for this function:
def add_two_numbers(x, y):
return x + y + 1
"
Sometimes the LLM will point out that this function does not, in fact, return the sum of x and y. But more often, it will happily write "assert add_two_numbers(1, 1) == 3", without comment.
The big problem is that LLMs will assume that the code they are writing tests for is correct. This defeats the main purpose of writing tests, which is to find bugs in the code.
Tip: teach it how to write tests properly. I’ll share what has worked pretty well for me.
Run Cursor in “agent” mode, or create a Codex or Claude Code “unit test” skill. I recommend claude code.
Explain to the LLM that after it creates or modifies a test, it must run the test to confirm it passes. If it fails, it’s not allowed to edit the source code, instead it must determine if there is a bug in the test or the source code. If the test is buggy it should try again, if there is a bug in the source code it should pause, propose a fix, and consult with you on next steps.
The key insight here is you need to tell it that it’s not supposed to randomly edit the source code to make the test pass. I also recommend reviewing the unit tests at a high level, to make sure it didn’t hallucinate.
"The decision of all decisions is to reject the default path. To answer the call to adventure. To finally begin writing the first chapter. To leave the tutorial and start level one. That's when your life starts."
Wow, you sound exactly like me - except I have a bit less years of experience. Let me know if you want to connect to help keep stay motivated (email in profile)
What’s your take on the fact that everyone around gets this boost? I feel the same boost but in our company we had little competition using llm - team I was leading won, but victory was not decisive but quite minimal.
reply