What works well and what doesn't

In the last blog post I presented the basic pattern I have been following to create a beta of a rather complex product — with Web, native iOS and native Android client — alone in three months. Is this sustainable? Absolutely not. I won’t be able to bring this product to scale by myself. My goal is to get it to a limited set of users and then scale it with a team. And I’m not so far away from that — but let me first go back to the title of this post. There are two bigger struggles/challenges I have had so far:

Addiction and balance

That might sound strange for a post about agentic coding, but hear me out.

What I mean by addiction is that the ability to deliver new features in a very short time is so addictive that it is very easy (and it constantly happened to me) to ignore the basics of product development.

You still need to think through what you actually want. Coding it becomes suddenly easy, and maybe most importantly you still need to take your time to test it yourself, do user testing and iterate on the learnings.

But I have to say the ability to just deliver things quickly made me an addict. Gone were the days where you had an idea you wanted to test out, you went to a team, and the response was: “Our backlog is full for the next 8 weeks — and have you read that book from Marty Cagan?”

And now I can just prompt it! It got so bad that I had so many agents running in parallel that I lost overview and either never committed their work or never tested it, because there was always the new shiny thing to build.

And now we come to balance. After a wild period of losing control, I literally had to force myself to limit the number of agents running in parallel, check their work, test the features and iterate on them. The magic number of parallel agents is around three — depending on the complexity of the tasks. This gives me enough time to read the output of the agents, give meaningful feedback, and actually test what was built. And it is so very easy to build a product without any coherency by just adding features to it.

Dark factory

Here is how a Dark Factory looks in the ideal case:

Ideal factory This would be really nice, but I have to admit I am not there, and I probably never will be. You can roughly imagine my factory looking like this:

realistic factory

Let’s look at the two differences:

Manual gate

Two main reasons for that:

I don’t believe (or I am not smart enough) to really specify features in a way that no human review is necessary. I still believe in MVPs, so I let LLMs create them and then I iterate. This is especially true for everything user-facing in the frontend.
While my automated backend testing is pretty good, I don’t have good enough automated testing for the web version specifically to have 100% confidence in releasing. I could write another blog post about how the velocity of AI makes frontend testing more difficult…

The role of the engineer

The factory doesn’t construct itself, so that is where a big part of the engineering work goes. How do you make sure the factory creates production-ready code, and the architecture is sound (security, GDPR and so on)?

So the factory is actually not that dark — the production is, but the layout isn’t. While I believe this will be solved soon, for now I see a big part of my job as constructing the factory. It took me time to realize this is the key thing. Now I wake up every morning and see the results of several CI runs:

Architecture Review
Bug Audit
Security and Performance Review
UX Review

Which are my “checks and balances” — next to all the markdown files that tell Claude what to do (and we all know it never follows those instructions 100%).

This is Part 2 of a series on building a startup with AI as your only team.