TL;DR: I needed to build a mass image production pipeline. I tried it the normal way. Twice. Both attempts collapsed. Then I gave in to the AI hype.

A Quick Disclaimer Before We Start

Yes, this is another post about AI. I’m aware my blog has become a one-topic newsletter at this point, and I’m aware I sound like the kid who’s still pressing the same joke two months after the room went silent.

But I keep discovering things - almost daily - that genuinely shift how I think about building software. So until the well dries up or I develop a second personality trait, you’re getting my viewpoint on it.

Sorry. Not sorry. A little sorry.

The Task

A mass image production pipeline. Different products, different themes, same visual style, all generated by AI, all needing to look consistent.

The kind of thing that normally involves a designer, a project manager, three meetings about brand guidelines, and a Jira board that outlives the actual project. On paper, the requirements fit on one page. Requirements that fit on one page have a way of not doing that for very long.

So I did what developers do. I started building.

Act 1: Formageddon

First attempt: a frontend with options.

The naive belief was that if I gave users enough knobs, they’d have everything they need. Settings, dropdowns, checkboxes, validation rules, an “advanced” panel for the gnarly bits.

What I ended up with was a UI that looked like the cockpit of a 747 designed by someone who has never flown, for users who just wanted to land at the nearest airport.

I thought I had made it flexible. I’d watched enough ThePrimeagen videos to dodge that pitfall. The next one wasn’t on the channel.

The user came back the next day:

  • The output layout should look slightly different.
  • The naming should follow a different pattern.
  • This theme needs a special exception.
  • This product category needs a different crop.
  • This customer wants the results grouped differently.

Every one of those was a code change. Not because the pipeline was bad - because every possible decision had to be predicted ahead of time and turned into a form field, a toggle, a dropdown, or a config value. You can give users a thousand options and still miss the one they need.

Formageddon, scrapped.

Act 2: Just One More Layer

Second attempt: if options-as-a-grid won’t work, make the workflow configurable. Build a node-based pipeline editor. Each node a step, each edge a transformation, the whole thing reshuffleable without redeploying.

Basically a homegrown n8n, but for image production.

A visual workflow builder. A config screen per node. A few magic nodes for the AI calls. Retry logic. State handling. Preview logic. Then the validator on top of all of it. Then the UI for the validator. Then the error states for the UI for the validator.

The parts kept falling apart. Every new node added combinatorial complexity. Every edge case broke two other things. The visual editor needed its own schema. The schema needed its own migrations. The migrations needed their own UI, because the people changing workflows weren’t going to write JSON.

I could have finished it.

There’s nothing technically impossible about building a domain-specific n8n. It’s a known shape of problem, the patterns exist, and I know how to grind through them. I just didn’t have the months it would take to do it properly - and “properly” was the only mode that would have shipped something that didn’t crumble the first time the user changed their mind. Six months of my life for one internal pipeline wasn’t going to happen.

I pulled the plug.

The Scope I’d Been Ignoring

Two dead attempts later, I finally looked at the actual problem.

The pipeline wasn’t the deliverable. The capacity to reshape the pipeline was the deliverable - on the user’s schedule, for reasons I couldn’t predict, indefinitely.

New themes. New customers. New naming conventions someone invented in a meeting on Tuesday. New approval steps because legal got involved. New output formats because the platform changed. New “small visual tweaks” that turned out to be load-bearing.

Every traditional approach - frontend with options, configurable workflow engine, even the dreaded internal admin panel - eventually hits the same wall: you can only expose the knobs you’ve already imagined.

Act 3: The AI Idea I Didn’t Want to Have

Yes, I’d thought about it. I’d dismissed it twice already that month.

I didn’t trust it.

“AI plugins” hit my LinkedIn feed a few times. Another AI-something. Scrolled past.

I don’t like non-determinism in production systems. I like things that do the same thing twice in a row. I like inputs that map to outputs without a coin flip in the middle. An agent that might do the right thing, or might confidently march toward a wall with a clipboard, was not what I wanted operating a pipeline that real customers were waiting on.

I’m the kind of developer who, if his life depended on it, would not trust AI. (More of that later.)

So I rejected the idea, went back to the n8n clone, and watched it continue to die.

Caving

A few days later, deadline approaching, both prior attempts dead, no good options left.

Fine. Quick and dirty. I’d build the AI version as a stopgap, see how badly it failed, then use the lessons to design the real solution. Throwaway. Tuesday-to-Friday job.

The plugin was embarrassingly small:

  • A few markdown files for instructions and rules.
  • Some scripts for the actual image generation pipeline.
  • A handful of HTML files that pass for a dashboard.
  • Codex sitting in the middle, reading the markdown, calling the scripts, refreshing the HTML.

That’s the whole plugin. A junior dev could build the scaffold in an afternoon.

I built it on a Tuesday afternoon expecting to start over by Friday morning.

Thursday

The user came to me on Thursday with a change request.

I almost said “give me a day, I’ll patch the scripts.” Instead I said “just tell Codex what you want.”

They did. Codex made a plan. They approved it. The pipeline ran. The new outputs were correct.

I had not touched the code.

It stopped feeling like a plugin somewhere around the third time the agent fixed something I hadn’t asked it to. What I’d built was less a plugin and more a tiny custom operating system for one very specific business case.

The workflow now looks almost insultingly simple:

  1. User tells Codex: “Create images with these input images”
  2. Codex creates a plan
  3. User approves the plan
  4. Scripts do their thing
  5. User watches progress on the website - in real time

And because Codex has a browser built in, all of this happens in one app. The agent sees the instructions. It sees the files. It sees the website. It sees the outputs. It sees the problems. Everything lives in the same loop.

The Trade

I’d assumed non-determinism would be the dealbreaker. An agent that sometimes does the right thing and sometimes confidently marches off a cliff is a terrible production component, if the cost of a wrong output is high and the time-to-detect is slow.

Neither turned out to be true here.

The cost of a wrong output is low, because outputs are images, and bad images are visible the second they render. There is no silent corruption, no subtle off-by-one that surfaces six weeks later in a quarterly report.

And the time-to-detect is roughly zero, because the user is already watching the website while the pipeline runs. Bad output? They tell Codex to redo it differently. Plan looks weird? They reject it before it runs. The agent has the entire context - the markdown rules, the previous outputs, the user’s last three corrections - and adjusts.

The non-determinism is still there. It just stopped mattering, because the iteration loop got cheap enough that you catch wrong outputs the moment they happen, fix them with a sentence, and move on.

I gave up determinism. What I got back was the ability to steer. For image production, that turned out to be worth more.

Why This Matters Beyond One Pipeline

For about twenty years, business software has come in two flavors: too dumb to be useful beyond their intended scope, or too configurable to use without a consultant on retainer telling you which knobs to turn. Pick your discomfort.

Either way, it works like this:

Here’s the app. Here’s the workflow. Here are the buttons. Here’s the approved way to use it. Now please - please - adapt your business process to our product.

It’s the software equivalent of buying a suit and being told the tailor only works on the customer.

And if the business case changes? Open a ticket. Write requirements. Wait for a developer. Wait for a consultant. Wait for the next sprint. Wait for deployment. Wait for someone to explain why the system was never designed for that.

I’ve lived this. You’ve lived this. We’ve all sat in the meeting where someone says “the tool doesn’t support that” and everyone nods like it’s a law of physics rather than a design choice someone made in 2019.

The old loop vs. the new loop, roughly:

Business case changes
─────────────────────────────────────────────────
Old:  ticket → backlog → sprint → consultant →
      deployment → "actually it doesn't do that"
      → repeat
─────────────────────────────────────────────────
New:  "hey, can you also..." → plan → review → run
─────────────────────────────────────────────────

You don’t need a developer, a consultant, or an SAP-certified shaman charging €1,800/day to move a column in a report. The customization layer used to be a six-month enterprise project with three consultants and a sacred Excel sheet. Now it’s a conversation.

A non-technical user can finally change how their software behaves - meaningfully, not “I rearranged the dashboard widgets” - without a developer in the loop. Not by clicking through a settings page someone pre-imagined for them. By describing what they want, in their own words, to a thing that can read the scaffold and figure out the rest.

Enterprise vendors should be more nervous than they are. Not about the AI. About the fact that the customization layer - the thing they charge six figures and six months for - just became a chat window.

I Still Have a Job, Apparently

The other thing that’s quietly changed is the shape of the work I do.

I used to build whole products. Frontend, backend, validators, settings pages, the admin panel nobody asked for but everyone needs. End-to-end systems that tried to predict every decision a user might ever make and bake it into a form field.

Now I build ports. A CLI here, an API there, an MCP server if the agent needs to talk to something custom, a couple of scripts that do one thing well. Small surfaces with defined interfaces that behave the same way every time they’re called.

The agent docks onto those ports and does the rest. The ports do the deterministic work - the scripts don’t hallucinate, the API returns what it returns, the database query is the database query. The agent does the steering on top: when the business case changes, you don’t rewrite the ports, you tell the agent something different and it composes the same ports in a different order.

It’s the same trade I described earlier - give up determinism, get the ability to steer - viewed from the builder’s side. The ports stay deterministic. The composition stops being.

What disappears is most of the work I used to spend my time on. The CRUD admin panel. The advanced settings tab. The five-step wizard. The validator UI for the validator schema. What’s left is the half that actually has to be correct.

But Not for Everything

Would I build a TurboTax-style Codex plugin?

Absolutely not.

I look terrible in orange, and any soap-related fantasies stay firmly in the voluntary, non-state-sponsored part of my life.

I’m not letting a clanker hallucinate my taxes. That is not automation. That is how you speedrun your way into a dedicated WBS podcast episode.

The trade I described earlier - give up determinism, get the ability to steer - only works when wrong outputs are cheap and detectable. Taxes are neither. Medical dosing is neither. Anything where the wrong answer ships silently and surfaces weeks later is neither. Plug an agent into that and you’re not automating - you’re gambling with good UX.

But for creating content. Managing communication. Maintaining a knowledge base. Running repeatable production workflows where mistakes are annoying, not legally radioactive. That’s where the math works.

This is not autopilot. It still needs me to understand what it is doing. To catch the dumb mistakes. To know when “close enough” is not close enough. To decide when the output is good, when the plan is sane, and when the agent is confidently marching toward a wall with a clipboard.

But the ratio changes.

Less clicking, less copy-pasting, less config babysitting, less “please upload the same file into seven different enterprise portals and pray.”

What’s left is mostly thinking, reviewing, and noticing when the agent is wrong.

I also - because I am who I am - replaced the HTML files with a proper server that hosts images on S3 and handles the UI. A developer without the ability to know when enough is enough. Some things don’t change.

(P.S.: OpenAI, if you’re reading this - please let the browser websites control the Codex session. I’m begging. Let the dashboard talk back to the brain. You’re so close.)

The future doesn’t feel like it’s coming. It feels like it showed up, sat down, and started filing its own tickets.