too inconsistent

An earlier version of this post appeared and has since been modified to be an explicit stance on our studio’s refusal to adopt such tools in our daily practice.

My supermarket recently announced their “exciting new AI-powered feature” meant to give me a better shopping experience.

Was it a solution to their packing algorithm that sometimes leads to bags with single items? No. Was it an improvement to their inventory management system to prevent multi-day stock outages or throwing away food? Also no.

Instead, they proudly announced a “smart shopping list” that suggests items based on my purchase history—essentially replicating the existing functionality of… lists. Their careers page also started featuring openings for prompt engineers and interns, foreshadowing what I can only assume will be more generative AI based features.

Our position and experience with these tools

We don’t use generative AI (GenAI) in our development work.

Our CTO spent years in ML, and our team has collectively spent hours experimenting with these tools in past jobs, as well as reviewing coworkers’ outputs generated by these tools and cleaning their outputs.

We know what they can do - and more importantly, what they can’t. We can confirm they are fine achievements and impressive in certain contexts. Their language capabilities are bar none - scarily so at times. We’re sure they fit some teams. But for us, building software meant to last years rather than quarters, they’re fundamentally unsuitable. The path from tool to chat to for homework and emails to sustainable business value still remains huge.

Here’s why we’ve reached that conclusion.

1. The economics are broken (and you’ll pay for it)

The compute costs are huge and unsustainable. Big tech companies have invested over $200 billion in GenAI infrastructure with questionable return profiles. The rumors about rapidly declining compute costs remain largely aspirational. The most capable reasoning models at time of writing, like OpenAI’s o3, can consume more than $1,000 in compute resources for a single complex query—multiplied across the parallel computations required to generate each response.

Both major LLM providers—OpenAI and Anthropic-are reportedly losing up to $14 billion annually, with no clear path to profitability. These services will lose money even with monthly subscription fees costing a few hundred dollars. But at some point, they will have to recoup these costs, and that’s where people and businesses hooked on them come in.

The industry is following a classic bait-and-switch pattern. Offer it for free, then make it low cost enough to get you hooked, embedding themselves in your workflows and development stack. Once deeply integrated enough, price hikes become inevitable. We’ve seen this with Google and Microsoft’s office suites, which now cost more while offering marginally useful AI features. Once these dependencies are in your stack, it’ll be almost impossible to remove them without time, effort or upsetting some users, forcing you to accept whatever pricing model they will come up with.

These companies lack convincing business strategies. The current plan is to spend as much money as possible on innovations and and hope the value proposition emerges later. This requires regular cash injections, and, to get them, they must persuade more and more people that GenAI is an inevitable future. Companies with AI solutions to sell push out only positive coverage, amplified by publications that pass on press releases as news with little to no critical analysis. It’s the classic technology hype cycle in full swing, but with unprecedented capital behind it. Some analysts even caution that AI is doing a lot of heavy lifting to prop up the US economy at large.

When the market shakeout comes (and it will come), infrastructure choices will matter. New and open-source models are rapidly appearing and improving, creating market uncertainty. We’ve learned to be wary of dependencies on unstable or immature providers, especially when their future is this unclear.

2. The quality problem isn’t fixable

Model training data provenance is questionable and intentionally opaque. It’s not just that companies obtained training data illegally, although that would be a good enough objection on its own. With large models capable of ingesting a massive corpus of data, it’s hard to know whether real or synthetic data went into their training, which makes them even less useful.

The outputs are probabilistic, not deterministic. If your business requires 100% accuracy, these tools are an unacceptable risk multiplier. And although better prompt engineering can mitigate some issues, the inherent stochastic nature of LLMs makes them fundamentally unsuitable for many business-critical applications.

High profile real-world failures keep adding up. There’s Air Canada, who learned the hard way that by delegating customer service to a GenAI chatbot, they are legally responsible for its misleading outputs. In other industries, attorneys have been barred from their profession for submitting AI-generated legal filings containing fictitious case citations.

Previous high-profile product demos, including OpenAI’s video generation capabilities, have been demonstrated to be carefully staged, and not representative of actual capabilities. Newer Deep Research reports suffer from fundamental information quality issues, indiscriminately incorporating internet content without effective source evaluation.

Enthusiasts will say, “just prompt it better,” but this is a fallacy. It’s what leads some companies to hire prompt engineers to figure out what incantation might make the output moderately better, not even foolproof. But the more you prompt them, the more you’re essentially doing the work yourself.

These are not just minor limitations to work around; they are fundamental flaws with the ways these systems are built. These tools are not reasoning machines.

3. The hidden cost to the team

As engineering and product leaders who value building strong teams, we always used to think, how can we attract and keep good people? Given what we’ve seen, we’re particularly concerned about GenAI’s impact on cognitive skills development.

For junior developers, giving them GenAI tools to boost productivity might be setting them up for failure if not used correctly. Microsoft’s own research shows that increased confidence in GenAI correlates with decreased critical thinking. People who uncritically accept AI outputs risk producing lower quality work, and unfortunately the tendency is to use these tools to do the work and put undue amount of pressure on reviewers to spot mistakes.

Senior developers may, in theory, stand to benefit more. Their extensive experience allows them to spot and correct subtle but critical flaws in GenAI outputs. But even they report being frustrated and demoralized as their roles shift from applying their expertise to endlessly correcting AI-generated content, as well as starting to realize that in many cases, they might as well write the code themselves instead of go down a prompt spiral.

Both are prone to losing time on prompt engineering rabbit holes instead of solving actual problems. And this opportunity cost is real. Every hour you spend wrestling with AI-generated code, every meeting discussing prompt engineering strategies, every debugging session tracking down subtle errors is time not spent understanding the problem to solve, talking to users, building the right solution or gaining technical depth in a topic by reading documentation and tinkering.

4. Environmental externalities matter

As engineers who consider the full system impact of our designs, we can’t ignore the environmental footprint of GenAI infrastructure.

Data centers to train and serve models have huge water and electricity consumption, systematically underreported due to lack of regulation (although that’s changing in Europe). While data center owners publish papers about energy efficiency, these are hard to peer review or replicate and verify. Google quitely removed its net-zero pledge from its website as rising AI energy demand threatens its 2030 climate goal.

These problems are not just ‘over there’, far from our homes and neighborhoods. Data centers compete with residential supply in the UK and USA as well. The UK has documented concerns about data centers getting cooling water over residential supply, while the US power grid experiences increasing instability from these concentrated loads. As of today, hardly anyone expects to be able to do more with less compute. Instead, the race is to build more data centers and nuclear facilities to power these systems.

If you care about total system impact, this matters. Responsible system design requires accounting for all externalities, not just immediate business benefits. It’s incongruent to talk carbon credits on one side of the business and hand wave heavy AI usage in a different part.

Our position: we don’t use these tools in our day-to-day

We don’t use GenAI tools in our development or creative work.

Some teams find these tools useful. For us they don’t fit.

Our work is in building custom software for mission-driven organizations and people who need long-term solutions, and these tools aren’t a good match. We take pride in building things that can work consistently and reliably. And when you’re building systems meant to last, vendor lock-in to unstable providers with unclear economics is not acceptable risk or expense down the line.

As lifelong technology enthusiasts, we find ourselves unusually against this current trend surrounded by too much hype and not enough scrutiny. We’re firmly anti technical debt and ballooning cloud bills disguised as progress.

If you’re a business considering GenAI adoption because everyone else seems to be doing it, first understand for yourself what these tools do, as opposed to what their marketing claims. It’s better to resist pressure to integrate them into critical systems before their limitations and long-term costs are fully understood. Ask yourself: What specific problem does it solve for you? What’s the cost if it fails? Who benefits if you become dependent on it? For mission-critical systems, we find the answer is usually “wait and see”.

If the paradigm shifts and these tools become genuinely reliable, we’ll revisit our position. Until then, we’re happily building without them.

Why we don't use Gen AI tools in our practice