A deep dive perspective on GenAI limitations

Why it might be smart to be skeptical

Posted on 22/02/2025 by andreea

My morning began with an email from my supermarket announcing their "exciting new AI-powered feature" meant to give me a better shopping experience.

Was it a solution to their overenthusiastic packing that sometimes leads to bags containing single items? No. Was it an improvement to their inventory management system to prevent multi-day stock outages or throwing away food? Also no.

Instead, they proudly unveiled a "smart shopping list" that suggests items based on my purchase history—essentially replicating the functionality of... lists. Their careers page now has openings for prompt engineers and AI interns, foreshadowing what I can only assume will be more LLM-based features.

AI will save us, say people who sell AI

Almost every industry publication seems determined to convince us that Generative AI (GenAI) will transform businesses overnight.

The pattern is familiar to anyone who's worked in Silicon Valley long enough: most positive coverage comes from companies with AI solutions to sell, amplified by tech journalists who rarely apply critical analysis. It's the classic technology hype cycle in full swing, but with unprecedented capital behind it.

But the more we read, the more we realize that there are some serious limitations to these tools that are not often talked about.

As people who've experimented extensively with these tools and made a career out of being excited about technology, we can confirm they're impressive in certain contexts. Their language capabilities are excellent, and they can be a welcome alternative to often dysfunctional search engine results.

But the gap between impressive demos and sustainable business value still remains huge.

When we see companies betting a big part of their business on these tools, making hiring decisions based on them, or outsourcing critical thinking to them, we feel compelled to offer a more nuanced perspective.

Here's our assessment for why it might be smart to be skeptical about GenAI adoption in the face of uncritical enthusiasm from everyone else.

The case of missing credible case studies

Let’s try a quick experiment: go and search for case studies where AI helped, e.g., small businesses.

We’ll wait.

What you'll find is a wasteland of fake content about how AI will "enhance operational efficiency" for businesses but conspicuously lacks names or faces of businesses and people involved, and any quantifiable results. It’s always a convenient "local bakery" somehow revolutionized by AI — why would a bakery need AI? - rather than verifiable success stories.

We have yet to meet a real business owner documenting tangible revenue impacts from GenAI implementations. If you're that person, please reach out—we'd genuinely love to hear your story and understand what made your implementation successful.

The inevitable cost of bait-and-switch

Remember when Netflix was cheap and you could share the account with friends and family? The cost of these services will follow a similar trajectory.

We're used to think about systems holistically, including their economic sustainability. The current GenAI economics don't add up.

Big tech companies have invested over $200 billion in GenAI infrastructure with questionable return profiles. The rumors about rapidly declining compute costs remain largely aspirational. The most capable reasoning models, like OpenAI's o3, can consume more than $1,000 in compute resources for a single complex query—multiplied across the parallel computations required to generate each response.

They will lose money even if you pay a monthly fee of a few hundred dollars. But at some point, they will have to recoup these costs, and that’s where you will come in.

This creates a predictable trajectory: these services enter the market as free or low-cost offerings to get you hooked, embedding themselves in your workflows and development stack. Once deeply integrated enough, price increases become inevitable—exactly as we've seen with Google and Microsoft's office suites, which now cost more while offering marginally useful AI features.

This is a classic vendor lock-in pattern. Once these dependencies are well established in your stack, it'll be almost impossible to remove them, forcing you to accept whatever pricing model they will come up with.

It's unclear which providers will be around after the market consolidates

Both major LLM providers—OpenAI and Anthropic-are reportedly losing up to $14 billion annually. Meanwhile, open-source models are rapidly improving, creating massive market uncertainty.

For a technology that's been commercially available for several years, the lack of a killer application at scale is concerning. More worrying is the absence of coherent business strategies from these companies' CEOs. The current plan is to spend as much money as possible and and hope the value proposition emerges later. The vision is achieving an ill-defined, ever-changing artificial general intelligence that no one can agree on. How will we know when we get there? No one knows. That's why the money needs to keep flowing.

When designing critical systems, we've learned to be wary of dependencies on unstable providers. While swapping out models is easy in theory, simply having them in the stack adds complexity, on top of having to monitor their running costs versus benefits.

GenAI tools foster a critical thinking deficit

As engineering and product leaders who value building strong teams, we always used to think, how can we attract and keep good people? You likely have the same concerns on your mind. Given what we've seen, we're particularly concerned about GenAI's impact on cognitive skills development.

If you're considering giving less experienced team members GenAI tools to boost productivity, you might be setting them up for failure if not used correctly. Consider Microsoft's own research showing increased confidence in GenAI correlates with decreased critical thinking. People who uncritically accept AI outputs risk producing lower quality work, and unfortunately the tendency is to use these tools to do the work rather than use them as proverbial bicycles for the mind.

Ironically, the people who stand to benefit most are those with extensive experience—precisely the people who can spot and correct subtle but critical flaws in GenAI outputs. But even they report being frustrated and demoralized as their roles shift from applying their expertise to endlessly correcting AI-generated content.

You might make staff more productive in the short term, but at what cost?

Results are not 100% reliable and maybe never will be

From a reliability standpoint, even the newest research models wouldn't cut it in mission-critical systems.

Previous high-profile product demos, including OpenAI's video generation capabilities, have been shown to be carefully staged, and not representative of actual capabilities. The newer Deep Research reports suffer from fundamental information quality issues, indiscriminately incorporating internet content without effective source evaluation—not because of technical limitations but because of the fundamental architecture of these systems. These tools are not reasoning machines.

Consider Air Canada, who learned the hard way that by delegating customer service to a GenAI chatbot, they are legally responsible for its misleading outputs. Or look at other industries, where attorneys or being barred from their profession for submitting AI-generated legal filings containing fictitious case citations.

So maybe don't use these tools if your business requires you to be 100% accurate; they are an unacceptable risk multiplier.

While better prompt engineering can mitigate some issues, the inherent stochastic nature of LLMs makes them fundamentally unsuitable for many business-critical applications. Also, the more you prompt them, the more you're essentially doing the work yourself. Great products shouldn't make you feel inadequate about how you use them.

The environment takes a huge hit with GenAI use

As engineers who consider the full system impact of our designs, we can't ignore the environmental footprint of GenAI infrastructure.

Big tech companies systematically underreport the water and electricity consumption of AI data centers, with a majority of figures severely underreported and the industry lacking transparency in absence of regulation.

This is not an 'over there' problem away from our homes and neighborhoods because data centers are everywhere and more are coming up each day. The UK has documented concerns about data centers getting cooling water over residential supply, while the US power grid experiences increasing instability from these concentrated loads.

Responsible system design requires accounting for all externalities, not just immediate business benefits.

Where it can be useful

Despite these concerns, there are several contexts where GenAI tools ocould be useful, which we'll expand on in a future post:

GenAI is fantastic at text generation. That's its bread and butter. For improving email communications, customer service responses, or transforming voice notes into structured content, these tools can be a game-changer. Having said that, it's unclear how many people are eager to talk to a chatbot when all they want to do is get on the phone with a human. And if we plan on creating agents that talk to each other, might not it be better to rethink whether we need to have those conversations at all?

When supervised by domain experts, GenAI can function as a useful force multiplier. As Ben Evans aptly described, it's like having "a thousand interns"—helpful when properly managed and reviewed, if that’s something you’re comfortable spending time managing or supervising.

Systems that tolerate or even benefit from probabilistic outputs can make good use of GenAI's capabilities without suffering from its limitations. There are people out there who might enjoy having AI create a meal plan for them instead of seeing a dietician, but where do you complain when it suggests you eat a whole raw onion for breakfast?

But for most other purposes, perhaps good old-fashioned machine learning models with better-defined parameters and more predictable behavior might be a better fit.

Parting thoughts: engineering in the age of hype

As lifelong technology enthusiasts who've built careers on technical innovation, we find ourselves in an unusual position of caution. This current wave of AI innovation has too much hype and not enough scrutiny.

Our professional recommendation: explore these tools, understand their capabilities, but resist pressure to integrate them into critical systems before their limitations and long-term costs are fully understood.

Think about it this way: if these tools disappeared tomorrow, would your core business operations be materially impacted? Would your team's effectiveness significantly decline? If the answer is no, then perhaps they aren't as essential as the messaging suggests.

If these models will ever be 100% right, then we will see a different kind of world emerge. That future hasn't arrived yet, and may never do—despite what AI vendors' marketing departments might claim.