The case for boring AI

2026-04-08 02:31:38

The AI benchmark race has a winner. It just isn’t you.

Every few months, a new model drops and a new leaderboard reshuffles. Labs compete to out-reason, out-code, and out-answer each other on tests designed to measure machine intelligence. The coverage follows. So does the funding.

What gets less attention is whether any of this is inevitable. The benchmarks, the arms race, the framing of AI as either salvation or catastrophe — these are choices, not laws of physics. They reflect what the industry decided to optimize for, and what it decided to fund. Technology that will take decades to pan out in ordinary, useful ways doesn’t raise billions this quarter. Extreme narratives do.

Some researchers think the goal is simply wrong. Not that AI isn’t important, but that important doesn’t have to mean unprecedented. The printing press changed the world. So did electricity. Both did it gradually, through messy adoption, giving societies time to respond. If AI follows that pattern, the right questions aren’t about superintelligence. They’re about who benefits, who gets harmed, and whether the tools we’re building actually work for the people using them.

Plenty of researchers have been asking those questions from very different directions. Here are three of them.

Useful, not general

Ruchir Puri has been building AI at IBM $IBM -0.68% since before most people had heard of machine learning. He watched Watson beat the world’s best Jeopardy players in 2011. He’s watched several cycles of hype crest and recede since. When the current wave arrived, he had a simple test for it: is it useful?

Not impressive. Not general. Useful.

“I don’t really care about artificial general intelligence,” he says. “I care about the useful part of it.”

That framing puts him at odds with much of the industry’s self-image. The labs racing toward AGI are optimizing for breadth, building systems that can do anything, answer anything, reason about anything. Puri thinks that’s the wrong target, and he has a benchmark he’d like to see the industry actually try to reach.

The human brain lives in 1,200 cubic centimeters, consumes 20 watts, the energy of a light bulb, and, as Puri points out, runs on sandwiches. A single Nvidia $NVDA +0.26% GPU consumes 1,200 watts, 60 times more than the entire brain, and you need thousands of them in a giant data center to do anything meaningful. If the brain is the benchmark, the industry isn’t close to efficient. It’s going in the wrong direction.

His alternative is what he calls hybrid architecture: small, medium, and large models working together, each assigned to the task it handles best. A large frontier model does the complex reasoning and planning. Smaller, purpose-built models handle execution. A task as simple as drafting an email doesn’t need a system trained on half the internet. It needs something fast, cheap, and focused. Every nine months or so, Puri notes, the small model of the previous generation becomes roughly equivalent to what was considered large. Intelligence is getting cheaper. The question is whether anyone is building for that reality.

The approach has real-world backing. Airbnb $ABNB -1.45% uses smaller models to resolve a significant portion of customer service issues faster than its human representatives can. Meta $META +0.35% doesn’t use its biggest models to deliver ads so it distills that knowledge into smaller ones built for that task alone. The pattern is consistent enough that researchers have started calling it a knowledge assembly line: data flows in, specialized models handle discrete steps, something useful comes out the other end.

IBM has been building that assembly line longer than most. A hybrid agent combining models from several companies has shown a 45% productivity improvement across a large engineering workforce. Systems running on smaller, purpose-built models now help the engineers who keep 84% of the world’s financial transactions processing get the right information at the right time. These aren’t flashy applications. They’re also not failing.

None of them require a system that can write poetry or solve your kid’s math homework. They require something narrower and, for that reason, more trustworthy. A model trained to do one thing well knows when a question falls outside its scope. It says so. That calibrated uncertainty, knowing what you don’t know, is something the big frontier models still struggle with.

“I want to build agents and systems for those processes,” Puri says. “Not something that answers two million things.”

Tools, not agents

Ben Shneiderman has a simple test for whether an AI system is well designed. Does the person using it feel like they did something, or does it feel like something was done for them?

The distinction matters more than it sounds. Shneiderman, a computer scientist at the University of Maryland who helped lay the foundations for modern interface design, has spent decades arguing that the goal of technology should be to amplify human ability, not replace it. Good tools build what he calls user self-efficacy, or the confidence that comes from knowing you can do something yourself. Bad ones quietly transfer that agency somewhere else.

He thinks most of the AI industry is building bad tools, and he thinks the agentic turn makes it worse. The pitch for AI agents is that they act on your behalf, handling tasks end to end without your involvement. To Shneiderman, that’s not a feature. It’s the problem. When something goes wrong, and it will, who is responsible? When something goes right, who learned anything?

The trap he’s been fighting against for a long time has a name. Anthropomorphism, the impulse to make technology seem human, is what keeps winning, and what keeps failing. In the 1970s, banks experimented with ATMs that greeted customers with “How can I help you?” and gave themselves names like Tilly the Teller and Harvey the World Banker. They were replaced by machines that showed you three options. Balance, cash, deposit. Utilization shot up. Citibank had 50% higher usage than its competitors. People didn’t want a synthetic relationship. They wanted to get their money.

The same pattern has repeated across decades, through Microsoft $MSFT -0.16% Bob, the AI pin from Humane, and waves of humanoid robots. Each time, the anthropomorphic version fails and gets replaced by something more tool-like. Shneiderman calls it a zombie idea. It doesn’t die, it just keeps coming back.

What’s different now is scale and sophistication. The current generation of AI is genuinely impressive, he acknowledges, startlingly so. But impressive and useful aren’t the same thing, and systems designed to seem human, to say I, to simulate relationship, are optimizing for the wrong quality. The question he wants designers to ask is simpler: does this give people more power, or less?

“There is no I in AI,” he says. “Or at least, there shouldn’t be.”

People, not benchmarks

Karen Panetta has a simple answer for why AI development looks the way it does. Follow the money.

Panetta, a professor of electrical and computer engineering at Tufts University and an IEEE fellow, studies AI ethics and has a clear view of where the technology should be going. Assistive pets for Alzheimer’s patients, adaptive learning tools for children with different cognitive styles, smart home monitoring for elderly people aging in place. The technology to do this well, she says, largely exists. The investment doesn’t.

“The humans don’t care about benchmarks,” she says. “They care about, does it work when I buy it, and is it going to really make my life easier?”

The problem is that the people who would benefit most from well-designed assistive AI are also the least compelling pitch to a venture capitalist. A system that transforms manufacturing processes, reduces workplace injuries, and cuts healthcare costs for a company’s employees has an obvious return. A robotic companion that keeps an Alzheimer’s patient calm and connected requires a different kind of math entirely. So the money goes where the money goes, and the populations with the most to gain keep waiting.

What’s changed, Panetta says, is that the expensive engineering problems are finally being solved at scale. Sensors are cheaper. Batteries are lighter. Wireless protocols are ubiquitous. The same investment that built industrial robots for factory floors has quietly made consumer robotics viable in a way it wasn’t five years ago. The path from warehouse to living room is shorter than it looks.

But she has a concern that the excitement around that transition tends to skip over. Physical robots have natural constraints. You know the force limits. You know the kinematics. You can anticipate, simulate, and design around how they’ll fail. Generative AI doesn’t come with those guarantees. It’s non-deterministic. It hallucinates. Nobody has fully mapped what happens when you put it inside a system that is physically present in the home of someone with dementia, or a child who can’t identify when something has gone wrong.

She’s seen what happens when a sensor gets dirty and a robot loses its spatial awareness. She’s thought about what it means to build something that learns intimate details about a person’s life, their routines, their cognitive state, their moments of confusion, and then acts on that information autonomously. The fail-safes, she says, haven’t kept up.

“I’m not worried about the robot,” she says. “I’m worried about the AI.”

📬 Sign up for the Daily Brief

Our free, fast and fun briefing on the global economy, delivered every weekday morning.

Sign me up

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.