The case for boring AI

The AI benchmark race has a winner. It just isnโ€™t you.

Every few months, a new model drops and a new leaderboard reshuffles. Labs compete to out-reason, out-code, and out-answer each other on tests designed to measure machine intelligence. The coverage follows. So does the funding.

What gets less attention is whether any of this is inevitable. The benchmarks, the arms race, the framing of AI as either salvation or catastrophe โ€” these are choices, not laws of physics. They reflect what the industry decided to optimize for, and what it decided to fund. Technology that will take decades to pan out in ordinary, useful ways doesnโ€™t raise billions this quarter. Extreme narratives do.

Some researchers think the goal is simply wrong. Not that AI isnโ€™t important, but that important doesnโ€™t have to mean unprecedented. The printing press changed the world. So did electricity. Both did it gradually, through messy adoption, giving societies time to respond. If AI follows that pattern, the right questions arenโ€™t about superintelligence. Theyโ€™re about who benefits, who gets harmed, and whether the tools weโ€™re building actually work for the people using them.

Plenty of researchers have been asking those questions from very different directions. Here are three of them.

Useful, not general

Ruchir Puri has been building AI at IBM $IBM -0.68% since before most people had heard of machine learning. He watched Watson beat the worldโ€™s best Jeopardy players in 2011. Heโ€™s watched several cycles of hype crest and recede since. When the current wave arrived, he had a simple test for it: is it useful?

Not impressive. Not general. Useful.

โ€œI donโ€™t really care about artificial general intelligence,โ€ he says. โ€œI care about the useful part of it.โ€

That framing puts him at odds with much of the industryโ€™s self-image. The labs racing toward AGI are optimizing for breadth, building systems that can do anything, answer anything, reason about anything. Puri thinks thatโ€™s the wrong target, and he has a benchmark heโ€™d like to see the industry actually try to reach.

The human brain lives in 1,200 cubic centimeters, consumes 20 watts, the energy of a light bulb, and, as Puri points out, runs on sandwiches. A single Nvidia $NVDA +0.26% GPU consumes 1,200 watts, 60 times more than the entire brain, and you need thousands of them in a giant data center to do anything meaningful. If the brain is the benchmark, the industry isnโ€™t close to efficient. Itโ€™s going in the wrong direction.

His alternative is what he calls hybrid architecture: small, medium, and large models working together, each assigned to the task it handles best. A large frontier model does the complex reasoning and planning. Smaller, purpose-built models handle execution. A task as simple as drafting an email doesnโ€™t need a system trained on half the internet. It needs something fast, cheap, and focused. Every nine months or so, Puri notes, the small model of the previous generation becomes roughly equivalent to what was considered large. Intelligence is getting cheaper. The question is whether anyone is building for that reality.

The approach has real-world backing. Airbnb $ABNB -1.45% uses smaller models to resolve a significant portion of customer service issues faster than its human representatives can. Meta $META +0.35% doesnโ€™t use its biggest models to deliver ads so it distills that knowledge into smaller ones built for that task alone. The pattern is consistent enough that researchers have started calling it a knowledge assembly line: data flows in, specialized models handle discrete steps, something useful comes out the other end.

IBM has been building that assembly line longer than most. A hybrid agent combining models from several companies has shown a 45% productivity improvement across a large engineering workforce. Systems running on smaller, purpose-built models now help the engineers who keep 84% of the worldโ€™s financial transactions processing get the right information at the right time. These arenโ€™t flashy applications. Theyโ€™re also not failing.

None of them require a system that can write poetry or solve your kidโ€™s math homework. They require something narrower and, for that reason, more trustworthy. A model trained to do one thing well knows when a question falls outside its scope. It says so. That calibrated uncertainty, knowing what you donโ€™t know, is something the big frontier models still struggle with.

โ€œI want to build agents and systems for those processes,โ€ Puri says. โ€œNot something that answers two million things.โ€

Tools, not agents

Ben Shneiderman has a simple test for whether an AI system is well designed. Does the person using it feel like they did something, or does it feel like something was done for them?

The distinction matters more than it sounds. Shneiderman, a computer scientist at the University of Maryland who helped lay the foundations for modern interface design, has spent decades arguing that the goal of technology should be to amplify human ability, not replace it. Good tools build what he calls user self-efficacy, or the confidence that comes from knowing you can do something yourself. Bad ones quietly transfer that agency somewhere else.

He thinks most of the AI industry is building bad tools, and he thinks the agentic turn makes it worse. The pitch for AI agents is that they act on your behalf, handling tasks end to end without your involvement. To Shneiderman, thatโ€™s not a feature. Itโ€™s the problem. When something goes wrong, and it will, who is responsible? When something goes right, who learned anything?

The trap heโ€™s been fighting against for a long time has a name. Anthropomorphism, the impulse to make technology seem human, is what keeps winning, and what keeps failing. In the 1970s, banks experimented with ATMs that greeted customers with โ€œHow can I help you?โ€ and gave themselves names like Tilly the Teller and Harvey the World Banker. They were replaced by machines that showed you three options. Balance, cash, deposit. Utilization shot up. Citibank had 50% higher usage than its competitors. People didnโ€™t want a synthetic relationship. They wanted to get their money.

The same pattern has repeated across decades, through Microsoft $MSFT -0.16% Bob, the AI pin from Humane, and waves of humanoid robots. Each time, the anthropomorphic version fails and gets replaced by something more tool-like. Shneiderman calls it a zombie idea. It doesnโ€™t die, it just keeps coming back.

Whatโ€™s different now is scale and sophistication. The current generation of AI is genuinely impressive, he acknowledges, startlingly so. But impressive and useful arenโ€™t the same thing, and systems designed to seem human, to say I, to simulate relationship, are optimizing for the wrong quality. The question he wants designers to ask is simpler: does this give people more power, or less?

โ€œThere is no I in AI,โ€ he says. โ€œOr at least, there shouldnโ€™t be.โ€

People, not benchmarks

Karen Panetta has a simple answer for why AI development looks the way it does. Follow the money.

Panetta, a professor of electrical and computer engineering at Tufts University and an IEEE fellow, studies AI ethics and has a clear view of where the technology should be going. Assistive pets for Alzheimerโ€™s patients, adaptive learning tools for children with different cognitive styles, smart home monitoring for elderly people aging in place. The technology to do this well, she says, largely exists. The investment doesnโ€™t.

โ€œThe humans donโ€™t care about benchmarks,โ€ she says. โ€œThey care about, does it work when I buy it, and is it going to really make my life easier?โ€

The problem is that the people who would benefit most from well-designed assistive AI are also the least compelling pitch to a venture capitalist. A system that transforms manufacturing processes, reduces workplace injuries, and cuts healthcare costs for a companyโ€™s employees has an obvious return. A robotic companion that keeps an Alzheimerโ€™s patient calm and connected requires a different kind of math entirely. So the money goes where the money goes, and the populations with the most to gain keep waiting.

Whatโ€™s changed, Panetta says, is that the expensive engineering problems are finally being solved at scale. Sensors are cheaper. Batteries are lighter. Wireless protocols are ubiquitous. The same investment that built industrial robots for factory floors has quietly made consumer robotics viable in a way it wasnโ€™t five years ago. The path from warehouse to living room is shorter than it looks.

But she has a concern that the excitement around that transition tends to skip over. Physical robots have natural constraints. You know the force limits. You know the kinematics. You can anticipate, simulate, and design around how theyโ€™ll fail. Generative AI doesnโ€™t come with those guarantees. Itโ€™s non-deterministic. It hallucinates. Nobody has fully mapped what happens when you put it inside a system that is physically present in the home of someone with dementia, or a child who canโ€™t identify when something has gone wrong.

Sheโ€™s seen what happens when a sensor gets dirty and a robot loses its spatial awareness. Sheโ€™s thought about what it means to build something that learns intimate details about a personโ€™s life, their routines, their cognitive state, their moments of confusion, and then acts on that information autonomously. The fail-safes, she says, havenโ€™t kept up.

โ€œIโ€™m not worried about the robot,โ€ she says. โ€œIโ€™m worried about the AI.โ€

๐Ÿ“ฌ Sign up for the Daily Brief

Our free, fast and fun briefing on the global economy, delivered every weekday morning.

Sign me up

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments