Harness, Lin Junyang, the trillion-dollar track, and Anthropic's palm

2026-03-30 09:41:03

The Infra of the Agent era: the opportunities and imagination are far bigger than “lobsters.”

In March 2026, the hottest word in the AI industry is not the name of any particular model, but an English word that sounds completely unrelated to AI: Harness.

Its original meaning is horse tack—everything from the bridle and reins to the saddle gear, the whole set of items that gets fitted onto a horse.

If you use it as a verb, its meaning should be “to harness,” i.e., “to harness” the thing.

You wouldn’t say “harness a calculator” (to “harness” a calculator), but you would say “harness the wind” (to harness the wind) and “harness a horse” (to harness a horse). Anyone using this word—whether intentionally or not—is implicitly admitting one thing: what he faces isn’t a passive tool, but a powerful, autonomous entity. He isn’t “using” it; he is “harnessing” it.

This word is becoming the core industrial concept of the AI Agent era.

Around it, an infrastructure layer on the scale of a trillion-dollar market is beginning to grow. And the one setting the rules for this infrastructure layer is also coming into view.

Harness is the New Infra

In the AI context, Harness has two evolutionary lines.

The first is the rhetorical layer. “Harness AI” as a general expression has been circulating in tech for a long time—its meaning is nothing more than “the ability to harness AI.”

The second, more important line is the technical layer. At the end of 2025, Anthropic began using “harness” to describe the infrastructure built around AI Agents—context management, tool calls, memory, guardrails, orchestration. The official definition of the Claude Agent SDK is “a general-purpose agent harness.”

In early 2026, HashiCorp co-founder Mitchell Hashimoto proposed “AI Harness” as a formal concept, and “Harness Engineering” quickly spread as a new engineering practice category.

But what truly makes this word worth taking seriously isn’t its popularity—it’s that it precisely describes a new relationship forming between humans and AI: symbiotic, asymmetric collaboration.

Humans provide intent, judgment, and direction. AI provides capabilities, speed, and scale.

Harness acknowledges both capability asymmetry and authority asymmetry, and the two asymmetries are reversed—AI’s capabilities may far exceed those of the harnessers, but the harnessers hold the ultimate decision rights on direction.

A horse runs much faster and is much stronger than a human, but where it goes—that’s up to the person.

Humans need to harness AI that’s stronger than themselves. That’s probably the most precise expression—whether intentionally or not—when Anthropic chose this word.

And the word Harness, indeed, is also a bit “Anthropic” (human), with a human-centered vibe.

Some say: Harness is the New Datasets.

The intuition in that line is sharp, but the conclusion is inaccurate. When foundation models converge, the quality of Harness really does become the key variable determining whether Agents are good or bad—just like data quality can determine the life or death of a foundation model.

But Datasets and Harness exist in fundamentally different ways: Datasets occupy a single position in the technical architecture—inputs during the training stage. Harness is not a specific layer; it’s a stack, a combination of layers.

Context engineering and memory are the storage layer, tool access is the network layer, orchestration is the container layer, guardrails are the security layer, evaluation is the observability layer, and skill packaging is the middleware. Each layer can spawn independent companies, standards, and business models. This is perfectly isomorphic to the stacked architecture of cloud-computing Infra.

In that sense, Harness is the New Infra: it isn’t infrastructure for pretraining models, but infrastructure for building Agents—giving Agents autonomy, while strictly following human instructions, ensuring safety, and complying with rules.

Harness itself isn’t new Datasets, but Harness’s healthy operation will generate good datasets for Agents and build a data flywheel. When a harness accumulates enough user behavior data and domain knowledge, it’s no longer just a “sidecar” system design—it starts to have data attributes: the more you use it, the better it gets; the more you use it, the harder it is to replace.

From this, you can derive an almost definition-level equation:

Foundation model + Harness = Agent.

Foundation models provide raw capabilities—reasoning, generation, understanding. But they’re static, passive, directionless. They can do everything, so they end up meaning nothing in particular. Harness provides structure, direction, and constraints—constraining infinite possibilities into finite, purposeful actions. The instant the two combine, AI turns from an object that gets asked to a subject that acts.

The same horse, outfitted with different tack, can pull a cart, carry people, work the fields, or race in competitions. The design of Harness determines the shape and purpose of the Agent.

Junyang Lin’s Pitch Deck

On March 26, 2026, Qwen’s former technical leader Junyang Lin posted a long article on X titled “From ‘Reasoning’ Thinking to ‘Agentic’ Thinking”. Within two days: 700,000 reads, 2,800 likes, 677 reposts.

Three weeks ago, on March 4, he had just left Alibaba. Three weeks later, he wrote a systematic industry judgment piece.

The article’s core argument is that AI is shifting from “thinking longer” to “thinking for action.”

Reasoning Thinking is essentially static monologue—inside a closed space, the model generates longer and longer reasoning chains, trying to make up for the lack of interaction with the environment using more text. Agentic Thinking is to continuously advance tasks while interacting with the environment. The training object undergoes three jumps: from training models, to training agents, to training systems.

This isn’t hand-waving. He uses Qwen’s own hard-won lessons to back it up: merging thinking and instruct modes is far harder than people imagine. The data distributions and optimization objectives of the two behaviors fundamentally pull against each other—instruct pursues conciseness, speed, and format compliance; thinking pursues spending more tokens exploring alternative paths. After Qwen3 tried to merge them, it split back into separate lines.

This lesson points to a deeper insight: Instruct is the replacement for Harness in the pre-Agent era.

Instruct “burns” behavioral norms into model weights via SFT and RLHF—essentially sewing the reins into the horse’s muscles. In the one-question-one-answer era, it’s sufficient. But in the Agent era, models need to run autonomously, call tools, and make continuous decisions—behavior space explodes, making it impossible to train all constraints into the weights. The focus of control must shift from inside the model to outside the model.

Instruct’s capability boundaries get punctured by the Agent paradigm—Harness is the inevitable evolution.

In his article, Junyang Lin mentions “harness” four times, with a very clear progression:

From “the external environment where the agent runs,” to “an independent engineering practice—harness engineering,” to “part of the training object—agent and the harness around it.”

His article proves from the training side that Harness isn’t only the infrastructure for Agent runtime—it’s also the infrastructure for training Agents.

In the closed loop of Agentic RL, the Agent runs inside the Harness; the environment produces feedback signals; feedback drives RL to update policy; and policy changes the Agent’s behavior. Remove Harness, and it’s not just that Agents get slower—the training fundamentally can’t run.

And he explicitly proposes that the biggest bottleneck of Agentic RL isn’t algorithms or model architecture, but environment quality and rollout infrastructure. The bottleneck constraining Agent evolution is in the Infra layer.

Thanks to Junyang, for adding the missing half of the argument “Harness is the New Infra” on my behalf.

Harness is an indispensable runtime infrastructure for Agents—that was an assertion from earlier. Junyang’s article tells us that Harness is also the Infra for Agent training. In the closed loop of Agentic RL, the environment generates feedback signals, feedback drives policy updates, policy changes Agent behavior, and Agent behavior in turn triggers new environmental feedback.

Only a systems layer that’s indispensable at both training and inference ends truly qualifies as infra—which is Harness.

In his article, Junyang Lin says something loaded with meaning: “Environment construction is shifting from a side project into a real startup category.”

“Environment construction” is not the same thing as Harness. It’s a subset of Harness, but an important one. “Environment” mainly corresponds to the tool access and evaluation feedback in the Harness architecture—specifically the world that Agents interact with during training: code execution sandboxes, browser simulators, sets of test cases, and an API simulation layer. Its core function is to produce feedback signals so that Agentic RL has something to optimize. It’s a bit like the containers, benchmarks, and Hugging Face for Agent training and setup.

The environment is the playground for Agent training, and Harness is the full suite of gear when an Agent runs. The playground is part of the gear, but not all of it.

However, when a former technical leader of an open-source model starts defining, separately, an entire startup category for one of Harness’s submodules—this itself is a signal. It means this stack has become complex enough and valuable enough that it’s starting to grow into layers with independent commercial entities, just like a real Infra stack.

And in a long, academic-sounding piece, he defines a startup race track. If you think this still isn’t a pitch deck from Junyang Lin’s startup, then don’t do VC.

A trillion-dollar startup track

If Junyang Lin really does build Agent training environment infrastructure—that direction he personally defined as “a real startup category”—which layer of the Harness cake is he facing? How big is that slice of cake?

Inside Harness there’s a complete multi-layer architecture that can be broken down into seven core modules: context engineering, memory systems, tool access, skill packaging, guardrails and permissions, evaluation and feedback, and orchestration and state management.

Besides the tool access layer (MCP), every layer has startups running.

The context and memory layers have Cognee (€7.5 million raised) and Interloom ($16.5 million seed, with Sequoia participating).

The tool access layer has been standardized by the MCP protocol—9700万 monthly SDK downloads, and Anthropic, OpenAI, Google, Microsoft, Amazon are all connected, with not many startups.

In the security access layer, Runlayer ($11 million, led by Khosla) emerged; for guardrails and compliance: Guardrails AI, Vigilant AI, Runtime, Alter.

Evaluation and observability is the hottest space. Arize AI raised $70 million Series C; its customers include Uber and PepsiCo. Langfuse became an open-source community standard.

The orchestration layer shows a “three-strong” landscape: LangGraph, CrewAI ($18 million funding, with 60% of Fortune 500 companies using it), and Microsoft Agent Framework—two of which are startups.

And the skills packaging layer’s startups more often present as agent products in vertical industry tracks, with the benchmark being Harvey—legal AI, $11 billion valuation, $1 billion cumulative funding, $190 million ARR—and Abridge, healthcare—AI, $5.3 billion valuation.

The training environment layer is in the earliest stage, with about 20 seed-stage companies; Wing VC predicts that by 2030 it will consolidate into 3–5 companies.

But not every module is a good track.

A core criterion for judging whether a track is good or bad is: does this module solve a “model capability problem” or a “systems design problem”?

The former gets swallowed by foundation models—context windows expand from 128K to 1M and then to even larger sizes. Today’s clever compression strategies may be useless tomorrow.

Modules in the systems design layer have persistent value—like tool access, which is an ecosystem position problem; security guardrails, which is a compliance problem; evaluation, which is an independence problem. These can’t be resolved just because models get stronger.

Their exit paths are also completely different. Tool access and skill packaging are too close to the model—model companies have strong incentives to absorb them. Anthropic does MCP and Skills; OpenAI does Plugins and GPTs—both are “swallowing” these two layers.

For startups in these directions, the ceiling is being acquired. Guardrails compliance and evaluation observability are exactly the opposite—they naturally require third-party independence. Banks won’t trust Anthropic’s own compliance audit tools, just like you wouldn’t let the audited party write its own audit report. Independence isn’t a business strategy; it’s the product value itself. The former makes good acquisition targets; the latter makes good IPO targets.

They all belong to Harness—the Agent’s Infra. So how big is the overall market for the Harness track?

Bottom-up, summing the valuation space of seven sub-tracks, by 2030 the total valuation of independently operating startups is about $500 billion to $800 billion. Skill packaging and vertical knowledge is the largest ($250 billion–$350 billion). Guardrails and compliance has the fastest growth (CAGR 65.8%: from $700 million in 2024 to a forecast $109.9 billion in 2034; the more autonomous Agents are, the more expensive the “reins,” and the earlier the value becomes obvious). The training environment is the earliest stage but has the highest certainty.

The overall AI Agent market is forecast to have $50–100 billion in revenue in 2030. Harness as the Infra layer accounts for 40–50%. Applying the 10–15x PS multiple typical for SaaS/Infra, the valuation opportunity aligns well.

A startup track of nearly a trillion dollars.

If we also count Harness revenue embedded inside model companies, the overall valuation opportunity for the Harness infrastructure layer is $2.5–3.8 trillion—roughly equivalent to the market capitalization total of today’s entire cloud computing Infra layer.

So back to Junyang Lin: if he really enters the training environment and RL infrastructure sub-track of Harness, he’s facing a market with only about 20 seed-stage companies today, but a valuation opportunity of $20–50 billion by 2030. Wing VC predicts that this track will ultimately consolidate into 3–5 top players.

With his identity as Qwen’s former technical leader, if he launches in Silicon Valley, seed round valuations might be between $200 million and $500 million. What the market gives isn’t company valuation—it’s pricing for the person. Junyang Lin already doesn’t need to write a BP; that tweet is enough. And if he takes a dollar fund in China, valuations might start from $50 million—$100 million isn’t impossible. RMB? That’s another story.

Anthropic’s palm

Now we need to answer a truly important question: for this trillion-dollar infrastructure layer of Harness, who is defining the rules?

Let’s look at the brutal facts:

MCP is the standard protocol pushed by Anthropic. Claude Code is an Anthropic-made harness product with $2.5 billion in annualized revenue. The Agent SDK is Anthropic’s developer entry point. The Skills system was designed by Anthropic. And even the popularity of the word “harness” in the AI Agent context—its biggest driver is Anthropic.

The deeper reason is the business model.

OpenAI’s core narrative is “the strongest model.” Revenue mainly comes from ChatGPT subscriptions. Anthropic, on the other hand, doesn’t do multimodal or world models anymore, but it’s increasingly regarded as the strongest model. Claude’s selling point isn’t being first on benchmarks—it’s “the model best suited for agent workflows,” more reliable, more controllable, and better for long-running autonomous execution.

This positioning means Anthropic’s competitiveness comes not only from the model itself, but from the quality of the Harness around it. The improvement of each Harness layer widens its moat. The prosperity of the Harness ecosystem directly equals Anthropic’s business interests.

This also explains why OpenAI started trying to build an ecosystem as early as 2023—Plugins, GPTs, the GPT Store—yet none of it really took off, while Anthropic’s MCP was only released at the end of 2024, a year and a half late, but became the de facto standard.

The fundamental reason is: OpenAI builds an application ecosystem; Anthropic builds an infrastructure ecosystem.

OpenAI’s GPT Store is like the logic of the App Store—I have the biggest user base, so you can open a shop here. But when the model itself can do everything, applications don’t really have a reason to exist. GPTs don’t have differentiation walls, because the underlying capabilities and ChatGPT itself are basically the same thing.

And Anthropic’s MCP isn’t an app store—it’s a protocol store. It doesn’t invite developers to open shops on Claude; instead, it defines a set of connection standards so that all tools and all models can be used. This is the logic of HTTP, not the logic of the App Store.

The more open the protocol, the stronger the control in the ecosystem. Now everyone uses MCP, and since MCP was designed by Anthropic, it doesn’t need to lock in users; it locks in developers’ minds and toolchains.

At the capital level. Anthropic and its early investor Menlo Ventures founded the $100 million Anthology Fund, investing in more than 30 harness-oriented startups in a year. The structure is smart: Menlo provides the money; Anthropic doesn’t participate in the fund’s economic benefits, but gives each portfolio company $25,000 in model credits and opens up its Chief Product Officer Mike Krieger and President Daniela Amodei to participate in demo day.

Anthropic puts in not a dollar, locks in more than 30 startups into the Claude ecosystem, and at the same time captures the most cutting-edge demand signals. This is a no-cost option.

But have we thought about why, in the Agentic AI era, Anthropic’s protocol ecosystem is more important than OpenAI’s application ecosystem?

Because Agents are not “applications” in the traditional sense. The interaction interface of a traditional app is fixed and limited—users call a car, the app follows a predefined process to call an API, match drivers, and calculate routes. Agents are different: they decide what tools to call, in what order, and when. The interaction interface is infinite and dynamic. And Agents also need to collaborate with each other—an orchestrator schedules specialized Agents, and specialized Agents schedule sub-Agents. That’s a distributed systems coordination problem.

When the interaction interface is fixed, you can integrate one by one; when it’s infinite, you can only define standards.

TCP/IP lets any two computers communicate, HTTP lets any client access any server, and MCP lets any Agent call any tool. The basic unit of an application ecosystem is “product,” while the basic unit of a protocol ecosystem is “connection.” In the Agentic era, the number and quality of connections determine everything.

All Harness entrepreneurs are doing backflips. If Junyang Lin really builds training environment infrastructure—his “real startup category”—his product will most likely end up having to integrate with the Claude ecosystem; or, in China, build a parallel ecosystem. Because Anthropic defined the protocol, built the SDK, laid out an ecosystem fund, and captured developers’ mindshare.

Maybe only Chinese Agent entrepreneurs have the chance to break out of Anthropic’s palm—that’s a matter of inevitability and force majeure.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.