The Agentic Web: How People (and AI Agents) Will Use Websites Next
The question that keeps coming up on our discovery calls is no longer “how do I rank in ChatGPT?”
That ground is broken and there is a playbook for it.
The new question is what comes after chat.
Voice? Agents talking to agents? Glasses? Autonomous web browsers? All of it at once?
There is a real answer, and it is not any of the things in the question.
The answer the industry has settled on is the agentic web: a web where AI agents do the browsing, the deciding, and increasingly the buying on behalf of people, alongside the people doing those things directly.
The visitors to your website are still humans. The visitors are also, increasingly, the AI agents acting on their behalf. Both will keep visiting for the next decade.
The site, the phone line, the CRM, and the rules that govern who can do what on which surface are about to change shape to accommodate both.
The agentic web is the web as it is being reshaped by AI agents that browse, decide, and transact on behalf of people.
Microsoft calls it “the open agentic web.” IEEE Spectrum calls it the redefinition of the internet.
We are calling it what businesses need to plan for in the next 18 months, because the alternative is being passed over by the next wave of buyer behavior without ever seeing it land.

33% of user experiences will shift from native applications to agentic front ends by 2028.
Source: Gartner 2025
The agent-task horizon is doubling roughly every seven months (METR, cited by Sequoia in “2026: This is AGI,” January 14, 2026).
Today’s agents reliably handle about 30 minutes of work. By 2028, a single agent does a workday. By 2034, a workyear.
The next default UX surface for SMBs is not the agentic browser, not voice, not glasses, and not the chat box. It is the agent inbox. The morning triage view where the SMB owner sees what their agents did overnight, what got escalated, and what needs a human decision. Everything else (voice, hybrid browsers, agent-to-agent protocols, even glasses) is a channel feeding into that triage.
This piece walks the four scenarios serious investors and labs are actively betting on (voice-first, agent-to-agent protocols, ambient/spatial, and hybrid multimodal), scores each for a business like yours, and then commits to where we think they converge inside the larger agentic web.
Four scenarios. One bet.

Key Takeaways:
- No single mode wins; the agent inbox is where all four scenarios converge for SMBs. Voice handles phone calls. Agent-to-agent protocols handle commerce and integration. Hybrid browsers handle research and execution. Glasses are early. All four drop their output into one place: a morning triage view that has more in common with email than with chat.
- Voice (Scenario A) and agent-to-agent (Scenario B) are the highest-ROI SMB bets in 2026. Voice has crossed the production threshold for inbound and outbound phone work. Agent-to-agent protocols (MCP at 78% enterprise adoption, plus ACP, AP2, UCP, x402, WebMCP) are the new mobile-readiness for SMB websites. Spatial (Scenario C) is a “do not invest yet” call we are making publicly. Hybrid (Scenario D) is the base case to design for.
- The site that is callable, not just browseable, wins twice. If your customer’s agent can read your structured data, your MCP server, and your ACP endpoint, and your own agents can act on the bookings, quotes, and questions that result, you are positioned for the agent inbox era. If your site is beautiful but only browseable, you are losing agent-driven traffic that is already compounding.

Why “What Comes After Chat?” Is the Wrong Question
The current public debate frames the future as a horse race between surfaces. ChatGPT Atlas versus Perplexity Comet versus The Browser Company’s Dia. Voice versus typing. Glasses versus phones. Pick one, the framing says, and optimize for it.
That framing misses what the people actually building these systems are saying.
a16z’s headline 2026 thesis, “Big Ideas 2026: The Agentic Interface” (December 22, 2025), frames the shift as three concurrent moves: interfaces from chat to action, design from human-first to agent-readable, and work from individual contributor to agent delegation.
Sequoia’s “2026: This is AGI” (January 14, 2026) calls it the “talkers to doers” turn.
Microsoft titled their February 2026 launch “Copilot Tasks: From Answers to Actions.”
Perplexity used “From Answers to Action” as the Comet design thesis.
OpenAI shipped Operator (January 2025) and then Atlas (October 21, 2025) to act, not just to answer.
The thing they are all saying, in different vocabularies, is the same thing:
Chat was a phase, and the phase is over as the endpoint. Chat is still the input. The value is in what happens after the input.
The follow-on question is which surfaces, which protocols, and which interaction patterns become the default. That is a real question.
It is also a question with an obvious framing trap: every player is incentivized to argue for the surface they own.
OpenAI argues for browsers (Atlas) and chat-as-app-shell (Apps SDK).
Anthropic argues for protocols (MCP). Apple has yet to argue for anything that ships.
The Browser Company argues for browsers-as-OS.
Google argues for “Gemini in every existing surface” because it already owns most existing surfaces.
The under-argued bet, the one neither the AI labs nor the analyst houses are fully selling, is the one made independently by Sequoia’s Always-On Economy (April 21, 2025) and by a16z’s “From System of Record to System of Intelligence” (Spring 2026):
The dominant new surface is not a live UI at all. It is the morning agent inbox.
That bet is the conclusion we land on. Four scenarios first, then the bet.
Why This is the Agentic Web and Not Just “AI Search” or “AI Chatbots”
The four scenarios below are not really four AI features. They are four manifestations of a single, larger shift the industry has started calling the agentic web. The vocabulary matters because the framing changes what an SMB owner does next.
“AI search” is a vendor-side framing.
It assumes the work of a search engine, plus AI summarization. The output is a better answer to a query.
The unit of work is a question, the surface is still a SERP-like result, and the SMB response is some flavor of “optimize for the new SERP.” That framing is too small, because it treats the agent as a smarter search box.
“AI chatbots” is a tool-side framing.
It assumes a synchronous conversation between a person and a model, often pinned to a single website or app. The output is a dialog. The unit of work is a turn. The SMB response is some flavor of “add a widget to the site.”
That framing is also too small, because it treats the agent as a smarter contact form.
The agentic web is a system-level framing.
It assumes AI agents that browse, decide, transact, and persist across sessions on behalf of a person or a business. The unit of work is an outcome, not a turn. The surface is wherever the agent goes (your website, your phone line, your MCP server, your ACP endpoint, your CRM API), and increasingly wherever the agent answers without you (the agent’s own inbox, the agent’s own chat thread, the agent’s own daily briefing).
The SMB response stops being “add a feature” and becomes “make the business legible and callable from the outside.”
That is what Microsoft means when it calls it “the open agentic web.”
It is what IEEE Spectrum means when it argues AI agents will redefine the internet.
It is what Sequoia and a16z mean when they describe ambient agents and systems of intelligence.
None of those framings reduce to AI search or to chatbots. All of them assume the four scenarios below are happening simultaneously, mediated by a single agent on each side of every transaction.
The reason this article uses “agentic web” instead of “AI search” or “chatbot” is not branding. It is precision.
If we said “AI search,” we would not be talking about the voice agent on your phone line, the MCP server that lets your CRM negotiate a quote, or the morning inbox where you triage what your agents did overnight.
If we said “chatbot,” we would not be talking about agent-to-agent commerce, the protocol stack, or the systems-of-intelligence layer. The agentic web is the only term that holds all of it at once.
Four Ways the Agentic Web Will Show Up (A Working Map)
Each of the four scenarios below is a credible “what comes after chat” thesis with serious champions, real shipping product, and a non-trivial chance of becoming the default within 24 months.
We score each on three dimensions:
- Adoption maturity (is the technology in production?)
- SMB readiness (is it usable by a 5-100 employee business this quarter?)
- 24-month likelihood (does this become a default surface by mid-2028?).
The full side-by-side scoring matrix is in the “Putting the Scenarios Side By Side” section below.
Scenario A: Voice-First as the Primary Interface

State of Play in 2026
Voice crossed the production threshold for narrow, high-value SMB workloads in late 2025. The infrastructure now exists to put a competent AI on the other end of an SMB’s phone, with sub-second turn-taking, current pricing for under a quarter per minute, and reasoning capability that did not exist 12 months ago.
ElevenLabs closed 2025 with more than $330M in ARR and raised a $500M Series D at an $11B valuation in February 2026, with two million voice agents deployed across web, mobile, and telephony.
OpenAI shipped GPT-Realtime-2 with GPT-5-class reasoning, exited the Realtime API beta, and added MCP, image, and SIP telephony support on the same release (The New Stack, late 2025).
Zillow, an early production user, saw its call-success rate on an adversarial benchmark go from 69% to 95%.
Hume’s Octave 2 and EVI 4 pushed emotion-aware prosody into general availability for support and coaching.
The SMB-tier voice-agent market (Bland, Synthflow, Retell, Vapi) commoditized at roughly $0.09 to $0.14 per minute inclusive of LLM and TTS, with $299 per month entry tiers.
The contrarian data point is also significant.
Apple’s LLM-powered Siri rebuild has slipped repeatedly, most recently to iOS 26.4 in 2026 and partly to 2027. The most-funded consumer voice-assistant rebuild of the era is the most public failure-to-ship.
GPT-Realtime-1.5 shipped a regression in voice expressiveness and accent quality severe enough that production builders complained publicly.
The AI voice generator market is forecast to grow from $4.16B in 2025 to $20.71B in 2031 at a 30.7% CAGR (Markets and Markets, 2025), but most of that growth is enterprise and telephony, not consumer assistants.
Voice is winning on the phone-call surface. Voice is still losing on the general consumer-assistant surface.
What This Changes for a Business Like Yours
The inbound phone call is now an AI surface, not just a fallback channel.
The hours-of-operation page, the pricing FAQ, the cancellation policy, and the service-area map are all training data for a voice agent that may handle the first call from a new lead.
“Click to call” is becoming “click to talk to an AI that already has your context.”
Voice does not kill websites.
It raises the cost of a bad voice answer.
The top ten questions a customer would ask the phone today are the same ten that need a published answer your voice agent can ground itself in.
Strongest Argument FOR
SMB phone calls were the original conversational interface and they never went away.
The historical AI voice failure mode was that the model was not good enough to converse without breaking the customer’s patience.
That ceiling moved in late 2025.
With sub-second turn-taking, GPT-5-class reasoning in the loop, MCP and SIP integration, and a 26-point production improvement on the Zillow adversarial benchmark, the case for putting voice on inbound and outbound calls is now a near-term ROI conversation, not a research conversation.
Strongest Argument AGAINST
The general-purpose voice case is unproven.
Apple has slipped Siri at least three times since WWDC 2024.
The most public quality regression of 2025 was an OpenAI voice release. Most ChatGPT users still type, not talk.
Accent, interruption, ambient noise, and social context still trip current models in ways an enterprise can absorb and an everyday consumer cannot.
Voice for SMB phone work is a confident bet. Voice as a daily-driver assistant for general consumers is still a 2027+ promise.Scoring
| Dimension | Score (1-5) | Reason |
|---|---|---|
| Adoption maturity | 3 | Production-ready for phone calls; not yet for general assistants |
| SMB readiness | 4 | Vendors and unit economics are ready today for inbound and outbound |
| 24-month likelihood | 4 | Likely to be a default channel for SMB phone work by 2028 |
Scenario B: Agent-to-Agent (Machine-to-Machine) Protocols

State of Play in 2026
The protocol layer of the agent era was contested in 2024 and is mostly settled in 2026.
Anthropic open-sourced the Model Context Protocol (MCP) on November 25, 2024. By April 2026, 78% of enterprise AI teams reported at least one MCP-backed agent in production, and 67% of CTOs surveyed said MCP would be their default agent-integration standard within 12 months.
The public registry grew to more than 9,400 servers, up from roughly 1,200 in Q1 2025, a 7.8x annual jump, with over 110M SDK downloads per month. Median time-to-integrate a new SaaS tool dropped from about 18 hours of custom function-calling to 4.2 hours with MCP, a 4.3x productivity multiplier (Digital Applied MCP Adoption Stats 2026).
Anthropic’s MCP co-creator David Soria Parra summed it up at the MCP Dev Summit in 2026: “Behind every corporate firewall, we’re quietly wiring MCPs to systems of record and company data all day long. Salesforce, Jira, internal wikis, Snowflake, HR systems. You never really see or hear about it on Twitter.”
Above MCP, the industry has split into two protocol families:
- Agent coordination. Google’s A2A Protocol launched in April 2025 with 50+ partners and was handed to the Linux Foundation in June 2025 with 100+ supporters; it crossed 150+ by year-end.
- Agent commerce. Stripe and OpenAI’s Agentic Commerce Protocol (ACP) launched in September 2025 and went generally available in Q1 2026 inside the Stripe Agentic Commerce Suite, with URBN (Anthropologie, Free People, Urban Outfitters), Etsy, Coach, Kate Spade, plus Wix, WooCommerce, BigCommerce, and Squarespace at launch. Google and PayPal launched AP2 (Agent Payments Protocol) in late 2025. Google and Shopify launched the Universal Commerce Protocol (UCP) in October 2025. Coinbase launched x402 on Solana in summer 2025, with 35M+ transactions and $10M+ volume processed by 2026.
Browser agents got their own native standard on February 10, 2026, when Google Chrome shipped WebMCP behind a flag in Chrome 146 Canary, co-authored with Microsoft. WebMCP lets a website expose structured tools (book appointment, request quote, check stock) directly to in-browser agents.
Forrester predicts 30% of enterprise app vendors will launch their own MCP servers in 2026. 73% of customer-service organizations will have implemented agent-assist solutions by end of 2025 (Gartner, August 27, 2025).
What this Changes for a Business Like Yours
The SMB website is no longer the only customer-facing surface.
The MCP, A2A, and WebMCP descriptors of the site are also surfaces. Discovery shifts from search-engine ranking to agent-catalog inclusion, which is why Stripe’s Agentic Commerce Suite, OpenAI’s Apps SDK, and Google’s UCP integrations are starting to look like the new “Yelp listing.”
Checkout shrinks from a six-step funnel to a single signed mandate (AP2’s verifiable credentials) or an HTTP 402 response (x402). Cart abandonment as a metric loses meaning.
Stripe’s framing is the bluntest in the agent commerce literature: “Agents don’t shop the way humans do. They directly query structured data. Visual merchandising is irrelevant to them. What matters is whether your data is accessible and complete.” (Agentic Commerce: A Guide for Businesses)
Agent-readability is the new mobile-readiness.
The historical analog is responsive design and schema.org adoption in the 2010s. Expect the same multi-year compliance arc.
Strongest Argument FOR
MCP is now the closest thing the agent ecosystem has to a universal standard.
Full client coverage across Claude, ChatGPT, Gemini, Cursor, Windsurf, and Vercel. 78% enterprise adoption with 9,400+ public servers in April 2026.
A 4.3x integration-time improvement that creates real budget pressure. When this much of the lower stack converges in 18 months, the ceiling moves with it.
Strongest Argument AGAINST
Protocol consensus is real at the MCP layer and fragmented above it.
A2A versus ACP versus AP2 versus UCP versus x402 means SMBs face protocol shopping rather than a single “agent-ready” target.
67% of MCP servers still run as local STDIO desk-only processes, not network-callable cross-org surfaces. For an SMB today, “agent-to-agent” is mostly developer plumbing, not yet a customer-facing front door.
The right move for most businesses in 2026 is structured data plus schema plus llms.txt, not a hand-rolled MCP server.
Scoring
| Dimension | Score (1-5) | Reason |
|---|---|---|
| Adoption maturity | 4 | MCP has won the lower layer; commerce protocols still fragmented |
| SMB readiness | 2 | Schema and llms.txt are ready; bespoke MCP servers are not |
| 24-month likelihood | 5 | Agent-to-agent surfaces become the new “mobile site” by 2028 |
Scenario C: Ambient, Spatial, and Wearables

State of Play in 2026
The single biggest verdict of 2025 was that the smartphone-replacement standalone AI device failed, while the AI-glasses-on-existing-eyewear category broke out.
Humane shut down the AI Pin nine months after launch on February 18, 2025; HP bought the assets for $116M; existing $700 devices were bricked with limited refunds. Rabbit’s R1 Large Action Model “failed to deliver in practice,” with the core thesis collapsing into a glorified app launcher.
The most ambitious screen-free device still in development, OpenAI’s acquisition of Jony Ive’s “io” startup, was delayed to early 2027 and stripped of the io brand.
Apple Vision Pro shipped an estimated 45,000 units in Q4 2025 (IDC), versus 390,000 for all of 2024. Apple cut digital advertising spend for Vision Pro by more than 95% year over year. The native app catalog reached roughly 3,000 in almost two years.
Global VR headset shipments fell 14% year over year. Apple shifted Vision Pro engineering staff to smart-glasses projects through 2025, conceding the form factor before shipping a sequel.
Meanwhile, EssilorLuxottica sold seven million Meta smart glasses in 2025, more than triple 2024, with annual capacity now targeted at 20 to 30 million units.
Meta shipped six smart-glasses SKUs across price points (Ray-Ban Meta gen 2, Oakley Meta HSTN, Oakley Meta Vanguard, Meta Ray-Ban Display).
Google re-entered with Gemini for Android XR plus Project Aura with XREAL and partnerships with Warby Parker, Gentle Monster, and Kering Eyewear.
Samsung is reportedly shipping a competitor in 2026.
The standalone screen-free AI hardware bet keeps failing. The glasses-on-eyewear bet is breaking out.
When Meta, Apple, Google, Samsung, and the world’s largest eyewear company all reorganize their roadmap around the same form factor in the same year, that is a real signal.
What This Changes for a Business Like Yours
Glasses are an audio plus camera plus heads-up surface, not a typing surface.
Most glasses-AI use cases are “what is in front of me right now,” which means local discovery wins.
Google Business Profile, hours, menu, address legibility, and a clear physical sign matter more than a parallax landing page.
The Vision Pro flop is a buy signal for SMBs to ignore VR headset campaigns and spatial-app development for at least 24 months.
Privacy and recording-disclosure norms will harden, the way ADA compliance and cookie banners did. Track local “no-record” glasses ordinances.
They are coming.
The website does not disappear in this scenario. It gets read aloud or summarized aloud, so well-structured content wins again.
Strongest Argument FOR
The fastest-moving consumer-AI hardware category in 2025 was glasses, by a wide margin.
EssilorLuxottica sold more Meta smart glasses in 2025 (7M units) than Valve has sold Steam Decks across the entire product’s lifetime.
The production target jump to 20-30M units per year is a manufacturing commitment, not a press release. Two of the world’s three largest hardware companies are reorganizing engineering around the form factor.
That kind of multi-vendor commit usually precedes a real category shift by 18 to 36 months.
Strongest Argument AGAINST
For every Meta-Ray-Ban win there is a Vision Pro and a Humane Pin.
Most smart-glasses sessions in 2025 were short camera and audio interactions, not the all-day ambient computing the category is sold as.
The one screen-free AI device with serious ambition (OpenAI/Ive) is delayed to February 2027 with no packaging or marketing yet.
SMBs reorganizing their stack around “glasses-first” today are betting on a capability curve that has not yet shown up in real usage time.
For most SMBs, our recommendation is concrete: do not invest in spatial or VR application development for at least 24 months.
If anything, make sure your local discovery surfaces (Google Business Profile, address schema, hours, menus) are in order. That serves voice and glasses simultaneously.
Scoring
| Dimension | Score (1-5) | Reason |
|---|---|---|
| Adoption maturity | 2 | Glasses category breaking out; standalone devices failing |
| SMB readiness | 1 | Local discovery hygiene only; no spatial dev recommended |
| 24-month likelihood | 3 | Real but slow; 36-month story, not 24 |
Scenario D: Hybrid Multimodal

State of Play in 2026
The biggest UI bet of late 2025 was OpenAI launching ChatGPT Atlas, an entire Chromium-class browser with ChatGPT and an in-browser agent mode, on October 21, 2025. Atlas shipped worldwide on macOS to all Free, Plus, Pro, and Go users with agent mode in preview for paid tiers.
Anthropic answered with Claude Computer Use as a tool in the Messages API (October 22, 2024) and then Claude Cowork, where Claude controls the desktop directly through the standard tool surface.
Perplexity shipped Comet on July 9, 2025.
Microsoft launched Copilot Vision in October 2024 and then Copilot Tasks in February 2026, moving from “see and explain” to “click and act.”
Google ships Gemini in Search (AI Mode), Chrome (Project Mariner), Android (Gemini Intelligence), and ambient hardware (Project Astra), the everything-everywhere bet.
a16z’s December 2025 framing crystallizes the hybrid thesis:
The interface is not collapsing into one mode. It is collapsing into one agent, addressed across many modes. Chat for instructions. Screen for context. Voice for ambient. Programmatic for execution.
a16z partner Stephenie Zhang’s line: “In 2026, people will start interfacing with the web through their agents. As agents take over retrieval and interpretation, visual design becomes less central to comprehension.”
The friction is real.
Claude Cowork runs at $20/mo and “still falls short” on multi-app workflows. Computer Use models hallucinate clicks and need explicit guardrails. Atlas browser memory and agent mode raise novel prompt-injection and silent-data-leak surfaces. OpenAI itself warns “our safeguards will not stop every attack.”
The hybrid bet works in aggregate, not yet in any single moment.
What This Changes for a Business Like Yours
The default reader of an SMB site in 2026 is now a hybrid agent.
Chat asks. Browser fetches. Agent acts.
The SMB site has to be readable to humans, parseable to crawlers, and callable by agents at the same time. CTAs need to be unambiguous tool calls (“Book a 30-minute consultation,” “Get a quote on this SKU”), not feel-good prose like “let’s start something great together.”
Visual design is still doing the work for human-only visitors. It increasingly does no work for the agent reader.
The default architecture in 2026 is two outputs from one CMS: one beautiful, one structured. The competitive moat shifts from brand visuals to brand legibility.
The metrics shift too.
Time on page, scroll depth, and session duration are degrading metrics the moment a non-trivial share of visits are agentic.
Outcome metrics (booked appointments, paid invoices, quote requests, replied DMs) move up the funnel.
Strongest Argument FOR
Hybrid is the only thesis that does not require any of the other three to fully win.
Voice handles phone calls. Glasses handle ambient. MCP and A2A handle machine-to-machine. Chat and browser handle everything else.
All four reach a single agent.
Atlas, Comet, Dia, Cowork, and Computer Use are evidence the frontier labs are betting on this exact hybrid topology.
If you are designing for one scenario, design for this one, because it absorbs the others as channels.
Strongest Argument AGAINST
Hybrid is a thesis with no single forcing function.
Atlas is still a Mac-only beta in early 2026. Cowork falls short on real desktop tasks today.
The most-cited a16z prediction (the end of the screen-time KPI) is explicitly an end-state, not a 2026 statement.
Without a default mode that wins, SMBs have to invest in voice, structured data, visual design, and agent compatibility simultaneously, which is the most expensive answer.
The discipline this scenario requires is choosing what not to do.
Scoring
| Dimension | Score (1-5) | Reason |
|---|---|---|
| Adoption maturity | 3 | All major surfaces in market; quality and trust still maturing |
| SMB readiness | 3 | Affordable to design for; expensive to invest in for every surface |
| 24-month likelihood | 5 | The base case if no single mode wins |
Putting the Scenarios Side by Side
| Scenario | Adoption maturity | SMB readiness | 24-month likelihood | One-line read |
|---|---|---|---|---|
| A. Voice-first | 3 | 4 | 4 | Production-ready for phone calls; not a general interface |
| B. Agent-to-agent protocols | 4 | 2 | 5 | MCP has won the lower layer; “agent-readable” is the new “mobile-ready” |
| C. Spatial / wearables | 2 | 1 | 3 | Glasses category breaking out; do not invest in spatial dev for 24 months |
| D. Hybrid multimodal | 3 | 3 | 5 | The base case to design for if no single mode wins |
Read the matrix and the picture is clear.
Scenarios B and D are the most likely co-winners over the next 24 months.
MCP, A2A, ACP, AP2, UCP, and WebMCP are the plumbing. Hybrid Atlas, Comet, Cowork, and Computer Use surfaces are the ceiling tile.
Voice (Scenario A) becomes one channel that runs on top of that plumbing, dominant in phone calls, narrow elsewhere.
Spatial (Scenario C) lags both for at least 24 months because the standalone hardware bets keep failing and the breakout glasses category is still mostly camera and audio, not interface.
But the matrix also misses the underlying question, which is not “which scenario wins?”
The right question is “which surface does an SMB owner actually look at every day?”
Because however many scenarios technically win, only one of them gets the owner’s attention in the morning.
That is where the bet lives.
Our Bet: The Agent Inbox is Where the Scenarios Converge for SMBs

In 2026, your customer’s agent will read your site, talk to your voice agent, pay your processor, and book your time slot, mostly without visiting your homepage as a human.
Stripe’s framing is the bluntest version of this future, but a16z, Sequoia, Anthropic, and OpenAI are all building toward the same world.
The piece nobody is publishing for SMBs is what happens on your side of that interaction.
The agent your customer sent did something. Your agents responded. Some of it required a human decision. Some of it did not.
Somebody has to triage the result.
That somebody is the SMB owner, and the surface they triage on is what Sequoia calls the agent inbox.
Where the Thesis Comes From
The agent-inbox idea is two years old now.
Sequoia’s Lauren Reeder and Stephanie Zhan named the original “agents on the brain” pattern in May 2023.
Konstantine Buhler’s Always-On Economy piece in April 2025 made the economic case: “the real transformation AI brings in the next 5-7 years isn’t about replacing human intelligence. It’s about removing temporal constraints from our economy.”
Sequoia returned in 2025 with Harrison Chase on Ambient Agents and the New Agent Inbox, framing it as a real product pattern (LangChain Inbox, Microsoft’s agent surface inside Outlook, internal tools at the frontier labs).
a16z arrived at the same place from a different direction.
From System of Record to System of Intelligence (Spring 2026) argues that the system of record (CRM, ERP) gets eclipsed by a new “system of intelligence” layer that reads, writes, and reasons over it. The user surface for that layer is a prioritized feed, not a chat box.
The piece that has not been written is the SMB version.
Every framework above assumes a Fortune 500 IT shop, a Series B product team, or a venture-funded brand. None of them write the version where the company is a 12-person law firm, a five-restaurant group, a $4M ecommerce DTC brand, or a real estate brokerage.
That is the version that matters because that is the version most of our readers run.
Why the Agent Inbox is the Right Surface for SMBs Specifically
Four reasons.
1. Small teams have finite attention and infinite inbound channels. An SMB owner already triages email, Slack, voicemail, text, Instagram DMs, and a CRM. Adding “chat with my agent” as a new live conversation is a tax. Adding “review what my agents did overnight” is a relief.
2. The bookings, quotes, and questions are already async. Most SMB sales motion is not synchronous. A lead asks at 9 PM. The owner responds at 7 AM. The agent inbox formalizes a rhythm that already exists.
3. The SMB metrics that matter are outcomes, not sessions. Bookings made. Quotes sent. Replies returned. Reviews collected. Calls handled. Those metrics already live in tools (CRM, scheduler, voice provider, review platform). The inbox is the morning aggregation of them.
4. Trust ramps with review, not autonomy. Only 6% of companies fully trust AI agents to autonomously run core business processes (HBR survey via Fortune, December 9, 2025). The inbox is the surface where SMB trust ramps incrementally: see what got done, approve the harder calls, intervene where the agent was wrong, watch the false-positive rate fall over time.
What the Morning Triage Actually Looks Like
The mental model is closer to email than to chat.
Here is a working sketch for a five-person service business with a website, a voice agent on inbound calls, and a booking and CRM system connected through MCP.
The agent inbox at 7 AM Monday might read:
- 8 inbound calls handled overnight. 6 qualified and booked into Calendly. 2 escalated for human review (one outside service area, one pricing question outside the published menu). Average call length 3 minutes 14 seconds.
- 4 quote requests received via web form. 3 drafted and queued for owner approval, 1 escalated as a complex multi-property situation. Drafted quotes link inline to the customer record.
- 2 review responses drafted. Both queued for approval. The agent flags one as a recoverable customer-experience issue (delayed start time) with a suggested service-recovery offer.
- 1 returning client at risk. The CRM agent flags a 90-day silence from a key account whose last interaction was a billing dispute; a re-engagement draft is queued.
- Daily revenue snapshot. Bookings confirmed, projected close from outstanding quotes, calls answered versus baseline, NPS pulse from the week.
The owner spends ten minutes triaging this view. Approves three quotes. Edits a review response. Escalates one call to the lead service technician. The rest happens automatically.
That ten-minute review is the new daily-driver SMB UX surface.
It is where voice ends (the agent took the call, the inbox triages the result). It is where agent-to-agent protocols end (the customer’s agent talked to your MCP server, the inbox shows what it did).
It is where hybrid browsers end (Atlas’s agent mode took an action, the inbox shows whether it worked).
It is where ambient ends (the glasses-driven local query routed through your booking flow, the inbox shows the booking).
Four scenarios. One inbox.
The Receipt
We are running a version of this with a Los Angeles real estate brokerage we onboarded recently.
Before our work, they ran prompt checks across ChatGPT, Claude, Gemini, and Perplexity for “best luxury real estate agents in Los Angeles” and “best broker to help with a high end real estate sale in Los Angeles.”
They appeared in roughly zero of those answers. Competitors with weaker pages but better-structured content appeared often.
The first 30 days of the engagement are not a redesign.
They are a measurement system, a content restructuring around the queries that matter, and the machine-readable layer (schema, pricing.md, FAQPage markup, allowed AI crawlers in robots.txt) that lets the answer engines and the customer agents see the brokerage at all.
The next 30 days will layer in agent-callable surfaces (a voice agent for inbound calls grounded in the published FAQs, an MCP server exposing inquiry intake) so that an agent acting on a buyer’s behalf can actually book a consultation without rendering JavaScript.
The final 30 days are the inbox layer: a daily summary of inbound calls, web form submissions, AI-citation tracking, and review status that fits into a ten-minute morning review.
It is not a finished system yet.
It will take all 90 days to settle. But the shape of the result is already visible: the brokerage’s website becomes legible to the agents that increasingly mediate buyer research, and the owner’s morning shifts from “what did email send me?” to “what did my agents do?”
That is the bet.

Right now, an AI agent is reading your website for one of your buyers. What is it telling them?
You just read what a site that talks to those agents looks like. We do a 30-minute audit that finds the three things keeping yours invisible. Free, no pitch, prioritized plan included.
What This Changes for Your Website, CRM, Voice Stack, and Metrics
Four practical implications, one per layer of the stack, if the agent inbox is the dominant new SMB surface.
Your Website Becomes Callable, Not Just Browseable
The website still needs to be beautiful for humans, fast for Core Web Vitals, and structured for traditional SEO. That work does not go away. What gets added is the machine-readable and agent-callable layer:
- Clean schema graph (
Organization,WebSite,WebPage,BreadcrumbList, page-type-specific schema,FAQPageon every page with an FAQ block) llms.txtat site root as hygienepricing.mdat site root if pricing is published (and it should be for as much of the catalog as ethics allow)robots.txtthat explicitly allows the modern AI crawlers (GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot, anthropic-ai, Google-Extended, Applebot-Extended)- Where appropriate, an
OrganizationMCP server that exposes structured tools (book appointment, request quote, check availability) for in-browser agents through WebMCP - An ACP or UCP endpoint if commerce is a meaningful share of revenue
The first four items are the foundational AI-Ready layer we mapped in the AI-Ready Website Playbook and should be in place before the rest.
The last two are the new 2026 layer that most SMB sites have not yet built.
Your CRM Treats Agents as Actors
The CRM stops being a system that only humans write to.
Every record gets a source tag: human-led, agent-led, co-led.
Every action gets an actor (the human rep, the voice agent, the booking agent, the review agent).
Every escalation gets a reason code so the inbox can sort and the owner can audit.
Without this layer, the inbox becomes a noisy stream of decontextualized actions. With it, the inbox becomes triagable.
Your Voice Stack is Production, Not Experimental
Pick a voice vendor that supports preambles (“let me check that for you”), parallel tool calls, MCP integration, and graceful escalation to a human.
For most SMBs in 2026, that vendor is one of ElevenLabs, OpenAI’s Realtime API used through a wrapper (Vapi, Retell, Synthflow), or a managed solution from a category-specific provider.
The unit economics are no longer prohibitive (under a quarter per minute inclusive of LLM and TTS), so the threshold for “this pencils out” is whether your inbound call volume justifies a fractional FTE of attention.
Ground the voice agent in the same published FAQs and pricing your website serves.
Same source, three audiences (humans on the call, humans on the web, agents on the web), same answer.
Your Metrics Shift from Sessions to Outcomes
Stop optimizing for time on page, scroll depth, and bounce rate.
Those metrics were proxies for outcomes in a world where every visitor was a human reading the page. They degrade the moment a non-trivial share of visits are agentic.
They lie in both directions: an agent that reads the page in two seconds and books an appointment is your best visitor; a human who scrolls for ten minutes and bounces is your worst.
Move up the funnel.
The metrics that survive the agent era are outcome metrics: appointments booked, quotes sent, replies returned, reviews collected, calls handled, escalations resolved, AI citations on tracked queries.
Pick five. Put them in the inbox. Review them in the morning.
The 90-Day SMB Readiness Sequence

If you are starting from “I have a website, an email inbox, and a phone line that goes to voicemail after hours,” here is what the 90 days look like.
Days 1-30: Foundation
This is the work in the AI-Ready Website Playbook.
Core Web Vitals fixed. Schema graph audited. llms.txt shipped. pricing.md shipped if pricing is public. AI crawlers allowed in robots.txt. FAQ blocks with FAQPage schema on every long-form page.
If you are running on a managed hosting platform that handles caching and image optimization automatically, this layer is mostly a content and configuration project, not a development project.
The output of these 30 days is a site that is legible to the agents already shopping on your behalf, before you touch the inbox layer.
Days 31-60: Agent-callable surfaces
Pick one inbound channel and make it agent-ready.
For most SMBs, that channel is the phone.
Stand up a voice agent (Bland, Synthflow, Retell, or ElevenLabs directly), ground it in the FAQ content you just shipped on the website, give it the ability to book appointments through your scheduler, and define an escalation policy (which call types route to a human, on what threshold). Run it on inbound only for the first two weeks. Listen to recordings. Tune.
If commerce is a meaningful share of your revenue, this is also the window to enable an ACP, UCP, or Stripe Agentic Commerce Suite endpoint, depending on your platform.
The output of these 30 days is at least one channel where your business is callable by an agent (your customer’s, yours, or both), not just browseable.
Days 61-90: The triage layer
Build the inbox.
The minimum-viable version is a single daily-summary email or Slack message generated by a workflow tool (n8n, Zapier, Make.com, Lindy, OpenAI AgentKit).
The summary pulls from the voice provider, the scheduler, the CRM, the web form, and the review platform. It surfaces what got handled, what got escalated, and what the owner needs to decide today. Ten minutes of attention should clear it.
Define the escalation rules explicitly. Bookings outside the service area, calls about competitor pricing, quote requests over a dollar threshold, reviews under four stars, billing disputes. Each gets a rule. Each rule produces an inbox row.
The output of these 30 days is a morning surface that compresses every agent-handled inbound from the prior 24 hours into a single review.
After the 90 days, the system runs daily. Every quarter, audit the false-positive rate (escalations the agent should have handled), the false-negative rate (handled actions that should have been escalated), and the AI-citation tracking. Adjust scope.
Four SMB Scenarios Applied
What the agent inbox looks like for four common SMB shapes.
⚖️ A 14-Attorney Law Firm
The voice agent on inbound calls intakes new-matter inquiries against the firm’s practice-area scope, asks the conflict-check screening questions, and books a paid consultation in the right attorney’s calendar.
The agent inbox at 8 AM shows: 5 calls handled overnight, 3 booked, 1 escalated as a conflict-check failure, 1 escalated as outside scope (criminal matter, referred to a partner firm).
2 new client intake forms completed via the website overnight, both queued for the partner’s morning review.
1 Google review draft response queued for approval.
🍝 A Four-Location Restaurant Group
The voice agent handles reservation requests, special-occasion bookings, and dietary-restriction questions across all four locations. The agent inbox at 7 AM shows: 22 reservations confirmed overnight, 3 escalated as large-party requests, 1 flagged as a possible no-show pattern, 1 review responded to with the owner’s approval. The agent also surfaces a daily reading on which locations have available capacity that hasn’t filled by the morning prep deadline.
🛒 A $4M eCommerce DTC brand
The MCP server exposes structured product, pricing, availability, and policy data. The agent inbox at 6 AM shows: 47 orders fulfilled overnight (no human intervention required); 3 orders flagged for fraud review; 8 return-or-exchange requests drafted for owner approval (the agent suggests one as a full refund based on the policy match); 2 Instagram DM customer-service questions drafted and queued. The morning task is approving the eight returns and replying to two DMs. Fifteen minutes total.
🏡 A Real Estate Brokerage
The voice agent handles initial inquiries grounded in the brokerage’s published practice-area pages (probate, trust, conservatorship, court-confirmed sales). The agent inbox at 7 AM shows: 4 inbound calls handled overnight, 2 booked into Calendly with the appropriate specialist, 1 escalated as a complex multi-property estate, 1 escalated as out of service area. 3 inbound web form quote requests drafted. 1 AI-citation tracking alert: the brokerage was cited in a ChatGPT answer for “probate real estate Los Angeles” for the first time, with the cited page identified.
The shape is consistent across all four. Inbound channels feed an inbox. The inbox compresses the night into a ten-minute review. The human spends attention on what matters and not on what does not.
What We Still Do Not Know
Five things this article cannot answer yet, and that we will revisit on a quarterly cadence as the data fills in.
- WebMCP adoption beyond developer flags. The Chrome WebMCP API is in Canary as of February 2026. Whether SMB platforms (WordPress, Shopify, Squarespace, Wix) ship native support, and on what timeline, will decide whether agent-callable websites become a normal SMB feature or a custom-build differentiator.
- Which commerce protocol consolidates. ACP, UCP, AP2, and x402 are not yet a single standard. If one absorbs the others in the next 12 months, SMB integration becomes simple. If they split by vertical or by ecosystem, SMBs end up implementing more than one. Watch Stripe, Shopify, Google, and Apple positioning over the second half of 2026.
- Voice-first beyond phone calls. The phone is the proven SMB use case. Whether voice becomes a daily-driver consumer assistant (a re-attempt at the “talk to your phone instead of typing” dream) is unproven, and Apple’s repeated Siri slips are evidence the timeline is not 2026.
- The trust-gap closure curve. HBR’s “only 6% fully trust agents” stat is the cleanest single data point on the consumer-side of the agent era. Where that number is in 12 months will tell us how fast the agent inbox shifts from “review every action” to “approve exceptions only.”
- Whether the open web survives as a commercial substrate. Stratechery’s argument that the ad-supported open web requires a native digital-payments and AI-citation-auction layer to stay viable is a thought experiment, not a shipped system. The replacement model has not been publicly tested. SMBs benefiting from organic traffic should plan for less of it in 2027 and beyond, and shift more pipeline to direct, referral, and agent-mediated channels in parallel.
FAQs About the Agentic Web Era
An agent inbox is the morning triage surface where an SMB owner reviews what their AI agents did during the prior 24 hours, sees what got escalated for human decision, and approves or edits the items that need attention. Sequoia named the pattern in 2025; LangChain, Microsoft (inside Outlook), and internal tools at the frontier labs ship versions of it. For SMBs, the minimum-viable version is a daily summary email or Slack message generated by a workflow tool that aggregates voice calls handled, bookings made, quotes drafted, reviews queued, and exceptions flagged from the prior day.
Email is reactive and unstructured. A chatbot is synchronous and live. The agent inbox sits between them. It is structured (every row is an action with an outcome and a status), asynchronous (compiled overnight, reviewed in the morning), and outcome-oriented (the metric is “how many escalations need a decision today,” not “how many messages do I have”). It is the morning review surface for work agents did while you slept.
No, but it is going to replace some calls to your business. Voice has crossed the production threshold for inbound and outbound phone work for SMBs in 2026, with sub-second turn-taking, GPT-5-class reasoning, and SIP telephony integration. Voice has not crossed the threshold for general-purpose consumer assistants; Apple’s Siri 2.0 has slipped at least three times. Your website still serves human visitors, search engines, AI research agents, and now your own voice agent (which grounds its answers in the FAQs you publish). It is not going away.
For most SMBs, no, not yet. The foundational AI-Ready layer (schema, llms.txt, pricing.md, allowed AI crawlers in robots.txt, FAQPage markup, semantic HTML, clean DOM) is the right work for 2026. A custom MCP server is the right next step if you are in a vertical where agent-to-agent commerce is already real (ecommerce on platforms enabling Stripe ACP, or B2B with agent-led quote negotiation), or if you have a meaningful API-first product. For everyone else, watch the WebMCP adoption curve and the ACP/UCP consolidation through Q4 2026 before committing engineering time.
You already are if you have done the foundational AI-Ready work. Atlas, Comet, and Dia reward the same things traditional accessibility and structured-content work rewards: clean DOM, real semantic HTML, exposed pricing, labeled buttons, accessible forms, working pagination. They punish the same things SMBs already know to fix: JavaScript-gated content, login walls, popups blocking primary content, unlabeled div elements pretending to be buttons. There is no separate “agentic browser optimization” project. There is just the structure and accessibility work, done well.
Probably not. Over 40% of agentic AI projects will be canceled by end of 2027 (Gartner, June 25, 2025), and most of the cancellations are chatbots that did not improve any metric the business actually cared about. Nielsen Norman Group’s lab studies show users treat chatbots like a search bar and abandon long, sycophantic responses. The better SMB investment in 2026 is structured answers (FAQs with schema, published pricing, clear policies) that any agent (the customer’s, yours, ChatGPT, Perplexity) can ground itself in. Build a chatbot if there is a specific high-volume, well-scoped question your team is answering manually that has a clear knowledge base behind it. Otherwise, ship the structured answer instead.
They already are. Forrester predicts one-third of retail marketplace projects will be deserted as answer engines steal traffic in 2026, and 20% of B2B sellers will be forced to engage in agent-led quote negotiations in 2026. The Stripe Agentic Commerce Suite GA in Q1 2026 onboarded URBN, Etsy, Coach, Kate Spade, Wix, Squarespace, BigCommerce, and WooCommerce on day one. For most SMB categories, the question is not “when will this start,” it is “what share of your inbound is already being routed by an agent that you cannot see in your analytics.”
Building agent-ready is the same work as building traditional SEO-ready and accessibility-ready (Core Web Vitals, schema, semantic HTML, clean DOM, accessible forms), with a thin additional machine-readable layer (llms.txt, pricing.md, AI-crawler allow lines in robots.txt, FAQPage schema). The cost is one engineer-week to one engineer-month depending on the current state. The benefit is compounding: every month you are not cited in agent-mediated answer environments is a month of pipeline routing to your better-structured competitor. Waiting is rarely the cheaper option, because the work is foundational hygiene, not a moonshot.
Both, sort of. Ahrefs’ 300,000-keyword study found AI Overviews now correlate with a 58% lower clickthrough rate for the top organic result (December 2025 data). That does not mean SEO is dead. It means the organic traffic that does land is more valuable, more buyer-intent, and more concentrated, and the rest is being arbitrated by an AI system that decides whether to surface you in the answer at all. The replacement model for the ad-supported open web (Stratechery proposes citation auctions paid in stablecoins) has not been publicly tested. Plan for less organic traffic in 2027 and beyond; rebalance toward direct, referral, and agent-mediated channels in parallel.
For most SMB sites on WordPress (or Shopify, Squarespace, Webflow, or Wix), 90 days. Days 1-30 are the foundation: Core Web Vitals, schema graph, llms.txt, pricing.md, AI crawlers allowed in robots.txt, FAQPage schema on long-form pages. Days 31-60 stand up at least one agent-callable channel (most often a voice agent on the phone line). Days 61-90 build the agent inbox: a daily-summary triage surface that aggregates voice, scheduler, CRM, web form, and review platform data into a ten-minute morning review. The sequence is mapped in the 90-Day SMB Readiness Sequence above. A complete rebuild is rarely required.
The agentic web is the web reshaped by AI agents that browse, decide, and transact on behalf of people. The agents read your website, talk to your voice agent, pay your processor, and book your time slot, often without a human ever rendering your homepage. Microsoft calls it “the open agentic web.” IEEE Spectrum calls it the redefinition of the internet. Sequoia and a16z describe it as the system-of-intelligence layer above the system-of-record. For a small business, it is the next 18-month shift in who actually visits the site and how they decide to buy.
AI search assumes a smarter SERP and treats the agent as a better search box. AI chatbots assume a synchronous conversation pinned to a single site and treat the agent as a smarter contact form. The agentic web assumes agents that browse, decide, transact, and persist across sessions on multiple surfaces (the website, the phone, the MCP server, the ACP endpoint, the CRM API, the agent’s own inbox). It is a system-level framing, not a feature framing, and the SMB response that follows is “make the business legible and callable from the outside,” not “add a widget.”
It has already arrived in measurable ways. 78% of enterprise AI teams reported at least one MCP-backed agent in production by April 2026. Stripe’s Agentic Commerce Suite went generally available in Q1 2026 with URBN, Etsy, Coach, Kate Spade, Wix, WooCommerce, BigCommerce, and Squarespace on day one. Forrester predicts 30% of enterprise app vendors will ship MCP servers in 2026 and 20% of B2B sellers will be forced into agent-led quote negotiations. The arrival pattern for SMBs is uneven by vertical: ecommerce and service-business inbound calls are early, B2B SaaS is fast behind, regulated verticals (legal, medical, financial) lag by 12 to 18 months.
Three things in order. (1) Make the website legible to agents: clean schema graph, llms.txt, pricing.md, AI crawlers explicitly allowed in robots.txt, FAQPage schema on long-form pages. That is the foundational layer. (2) Make one channel callable by an agent: usually the phone line, with a voice agent grounded in the same FAQ content the website serves. (3) Build the inbox: a daily-summary triage surface that compresses every agent-handled inbound from the prior 24 hours into a ten-minute morning review. The 90-Day SMB Readiness Sequence above walks the sequence step by step.
Where to Start
The next era of user experience on the web is not the agentic browser, not voice, not glasses. It is the morning agent inbox.
Voice handles phone calls. Agent-to-agent protocols handle commerce and integration. Hybrid browsers handle research and execution. Glasses are early.
All four drop their output into one place: the surface where the SMB owner triages what their agents did overnight.
If your website is not yet agent-readable, that is week one.
The foundational layer (semantic HTML, schema graph, llms.txt, pricing.md, allowed AI crawlers in robots.txt, FAQPage markup) is the same work that already makes your site faster for humans, cleaner for search engines, and more accessible for assistive technology. The AI-Ready Website Playbook covers the foundational layer in detail.
If your foundational layer is already in good shape, the next move is to make one channel agent-callable.
For most SMBs in 2026, that channel is the phone.
Stand up a voice agent grounded in your published FAQs, give it the ability to book appointments through your scheduler, and define explicit escalation rules. Run it on inbound only for the first two weeks. Listen to recordings. Tune.
If one channel is already agent-callable, the next move is the inbox itself.
Pick a workflow tool (n8n, Zapier, Make.com, Lindy, OpenAI AgentKit). Aggregate the voice provider, the scheduler, the CRM, the web form, and the review platform into one daily-summary message. Define escalation rules explicitly. Make sure the ten-minute morning review covers everything the agents did and everything the owner needs to decide.
The agent inbox is not a product you buy. It is a discipline you build.
The businesses that build it in 2026 set the rhythm their teams will run on for the next decade.
The agentic web is not a 2030 problem. It is a 2026 build, and the small businesses that prepare their websites and their inboxes for both customers and AI agents will be the ones who win the next decade of growth.
The agents are arriving on both sides of the conversation. The inbox is where they meet you in the morning.

Want help working through our 90-Day Sequence?
Book an Agentic Web Readiness consultation with us. The call is free, the audit is real, and you will leave with a prioritized plan whether or not you hire us to execute it.
Topics:


