MCP & Agent Protocolsmcpstatelessmigrationload-balancing

Migrating Your Remote MCP Server to the Stateless 2026-07-28 Spec: A Field Guide

HSMalik Hamza ShabbirJune 11, 2026Updated June 11, 20269 min read

In short

The 2026-07-28 MCP spec (release candidate locked 21 May 2026, final 28 July 2026) lets remote servers run stateless: no initialize round-trip, no Mcp-Session-Id header, and no sticky sessions. Migrate now while you still have a validation window, because stateless servers sit behind a plain round-robin load balancer and scale horizontally without affinity. I cover the exact code changes and the SDK gotchas in AI agents and automation.

Migrating Your Remote MCP Server to the Stateless 2026-07-28 Spec: A Field Guide

On this page

What actually changed in the 2026-07-28 spec?
How do I drop the initialize handshake safely?
How do I remove the Mcp-Session-Id header?
What about sessions I actually need, like auth or rate limits?
How do I swap sticky load balancing for round-robin?
What are the Tier-1 SDK gotchas in the validation window?
What does a sane 10-week migration plan look like?

The short version: the MCP 2026-07-28 spec lets a remote server run fully stateless, which means you can delete the initialize handshake round-trip, stop emitting the Mcp-Session-Id header, and put your server behind a plain round-robin load balancer instead of a sticky one. The release candidate locked on 21 May 2026 and the final spec publishes 28 July 2026, so you have a real validation window right now. In this guide I walk through the exact code changes I make on a migration, the load-balancer swap, and the Tier-1 SDK gotchas that bite people who flip the switch without reading the transport defaults.

What actually changed in the 2026-07-28 spec?

The headline change is that statelessness becomes a first-class transport mode rather than a workaround. A compliant 2026-07-28 server can treat every HTTP request as self-contained, so it does not need to remember anything about the client between calls.

Before this, the Streamable HTTP transport leaned on a session. The client sent an initialize request, the server replied with capabilities and a session id in the Mcp-Session-Id response header, and the client echoed that header on every subsequent request. That session id is what forced sticky load balancing: request two had to land on the same process that handled request one, because that process held the live session in memory.

The new spec keeps that flow working for backward compatibility, but it blesses a mode where:

The client can send a tool call without ever issuing initialize.

The server never mints a Mcp-Session-Id, and the client never sends one.

Each request carries enough context (auth token, declared protocol version) to be served by any replica.

The practical payoff is horizontal scale. Once no single process owns a session, you can run ten replicas behind a dumb load balancer and add an eleventh during a traffic spike with zero affinity logic.


Aspect	Stateful (pre-2026-07-28 default)	Stateless (2026-07-28)
`initialize` handshake	Required before any tool call	Optional, can be skipped
`Mcp-Session-Id` header	Server-issued, client-echoed	Not used
Load balancer	Sticky sessions / affinity	Round-robin, no affinity
Per-connection memory	Held in the process	None, or in a shared store
Horizontal scaling	Painful, replicas not interchangeable	Trivial, every replica equal
SSE long-lived streams	Common	Discouraged, prefer per-request responses

How do I drop the initialize handshake safely?

You do not delete initialize so much as stop depending on it. The safe move is to make your tool handlers work whether or not an init ever happened, then let the stateless transport skip it.

In a stateful server, people stash negotiated state during init and read it later. That is exactly the coupling you remove. Pull anything you stored at init time into per-request context instead. In my experience the only things that genuinely live in init are the protocol version and the client capabilities, and both can travel on the request or be assumed from a sane default.

Here is the shape of a handler that no longer assumes a prior handshake:

// Before: handler assumes init ran and populated session state
server.tool("search_orders", async (args, ctx) => {
  const cfg = ctx.session.negotiatedConfig; // dies if no init
  return doSearch(args, cfg);
});

// After: everything the handler needs is derived per request
server.tool("search_orders", async (args, ctx) => {
  const cfg = resolveConfig({
    protocolVersion: ctx.protocolVersion ?? "2026-07-28",
    auth: ctx.auth,            // from the bearer token on THIS request
  });
  return doSearch(args, cfg);
});

The key discipline: a tool call must be answerable from the request alone. If a handler reaches for something only initialize could have set, that is a migration bug, not a feature.

One thing I always verify is capability declaration. Without init, the client cannot ask "what can you do" mid-conversation as cleanly, so make sure your tool list and schemas are correct and discoverable on the first call. If you are also shipping interactive UI, the same per-request discipline applies, and I cover that pattern in shipping your first MCP app with interactive UI ↗.

How do I remove the Mcp-Session-Id header?

Stop reading it, stop writing it, and stop requiring it. The header was the load balancer's leash, and once your handlers are request-scoped, it has no job left to do.

There are two failure modes I watch for. First, a server that still sets Mcp-Session-Id on responses even after you "went stateless" because the SDK transport defaults to stateful. The client then dutifully echoes it, and a naive proxy may start pinning traffic again. Second, middleware that rejects requests lacking the header. Both are easy to miss because nothing throws, the server just behaves statefully under the hood.

A quick guard I add during migration is an assertion in a test that the response carries no session header:

const res = await fetch(`${BASE}/mcp`, {
  method: "POST",
  headers: { "content-type": "application/json", "mcp-protocol-version": "2026-07-28" },
  body: JSON.stringify({
    jsonrpc: "2.0", id: 1, method: "tools/call",
    params: { name: "search_orders", arguments: { q: "INV-1024" } },
  }),
});

if (res.headers.get("mcp-session-id")) {
  throw new Error("Server is still emitting Mcp-Session-Id; transport is not stateless");
}

Notice there is no initialize call before the tools/call. That single request is the whole interaction. If it returns a valid result and no session header, your transport is genuinely stateless.

What about sessions I actually need, like auth or rate limits?

Stateless does not mean amnesiac, it means the state does not live in the process. Move anything you truly need to persist into a shared store or carry it on the request, so any replica can reconstruct it.

In practice I split server-side memory into three buckets:

Auth identity. This already belongs on the request as a bearer token. Validate it per call. Never derive identity from a session id.

Short-lived working state, like a multi-step tool flow. Put it in Redis keyed by something the client sends, or encode it in an opaque token you return and the client passes back.

Rate limits and quotas. Move counters to a shared store (Redis, a database, or your API gateway) instead of an in-process counter that only works when the same replica sees every request.

// Stateless session: state lives in Redis, keyed by a client-supplied id
async function loadFlow(flowId: string) {
  const raw = await redis.get(`flow:${flowId}`);
  return raw ? JSON.parse(raw) : { step: 0 };
}
async function saveFlow(flowId: string, state: object) {
  await redis.set(`flow:${flowId}`, JSON.stringify(state), "EX", 900);
}

The mental model that keeps me honest: any single request could be served by a process that has been alive for three seconds and has never seen this client before. If that is true, you are stateless.

How do I swap sticky load balancing for round-robin?

Once no replica owns a session, you delete the affinity rules and switch the balancer to round-robin or least-connections. That is the whole point of the migration, and it is usually a few lines of config plus a deploy.

If you are on NGINX, drop the ip_hash or sticky cookie directive:

NGINX

## Before: sticky, pins each client to one upstream
upstream mcp_backend {
    ip_hash;
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;
}

## After: plain round-robin, every replica interchangeable
upstream mcp_backend {
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;
    server 10.0.1.13:8080;  # added with zero affinity wiring
}

On a cloud load balancer the equivalent is turning off session affinity (sometimes called sticky sessions or session persistence) on the target group. The order matters: deploy the stateless server first, confirm it answers cold requests correctly, and only then flip the balancer. If you flip the balancer while the server still depends on in-process state, you get intermittent failures that are miserable to debug because they only show up when a request lands on the "wrong" replica.

I also drop the idle timeout for long-lived SSE streams. The 2026-07-28 stateless style prefers a request to get its response and close, rather than holding an open stream that effectively re-introduces affinity through the back door.

What are the Tier-1 SDK gotchas in the validation window?

The single biggest gotcha is that the official SDKs still default to the stateful transport, so "going stateless" is an explicit opt-in, not the absence of config. If you forget the flag, your server looks migrated but is not.

These are the traps I hit most often when I do this for clients:

Transport defaults to stateful. You must construct the Streamable HTTP transport in stateless mode explicitly. Leaving the session-id generator at its default means the SDK keeps minting Mcp-Session-Id.

Handlers that read init-time state compile fine and fail at runtime. Static typing does not catch "this field was only ever set during initialize." Test a cold tools/call with no prior handshake.

Version pinning. During the window between the 21 May 2026 release candidate and the 28 July 2026 final spec, pin the exact SDK version in your lockfile and re-run your conformance tests when you bump it. Behavior around the stateless transport is the thing most likely to shift between RC builds.

Protocol version negotiation. Without init, decide how you handle the MCP-Protocol-Version header. Set a sane default and reject only versions you truly cannot serve.

Clients that still send the header. Older clients keep echoing a session id even when you ignore it. Make sure ignoring it is harmless and does not trip validation middleware.

// The opt-in that actually makes it stateless
const transport = new StreamableHTTPServerTransport({
  sessionIdGenerator: undefined, // <- stateless: do NOT mint a session id
});
await server.connect(transport);

That sessionIdGenerator: undefined line is the whole migration in one place for a lot of servers. Everything else is making sure your handlers respect it.

What does a sane 10-week migration plan look like?

Treat the window as a validation exercise, not a big-bang cutover. Stage it so each step is independently reversible, and the load-balancer flip comes last.

A schedule I have used:

Weeks 1-2. Audit handlers for init-time and session-time coupling. List every place state is read.

Weeks 3-5. Make handlers request-scoped. Move working state to a shared store. Add the cold-call test.

Weeks 6-7. Flip the transport to stateless mode in staging. Verify no Mcp-Session-Id.

Weeks 8-9. Run both modes in parallel if your platform allows, watch error rates on cold replicas.

Week 10. Switch the load balancer to round-robin, remove affinity, scale out.

While you are in the auditing mindset, it is a good time to apply the same scrutiny to any third-party servers you depend on, which I cover in auditing a third-party MCP server before you trust it ↗. And if you maintain agent tooling across multiple hosts, keeping that logic portable pairs well with this work, which is the subject of building portable agent skills ↗.

If you want this migration handled cleanly before the deadline, that is the kind of work I take on through AI agents and automation ↗.

The SERP for this migration is wide open before the deadline, but more importantly the engineering is genuinely worth doing: stateless servers are cheaper to run and far less fragile under load. If you want a second pair of hands to run the audit or own the cutover, my contact page ↗ is the fastest way to reach me, and I am happy to look at your transport config before you flip anything in production.

FAQ

What changes in the MCP 2026-07-28 stateless spec?

Remote servers can skip the initialize handshake and the Mcp-Session-Id header, treating every request as self-contained so they no longer need sticky sessions or per-connection state.

Do I have to migrate my MCP server by 28 July 2026?

No, the handshake and session header remain supported for backward compatibility, but stateless is the path that lets you scale horizontally, so I migrate eagerly rather than waiting.

Can a stateless MCP server still keep sessions?

Yes, but you move the state out of the process into a shared store like Redis or a signed token, so any replica behind the load balancer can serve any request.

What load balancer do I need after going stateless?

A plain round-robin or least-connections balancer with no session affinity, because every replica is interchangeable once the per-connection state is gone.

What is the biggest SDK gotcha during migration?

Tier-1 SDKs still default to a stateful transport, so you must explicitly opt into the stateless mode or your server will keep emitting a Mcp-Session-Id and silently break round-robin routing.

Working on something like this?

I build web apps, AI features, and mobile products for clients. If this article matches a problem you have, tell me about it.

Start a conversation

Malik Hamza Shabbir · Full-Stack & AI Engineer

I build full-stack and AI products solo: a reputation SaaS in production, RAG pipelines, and React Native apps. I write from what I ship, not from documentation summaries.

About me