Scaling WebSocket - Architecture Guide

A single server can handle tens of thousands of concurrent WebSocket connections. But when you need more, you run into problems that HTTP applications never face. WebSocket connections are stateful, long-lived, and tied to a specific server process.

This guide covers the practical steps for scaling WebSocket infrastructure from a single process to hundreds of thousands of concurrent connections.

Why WebSocket Is Hard to Scale

HTTP requests are stateless. A load balancer can send each request to any backend server, and the response comes back the same way. WebSocket breaks this model completely.

When a client opens a WebSocket connection, it stays open. The server holds that connection in memory. If the client sends a message, it must reach the same server process that holds the connection. If another client on a different server needs to receive that message, you need a way to route it across servers.

The core problems:

Stateful connections - Each connection is bound to one server process. You cannot move a live connection between servers.
Sticky sessions required - The load balancer must route a client to the same backend every time during the upgrade handshake and for the lifetime of the connection.
Memory per connection - Every open connection consumes memory for its socket buffer, application state, and any queued messages.
No standard load balancing - Round-robin and least-connections algorithms do not account for long-lived connections. A server that was “least connected” five minutes ago may now be the most loaded.

Single Server Limits

Before scaling horizontally, understand what one server can do.

The operating system tracks each WebSocket connection as an open file descriptor. The default limit on most Linux systems is 1,024 file descriptors per process. You will hit this wall fast.

With proper tuning, a single Node.js process can handle 100,000 or more concurrent WebSocket connections. The old “C10K problem” (handling 10,000 concurrent connections) is solved. The C1M problem (one million connections) is where real architectural work begins.

Practical limits per server:

Resource	Default	Tuned
File descriptors (per process)	1,024	1,000,000+
Memory per connection	~10 KB idle	~2 KB optimized
Connections per 1 GB RAM	~25,000	~100,000
CPU overhead (idle connections)	Minimal	Minimal

The bottleneck shifts depending on your workload. Idle connections cost memory. Active connections cost CPU. High-throughput messaging costs both.

Vertical Scaling

Start here. Vertical scaling is simpler than horizontal and takes you further than most people expect.

Increase File Descriptor Limits

Check your current limit:

ulimit -n

Set it higher for the current session:

ulimit -n 1000000

For a permanent change, edit /etc/security/limits.conf:

*    soft    nofile    1000000
*    hard    nofile    1000000

Tune Kernel Parameters

Add these to /etc/sysctl.conf and apply with sysctl -p:

# Increase the maximum number of open files system-wide
fs.file-max = 2000000

# Increase the range of local ports available
net.ipv4.ip_local_port_range = 1024 65535

# Increase socket buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535

# Enable TCP keepalive for detecting dead connections
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 6

# Reuse sockets in TIME_WAIT state
net.ipv4.tcp_tw_reuse = 1

Memory Optimization

Each WebSocket connection allocates send and receive buffers. The default buffer sizes in most WebSocket libraries are larger than necessary for typical messages. If your messages are small (under 1 KB), reduce the buffer sizes in your WebSocket server configuration.

For the ws library in Node.js:

const WebSocket = require('ws');

const wss = new WebSocket.Server({
  port: 8080,
  perMessageDeflate: false, // Disable compression to save CPU and memory
  maxPayload: 64 * 1024,   // Limit message size to 64 KB
});

Disabling per-message compression saves roughly 300 bytes of memory per connection and reduces CPU usage significantly under load.

Horizontal Scaling with Sticky Sessions

When one server is not enough, you add more servers behind a load balancer. The load balancer must support WebSocket connections and sticky sessions.

How Sticky Sessions Work

The load balancer inspects the initial HTTP upgrade request and routes it to a specific backend. All subsequent traffic on that connection goes to the same backend. Common routing strategies:

IP hash - Route based on the client’s IP address. Simple but breaks when clients share IPs (corporate NAT, mobile carriers).
Cookie-based - Set a cookie during the first request. More reliable than IP hash but requires the initial HTTP request to pass through the load balancer first.
Connection ID header - The application sets a custom header that the load balancer uses for routing. Most flexible but requires application-level coordination.

Nginx Configuration for Sticky Sessions

upstream websocket_servers {
    ip_hash;
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    server 10.0.1.3:8080;
}

map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}

server {
    listen 80;
    server_name ws.example.com;

    location /ws {
        proxy_pass http://websocket_servers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;

        # Timeout for idle WebSocket connections (seconds)
        proxy_read_timeout 3600;
        proxy_send_timeout 3600;
    }
}

The proxy_read_timeout value is critical. Nginx closes idle connections after 60 seconds by default. For WebSocket, set this to a much higher value or implement application-level ping/pong frames to keep connections alive.

Pros and Cons of Sticky Sessions

Sticky sessions solve the routing problem but introduce new ones. If a server goes down, all connections on that server are lost. Clients must reconnect, and the load balancer routes them to a surviving server. This creates a thundering herd effect where one server failure causes a spike of reconnections across remaining servers.

Uneven distribution is another issue. Over time, some servers accumulate more long-lived connections than others. New connections go to the least-loaded server, but existing connections stay where they are.

Pub/Sub Architecture

Sticky sessions get connections to the right server. But what happens when server A needs to send a message to a client connected to server B? You need a message bus.

Redis pub/sub is the most common solution. Every WebSocket server subscribes to relevant channels on Redis. When a message needs to reach clients across servers, the originating server publishes to Redis, and all subscribed servers receive it.

Redis Pub/Sub with Socket.IO

const { Server } = require('socket.io');
const { createAdapter } = require('@socket.io/redis-adapter');
const { createClient } = require('redis');

async function createWebSocketServer(httpServer) {
  const io = new Server(httpServer);

  const pubClient = createClient({ url: 'redis://10.0.2.1:6379' });
  const subClient = pubClient.duplicate();

  await Promise.all([pubClient.connect(), subClient.connect()]);

  io.adapter(createAdapter(pubClient, subClient));

  io.on('connection', (socket) => {
    socket.join('global-room');

    socket.on('message', (data) => {
      // This broadcast reaches clients on ALL servers
      io.to('global-room').emit('message', data);
    });
  });

  return io;
}

Raw Redis Pub/Sub with the ws Library

If you are not using Socket.IO, you can wire Redis pub/sub directly:

const WebSocket = require('ws');
const Redis = require('ioredis');

const wss = new WebSocket.Server({ port: 8080 });
const publisher = new Redis('redis://10.0.2.1:6379');
const subscriber = new Redis('redis://10.0.2.1:6379');

// Map of local connections
const clients = new Map();

subscriber.subscribe('chat-channel');

subscriber.on('message', (channel, message) => {
  const data = JSON.parse(message);
  // Send to all local clients (skip the original sender if local)
  for (const [id, ws] of clients) {
    if (id !== data.senderId && ws.readyState === WebSocket.OPEN) {
      ws.send(data.payload);
    }
  }
});

wss.on('connection', (ws) => {
  const clientId = crypto.randomUUID();
  clients.set(clientId, ws);

  ws.on('message', (raw) => {
    publisher.publish('chat-channel', JSON.stringify({
      senderId: clientId,
      payload: raw.toString(),
    }));
  });

  ws.on('close', () => {
    clients.delete(clientId);
  });
});

Redis pub/sub adds latency (typically 0.1 to 0.5 ms on a local network) but allows you to scale to dozens of WebSocket servers without any server needing to know about the others.

Dedicated WebSocket Servers vs Application Servers

Separating your WebSocket tier from your HTTP API tier is one of the most effective architectural decisions you can make.

Your HTTP API handles short-lived request/response cycles. It benefits from stateless horizontal scaling. Your WebSocket servers hold long-lived connections and manage real-time state. These two workloads have different resource profiles, scaling needs, and failure modes.

Benefits of separation:

Scale each tier independently. If you need more WebSocket capacity, add WS servers without touching your API.
Deploy API changes without dropping WebSocket connections. Your WS servers can run a stable, rarely-updated codebase.
Tune server configurations for each workload. WS servers get high file descriptor limits and large memory. API servers get more CPU cores.
Isolate failures. A crash in your API does not disconnect WebSocket clients.

The HTTP API communicates with the WebSocket tier through the same Redis pub/sub bus. When an API endpoint needs to push a real-time update, it publishes a message to Redis. The WebSocket servers pick it up and deliver it to connected clients.

Connection State Management

When you run multiple WebSocket servers, connection state becomes a distributed systems problem. Each server knows about its own connections, but no single server has the complete picture.

Store connection metadata in Redis:

// On connection
await redis.hset(`ws:connections:${clientId}`, {
  serverId: SERVER_ID,
  userId: authenticatedUserId,
  connectedAt: Date.now(),
  rooms: JSON.stringify(['global', 'team-42']),
});
await redis.expire(`ws:connections:${clientId}`, 86400);

// On disconnect
await redis.del(`ws:connections:${clientId}`);

This shared state enables several important operations:

Presence queries - Check which users are currently online across all servers.
Targeted messaging - Look up which server holds a specific user’s connection and route messages accordingly.
Reconnection handling - When a client reconnects to a different server, the new server can restore the client’s room subscriptions and replay missed messages.

For reconnection, the client should send a last-received message ID. The server looks up the client’s previous state in Redis and replays any messages published after that ID. This requires storing recent messages in a Redis list or stream, not just publishing them through pub/sub.

Common Architectures

Single Server (Under 50K Connections)

One server, no load balancer, no Redis. This is the right starting point for most applications. A tuned Node.js server with the ws library handles 50,000 idle connections on a machine with 2 GB of RAM.

Good for: MVPs, internal tools, small SaaS products.

Multiple Servers with Redis Pub/Sub (Under 500K Connections)

Three to ten WebSocket servers behind a load balancer with sticky sessions. Redis handles cross-server messaging. Each server runs the same application code.

Good for: Production SaaS, chat applications, collaborative tools, real-time dashboards.

Dedicated WebSocket Gateway with Microservices (Over 500K Connections)

A stateless gateway tier handles connection management, authentication, and message routing. Backend microservices handle business logic and publish events through a message bus (Redis Streams, Kafka, or NATS). The gateway tier translates these events into WebSocket frames for connected clients.

Good for: Large-scale platforms, gaming, financial data feeds.

Load Balancer Configuration

Nginx WebSocket Proxy with Health Checks

upstream websocket_backend {
    least_conn;
    server 10.0.1.1:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.2:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.3:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl;
    server_name ws.example.com;

    ssl_certificate /etc/ssl/certs/ws.example.com.pem;
    ssl_certificate_key /etc/ssl/private/ws.example.com.key;

    location /ws {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;

        # Buffer settings
        proxy_buffering off;
        proxy_buffer_size 8k;
    }

    # Health check endpoint (served by your WS servers)
    location /health {
        proxy_pass http://websocket_backend;
        proxy_read_timeout 5s;
    }
}

Key details: proxy_buffering off prevents Nginx from buffering WebSocket frames, which reduces latency. The proxy_read_timeout of 86400 seconds (24 hours) allows long-lived connections. Adjust this based on your application’s expected connection lifetime.

HAProxy Alternative

If you prefer HAProxy, the relevant configuration:

frontend ws_frontend
    bind *:443 ssl crt /etc/ssl/certs/ws.example.com.pem
    default_backend ws_backend

backend ws_backend
    balance source
    option httpchk GET /health
    timeout server 86400s
    timeout tunnel 86400s
    server ws1 10.0.1.1:8080 check inter 5s fall 3 rise 2
    server ws2 10.0.1.2:8080 check inter 5s fall 3 rise 2
    server ws3 10.0.1.3:8080 check inter 5s fall 3 rise 2

The timeout tunnel setting is specific to HAProxy and controls how long a WebSocket connection can remain idle.

Monitoring

You cannot scale what you cannot measure. Track these metrics for every WebSocket server:

Connections per server - Current open connection count. Alert if any server exceeds 80% of its tuned capacity.
Message throughput - Messages sent and received per second, per server. Spikes may indicate broadcast storms or misbehaving clients.
Memory usage - RSS memory of the WebSocket process. A steady climb indicates a connection or buffer leak.
Reconnection rate - How often clients reconnect. A sudden spike indicates server instability, network issues, or aggressive timeouts.
Redis pub/sub latency - Time between publishing a message and receiving it on another server. Should stay under 1 ms on a local network.
Message queue depth - If your servers queue outbound messages, track the queue size. Growing queues mean your servers cannot keep up with the message rate.

Expose these metrics through a /metrics endpoint in Prometheus format. Use Grafana dashboards to visualize connection distribution, message throughput, and per-server resource usage.

A simple metrics middleware for Node.js:

let connectionCount = 0;
let messagesSent = 0;
let messagesReceived = 0;

wss.on('connection', (ws) => {
  connectionCount++;

  ws.on('message', () => { messagesReceived++; });
  ws.on('close', () => { connectionCount--; });
});

// Expose for Prometheus scraping
app.get('/metrics', (req, res) => {
  res.set('Content-Type', 'text/plain');
  res.send([
    `ws_connections_current ${connectionCount}`,
    `ws_messages_sent_total ${messagesSent}`,
    `ws_messages_received_total ${messagesReceived}`,
    `ws_memory_rss_bytes ${process.memoryUsage().rss}`,
  ].join('\n'));
});