Load Testing WebSocket Applications

WebSocket load testing is fundamentally different from HTTP load testing because connections are long-lived and bidirectional. An HTTP load test measures how fast your server can handle discrete request-response cycles, but a WebSocket load test must account for thousands of persistent connections, continuous message flow in both directions, and the memory overhead each connection carries. This article walks you through the tools, techniques, and system tuning required to accurately load test WebSocket servers.

What to Measure

Before you pick a tool or write a script, you need to decide what metrics matter for your application. WebSocket servers fail in different ways than HTTP servers, and the metrics that signal trouble are different too.

Concurrent Connections

The most basic metric is how many simultaneous WebSocket connections your server can hold open. Each connection consumes memory for the socket buffer, any application-level state you attach to it, and the overhead of your WebSocket library. A chat application might need to support 50,000 concurrent connections per node, while a real-time dashboard might only need 5,000. Know your target before you start testing.

Message Throughput

Throughput measures how many messages per second your server can process, both inbound and outbound. This is especially important for applications like multiplayer game servers or collaborative editing tools where every connected client sends frequent updates. Measure inbound and outbound throughput separately, because bottlenecks often appear on only one side.

Latency Percentiles

Average latency is nearly useless for WebSocket applications. You want the 50th, 95th, and 99th percentile latencies. A server might show a 5ms average but have a 99th percentile of 800ms, meaning one in every hundred messages takes nearly a second to process. For real-time applications, that tail latency is what your users actually feel.

Memory Per Connection

Track your server’s resident memory as you increase the connection count. Divide total memory growth by the number of connections to get a per-connection cost. This number tells you how many connections a single server instance can hold before you hit memory limits. If you see 50KB per connection, a server with 4GB of available memory caps out around 80,000 connections in theory, less in practice due to garbage collection pressure and other overhead.

Load Testing with Artillery

Artillery is one of the most accessible tools for WebSocket load testing. It supports WebSocket scenarios out of the box and uses a YAML configuration format that is easy to read and modify.

Installation

Install Artillery globally via npm:

npm install -g artillery

Verify the installation:

artillery --version

Writing a WebSocket Scenario

Create a file called ws-load-test.yml:

config:
  target: "ws://localhost:8080"
  phases:
    - duration: 60
      arrivalRate: 10
      name: "Warm up"
    - duration: 120
      arrivalRate: 50
      name: "Sustained load"
    - duration: 60
      arrivalRate: 100
      name: "Peak load"

scenarios:
  - engine: ws
    flow:
      - send: '{"type": "auth", "token": "test-user-token"}'
      - think: 1
      - send: '{"type": "subscribe", "channel": "prices"}'
      - think: 2
      - loop:
          - send: '{"type": "ping"}'
          - think: 5
        count: 10

This configuration ramps from 10 new connections per second up to 100, simulating a traffic spike. Each virtual user authenticates, subscribes to a channel, and then sends periodic pings to keep the connection alive.

Running the Test

artillery run ws-load-test.yml

Artillery prints a summary every 10 seconds showing connection counts, message rates, and latency percentiles. After the test completes, you get a full report. For a more detailed HTML report:

artillery run ws-load-test.yml --output results.json
artillery report results.json

Reading the Results

Focus on these fields in the Artillery output:

websocket.send_rate and websocket.recv_rate: messages per second in each direction.
websocket.response_time.p95 and websocket.response_time.p99: the tail latencies that matter most.
websocket.errors: connection failures, timeouts, and protocol errors.
vusers.created vs vusers.completed: if created far exceeds completed, your server is dropping connections.

Load Testing with k6

k6 by Grafana Labs is a developer-centric load testing tool written in Go. It runs tests defined in JavaScript and has built-in WebSocket support. If you prefer writing test logic in code rather than YAML, k6 is a strong choice.

Installation

On macOS:

brew install k6

On Linux:

sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

Writing a k6 WebSocket Test

Create a file called ws-load-test.js:

import ws from "k6/ws";
import { check, sleep } from "k6";
import { Counter, Trend } from "k6/metrics";

const messageLatency = new Trend("ws_message_latency", true);
const messagesReceived = new Counter("ws_messages_received");

export const options = {
  stages: [
    { duration: "30s", target: 100 },
    { duration: "1m", target: 500 },
    { duration: "30s", target: 1000 },
    { duration: "1m", target: 1000 },
    { duration: "30s", target: 0 },
  ],
  thresholds: {
    ws_message_latency: ["p(95)<200", "p(99)<500"],
    ws_connecting: ["p(95)<1000"],
  },
};

export default function () {
  const url = "ws://localhost:8080/ws";

  const res = ws.connect(url, {}, function (socket) {
    socket.on("open", function () {
      socket.send(
        JSON.stringify({ type: "subscribe", channel: "updates" })
      );

      socket.setInterval(function () {
        const start = Date.now();
        socket.send(JSON.stringify({ type: "ping", ts: start }));
      }, 2000);
    });

    socket.on("message", function (msg) {
      messagesReceived.add(1);
      try {
        const data = JSON.parse(msg);
        if (data.type === "pong" && data.ts) {
          messageLatency.add(Date.now() - data.ts);
        }
      } catch (e) {
        // Non-JSON message, skip
      }
    });

    socket.on("error", function (e) {
      console.error("WebSocket error:", e.error());
    });

    socket.setTimeout(function () {
      socket.close();
    }, 30000);
  });

  check(res, {
    "WebSocket connection established": (r) => r && r.status === 101,
  });

  sleep(1);
}

Running and Setting Thresholds

k6 run ws-load-test.js

The thresholds block in the options tells k6 to fail the test run if the 95th percentile message latency exceeds 200ms or the 99th percentile exceeds 500ms. This is useful for CI pipelines where you want load tests to act as a gate. For more on testing WebSocket connections in general, see How to Test WebSocket Connections.

Custom Load Test with Node.js

Sometimes you need more control than Artillery or k6 offers. A custom Node.js script using the ws library gives you full flexibility to model complex client behavior.

Basic Connection Spawner

const WebSocket = require("ws");

const TARGET = "ws://localhost:8080";
const TOTAL_CONNECTIONS = 5000;
const RAMP_RATE = 50; // connections per second
const MESSAGE_INTERVAL = 3000; // ms between messages per client

let connected = 0;
let errors = 0;
let messagesSent = 0;
let messagesReceived = 0;
const latencies = [];

function createConnection(id) {
  return new Promise((resolve) => {
    const ws = new WebSocket(TARGET);

    ws.on("open", () => {
      connected++;
      ws.send(JSON.stringify({ type: "hello", clientId: id }));

      const interval = setInterval(() => {
        if (ws.readyState === WebSocket.OPEN) {
          const ts = Date.now();
          ws.send(JSON.stringify({ type: "ping", ts }));
          messagesSent++;
        }
      }, MESSAGE_INTERVAL);

      ws.on("close", () => {
        connected--;
        clearInterval(interval);
      });

      ws.on("message", (data) => {
        messagesReceived++;
        try {
          const msg = JSON.parse(data);
          if (msg.ts) {
            latencies.push(Date.now() - msg.ts);
          }
        } catch (e) {
          // ignore
        }
      });

      resolve(ws);
    });

    ws.on("error", (err) => {
      errors++;
      resolve(null);
    });
  });
}

function percentile(arr, p) {
  const sorted = arr.slice().sort((a, b) => a - b);
  const idx = Math.ceil((p / 100) * sorted.length) - 1;
  return sorted[Math.max(0, idx)];
}

async function run() {
  console.log(`Starting load test: ${TOTAL_CONNECTIONS} connections at ${RAMP_RATE}/sec`);

  const connections = [];

  for (let i = 0; i < TOTAL_CONNECTIONS; i++) {
    connections.push(createConnection(i));

    if ((i + 1) % RAMP_RATE === 0) {
      await new Promise((r) => setTimeout(r, 1000));
      console.log(
        `Progress: ${i + 1} initiated, ${connected} connected, ${errors} errors`
      );
    }
  }

  // Let messages flow for 60 seconds
  await new Promise((r) => setTimeout(r, 60000));

  console.log("\n--- Results ---");
  console.log(`Connected: ${connected}`);
  console.log(`Errors: ${errors}`);
  console.log(`Messages sent: ${messagesSent}`);
  console.log(`Messages received: ${messagesReceived}`);

  if (latencies.length > 0) {
    console.log(`Latency p50: ${percentile(latencies, 50)}ms`);
    console.log(`Latency p95: ${percentile(latencies, 95)}ms`);
    console.log(`Latency p99: ${percentile(latencies, 99)}ms`);
  }

  // Clean up
  const allWs = await Promise.all(connections);
  allWs.forEach((ws) => ws && ws.close());

  process.exit(0);
}

run();

Run this with:

node --max-old-space-size=4096 ws-load-test.js

The --max-old-space-size flag gives Node.js more heap memory, which you will need when holding thousands of open connections. For background on why WebSocket connections consume memory this way, see WebSocket Scalability.

Choosing a Load Testing Tool

For most teams, k6 is the best starting point. It balances scripting power with ease of use. Artillery works well for simpler scenarios defined in YAML. Custom scripts give you maximum control when your test logic is complex. If your organization already uses JMeter for HTTP load testing, JMeter WebSocket testing is possible via the WebSocket Samplers plugin, though configuration is more involved than k6 or Artillery.

OS-Level Tuning

Your load testing client machine and your server both need OS-level adjustments to handle thousands of concurrent connections. Without these changes, you will hit limits long before your application code becomes the bottleneck.

File Descriptors

Each WebSocket connection uses a file descriptor. The default limit on most Linux systems is 1024. Check your current limit:

ulimit -n

Increase it for the current session:

ulimit -n 65535

For a permanent change on Linux, edit /etc/security/limits.conf:

*    soft    nofile    65535
*    hard    nofile    65535

On macOS, the process is different:

sudo launchctl limit maxfiles 65535 200000
ulimit -n 65535

Ephemeral Port Range

Each outbound connection from your load test client uses an ephemeral port. The default range is often 32768-60999, giving you about 28,000 ports. If you need more connections from a single client machine, expand the range:

sudo sysctl -w net.ipv4.ip_local_port_range="1024 65535"

This gives you roughly 64,000 ports. For even more connections, bind to multiple local IP addresses.

TCP Tuning

On Linux, these settings help when running high-connection-count tests:

# Allow reuse of sockets in TIME_WAIT state
sudo sysctl -w net.ipv4.tcp_tw_reuse=1

# Increase the maximum number of connection tracking entries
sudo sysctl -w net.netfilter.nf_conntrack_max=1000000

# Increase socket buffer sizes
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216

Common Bottlenecks

When your load test starts showing failures or high latency, the cause usually falls into one of these categories.

CPU Saturation

If your server process is pinned at 100% CPU, it cannot process incoming messages fast enough. Common causes include JSON parsing of large messages, expensive business logic running synchronously in the message handler, and broadcasting messages to large subscriber lists without batching. Profile your application with the relevant tools for your runtime. In Node.js, use --prof or the Chrome DevTools profiler. Profile your server and optimize the hot paths first.

Memory Exhaustion

Memory issues surface as increasing garbage collection pauses (visible as latency spikes) or outright crashes. Track your server’s RSS (Resident Set Size) over time during the load test. If memory climbs linearly with connections and never flattens, you likely have a per-connection memory leak, often from event listeners that are never removed or buffers that grow unboundedly.

File Descriptor Limits

You will see “EMFILE: too many open files” errors when you hit this limit. The fix is the OS-level tuning described above. This is often the first wall people hit when load testing WebSocket servers for the first time.

Event Loop Blocking

In Node.js and similar single-threaded runtimes, any synchronous operation that takes more than a few milliseconds blocks all other connections. A single JSON.parse() call on a 1MB message can stall the event loop for 50ms or more. During that time, no other messages are processed and no new connections are accepted. Use worker threads or process the data in chunks for heavy operations.

Interpreting Results

Knowing what “good” looks like depends on your application, but some general guidelines apply.

Connection Establishment

A healthy server should accept new WebSocket connections (including the HTTP upgrade handshake) in under 100ms at the 99th percentile. If connection times exceed 500ms under load, your server is struggling to keep up with the accept queue. This often points to a CPU bottleneck or a slow authentication step in the upgrade handler.

Message Latency

For real-time applications like chat or gaming, aim for a p95 message latency under 100ms and a p99 under 250ms. For less time-sensitive applications like dashboards that update every few seconds, a p95 under 500ms is usually acceptable. If you see a large gap between p50 and p99, your server has periodic stalls, likely from garbage collection or event loop blocking.

Throughput Ceiling

Increase your message rate gradually until latency starts climbing or errors appear. The point just before degradation is your server’s throughput ceiling. Document this number. It tells you how much headroom you have and when you need to add capacity or optimize. For architectures that distribute load across multiple servers, see WebSocket Scalability.

Error Rates

Any error rate above 0.1% during normal load warrants investigation. During peak load tests, error rates up to 1% might be acceptable depending on your use case. Connection resets, timeouts, and protocol errors each point to different root causes.

CI Integration

Running load tests in your CI pipeline catches performance regressions before they reach production. The key is keeping the tests fast enough to run on every merge but meaningful enough to detect real problems.

Lightweight Smoke Tests

For every pull request, run a short load test (30-60 seconds) with modest concurrency (100-500 connections). This catches obvious regressions like accidentally making a message handler synchronous or introducing a memory leak. Use k6 thresholds to fail the pipeline if key metrics exceed your limits:

k6 run --duration 30s --vus 200 ws-smoke-test.js

If the test exits with a non-zero code (threshold exceeded), the pipeline fails.

Nightly Full Load Tests

Run longer, more intensive load tests on a schedule, perhaps nightly or before releases. These tests ramp to production-level concurrency and run for 10-30 minutes to expose issues that only appear under sustained load, like slow memory leaks or connection pool exhaustion.

A sample GitHub Actions workflow step:

- name: Run WebSocket load test
  run: |
    docker compose up -d websocket-server
    sleep 5
    k6 run \
      --out json=results.json \
      --duration 10m \
      --vus 1000 \
      ws-full-load-test.js
    docker compose down

Storing Historical Results

Store load test results (connection counts, latencies, throughput) in a time-series database or even a simple CSV file committed to your repository. Tracking these numbers over time lets you spot gradual performance degradation that a single test run cannot reveal. For a broader view of testing strategies, see How to Test WebSocket Connections.

Frequently Asked Questions

How many WebSocket connections can a single server handle?

The answer depends on your hardware, your application logic, and your OS configuration. A well-tuned Node.js server doing minimal per-message processing can hold 100,000 or more concurrent connections on a single machine with 8GB of RAM. If each connection triggers database queries or heavy computation, that number drops significantly. The only way to find your specific limit is to run the load tests described in this article against your actual application.

Should I load test from a single machine or multiple machines?

A single machine can typically generate 30,000-50,000 outbound WebSocket connections before hitting ephemeral port or file descriptor limits. For tests beyond that, use multiple client machines. Both Artillery and k6 support distributed execution. Artillery has a --cluster mode, and k6 offers the k6 Cloud service or you can orchestrate multiple k6 instances with Kubernetes. If you run from a single machine, make sure it is not the bottleneck: monitor its CPU and network usage during the test.

How do I load test WebSocket servers that require authentication?

Most load testing tools let you set custom headers on the initial HTTP upgrade request. In Artillery, use the headers block in your config. In k6, pass headers as the second argument to ws.connect(). For token-based auth, generate a pool of valid tokens before the test and distribute them across virtual users. For cookie-based auth, make an HTTP request to your login endpoint first, capture the cookie, and pass it to the WebSocket connection. The k6 HTTP and WebSocket APIs share a cookie jar, making this straightforward.

What is the difference between load testing and stress testing for WebSocket?

Load testing confirms your server handles the expected production load with acceptable latency and error rates. Stress testing pushes beyond expected limits to find the breaking point and observe how the server fails. Both are valuable. Run load tests regularly in CI to prevent regressions. Run stress tests periodically to understand your capacity ceiling and to verify that your server degrades gracefully, dropping new connections rather than crashing, for example, when it reaches its limit.