Skip to content

Memory Leak With Concurrent Requests #3897

@winrid

Description

@winrid

Version
1.6

[package]
name = "TEST"
version = "0.1.0"
edition = "2021"

[profile.release]
debug = true

[dependencies]
tokio = { version = "1.45", features = ["full"] }
hyper = { version = "1.6.0", features = ["server", "http1", "http2"] }
hyper-util = { version = "0.1", features = ["server", "tokio"] }
http-body-util = "0.1"

tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

Platform

6.11.0-25-generic #25~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 15 17:20:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Description
Hello :)

I am working on migrating a Java service to Rust, but so far the Hyper version's memory use seems unbounded and is using twice what the Java version used to use before we have to kill it. This service doesn't do much CPU work at all but it does handle lots of concurrent requests.

After the first run, the service sits at 270mb RSS.
After the second run, it sits at 484mb RSS.
3rd run: 699mb.
4th run: 868mb.

A simple NodeJS server completes the test with 100mb of ram used. I am not sure what I am doing wrong :(

  1. I have tried with and without keepalive.
  2. It only happens with concurrent requests. Sequential requests do not cause the problem.
  3. I have tried with different allocators.
  4. Heaptrack shows the allocations from Hyper.

If it was reusing buffers I'd expect the memory to go up and sit there, but it seems to continually increase.

use hyper::body::Bytes;
use hyper::server::conn::http1;
use hyper::service::service_fn;
use hyper::{Method, Request, Response, StatusCode};
use hyper::header::{CONNECTION, HeaderValue};
use hyper_util::rt::TokioIo;
use http_body_util::Full;
use std::convert::Infallible;
use std::net::SocketAddr;
use tokio::net::TcpSocket;
use tracing::{error, info};

async fn handle_request(req: Request<hyper::body::Incoming>) -> Result<Response<Full<Bytes>>, Infallible> {
    match (req.method(), req.uri().path()) {
        (&Method::GET, "/test") => {
            let resp = Response::new(Full::new(Bytes::from("OK")));
            Ok(resp)
        }
        _ => {
            let mut response = Response::new(Full::new(Bytes::from("Not Found")));
            *response.status_mut() = StatusCode::NOT_FOUND;
            Ok(response)
        }
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    tracing_subscriber::fmt()
        .with_env_filter(tracing_subscriber::EnvFilter::from_default_env())
        .init();

    let port = 3000;
    info!("Starting HTTP server on port {}", port);
    let addr = SocketAddr::from(([0, 0, 0, 0], 3000));

    let socket = TcpSocket::new_v4()?;
    // socket.set_keepalive(false)?;
    socket.bind(addr)?;
    let listener = socket.listen(1024)?;
    info!("HTTP server listening on {}", addr);

    loop {
        let (stream, _) = listener.accept().await?;
        let io = TokioIo::new(stream);

        tokio::task::spawn(async move {
            if let Err(err) = http1::Builder::new()
                .serve_connection(io, service_fn(handle_request))
                .await
            {
                error!("Error serving connection: {:?}", err);
            }
        });
    }
}

Running it:

ulimit -n 65000
cargo run --release

To run, if you have npm:

npm i -g autocannon
autocannon -c 20000 -d 10 http://localhost:3000/test

(will fire a total of around 100k req on my machine)

Activity

added
C-bugCategory: bug. Something is wrong. This is bad!
on Jun 3, 2025
seanmonstar

seanmonstar commented on Jun 3, 2025

@seanmonstar
Member

Hey there! People have occassionally mentioned a leak before, but it seems to be very hard to identify if it is indeed in hyper. I haven't been able to reproduce in various systems.

A common fix is to use something jemallocator. Beyond that, if it's possible to wrong something like valgrind to trace where the memory is being held, that'd make it possible for a contributor to handle this, or identify if it's somewhere else.

winrid

winrid commented on Jun 3, 2025

@winrid
Author

Hello @seanmonstar

As I said above I already tried several allocators and heaptrack shows the leak happening from hyper. Hundreds of mb of allocations from hyper allocating buffers via Vecs, it only happens under concurrent load, and is extremely easy to reproduce. I provided all the code and steps to reproduce it.

So far with Valgrind it doesn't seem to work as it runs too slow under valgrind to reproduce the issue

dayvejones

dayvejones commented on Jun 3, 2025

@dayvejones

Hello! @seanmonstar @winrid

  1. Add connections counters (or use alternative from OS) and make sure that connections (not requests) are closed successfully and not hanging idle with keep-alive. Hyper really has very few capabilities for control and monitoring connections.

  2. Try mimalloc

I also did several tests(but with k6 tool.

cargo.toml

[package]
name = "example-hello-world"
version = "0.1.0"
edition = "2021"
publish = false

[dependencies]
tokio = { version = "1.45.1", features = ["full"] }
hyper = { version = "1.6.0", default-features=false, features = ["server", "http1"] }
hyper-util = { version = "0.1", features = ["server", "tokio"] }
http-body-util = "0.1"
mimalloc = { version = "0.1.46"}
tracing = "0.1.41"
tracing-subscriber = { version = "0.3.19", features = ["env-filter"] } 

main.rs

use hyper::body::Bytes;
use hyper::server::conn::http1;
use hyper::service::service_fn;
use hyper::{Request, Response};
use hyper_util::rt::{TokioIo, TokioTimer};
use http_body_util::Full;
use tokio::time;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::util::SubscriberInitExt;
use tracing_subscriber::{fmt, EnvFilter};
use std::convert::Infallible;
use std::net::SocketAddr;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
use std::time::Duration;
use tokio::net::TcpListener;
use tracing::{error, info};
use mimalloc::MiMalloc;

#[global_allocator]
static GLOBAL: MiMalloc = MiMalloc;

#[derive(Clone)]
struct ConnectionCounter {
    count: Arc<AtomicUsize>,
}

impl ConnectionCounter {
    fn new() -> Self {
        Self {
            count: Arc::new(AtomicUsize::new(0)),
        }
    }

    fn increment(&self) {
        self.count.fetch_add(1, Ordering::SeqCst);
    }

    fn decrement(&self) {
        self.count.fetch_sub(1, Ordering::SeqCst);
    }

    fn current(&self) -> usize {
        self.count.load(Ordering::SeqCst)
    }
}

async fn test(_: Request<impl hyper::body::Body>) -> Result<Response<Full<Bytes>>, Infallible> {
    //tokio::time::sleep(Duration::from_micros(50)).await;

    Ok(Response::new(Full::new(Bytes::from("Test!"))))
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let fmt_layer = fmt::layer()
        .with_target(false);

    let filter_layer = EnvFilter::try_from_default_env()
        .or_else(|_| EnvFilter::try_new("info"))
        .unwrap();

    tracing_subscriber::registry()
        .with(filter_layer)
        .with(fmt_layer)
        .init();

    let addr: SocketAddr = ([0, 0, 0, 0], 3000).into();
    let listener = TcpListener::bind(addr).await?;
    let counter: ConnectionCounter = ConnectionCounter::new();

    tokio::spawn(track_connections_counter(counter.clone()));

    loop {
        let (stream, _) = listener.accept().await?;

        let counter = counter.clone();

        let io = TokioIo::new(stream);

        tokio::task::spawn(async move {
            counter.increment();

            if let Err(err) = http1::Builder::new()
                //.header_read_timeout(Duration::from_secs(60*10))
                .timer(TokioTimer::new())
                .serve_connection(io, service_fn(test))
                .await
            {
                error!("Error serving connection: {:?}", err);
            }

            counter.decrement();
        });
    }

}

async fn track_connections_counter (connection_counter: ConnectionCounter) {
    let mut interval = time::interval(Duration::from_secs(1));

    loop {
        interval.tick().await;
        info!("Current connections counters: {}", connection_counter.current());
    }
}

script.js (for k6)

import http from 'k6/http';
import { sleep } from 'k6';

export let options = {
  vus: 10000,
  duration: '10s',
  maxRedirects: 0,
  userAgent: 'k6-load-test',
  setupTimeout: '12s',
  summaryTrendStats: ['avg', 'min', 'med', 'max', 'p(90.0)', 'count'],
};

export default function () {
  let res = http.get('http://localhost:3000', {
    headers: {
      'Connection': 'keep-alive',
    },
    tags: { name: 'hello_request' },
  });
}

build app
cargo build --release
start load test
k6 run script.js

Image
Image
Image

I tried different configurations many times and I have no memory leaks

winrid

winrid commented on Jun 3, 2025

@winrid
Author

Thanks for taking a look @dayvejones but you didn't show what memory usage you observed. I don't think it actually shows up as a leak in the leak detector, but the buffers that hyper uses seem to keep growing. I will try your examples. I had already tried mimalloc but will check again.

But again, not sure why the allocator would matter at all. Hyper should just reuse the memory it allocated. It certainly shouldn't balloon to several GB like I was seeing.

seanmonstar

seanmonstar commented on Jun 3, 2025

@seanmonstar
Member

I didn't mean to sound dismissive, sorry if it came across that way. While it might seem easy to reproduce, many of us have tried many different times and ways over the years and struggle to reproduce. For background, here was a previous report: #1790

I tried again today, and also could not get a leak. heaptracker does show a lot of memory is allocated by hyper, yes, but it all gets deallocated when the connections close. One possibility is that the allocator has cached memory, or another is that the OS hasn't bothered to reclaim the pages yet.

winrid

winrid commented on Jun 3, 2025

@winrid
Author

Hello @seanmonstar

it all gets deallocated when the connections close

In this case the connections should be reused with keepalive, no? I never see the memory use go back down. Also, again, a very simple vertx or nodejs server easily outperforms hyper in memory usage. I have tried with keepalive on and off. I will investigate further.

winrid

winrid commented on Jun 3, 2025

@winrid
Author
  1. @dayvejones the k6 tool somehow is not able to reproduce the issue even with vus: 20000. I can reproduce it only with autocannon -c 20000.
  2. The connections are closed (at least, from the hyper side):
2025-06-03T20:09:29.655467Z  INFO Current connections counters: 0
2025-06-03T20:09:30.655652Z  INFO Current connections counters: 0
2025-06-03T20:09:31.655166Z  INFO Current connections counters: 0
2025-06-03T20:09:32.655481Z  INFO Current connections counters: 0
2025-06-03T20:09:33.655601Z  INFO Current connections counters: 0
2025-06-03T20:09:34.655621Z  INFO Current connections counters: 14207
2025-06-03T20:09:35.654928Z  INFO Current connections counters: 14731
2025-06-03T20:09:36.655336Z  INFO Current connections counters: 15332
2025-06-03T20:09:37.655184Z  INFO Current connections counters: 15773
2025-06-03T20:09:38.655304Z  INFO Current connections counters: 16213
2025-06-03T20:09:39.654722Z  INFO Current connections counters: 16693
2025-06-03T20:09:40.655227Z  INFO Current connections counters: 17167
2025-06-03T20:09:41.655065Z  INFO Current connections counters: 17641
2025-06-03T20:09:42.655115Z  INFO Current connections counters: 18204
2025-06-03T20:09:43.655389Z  INFO Current connections counters: 18777
2025-06-03T20:09:44.655292Z  INFO Current connections counters: 19358
2025-06-03T20:09:45.655272Z  INFO Current connections counters: 19871
2025-06-03T20:09:46.654677Z  INFO Current connections counters: 20000
2025-06-03T20:09:47.654667Z  INFO Current connections counters: 8325
2025-06-03T20:09:48.654742Z  INFO Current connections counters: 8325
2025-06-03T20:09:49.655326Z  INFO Current connections counters: 0
2025-06-03T20:09:50.655555Z  INFO Current connections counters: 0
2025-06-03T20:09:51.655561Z  INFO Current connections counters: 0

For some reason even after the test is done (I've waited a few minutes) the memory usage has continued to climb from 490mb to 528mb. I will continue to monitor...

dayvejones

dayvejones commented on Jun 3, 2025

@dayvejones

I had large memory leaks when one of the server clients used envoy as a reverse proxy, which opened more than 10,000 connections for 500 rps with an average duration of 10 milliseconds via http1. And 99.9% of connections were simply waiting in idle. Until I made a connection counter, I couldn’t find the problem, because rps was low.

And I don’t want to disable keep-alive completely for same corrupted clients. And there seem to be no other settings for http1

winrid

winrid commented on Jun 3, 2025

@winrid
Author

With Dayve's code, locally, not handling any traffic, with mimalloc, the memory usage grew from 490mb to 621mb in 5mins. No connections were made in that time.

@dayvejones In production this service does not sit behind a proxy. It terminates SSL itself. I will add the connection counter.

seanmonstar

seanmonstar commented on Jun 3, 2025

@seanmonstar
Member

That sounds a lot like it's something else, then. Because if there's no connections running, then hyper isn't doing anything.

winrid

winrid commented on Jun 3, 2025

@winrid
Author

That was the rss of the server itself. Although I think maybe that was just behavior from this allocator, maybe after the test it decided to allocate more memory to do some sort of background work, because then it evened off.

Running a few more times I can't repro the issue, memory is staying flat at around 600mb, so maybe there is some sort of heap fragmentation or something that is affecting libc and jemalloc but not mimalloc. I thought I had tried mimalloc before, but maybe I messed up the test.

I'll continue to investigate and close if I don't find anything...

dayvejones

dayvejones commented on Jun 3, 2025

@dayvejones

@seanmonstar how do you recommend measuring connections? Is my code ok or is there something better?

And is it possible to monitor how many connections are active? how many are idle?

winrid

winrid commented on Jun 3, 2025

@winrid
Author

I've added Dayve's connection tracking to grafana in prod:

Image

The counts are not very high :)

Still seeing memory usage grow, but will monitor for a couple days and then go from there. I've also switched prod to mimalloc.

dayvejones

dayvejones commented on Jun 3, 2025

@dayvejones

winrid 20,000 connections from autocannon are not visible :)

winrid

winrid commented on Jun 3, 2025

@winrid
Author

@dayvejones the traffic tends to be very spiky :) also if the requests are completing very quickly, which most do, they won't be included in the count. For example right now there are 5k connected clients, but the graph never goes up over 5. But that's ok, it's just for detecting leaks.

dayvejones

dayvejones commented on Jun 3, 2025

@dayvejones

I see that during the entire load test there are 10 thousand open continuous connections. And requests are flying in the hundreds of thousands.
Your problem is something else

winrid

winrid commented on Jun 3, 2025

@winrid
Author

@dayvejones requests completing has nothing to do with design causing heap fragmentation or memory use...

winrid

winrid commented on Jun 3, 2025

@winrid
Author

Anyways, thanks for your patience look into this. if I still observe the issue I'll try to debug it and create a PR, or at the very least try to see if the fragmentation can be reduced by reducing small heap allocations or something.

dayvejones

dayvejones commented on Jun 4, 2025

@dayvejones

@seanmonstar hello Sean, am I correctly measuring number of connections? And is it possible to distinguish idle and active connections for metrics?

#3897 (comment)

seanmonstar

seanmonstar commented on Jun 4, 2025

@seanmonstar
Member

That looks fine. Each connection is literally a Connection type, and they don't know anything about each other, so a counter like you have is fine.

There is not currently a mechanism to observe outside the active/idle state. You can set a header read timeout, to prevent idle time for too long.

winrid

winrid commented on Jun 4, 2025

@winrid
Author

@seanmonstar there is no such header for http2. AFAIK the server has to handle it.

winrid

winrid commented on Jun 4, 2025

@winrid
Author

Related: hyperium/h2#827
We should try to get this improvement made to h2 first, I guess, along with an idle timeout for http2, and then we can update hyper, I think.

winrid

winrid commented on Jun 4, 2025

@winrid
Author

But yeah, it looks like this is my issue, as the http2 server doesn't seem to ever be closing connections:

ss -s
Total: 80715
TCP:   80611 (estab 80541, closed 3, orphaned 62, timewait 2)

Transport Total     IP        IPv6
RAW       0         0         0        
UDP       0         0         0        
TCP       80608     80605     3        
INET      80608     80605     3        
FRAG      0         0         0    

after restarting the server it goes back down to 8044 :)

Using Dayve's metric tracking there's no leak with unresolved futures from the app side - so it's not a bug in the app layer (thank heavens).

I don't see any way to control this from axum, but I can rip that out and just use raw hyper once it's ready.

winrid

winrid commented on Jun 7, 2025

@winrid
Author

I did some more testing, it looks like hyper properly closes the http2 connections once they are idle. Sorry for the trouble.

winrid

winrid commented on Jun 17, 2025

@winrid
Author

So to resolve this problem in desperation I switched to Actix, and the issue went away:

Image

I also carefully switched back to Hyper just to double check...

I think now the problem I have isn't actually a memory issue in Hyper but a connection leak, I am not sure why yet or under what circumstances as I can verify individual connections are properly closed. The CPU usage actually comes from the sidecar on the machines scanning the connections, as it gets up to 50k+ connections the sidecar is not happy, so that's certainly a problem but it doesn't occur with Actix.

@seanmonstar LMK if you want access to the code to see what is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: bug. Something is wrong. This is bad!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @seanmonstar@winrid@dayvejones

        Issue actions

          Memory Leak With Concurrent Requests · Issue #3897 · hyperium/hyper