Description
We are using Gunicorn with multiple workers with the Gunicorn max_requests
option. We found that memory usage in our containers increases with every worker restart. By reducing the max_requests
value so workers restart very frequently, the problem became apparent.
Using the method described in the readme:
from prometheus_client import multiprocess
def child_exit(server, worker):
multiprocess.mark_process_dead(worker.pid)
While running a performance test, I see the following in the container:
[root@mock-adapter-8568fdcf6d-lz8tx app]# cd /app/prometheus_tmp
[root@mock-adapter-8568fdcf6d-lz8tx prometheus_tmp]# ls
counter_10.db counter_214.db counter_319.db counter_41.db counter_91.db histogram_196.db histogram_296.db histogram_401.db histogram_69.db
counter_111.db counter_222.db counter_31.db counter_425.db counter_97.db histogram_1.db histogram_305.db histogram_409.db histogram_77.db
counter_117.db counter_232.db counter_328.db counter_433.db counter_9.db histogram_202.db histogram_311.db histogram_419.db histogram_84.db
counter_130.db counter_23.db counter_338.db counter_444.db histogram_10.db histogram_214.db histogram_319.db histogram_41.db histogram_91.db
counter_138.db counter_244.db counter_345.db counter_450.db histogram_111.db histogram_222.db histogram_31.db histogram_425.db histogram_97.db
counter_148.db counter_251.db counter_353.db counter_458.db histogram_117.db histogram_232.db histogram_328.db histogram_433.db histogram_9.db
counter_154.db counter_258.db counter_361.db counter_470.db histogram_130.db histogram_23.db histogram_338.db histogram_444.db
counter_175.db counter_269.db counter_36.db counter_48.db histogram_138.db histogram_244.db histogram_345.db histogram_450.db
counter_187.db counter_277.db counter_374.db counter_56.db histogram_148.db histogram_251.db histogram_353.db histogram_458.db
counter_18.db counter_286.db counter_384.db counter_61.db histogram_154.db histogram_258.db histogram_361.db histogram_470.db
counter_196.db counter_296.db counter_401.db counter_69.db histogram_175.db histogram_269.db histogram_36.db histogram_48.db
counter_1.db counter_305.db counter_409.db counter_77.db histogram_187.db histogram_277.db histogram_374.db histogram_56.db
counter_202.db counter_311.db counter_419.db counter_84.db histogram_18.db histogram_286.db histogram_384.db histogram_61.db
Watching that directory, new files are created with every new worker and old ones are not being cleaned up. Deleting the files in that directory reduce down the memory usage on that container, which then starts building again.
What is the proper way of dealing with these database files? Is this mark_process_dead
's responsibility?
And a general question about Prometheus: do we need to keep metrics around after they are collected? Could we wipe our metrics after the collector hits our metrics endpoint? If so, can this be done via the prometheus_client?