The Framework and Persistence of Prometheus


What is Prometheus

Prometheus is an open source system monitoring and alarm tool, characterized by

  • Multidimensional data model (time series data consists of metric name and a set of key/value)

  • Flexible query language (PromQl) in multiple dimensions

  • Independent of distributed storage, single master node works.

  • Time series data are collected through HTTP-based pull mode.

  • Sequential data pushing can be performed through push gateway.

  • The target server to be collected can be obtained through service discovery or static configuration.

  • Support for various visual charts and dashboards

Pull mode

Prometheus uses pull, or pull model, to collect data. It collects indicators through HTTP protocol. As long as the application system can provide HTTP interface, it can access the monitoring system. Compared with private protocol or binary protocol, it is developed and simple.

Push mode

For this short-period index collection of scheduled tasks, if the pull mode is adopted, the task may be finished and Prometheus has not had time to collect. At this time, a transfer layer can be added. The client pushes the data to the Push Gateway cache and Prometheus comes from the push gateway pull index. (Additional Push Gateway needs to be built and new job needs to be added to collect data from gateway.)

Composition and structure

  • Prometheus server
    It is mainly responsible for data acquisition and storage, and provides support for PromQL query language.

  • Client sdk
    Official client class libraries include go, java, scala, python, ruby, and many other class libraries developed by third parties, supporting nodejs, php, erlang, etc

  • Push Gateway
    Intermediate Gateway Supporting Temporary Job Active Push Indicators

  • PromDash
    Dashboard developed by rails for visualizing indicator data

  • exporters
    Indicators supporting other data sources are imported into Prometheus, supporting databases, hardware, message middleware, storage systems, http servers, jmx, etc.

  • alertmanager
    Experimental components for alarm

  • prometheus_cli
    Command line tools

  • Other auxiliary tools

The architecture diagram is as follows:

prometheus Architecture overview

Default configuration

docker exec -it a9bd827a1d18 less /etc/prometheus/prometheus.yml


# my global config
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
      monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

      - targets: ['localhost:9090']
  • scrape_interval
    This refers to capturing data every 15 seconds (Here)

  • evaluation_interval
    Refers to the interval between calculating rule

Push Gateway

Pushgateway has a separate mirror image.

docker pull prom/pushgateway

For applications that like to use push mode, a push gateway can be specially built to adapt.


Prometheus uses g’s LevelDB to index (PromSQL relies heavily on LevelDB.), for a large number of sampling data has its own storage layer, Prometheus creates a local file for each timing data, organized by 1024byte chunk.

magnetic disk file

Prometheus stores the file in the path specified by storage.local.path, which defaults to. /data. There are three kinds of chunk codes

  • type 0

First generation encoding format, simple delta encoding

  • type 1

The current default encoding format, double-delta encoding

  • type 2

Variable bit-width encoding, the encoding method used by Beringei, facebook’s time series database

Memory usage

Prometheus has saved the most recently used chunks in memory. The maximum number of chunks can be set by storage.local.memory-chunks, with a default value of 1048576, i.e. 1048576 chunks and a size of 1G.
In addition to the adopted data, prometheus also needs to carry out various operations on the data, so the overall memory cost will definitely be larger than the configured local.memory-chunks size. Therefore, the government proposes to reserve 3 times the local.memory-chunks memory size.

As a rule of thumb, you should have at least three times more RAM available than needed by the memory chunks alone

You can view Prometheus _ Local _ Storage _ Memory _ Chunks and process_resident_memory_byte through the metrics of the server.

  • prometheus_local_storage_memory_chunks

The current number of chunks in memory, excluding cloned chunks
Number of chunks Exposed in Current Memory

  • process_resident_memory_byte

Resident memory size in bytes
Size of data residing in memory

  • prometheus_local_storage_persistence_urgency_score
    Between 0 and 1, prometheus leaves rushed mode when the value is less than or equal to 0.7.

When it is greater than 0.8, enter the rushed mode.

  • prometheus_local_storage_rushed_mode
    1 indicates entering rushed mode, 0 indicates no. When entering rushed mode, prometheus will use the configuration of Storage.local.series-sync-strategy and Storage.local.checkpoint-interval to accelerate the persistence of chunks.

Storage parameters

docker run -p 9090:9090 \
-v /tmp/prometheus-data:/prometheus-data \
prom/prometheus \
-storage.local.retention 168h0m0s \
-storage.local.max-chunks-to-persist 3024288 \
-storage.local.memory-chunks=50502740 \


Set the maximum number of chunks retained in prometheus memory to 1048576 by default, which is 1G in size.


Used to configure the time for data storage, 168h0m0s is 24*7 hours, or 1 week.


Used to control the timing of rewriting sequence files, the default is to rewrite when 10% of chunks are removed. If the disk space is large enough and you do not want to rewrite frequently, you can increase the value, for example, 0.3, that is, when 30% of chunks are removed, rewrite will be triggered.


This parameter controls the maximum number of chunks waiting to be written to disk. If this number is exceeded, Prometheus will limit the sampling rate until this number falls to 95% of the specified threshold. It is recommended that this value be set to 50% of storage.local.memory-chunks. Prometheus will try its best to accelerate the storage speed to avoid sending in the case of current restriction.


When prometheus server is performing checkpoint operation or processing expensive queries, there will be a short pause in the operation of collecting indicators. This is because the mutexes allocated by prometheus to the time series may not be enough. The pre-allocated mutexes can be increased through this indicator, and sometimes it can be set to tens of thousands.


After controlling when to synchronize to disk after writing data, there are’ never’, ‘always’, ‘adaptive’. Synchronization operation can reduce data loss caused by operating system crash, but will reduce the performance of writing data.
The default policy is adaptive, that is, disks will not be synchronized immediately after data is written, and the operating system’s page cache will be used for batch synchronization.


The interval between checkpoints, that is, checkpoints are performed on memory chunks that have not been written to disk.