...

Text file src/github.com/prometheus/alertmanager/README.md

Documentation: github.com/prometheus/alertmanager

     1# Alertmanager [![CircleCI](https://circleci.com/gh/prometheus/alertmanager/tree/main.svg?style=shield)][circleci]
     2
     3[![Docker Repository on Quay](https://quay.io/repository/prometheus/alertmanager/status "Docker Repository on Quay")][quay]
     4[![Docker Pulls](https://img.shields.io/docker/pulls/prom/alertmanager.svg?maxAge=604800)][hub]
     5
     6The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct [receiver integrations](https://prometheus.io/docs/alerting/latest/configuration/#receiver) such as email, PagerDuty, OpsGenie, or many other [mechanisms](https://prometheus.io/docs/operating/integrations/#alertmanager-webhook-receiver) thanks to the webhook receiver. It also takes care of silencing and inhibition of alerts.
     7
     8* [Documentation](http://prometheus.io/docs/alerting/alertmanager/)
     9
    10## Install
    11
    12There are various ways of installing Alertmanager.
    13
    14### Precompiled binaries
    15
    16Precompiled binaries for released versions are available in the
    17[*download* section](https://prometheus.io/download/)
    18on [prometheus.io](https://prometheus.io). Using the latest production release binary
    19is the recommended way of installing Alertmanager.
    20
    21### Docker images
    22
    23Docker images are available on [Quay.io](https://quay.io/repository/prometheus/alertmanager) or [Docker Hub](https://hub.docker.com/r/prom/alertmanager/).
    24
    25You can launch an Alertmanager container for trying it out with
    26
    27    $ docker run --name alertmanager -d -p 127.0.0.1:9093:9093 quay.io/prometheus/alertmanager
    28
    29Alertmanager will now be reachable at http://localhost:9093/.
    30
    31### Compiling the binary
    32
    33You can either `go get` it:
    34
    35```
    36$ GO15VENDOREXPERIMENT=1 go get github.com/prometheus/alertmanager/cmd/...
    37# cd $GOPATH/src/github.com/prometheus/alertmanager
    38$ alertmanager --config.file=<your_file>
    39```
    40
    41Or clone the repository and build manually:
    42
    43```
    44$ mkdir -p $GOPATH/src/github.com/prometheus
    45$ cd $GOPATH/src/github.com/prometheus
    46$ git clone https://github.com/prometheus/alertmanager.git
    47$ cd alertmanager
    48$ make build
    49$ ./alertmanager --config.file=<your_file>
    50```
    51
    52You can also build just one of the binaries in this repo by passing a name to the build function:
    53```
    54$ make build BINARIES=amtool
    55```
    56
    57## Example
    58
    59This is an example configuration that should cover most relevant aspects of the new YAML configuration format. The full documentation of the configuration can be found [here](https://prometheus.io/docs/alerting/configuration/).
    60
    61```yaml
    62global:
    63  # The smarthost and SMTP sender used for mail notifications.
    64  smtp_smarthost: 'localhost:25'
    65  smtp_from: 'alertmanager@example.org'
    66
    67# The root route on which each incoming alert enters.
    68route:
    69  # The root route must not have any matchers as it is the entry point for
    70  # all alerts. It needs to have a receiver configured so alerts that do not
    71  # match any of the sub-routes are sent to someone.
    72  receiver: 'team-X-mails'
    73
    74  # The labels by which incoming alerts are grouped together. For example,
    75  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
    76  # be batched into a single group.
    77  #
    78  # To aggregate by all possible labels use '...' as the sole label name.
    79  # This effectively disables aggregation entirely, passing through all
    80  # alerts as-is. This is unlikely to be what you want, unless you have
    81  # a very low alert volume or your upstream notification system performs
    82  # its own grouping. Example: group_by: [...]
    83  group_by: ['alertname', 'cluster']
    84
    85  # When a new group of alerts is created by an incoming alert, wait at
    86  # least 'group_wait' to send the initial notification.
    87  # This way ensures that you get multiple alerts for the same group that start
    88  # firing shortly after another are batched together on the first
    89  # notification.
    90  group_wait: 30s
    91
    92  # When the first notification was sent, wait 'group_interval' to send a batch
    93  # of new alerts that started firing for that group.
    94  group_interval: 5m
    95
    96  # If an alert has successfully been sent, wait 'repeat_interval' to
    97  # resend them.
    98  repeat_interval: 3h
    99
   100  # All the above attributes are inherited by all child routes and can
   101  # overwritten on each.
   102
   103  # The child route trees.
   104  routes:
   105  # This routes performs a regular expression match on alert labels to
   106  # catch alerts that are related to a list of services.
   107  - match_re:
   108      service: ^(foo1|foo2|baz)$
   109    receiver: team-X-mails
   110
   111    # The service has a sub-route for critical alerts, any alerts
   112    # that do not match, i.e. severity != critical, fall-back to the
   113    # parent node and are sent to 'team-X-mails'
   114    routes:
   115    - match:
   116        severity: critical
   117      receiver: team-X-pager
   118
   119  - match:
   120      service: files
   121    receiver: team-Y-mails
   122
   123    routes:
   124    - match:
   125        severity: critical
   126      receiver: team-Y-pager
   127
   128  # This route handles all alerts coming from a database service. If there's
   129  # no team to handle it, it defaults to the DB team.
   130  - match:
   131      service: database
   132
   133    receiver: team-DB-pager
   134    # Also group alerts by affected database.
   135    group_by: [alertname, cluster, database]
   136
   137    routes:
   138    - match:
   139        owner: team-X
   140      receiver: team-X-pager
   141
   142    - match:
   143        owner: team-Y
   144      receiver: team-Y-pager
   145
   146
   147# Inhibition rules allow to mute a set of alerts given that another alert is
   148# firing.
   149# We use this to mute any warning-level notifications if the same alert is
   150# already critical.
   151inhibit_rules:
   152- source_matchers:
   153    - severity="critical"
   154  target_matchers:
   155    - severity="warning"
   156  # Apply inhibition if the alertname is the same.
   157  # CAUTION: 
   158  #   If all label names listed in `equal` are missing 
   159  #   from both the source and target alerts,
   160  #   the inhibition rule will apply!
   161  equal: ['alertname']
   162
   163
   164receivers:
   165- name: 'team-X-mails'
   166  email_configs:
   167  - to: 'team-X+alerts@example.org, team-Y+alerts@example.org'
   168
   169- name: 'team-X-pager'
   170  email_configs:
   171  - to: 'team-X+alerts-critical@example.org'
   172  pagerduty_configs:
   173  - routing_key: <team-X-key>
   174
   175- name: 'team-Y-mails'
   176  email_configs:
   177  - to: 'team-Y+alerts@example.org'
   178
   179- name: 'team-Y-pager'
   180  pagerduty_configs:
   181  - routing_key: <team-Y-key>
   182
   183- name: 'team-DB-pager'
   184  pagerduty_configs:
   185  - routing_key: <team-DB-key>
   186```
   187
   188## API
   189
   190The current Alertmanager API is version 2. This API is fully generated via the
   191[OpenAPI project](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md)
   192and [Go Swagger](https://github.com/go-swagger/go-swagger/) with the exception
   193of the HTTP handlers themselves. The API specification can be found in
   194[api/v2/openapi.yaml](api/v2/openapi.yaml). A HTML rendered version can be
   195accessed [here](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/prometheus/alertmanager/main/api/v2/openapi.yaml).
   196Clients can be easily generated via any OpenAPI generator for all major languages.
   197
   198With the default config, endpoints are accessed under a `/api/v1` or `/api/v2` prefix.
   199The v2 `/status` endpoint would be `/api/v2/status`. If `--web.route-prefix` is set then API routes are
   200prefixed with that as well, so `--web.route-prefix=/alertmanager/` would
   201relate to `/alertmanager/api/v2/status`.
   202
   203_API v2 is still under heavy development and thereby subject to change._
   204
   205## amtool
   206
   207`amtool` is a cli tool for interacting with the Alertmanager API. It is bundled with all releases of Alertmanager.
   208
   209### Install
   210
   211Alternatively you can install with:
   212```
   213$ go install github.com/prometheus/alertmanager/cmd/amtool@latest
   214```
   215
   216### Examples
   217
   218View all currently firing alerts:
   219```
   220$ amtool alert
   221Alertname        Starts At                Summary
   222Test_Alert       2017-08-02 18:30:18 UTC  This is a testing alert!
   223Test_Alert       2017-08-02 18:30:18 UTC  This is a testing alert!
   224Check_Foo_Fails  2017-08-02 18:30:18 UTC  This is a testing alert!
   225Check_Foo_Fails  2017-08-02 18:30:18 UTC  This is a testing alert!
   226```
   227
   228View all currently firing alerts with extended output:
   229```
   230$ amtool -o extended alert
   231Labels                                        Annotations                                                    Starts At                Ends At                  Generator URL
   232alertname="Test_Alert" instance="node0"       link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   233alertname="Test_Alert" instance="node1"       link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   234alertname="Check_Foo_Fails" instance="node0"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   235alertname="Check_Foo_Fails" instance="node1"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   236```
   237
   238In addition to viewing alerts, you can use the rich query syntax provided by Alertmanager:
   239```
   240$ amtool -o extended alert query alertname="Test_Alert"
   241Labels                                   Annotations                                                    Starts At                Ends At                  Generator URL
   242alertname="Test_Alert" instance="node0"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   243alertname="Test_Alert" instance="node1"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   244
   245$ amtool -o extended alert query instance=~".+1"
   246Labels                                        Annotations                                                    Starts At                Ends At                  Generator URL
   247alertname="Test_Alert" instance="node1"       link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   248alertname="Check_Foo_Fails" instance="node1"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   249
   250$ amtool -o extended alert query alertname=~"Test.*" instance=~".+1"
   251Labels                                   Annotations                                                    Starts At                Ends At                  Generator URL
   252alertname="Test_Alert" instance="node1"  link="https://example.com" summary="This is a testing alert!"  2017-08-02 18:31:24 UTC  0001-01-01 00:00:00 UTC  http://my.testing.script.local
   253```
   254
   255Silence an alert:
   256```
   257$ amtool silence add alertname=Test_Alert
   258b3ede22e-ca14-4aa0-932c-ca2f3445f926
   259
   260$ amtool silence add alertname="Test_Alert" instance=~".+0"
   261e48cb58a-0b17-49ba-b734-3585139b1d25
   262```
   263
   264View silences:
   265```
   266$ amtool silence query
   267ID                                    Matchers              Ends At                  Created By  Comment
   268b3ede22e-ca14-4aa0-932c-ca2f3445f926  alertname=Test_Alert  2017-08-02 19:54:50 UTC  kellel
   269
   270$ amtool silence query instance=~".+0"
   271ID                                    Matchers                            Ends At                  Created By  Comment
   272e48cb58a-0b17-49ba-b734-3585139b1d25  alertname=Test_Alert instance=~.+0  2017-08-02 22:41:39 UTC  kellel
   273```
   274
   275Expire a silence:
   276```
   277$ amtool silence expire b3ede22e-ca14-4aa0-932c-ca2f3445f926
   278```
   279
   280Expire all silences matching a query:
   281```
   282$ amtool silence query instance=~".+0"
   283ID                                    Matchers                            Ends At                  Created By  Comment
   284e48cb58a-0b17-49ba-b734-3585139b1d25  alertname=Test_Alert instance=~.+0  2017-08-02 22:41:39 UTC  kellel
   285
   286$ amtool silence expire $(amtool silence query -q instance=~".+0")
   287
   288$ amtool silence query instance=~".+0"
   289
   290```
   291
   292Expire all silences:
   293```
   294$ amtool silence expire $(amtool silence query -q)
   295```
   296
   297Try out how a template works. Let's say you have this in your configuration file:
   298```
   299templates:
   300  - '/foo/bar/*.tmpl'
   301```
   302
   303Then you can test out how a template would look like with example by using this command:
   304```
   305amtool template render --template.glob='/foo/bar/*.tmpl' --template.text='{{ template "slack.default.markdown.v1" . }}'
   306```
   307
   308### Configuration
   309
   310`amtool` allows a configuration file to specify some options for convenience. The default configuration file paths are `$HOME/.config/amtool/config.yml` or `/etc/amtool/config.yml`
   311
   312An example configuration file might look like the following:
   313
   314```
   315# Define the path that `amtool` can find your `alertmanager` instance
   316alertmanager.url: "http://localhost:9093"
   317
   318# Override the default author. (unset defaults to your username)
   319author: me@example.com
   320
   321# Force amtool to give you an error if you don't include a comment on a silence
   322comment_required: true
   323
   324# Set a default output format. (unset defaults to simple)
   325output: extended
   326
   327# Set a default receiver
   328receiver: team-X-pager
   329```
   330
   331### Routes
   332
   333`amtool` allows you to visualize the routes of your configuration in form of text tree view.
   334Also you can use it to test the routing by passing it label set of an alert
   335and it prints out all receivers the alert would match ordered and separated by `,`.
   336(If you use `--verify.receivers` amtool returns error code 1 on mismatch)
   337
   338Example of usage:
   339```
   340# View routing tree of remote Alertmanager
   341$ amtool config routes --alertmanager.url=http://localhost:9090
   342
   343# Test if alert matches expected receiver
   344$ amtool config routes test --config.file=doc/examples/simple.yml --tree --verify.receivers=team-X-pager service=database owner=team-X
   345```
   346
   347## High Availability
   348
   349Alertmanager's high availability is in production use at many companies and is enabled by default.
   350
   351> Important: Both UDP and TCP are needed in alertmanager 0.15 and higher for the cluster to work.
   352>  - If you are using a firewall, make sure to whitelist the clustering port for both protocols.
   353>  - If you are running in a container, make sure to expose the clustering port for both protocols.
   354
   355To create a highly available cluster of the Alertmanager the instances need to
   356be configured to communicate with each other. This is configured using the
   357`--cluster.*` flags.
   358
   359- `--cluster.listen-address` string: cluster listen address (default "0.0.0.0:9094"; empty string disables HA mode)
   360- `--cluster.advertise-address` string: cluster advertise address
   361- `--cluster.peer` value: initial peers (repeat flag for each additional peer)
   362- `--cluster.peer-timeout` value: peer timeout period (default "15s")
   363- `--cluster.gossip-interval` value: cluster message propagation speed
   364  (default "200ms")
   365- `--cluster.pushpull-interval` value: lower values will increase
   366  convergence speeds at expense of bandwidth (default "1m0s")
   367- `--cluster.settle-timeout` value: maximum time to wait for cluster
   368  connections to settle before evaluating notifications.
   369- `--cluster.tcp-timeout` value: timeout value for tcp connections, reads and writes (default "10s")
   370- `--cluster.probe-timeout` value: time to wait for ack before marking node unhealthy
   371  (default "500ms")
   372- `--cluster.probe-interval` value: interval between random node probes (default "1s")
   373- `--cluster.reconnect-interval` value: interval between attempting to reconnect to lost peers (default "10s")
   374- `--cluster.reconnect-timeout` value: length of time to attempt to reconnect to a lost peer (default: "6h0m0s")
   375
   376The chosen port in the `cluster.listen-address` flag is the port that needs to be
   377specified in the `cluster.peer` flag of the other peers.
   378
   379The `cluster.advertise-address` flag is required if the instance doesn't have
   380an IP address that is part of [RFC 6890](https://tools.ietf.org/html/rfc6890)
   381with a default route.
   382
   383To start a cluster of three peers on your local machine use [`goreman`](https://github.com/mattn/goreman) and the
   384Procfile within this repository.
   385
   386	goreman start
   387
   388To point your Prometheus 1.4, or later, instance to multiple Alertmanagers, configure them
   389in your `prometheus.yml` configuration file, for example:
   390
   391```yaml
   392alerting:
   393  alertmanagers:
   394  - static_configs:
   395    - targets:
   396      - alertmanager1:9093
   397      - alertmanager2:9093
   398      - alertmanager3:9093
   399```
   400
   401> Important: Do not load balance traffic between Prometheus and its Alertmanagers, but instead point Prometheus to a list of all Alertmanagers. The Alertmanager implementation expects all alerts to be sent to all Alertmanagers to ensure high availability.
   402
   403### Turn off high availability
   404
   405If running Alertmanager in high availability mode is not desired, setting `--cluster.listen-address=` prevents Alertmanager from listening to incoming peer requests.
   406
   407## Contributing
   408
   409Check the [Prometheus contributing page](https://github.com/prometheus/prometheus/blob/main/CONTRIBUTING.md).
   410
   411To contribute to the user interface, refer to [ui/app/CONTRIBUTING.md](ui/app/CONTRIBUTING.md).
   412
   413## Architecture
   414
   415![](doc/arch.svg)
   416
   417## License
   418
   419Apache License 2.0, see [LICENSE](https://github.com/prometheus/alertmanager/blob/main/LICENSE).
   420
   421[hub]: https://hub.docker.com/r/prom/alertmanager/
   422[circleci]: https://circleci.com/gh/prometheus/alertmanager
   423[quay]: https://quay.io/repository/prometheus/alertmanager

View as plain text