...
1label: Config Connector MonitoringAlertPolicy
2markdownDescription: Creates yaml for a MonitoringAlertPolicy resource
3insertText: |
4 apiVersion: monitoring.cnrm.cloud.google.com/v1beta1
5 kind: MonitoringAlertPolicy
6 metadata:
7 labels:
8 \${1:checking}: \${2:website-health}
9 \${3:oncall-treatment}: \${4:stay-aware}
10 name: \${5:monitoringalertpolicy-name}
11 spec:
12 displayName: \${6:Sample Website Aetwork Connectivity Alert Policy}
13 enabled: \${7:true}
14 notificationChannels:
15 - name: \${8:monitoringalertpolicy-dep1-networkconnectivity}
16 - name: \${9:monitoringalertpolicy-dep2-networkconnectivity}
17 combiner: \${10:OR}
18 conditions:
19 - displayName: \${11:Failure of uptime check_id uptime-check-for-google-cloud-site}
20 conditionThreshold:
21 filter: \${12:metric.type="monitoring.googleapis.com/uptime_check/check_passed"
22 AND metric.label.check_id="uptime-check-for-google-cloud-site" AND resource.type="uptime_url"}
23 aggregations:
24 - perSeriesAligner: \${13:ALIGN_NEXT_OLDER}
25 alignmentPeriod: \${14:1200s}
26 crossSeriesReducer: \${15:REDUCE_COUNT_FALSE}
27 groupByFields:
28 - \${16:resource.label.*}
29 comparison: \${17:COMPARISON_GT}
30 thresholdValue: \${18:1}
31 duration: \${19:600s}
32 trigger:
33 count: \${20:1}
34 - displayName: \${21:SSL Certificate for google-cloud-site expiring soon}
35 conditionThreshold:
36 filter: \${22:metric.type="monitoring.googleapis.com/uptime_check/time_until_ssl_cert_expires"
37 AND metric.label.check_id="uptime-check-for-google-cloud-site" AND resource.type="uptime_url"}
38 aggregations:
39 - alignmentPeriod: \${23:1200s}
40 perSeriesAligner: \${24:ALIGN_NEXT_OLDER}
41 crossSeriesReducer: \${25:REDUCE_MEAN}
42 groupByFields:
43 - \${26:resource.label.*}
44 comparison: \${27:COMPARISON_LT}
45 thresholdValue: \${28:15}
46 duration: \${29:600s}
47 trigger:
48 count: \${30:1}
49 - displayName: \${31:Uptime check running}
50 conditionAbsent:
51 filter: \${32:metric.type="monitoring.googleapis.com/uptime_check/check_passed"
52 AND metric.label.check_id="uptime-check-for-google-cloud-site" AND resource.type="uptime_url"}
53 duration: \${33:3900s}
54 - displayName: \${34:Ratio of HTTP 500s error-response counts to all HTTP response
55 counts}
56 conditionThreshold:
57 filter: \${35:metric.label.response_code>="500" AND metric.label.response_code<"600"
58 AND metric.type="appengine.googleapis.com/http/server/response_count" AND
59 resource.type="gae_app"}
60 aggregations:
61 - alignmentPeriod: \${36:300s}
62 perSeriesAligner: \${37:ALIGN_DELTA}
63 crossSeriesReducer: \${38:REDUCE_SUM}
64 groupByFields:
65 - \${39:project}
66 - \${40:resource.label.module_id}
67 - \${41:resource.label.version_id}
68 denominatorFilter: \${42:metric.type="appengine.googleapis.com/http/server/response_count"
69 AND resource.type="gae_app"}
70 denominatorAggregations:
71 - alignmentPeriod: \${43:300s}
72 perSeriesAligner: \${44:ALIGN_DELTA}
73 crossSeriesReducer: \${45:REDUCE_SUM}
74 groupByFields:
75 - \${46:project}
76 - \${47:resource.label.module_id}
77 - \${48:resource.label.version_id}
78 comparison: \${49:COMPARISON_GT}
79 thresholdValue: \${50:0.5}
80 duration: \${51:0s}
81 trigger:
82 count: \${52:1}
83 documentation:
84 content: |-
85 \${53:This sample is a synthesis of policy samples found at https://cloud.google.com/monitoring/alerts/policies-in-json. It is meant to give an idea of what is possible rather than be a completely realistic alerting policy in and of itself.
86
87 Combiner OR
88 OR combiner policies will trigger an incident when any of their conditions are met. They should be considered the default for most purposes.
89
90 Uptime-check conditions
91 The first three conditions in this policy involve an uptime check with the ID 'uptime-check-for-google-cloud-site'.
92
93 The first condition, "Failure of uptime check_id uptime-check-for-google-cloud-site", tests if the uptime check fails.
94 The second condition, "SSL Certificate for google-cloud-site expiring soon", tests if the SSL certificate on the Google Cloud site will expire in under 15 days.
95
96 Metric-absence condition
97 The third condition in this policy, "Uptime check running" tests if the aforementioned uptime check is not written to for a period of approximately an hour.
98 Note that unlike all the conditions so far, the condition used here is conditionAbsent, because the test is for the lack of a metric.
99
100 Metric ratio
101 The fourth and last condition in this policy, "Ratio of HTTP 500s error-response counts to all HTTP response counts", tests that 5XX error codes do not make up more than half of all HTTP responses. It targets a different set of metrics through appengine.
102
103 All together, this policy would monitor for a situation where any of the above conditions threatened the health of the website.}
View as plain text