osp-alarms¶
Supervision Prometheus (Beta)¶
Warning
Prometheus supervision feature is in beta version, we are very glad to receive feedback at info@swissdotnet.ch. This feature behavior can change without notice in future versions.
The module provides a prometheus export on the 9100 TCP port. It provides following metrics:
Insertion¶
insertion_execution_seconds
processing_execution_seconds with the following labels:
buffer
total_batch
batch
pre-alarms
alarm
check
aggregate
post-alarms
write
pre_insert_rule_execution_seconds with a label (itemId) for each pre-insert rule.
processing_batch_size with following labels
total_batch
batch
buffer_size before running the processing.
History¶
history_execution_seconds
Deduplication¶
deduplication_execution_seconds
Action¶
action_rule_execution_seconds with a label (itemId) for each action rule.
alarm_action_execution_seconds with following labels:
acknowledge
escalate
lock
unlock
clear
tag
untag
journal
edit
create
insert
Front view¶
deduplicated_change_execution_seconds with following labels:
step
find: The duration of the find on the db.
compute: The duration of the computation of the difference between two state of the alarms to publish only the difference for the front-end.
filter a label for each filter.
Each metrics is an histogram with the following buckets (a bucket of 0.05s will contain the number of computation that take up to 0.05s to execute):
0.001 seconds
0.05 seconds
0.1 seconds
0.25 seconds
0.5 seconds
0.75 seconds
1.0 seconds
5.0 seconds
10.0 seconds
20.0 seconds
60.0 seconds
300.0 seconds
inf
An example for a Grafana dashboard : Dashboard
Performance consideration¶
Following points can impact the performance of the alarms handling.
The size of alarms.
The complexity of filters.
The number of occurrences.
The principal symptom will be a long time to display a change on the front-end. The grafana dashboard can help detecting these points.
Migration¶
The migration must be done to adapt the alarm for a new deduplication. The id used change in order to improve the performance.
Database migration from the previous version must be done for:
<= 0.7.18 to 0.7.19
1.0.0 and 1.0.1 to 1.0.2
The alarms module includes an auto migration feature but it can be slow if the database has a lot of data. For big database (more than 3 GB of data), we strongly recommend using the migration tool.
Warning
The alarms module must not be started in the new version before the migration tool is started. This can cause errors during the migration process.
The tool is available as a docker image nexus.onsphere.ch/osp-alarms-migration:<version>
.
It has the following parameters:
usage: Alarms migration [-h] [--connectionString CONNECTIONSTRING] [--database DATABASE] [--rollback]
[--disable-move] collection [collection ...]
Migrate alarms id and severity lock from 0.7.18 to 0.7.19.
This tool will copy each collection to a new one (example -> example_old).
Then it will migrate them back to the original collection.
positional arguments:
collection A collection on which run the migration
named arguments:
-h, --help show this help message and exit
--connectionString CONNECTIONSTRING
The connection string for mongo database. (default: mongodb://localhost)
--database DATABASE Name of the database (default: alarms)
--rollback Rollback the migration (default: false)
--disable-move Don't move the existing collection to collection_old, only apply the change (default:
false)
Warning
The migration must not be run on the deduplicated collection.
Note
You can see collections created on the alarms database by running show collections
. Collections to migrate are the following :
archiveFrom*To*
buffer
failure
history
ignored
live
Following are possible:
Runs on the command line
docker run --rm -it nexus.onsphere.ch/osp-alarms-migration:<version> --connectionString mongodb://modules_mongodb_osp-mongo-1/?replicaSet=sdn0 --database=alarms history live
Runs as a one time service (from inside the stack or as an external service with the network as osp-stack-1_mongo):
osp-alarms-migration: image: nexus.onsphere.ch/osp-alarms-migration:<version> deploy: replicas: 1 restart_policy: condition: none command: --connectionString mongodb://modules_mongodb_osp-mongo-1/?replicaSet=sdn0 --database=alarms history live networks: - mongo
You can use the following strategy to do the migration:
Migrate all collection with the migration utility. (All alarms will disappear and reappear as the migration progress)
Migrate all collection other than live and buffer with the migration utility.
Don’t use the migration utility and let the module do all the work (ok for small installation). The module can restart multiple time during the migration.
List of configuration files¶
Filename |
Short description |
Format |
Documentation |
---|---|---|---|
module.service |
Each service is described in its own file and then assembled |
yml |
See the Swarm administration or Official documentation |
module.alarms |
The module description |
json |
|
severity.ospp |
Define the property of a severity (name, description). |
json |
|
severity.alarms |
Define a severity for the alarms to use |
json |
|
filter.alarms |
Define the filter to apply on MongoDB. |
json |
|
filter.ospp |
Define the property of a filter (name, description). |
json |
|
owner.alarms |
Define a value generated base on the current alarms. |
json |
|
view.alarms |
Define the mapping of the fields between an alarms and the name define on the view.ospp. |
json |
|
view.ospp |
Define the property of a view (name, description, fields) |
json |
|
pre-insert.alarms |
Define a processing to execute when new alarms is inserted. |
json |
|
action.alarms |
Define a processing to excute on a subset of the alarms either trigger periodically or manually |
json |
|
output.alarms |
Generate an alarm when a value change. |
json |