osp-alarms

Supervision Prometheus (Beta)

Warning

Prometheus supervision feature is in beta version, we are very glad to receive feedback at info@swissdotnet.ch. This feature behavior can change without notice in future versions.

The module provides a prometheus export on the 9100 TCP port. It provides following metrics:

Insertion

  • insertion_execution_seconds

  • processing_execution_seconds with the following labels:

    • buffer

    • total_batch

      • batch

      • pre-alarms

      • alarm

        • check

        • aggregate

      • post-alarms

      • write

  • pre_insert_rule_execution_seconds with a label (itemId) for each pre-insert rule.

  • processing_batch_size with following labels

    • total_batch

    • batch

  • buffer_size before running the processing.

History

  • history_execution_seconds

Deduplication

  • deduplication_execution_seconds

Action

  • action_rule_execution_seconds with a label (itemId) for each action rule.

  • alarm_action_execution_seconds with following labels:

    • acknowledge

    • escalate

    • lock

    • unlock

    • clear

    • tag

    • untag

    • journal

    • edit

    • create

    • insert

Front view

  • deduplicated_change_execution_seconds with following labels:

    • step

      • find: The duration of the find on the db.

      • compute: The duration of the computation of the difference between two state of the alarms to publish only the difference for the front-end.

    • filter a label for each filter.

Each metrics is an histogram with the following buckets (a bucket of 0.05s will contain the number of computation that take up to 0.05s to execute):

  • 0.001 seconds

  • 0.05 seconds

  • 0.1 seconds

  • 0.25 seconds

  • 0.5 seconds

  • 0.75 seconds

  • 1.0 seconds

  • 5.0 seconds

  • 10.0 seconds

  • 20.0 seconds

  • 60.0 seconds

  • 300.0 seconds

  • inf

An example for a Grafana dashboard : Dashboard

Performance consideration

Following points can impact the performance of the alarms handling.

  • The size of alarms.

  • The complexity of filters.

  • The number of occurrences.

The principal symptom will be a long time to display a change on the front-end. The grafana dashboard can help detecting these points.

Migration

The migration must be done to adapt the alarm for a new deduplication. The id used change in order to improve the performance.

Database migration from the previous version must be done for:

  • <= 0.7.18 to 0.7.19

  • 1.0.0 and 1.0.1 to 1.0.2

The alarms module includes an auto migration feature but it can be slow if the database has a lot of data. For big database (more than 3 GB of data), we strongly recommend using the migration tool.

Warning

The alarms module must not be started in the new version before the migration tool is started. This can cause errors during the migration process.

The tool is available as a docker image nexus.onsphere.ch/osp-alarms-migration:<version>. It has the following parameters:

usage: Alarms migration [-h] [--connectionString CONNECTIONSTRING] [--database DATABASE] [--rollback]
                        [--disable-move] collection [collection ...]

Migrate alarms id and severity lock from 0.7.18 to 0.7.19.
This tool will copy each collection to a new one (example -> example_old).
Then it will migrate them back to the original collection.


positional arguments:
  collection             A collection on which run the migration

named arguments:
  -h, --help             show this help message and exit
  --connectionString CONNECTIONSTRING
                        The connection string for mongo database. (default: mongodb://localhost)
  --database DATABASE    Name of the database (default: alarms)
  --rollback             Rollback the migration (default: false)
  --disable-move         Don't move the existing collection  to  collection_old,  only apply the change (default:
                        false)

Warning

The migration must not be run on the deduplicated collection.

Note

You can see collections created on the alarms database by running show collections. Collections to migrate are the following :

  • archiveFrom*To*

  • buffer

  • failure

  • history

  • ignored

  • live

Following are possible:

  • Runs on the command line docker run --rm -it nexus.onsphere.ch/osp-alarms-migration:<version> --connectionString mongodb://modules_mongodb_osp-mongo-1/?replicaSet=sdn0 --database=alarms history live

  • Runs as a one time service (from inside the stack or as an external service with the network as osp-stack-1_mongo):

    osp-alarms-migration:
      image: nexus.onsphere.ch/osp-alarms-migration:<version>
      deploy:
        replicas: 1
        restart_policy:
          condition: none
      command: --connectionString mongodb://modules_mongodb_osp-mongo-1/?replicaSet=sdn0 --database=alarms history live
      networks:
        - mongo
    

You can use the following strategy to do the migration:

  • Migrate all collection with the migration utility. (All alarms will disappear and reappear as the migration progress)

  • Migrate all collection other than live and buffer with the migration utility.

  • Don’t use the migration utility and let the module do all the work (ok for small installation). The module can restart multiple time during the migration.

List of configuration files

Filename

Short description

Format

Documentation

module.service

Each service is described in its own file and then assembled

yml

See the Swarm administration or Official documentation

module.alarms

The module description

json

module.alarms

severity.ospp

Define the property of a severity (name, description).

json

severity.ospp

severity.alarms

Define a severity for the alarms to use

json

severity.alarms

filter.alarms

Define the filter to apply on MongoDB.

json

filter.alarms

filter.ospp

Define the property of a filter (name, description).

json

filter.ospp

owner.alarms

Define a value generated base on the current alarms.

json

owner.alarms

view.alarms

Define the mapping of the fields between an alarms and the name define on the view.ospp.

json

view.alarms

view.ospp

Define the property of a view (name, description, fields)

json

view.ospp

pre-insert.alarms

Define a processing to execute when new alarms is inserted.

json

pre-insert.alarms

action.alarms

Define a processing to excute on a subset of the alarms either trigger periodically or manually

json

action.alarms

output.alarms

Generate an alarm when a value change.

json

output.alarms