Alarms¶
Alarms are a way to represent events that can be requested based on filters and views. Alarms are stored inside a MongoDB instance.
Each alarm includes the following fields when it is created :
The time of the creation
A summary to describe it
A location
A source to indicate it’s origin
A severity to show it’s criticality
An alarm can be created by multiple sources, the most common one are :
A scripts
An snmp-trap
A value
An ip-rct
As an alarm can evolve over time, we provide an historisation and a deduplication mechanism.
historisation: Each modification on a alarm can be displayed.
Deduplication: Two occurrences of the same alarm will be merged together to allow a more concise view of the current state.
What can be done¶
Modification¶
An alarm can be modified either by an automated action or an user action.
The possible actions are :
Change the summary
Change the location
Change the source
Increase or decrease the severity
Add a journal entry
Acknowledge it
Un-acknowledge it
Add or remove a tag
Edit the additional data
Remove an alarm
Publication¶
A modification of one or multiple alarms can trigger the publication of one or multiple values. The publication can generate the following values :
The alarms current count
The minimum or maximum severity
The list of the modified or created alarms
Every values generated by the alarms are based on a filter.
Filter¶
A filter is used to observe a sub set of all alarms. For example, react only for alarms located in Switzerland.
Display¶
To show alarms to a user, we use a combination of filter and view.
View¶
A view defines how to display a field and whether or not it should be displayed. See the alarm table widget for more information.
Processing¶
Before insertion¶
During the insertion process, a series of scripts can be run to analyze the incoming alarms and trigger different mechanisms.
Add or modify alarms based on values, current alarms or state.
Reject alarms.
On existing alarms¶
Some scripts can be triggered either periodically or manually. They can add, modify or remove alarms based on parameters, values, current alarms or state.
Usage¶
Monitor an infrastructure¶
The state of an infrastructure can be represented on OnSphere. When this state is not correct an alarm can be generated to notify users that a problem appeared. It is also possible for an infrastructure to report problem with snmp-trap or webhook and create alarms.
Examples¶
Manage the maintenance¶
When a device has a problem, it can be useful to inhibit the alarms until the technician was able to fix the device.
How it is done¶
Requirement¶
Module osp-alarms
Alarms persistence¶
The current and past alarms are stored on a Mongo database.
Note
We provide, by default, a MongoDB instance with the module alarms. This is not recommended for production as the instance is configured for testing purpose without any access management.
Warning
MongoDB defines collections which are not the Collections defined by OnSphere. MongoDB collections are the equivalent of SQL tables.
Alarms are stored using mainly five collections.
Buffer : This collection is used as a buffer for fast insertion. Elements in this collection are not yet considered to be part of the alarms.
Live : This collection stores the current “live” alarms. Alarms in this collection are used to generate a deduplicated view of the alarms. All the actions possible on alarms are applied on this collection (escalate, add journal entry, …).
Deduplicated : This collection stores the result of the deduplication process. Alarms in this collection are deduplicated (merged) and are used to generate alarms lists views and logics.
History : This collection stores the whole history of received alarms. Alarms in this collection are used to generate the alarms history lists views. The history is only keeping the most recent alarms, the older one are moved to the archives collections.
Archive: These collections store the older parts of the history. These collections are automatically generated for a define time period (For example, archiveFrom01012023To03312023 for the alarms between the 01 january 2023 and 31 mars 2023).
Detailed content of an alarm¶
Alarms exist in two format. The one used of the live collection and the other one for the deduplicated collection. They share most of their field with some small difference.
Common fields
Field name
Description
serial
Unique identifier for an alarm. This is used for alarm deduplication (i.e., all alarms with same serial content will be merged).
summary
The description of the alarms.
severity
The alarm severity (numeric value). As defined on the severity.ospp.
source
The origin of the alarm. For example:
router-1
,value 1
.location
The location of the alarm. For example:
Corminboeuf
,Showroom
.tags
An array of tags associated with the alarm. A tag is represented as a String.
acknowledged
The acknowledge status (boolean value).
userTags
The list of tags added by users.
hideUntil
A timestamp in nanoseconds to indicate the time until the alarm can should be hidden.
forceHide
A boolean to indicate if the alarms must be hidden.
journal
A list of entry to store the action done, comment on the current state and so on.
additionalData
Any other useful information as a key/value pair.
Live fields
Field name
Description
timestamp
The occurrence timestamp of the alarm.
userUntags
The list tags removed by users.
Deduplicated fields
Field name
Description
firstTimestamp
The first occurrence timestamp of the alarm as a long representing the nanoseconds.
lastTimestamp
The last occurrence timestamp of the alarm as a long representing the nanoseconds.
origSeverity
The alarm original severity (numeric value).
isSeverityLocked
Flag indicating if the severity is locked.
count
The total number of alarms using this serial.
Insertion¶
Two ways are available to insert alarms in the database : standard insertion and direct insertion.
Direct insertion¶
Direct insertion is used for internal and scripted alarms insertion. This is used for inserting alarms directly into the database, without any pre-processing mechanisms. Alarms inserted this way are inserted one by one and the insertion result is returned to the caller, so this is not designed for efficiently inserting loads of alarms.
Standard insertion¶
The standard insertion mechanism, on the other hand, is designed to efficiently insert batches of alarms, and it allows pre-processing alarms right before they are inserted so they can be filtered out or enhanced with additional information.
Standard insertion pipeline is decomposed into 3 steps :
Buffer
Incoming alarms are first inserted into a buffer collection without any pre-processing operation to keep those insertions as efficient as possible. This buffer is stored on the disk, so alarms are guaranteed to be handled by the system once they are inserted in it, even if the osp-alarms module restarts.
Every 500ms, or when the buffer has enough alarms in it, a fixed sized batch of alarms is taken from the buffer and passed into the rest of the pipeline.
Pre-insertion
The batch of alarms taken from the buffer is then given to the pre-insertion rules. Those pre-insertion rules evaluate a condition for each alarm of the batch to check if they must handle the alarm or not, and all pre-insertion rules matching an alarm will be applied to it, and so on for each alarm of the batch. Alarm not matching any pre-insertion rules are simply kept for insertion.
Pre-insertion rules are able to either
insert a new alarm (directly or injecting it into the buffer)
remove an alarm
update the alarm and forward it further in the pipeline
drop the alarm.
or any combination of those.
Operations execution
Once all the alarms have been handled by the pre-insertion rules, all operations requested by pre-insertion rules are applied to the database using a transaction. In particular, alarms that have not been filtered out are inserted into the live collection database, so they can be used in the deduplication process.
Alarm deduplication¶
Alarm deduplication is a process that aggregates individual alarms from the live collection into deduplicated alarms representing the current state of an alarm based on its occurrences. For example, a deduplicated alarm can contain the number of occurrences, the first and last occurrence, the highest severity, etc…
Alarm deduplication is an automatic process triggered from changes in the live collection.
To improve performances the deduplication process triggering is buffered. If a lot of modifications are performed in a short amount of time, the changes will be buffered and the deduplication process will not be triggered for every modification.
Note
The deduplication fields are directly used to define the deduplicated alarms. Before adding a new field, we highly recommend reading the Mongo aggregation documentation. The Mongo playground is a good tool to try a solution beforehand.
Note
Deduplication fields are composed of two steps. The first one is the groupingPolicy
describing how to aggregate alarms content. The second one is the processingPolicy
describing how to build the field. By default, it just takes the result and put it under the fieldName
field. If multiple steps define the same field, the processingPolicy
are prioritized over the one without it and after that the last defined is used.
View¶
A view allows to filter what is shown to users. It can be used to only display information useful to the operator and keep some sensible information private.
Filter¶
A filter defines a dynamic sub-set of alarms based on information available inside them. It can be used to only show alarms related to a location or type of device.
A filter can also be used to display a count, maximum severity, minimum severity or count for a specific severity. An overview of the system can be easily created with these information.
Query¶
Filters queries are created using mongoDB query operators.
The simplest query filter is the empty filter {}
. This query will return the alarms without filtering.
A slightly more advanced filter could be to only retrieve alarms from a particular source:
{
"source": {
"$eq": "localhost"
}
}
This filter will return all alarms with the localhost source. It can be abbreviated as :
{
"source": "localhost"
}
Empty clear severity¶
Alarm filters query result might be empty (i.e., no alarm match the filter query). However, we might still want to display the maximum severity of an empty filter (e.g., to know that everything is fine). For this purpose alarm filters can define an “empty clear severity.” This severity will be used as maximum severity for empty filters.
As it might be cumbersome to define an “empty clear severity” for every filters, it is also possible to define one severity as the “default empty clear severity”.
Values¶
Values provided by alarms are linked to a filter. The publication of a new value is done after the deduplication.
ALARM_COUNT
: The count of the alarm matching the filter.MAX_SEVERITY
: The maximum severity of the alarms matching the filter.MIN_SEVERITY
: The minimum severity of the alarms matching the filter.ON_ALARM
: The list of the updated and deleted alarms since the last publication.SEVERITY_COUNT
: The count of alarms with the defined severity.
Note
Computing filters values requires performing requests on the database. Additionally, ON_ALARM
values requires the module to keep a list of active alarms. The memory footprint of the module will therefore increase for every ON_ALARM
value that is configured. In order to get the best performance, you should only define filter values if you need them.
Severity¶
The severity represents the criticality of alarms. When an unknown severity is used, the default one will be used to replace it.
Default severity levels are :
Name |
Value |
Default |
ItemId |
---|---|---|---|
Clear |
100 |
root.alarms.severities.clear |
|
Intermediate |
200 |
root.alarms.severities.intermediate |
|
Warning |
300 |
root.alarms.severities.warning |
|
Minor |
400 |
root.alarms.severities.minor |
|
Major |
500 |
root.alarms.severities.major |
|
Critical |
600 |
root.alarms.severities.critical |
Tip
In some cases, it may be useful to reverse the severity. The highest value is the least critical. For example when you have little critical alarms but a lot of non-critical.
Tip
When choosing the severities, keep in mind that the number of needed severities may grow with time.
Expiration and Escalation¶
A severity can define a policy for an automatic escalation or expiration. For example, a clear alarm can be automatically removed after 2 hours or a minor alarm can be escalated to a major one if it was not acknowledged after 30 minutes.
Lock¶
It is possible to lock an alarm severity. When the severity of an alarm is locked, it won’t change even if new alarms are received afterward with another severity and deduplicated with the current alarm.
Locking an alarm severity can be useful in two cases (among others):
when you want to change the severity of an alarm that is generated frequently (using alarm severity escalation will only change the severity until a new alarm is received)
when receiving “flipping alarms” (i.e., alarms that are often generated with a short period of time)
Outputs¶
Alarm outputs allow creating an alarm from a Value change.
An output is composed of :
A condition (optional, defaults to true) using Lua syntax
The alarm to generate when the condition is true
The alarm to generate when the condition is false (optional)
Warning
The condition must be a lua expression that evaluate to a boolean.
The condition can access the value that triggered its evaluation using the trigger object (for example trigger.content == true), but cannot access Values in hierarchy. Alarms fields are statically defined and cannot use Lua syntax building in their fields content.
Pre-insertion rules¶
When inserting an alarm, a pre-insert operation can be used to apply a conditional operation before the alarm is inserted.
There are three steps for each pre-insertion :
for
on each new alarm received to determine if the alarm must be processed by the pre-insertionif
to express conditions based on the live alarms collection and compute the operation to runThe operations to run when the condition is true or when it is false.
Warning
If the if
is false and there is no else_execute
, the alarm will be dropped.
Operations can be a combination of the following (operations being the third parameter of the thenExecute or elseExecute function) :
operations:create(): Alarm : Returns an alarm generated from the currently processed alarm that will be inserted in live collection (direct insertion). Returned alarm can be modified before the alarm is really inserted.
operations:insert(): Alarm : Returns an alarm generated from the currently processed alarm that will be inserted in buffer collection (standard insertion). Returned alarm can be modified before the alarm is really inserted.
operations:forward(): Alarm : Returns an alarm generated from the currently processed alarm that will be forwarded to the next pre-insertion rule if any. Returned alarm can be modified before the alarm is really forwarded.
[DEPRECATED] operations:remove(criteria: string) : Finds alarms matching a MongoDB criteria that will be removed from the live collection (this operation is a bit out-of-scope of pre-insertion rules and could be easily replaced by an action rule so it may be removed in the future).
[DEPRECATED] operations:ignore() : Ignores the alarm (this is mostly equivalent to not forwarding nor inserting the alarm, and may be removed in the future).
The pre-insertion rules are applied to alarms from highest to lowest priority (1 is higher than 10) and they can be chained to from a full pre-processing pipeline :
Signature
The signature of methods used by the pre-insertion rules are as follow:
beginBatch: functionName() : void
for: functionName(alarm: Alarm) : boolean
Alarm: A live alarm
predicate: functionName(alarm: Alarm) : String
Alarm: A live alarm
String (Mongo filter)
is: functionName(alarm: Alarm, filterResult: FilterResult) : boolean
Alarm: A live alarm
thenExecute: functionName(alarm: Alarm, filterResult: FilterResult, operations: Operations) : void
Alarm: A live alarm
elseExecute: functionName(alarm: Alarm, filterResult: FilterResult, operations: Operations) : void
Alarm: A live alarm
endBatch:functionName() : void
FilterResult
Field |
Description |
Type |
---|---|---|
count |
The number of alarms matching the filter |
Integer |
highestSeverity |
The highest severity of all matching alarms |
Integer |
lowestSeverity |
The lowest severity of all matching alarms |
Integer |
latestSeverity |
The latest severity of all matching alarms |
Integer |
firstTimestamp |
The first timestamp of all matching alarms |
Timestamp |
lastTimestamp |
The last timestamp of all matching alarms |
Timestamp |
hideUntil |
The last value of the field hideUntil |
Integer |
acknowledge |
The last value of the field acknowledge |
Boolean |
Action rules¶
An action rule is used to run a task on alarms, either triggered manually or periodically.
There are two steps for each action :
find
the alarms matching a criterionexecute
an action for each alarm found
Actions can be a combination of the following (actions being the third parameter of the execute function) :
actions:create(serial: string, severity: int): Alarm : Returns a new empty alarm that will be inserted in the live collection (direct insertion). Returned alarm can be modified before the alarm is really inserted.
actions:insert(serial: string, severity: int): Alarm : Returns a new empty alarm that will be inserted in the buffer collection (standard insertion). Returned alarm can be modified before the alarm is really inserted.
actions:remove() : Removes the alarm.
actions:acknowledge() : Acknowledges the alarm.
actions:unacknowledge() : Un-acknowledges the alarm.
actions:lock_severity(severity: int) : Locks the severity of the alarm to severity.
actions:unlock_severity() : Unlocks the severity of the alarm.
actions:tag(tags: string[]) : Adds tags to the alarm tags.
actions:untag(tags: string[]) : Removes tags from the alarm tags.
actions:escalate(severity: int) : Changes the severity of the alarm to severity.
actions:edit(summary: string, location: string, source: string, hideUntil: int, forceHide: boolean, additionalData: object) : Updates any of the fields summary, location, source, hideUntil, forceHide and additionalData of the alarm. Any field of additionalData can be added or updated, but they cannot be removed. The value null can be provided for fields that do not need to be updated.
actions:journal(message: string, user: string) : Adds a journal entry to the alarm.
Signature
The signature for the methods used by the pre-insertion rules are as follow:
criteria: functionName(data: Map) : String
Map (User data provided by the trigger (front-end, script or timer). nil for the timer.)
String (Mongo filter)
beginBatch: functionName(data: Map) : void
Map (User data provided by the trigger (front-end, script or timer). nil for the timer.)
execute: functionName(alarm: DeduplicatedAlarm, data: Map, actions: Actions) : void
DeduplicatedAlarm: A deduplicated alarm
Map (User data provided by the trigger (front-end, script or timer). nil for the timer.)
endBatch:functionName() : void
Lua script function¶
Warning
Lua functions defined for alarms module behavior cannot be defined as “local”.
Alarms Lua scripts provide the following functions and objects :
Log: this object can be used to log messages. Use the symbol {} and a list of values to inject values inside the text message (ie: log.info(“Logging the value of content: {}”, {{name = “Automate 1”}})). You can log messages with different log levels (the current log level depends on the scriptLogLevel defined inside the module.alarms configuration file):
trace : log.trace(String message, List<> args)
debug : log.debug(String message, List<> args)
info : log.info(String message, List<> args)
warn : log.warn(String message, List<> args)
error : log.error(String message, List<> args)
Timestamp: This object is used to manipulate the timestamp of an alarm.
isOlderThan(Timestamp other): The method can be used to determine if the current timestamp is older than the other one.
isNewerThan(Timestamp other): The method can be used to determine if the current timestamp is newer than the other one.
plusMillis(int value): This method allows adding value milliseconds to a timestamp. (Millis can be replaced by Seconds, Minutes, Hours or Days)
minusMillis(int value): This method allows subtracting value milliseconds to a timestamp. (Millis can be replaced by Seconds, Minutes, Hours or Days)
getValue(): This method returns the underlying number of nano seconds since the Unix epoch.
Timestamp.format(String format, int epochNano, String timezone): This method returns a formatted date based on the timezone. For example, the call Timestamp.format(“yyyy-MM-dd hh:mm:ss”, 1644307200000000000, “CET”) will produce 2022-02-08 09:00:00. The timezone is UTC by default. The following symbols are available:
Symbol
Meaning
Presentation
Examples
G
era
text
AD; Anno Domini; A
u
year
year
2004; 04
y
year-of-era
year
2004; 04
D
day-of-year
number
189
M/L
month-of-year
number/text
7; 07; Jul; July; J
d
day-of-month
number
10
Q/q
quarter-of-year
number/text
3; 03; Q3; 3rd quarter
Y
week-based-year
year
1996; 96
w
week-of-week-based-year
number
27
W
week-of-month
number
4
E
day-of-week
text
Tue; Tuesday; T
e/c
localized day-of-week
number/text
2; 02; Tue; Tuesday; T
F
week-of-month
number
3
a
am-pm-of-day
text
PM
h
clock-hour-of-am-pm (1-12)
number
12
K
hour-of-am-pm (0-11)
number
0
k
clock-hour-of-am-pm (1-24)
number
0
H
hour-of-day (0-23)
number
0
m
minute-of-hour
number
30
s
second-of-minute
number
55
S
fraction-of-second
fraction
978
A
milli-of-day
number
1234
n
nano-of-second
number
987654321
N
nano-of-day
number
1234000000
Timestamp.now(): This function returns a timestamp for the current time.
Timestamp.from(): This function creates a timestamp from nanoseconds.
values.get(String id): This function allows getting a value in the hierarchy. The value must be declared in the action or pre-insertion to be able to access it.
values.update(String id, Any content): This function updates the value in the hierarchy with the content and wait for the status.
values.blindUpdate(String id, Any content): This function updates the value in the hierarchy with the content without waiting for the status.
Severity.has(int value): This function checks if the severity exists.
Severity.has(String name): This function checks if the severity exists.
Severity.get(String name): This function returns the value of the severity.
store.get(String identifier): This function gets a value into the database if the identifier doesn’t exist nil is returned. The store is accessible by any script.
store.set(String identifier, any content): This function stores a value into the database. The store is accessible by any script.
actionRules.run(String id, Table data, int delay): This function calls an actionRule identified by his id. The data can be used to pass some argument to the rule. The delay defines an interval in milliseconds between the call and the real execution.
script: Run a script (module script)
Methods:
run(String id, Table data, int timeout): Runs the script and waits for it to finish.
runBlind(String id, Table data, int timeout): Starts the script and returns immediately.
Parameters:
id: The id of the script to run.
parameters: (Optional) Parameters to forward to the script.
timeout: (Optional) Maximum time to wait a result in seconds.
Return a table:
success: boolean to indicate if the execution was successful.
message: a message returned by the script.
content: a table containing the result of the script.
Example:
script.run(‘root.test.script’)
script.runBlind(‘root.test.script’, {1, 2, 3})
script.run(‘root.test.script’, {}, 10)
collections: Makes a request to the module osp-collections
List requests. Returns an object with a totalCount property and a collections with all results.
list(String schemaId, int pageSize, int pageNumber): sends a list request on a collection without any filter.
listWithFilter(String schemaId, int pageSize, int pageNumber, String filterId): sends a list request on a collection with a filter defined in the schema.
listWithCustomFilter(String schemaId, int pageSize, int pageNumber, String filter): sends a list request on a collection with a custom filter.
Get requests. Return the document (type :Map<String, Object>) found.
get(String schemaId, String documentId): sends a get request on a collection and on one specific entry.
getWithFilterId(String schemaId, String filterId): sends a get request on a collection with a filter defined in the schema. Returns the first match.
getWithCustomFilter(String schemaId, String filter): sends a get request on a collection with a custom filter. Returns the first match.
Create/Update requests. Each request returns the _id of the created/updated document.
insert(String schemaId, Map<String, Object> data): inserts a new element inside a collection.
update(String schemaId, Map<String, Object> data): updates an element inside a collection. If data doesn’t contain an _id, it will create a new element instead.
updateDiff(String schemaId, String documentId, List<Map<String, Object>> data): updates an element inside a collection by passing a list of Updates to apply.
delete(String schemaId, String documentId): deletes an element of a collection.
Communications¶
A script can trigger communications with Notifications.
Sends a notification with a static contacts.coms with sendStaticNotification:
Parameters
notificationId : The id of the notification.coms to send.
contactId : The id of the contacts.coms to send the notification to.
trigger : An object containing arbitrary information that will be provided to the message generation.
originator : The originator (See below) of the notification.
providers : The list of providers (See below) to use.
Example
communications.sendStaticNotification( "root.communication.notification", "root.communication.directory", { name = "Alarm", content = true, timestamp = 0 }, { smsSender = "alarm", emailSenderAddress = "alarm@localhost.com", emailSenderName = "my super alarms" }, { { providerId = "root.communication.provider.maildev" } } )
Sends a notification with a contact list extracted from a collection with sendCollectionNotification:
Parameters
notificationId : The id of the notification.coms to send.
collectionId : The id of the Collections from which to extract the contacts to whom to send the notification.
collectionFilter : The filter to apply on the collection when extracting contacts.
trigger : An object containing arbitrary information that will be provided to the message generation.
originator : The originator (See below) of the notification.
providers : The list of providers (See below) to use.
Sends a notification with a contact list extracted from
Keycloak
with sendKeycloakGroupNotification:Parameters
notificationId : The id of the notification.coms to send.
wantedGroups : The list of groups to send the notification to. If empty, everyone will be notified.
trigger : An object containing arbitrary information that will be provided to the message generation.
originator : The originator (See below) of the notification.
providers : The list of providers (See below) to use.
Originator¶
Field |
Type |
Mandatory |
---|---|---|
smsSender |
String |
Only if an sms provider is used |
emailSenderAddress |
String |
Only if an email provider is used |
emailSenderName |
String |
No |
Provider¶
Field |
Type |
Mandatory |
---|---|---|
providerId |
String |
Yes |
backup |
Provider |
No |