An Event would be generated a node or the Trustgrid cloud based on a change in status. (e.g. a node disconnects)
If the Event matched the conditions of a configured Filter then an Alert would be generated and sent to the configured notification integrations (e.g. email, PagerDuty, OpsGenie, Slack or Teams)
The new system introduces a few new terms. It operates like this:
An Event would be generated based on a change in status. This is the same as before.
The Event will now generate an Alert.
The Alert is compared to the conditions defined within an Alarm.
Matching Alarms will send the details to any selected Channel.
The Channel will then push the information through the configured notification integrations. (e.g. email, PagerDuty, OpsGenie, Slack or Teams)
As part of the release deployment, Trustgrid will convert all existing Filters to Alarms and Channels automatically. No action is required to maintain the current behavior.
The new system introduces the idea of alerts being resolved. While an alert is in the unresolved state any repeat of that event will not generate a new alert. This should reduce the noise of repetitive alerts.
There are three ways to resolve alerts:
Manually in the portal.
This can be done for all Alerts across the organization via the Alarms > Events page.
Or, to resolve alerts for a specific node you can click the Info panel at the top right.
Alerts will resolve themselves after 24 hours
Alerts may resolve if a corresponding event indicates the status has changed. An example would be the node connectivity events.
When a node disconnects it would generate an alarm. If you have a system such as PagerDuty configured this would create an incident within that system.
When the node reconnects the alarm would be resolved within the Trustgrid system. In the PagerDuty system the incident would automatically be resolved.
The Alarm configuration is very similar to the previous Filter configuration but has been enhanced to allow for greater flexibility.
You can now select multiple Nodes, Event Types or Tag matches. Previously you’d have to create a filter for each.
You can choose if the All, Any or None of the criteria should match to trigger the alarm. Previously all criteria had to be true for a match to occur.
The notification integration has been moved to channels
Also, the “Extra Info to send with alert” section has been renamed Description but the contents are still included in the Alert details sent to any integration.
In the old system, you had to define the notification integrations information (email addresses, API keys, and Webhooks) for each filter you defined. If one of these changed you had to update each filter manually.
Channels separate the integration information from the Alarm configuration. Within a channel, you define one or more notification integrations. Then from the Alarm, you can select one or more channels.
Testing Alarms & Channels
After you’ve configured your Alarms and Channels you can test them to confirm they behave as expected.
Navigate to Alarms > Events
Find an Event that matches your Alarm filter (or that you think it did)
Click the Test button to the right of the event
A brand new feature is the ability to configure suppression windows that will stop any Alarms from triggering. This should enable performing maintenance without generating extraneous notifications to your integrated systems.
Finally, we’ve added the ability to test the WAN connection of a node. This will test the bandwidth between the Node and the Trustgrid control plane.
To run the tool:
Navigate to the desired Node’s page
Select the Network > Interfaces
Under “Interface Tools” click the Speed Test button
Click the GO button to start the test. It will test both upload and download speeds.
After it completes it will display the average results.