Allow for shorter interval times
Currently the shortest interval time allowed is 15 minutes. Fifteen minutes is way too long for some of our most vital services and doesnt filter out true issues from small outlier issues. Allowing for a shorter time interval would fix this issue.
Example: Monitoring request times and report back if the service takes longer than 6000ms.
Current (option1)
15 minute interval
after one failure send an email notifying the administrator about the issue
Problems:
The service could have had issues for 15 minutes
The service could have had one network issue and is not reflective of a true issue (a small blip) causing geocortex analtics to spam emails over small non issues.
Current (option2)
15 minute interval
after two consecutive failures send an email notifying the administrator about the issue
Problem:
The service could have had issues for 30 minutes (unacceptable for vital programs)
Shorter Time Intervals
3 minute interval
after two consecutive failures send an email notifying the administrator about the issue
The goodies it solves:
The service could have had issues for 6 minutes or less (acceptable for vital programs)
Alerts the administrator about real issues instead of a bunch of network blips
Shorter Time Intervals (even better)
3 minute interval
after two consecutive failures send an email notifying the administrator about the issue
after six failures in a 24 hr period, notify the administrator
The goodies it solves:
The service could have had issues for 6 minutes or less (acceptable for vital programs)
Alerts the administrator about real issues instead of a bunch of network blips
Notifies the administrator about service consistency. A bunch of long response times over 24 hrs could indicate a bigger issue or a misconfiguration (which is not good for our users).
There is a certain finesse when it comes to servers and services monitoring. Too much chatter in the data can make it difficult to read, comprehend, and see issues. Too little data doesnt tell the user about serious service issues.
Thank you for your time,
RGT
Example: Monitoring request times and report back if the service takes longer than 6000ms.
Current (option1)
15 minute interval
after one failure send an email notifying the administrator about the issue
Problems:
The service could have had issues for 15 minutes
The service could have had one network issue and is not reflective of a true issue (a small blip) causing geocortex analtics to spam emails over small non issues.
Current (option2)
15 minute interval
after two consecutive failures send an email notifying the administrator about the issue
Problem:
The service could have had issues for 30 minutes (unacceptable for vital programs)
Shorter Time Intervals
3 minute interval
after two consecutive failures send an email notifying the administrator about the issue
The goodies it solves:
The service could have had issues for 6 minutes or less (acceptable for vital programs)
Alerts the administrator about real issues instead of a bunch of network blips
Shorter Time Intervals (even better)
3 minute interval
after two consecutive failures send an email notifying the administrator about the issue
after six failures in a 24 hr period, notify the administrator
The goodies it solves:
The service could have had issues for 6 minutes or less (acceptable for vital programs)
Alerts the administrator about real issues instead of a bunch of network blips
Notifies the administrator about service consistency. A bunch of long response times over 24 hrs could indicate a bigger issue or a misconfiguration (which is not good for our users).
There is a certain finesse when it comes to servers and services monitoring. Too much chatter in the data can make it difficult to read, comprehend, and see issues. Too little data doesnt tell the user about serious service issues.
Thank you for your time,
RGT
5
Vous devez vous connecter pour laisser un commentaire.
Commentaires
0 commentaire