How to conceive the alarm handling of a commercial website, including slow response and various abnormal handling
- First think clearly what events need attention.
- Let the program find “active” events, such as processing errors, full queues, etc., and then report them
- Write a script to find “passive” events, such as slow processing speed and unresponsive service, and then report them
- Let the script of 3 run regularly and turn the passive event into the active event (heartbeat)
- Finally, a center receives reports from various sources and handles incidents of different severity according to a certain strategy.
Attention should be paid to controlling the frequency of reporting, so as to avoid filling up your network card with the reported requests when the network is bad, adding to the problem.