Monitoring is a key and integral part of being part of a vigilent community. Monitoring in the Computer Operations, CyberSecurity and Network world is a means to assess what is "Normal" to that environment when it works properly, and what is "Not Normal" in the cases of service failures and cyber attack. Monitoring is there to help spot when things fail, as they will, when things change, as they will, and when things that appear and they are not supposed to, as they might.
Monitoring is about the static of what is there, confirming how the static is set to operate, and watching how the operations of things is proceeding. Although this is a small set of things to do it covers a large number of objects to do this on within an environment of high complexity that is not described in enough detail to be certain of the paths always taken. It is because of this huge volume of complex paths to follow that monitoring is such a trove of options, tools, and approaches to the problem. But the problem spans and overlaps contractual considerations, operational considerations, service level considerations, security considerations and cybersecurity considerations. When you do something for one it is highly probable that that result is also appropriate for use in another.
Monitoring can also become overbearing and complex in its own right. While not doing any monitoring is one extreme, doing monitoring to the loss of minimum processing capability is the other and the costs for this can be high. Note that the value of monitoring follows Pareto's Assertion (80:20 Rule) that the largest proportion of value come from the first and second iterations of efforts. Also note that even if you are very good in one type of monitoring there are other aspects of monitoring that need to be covered. Focus on one aspect leaves actions in other areas open to use as a bypass route around efforts. It is proposed here that some action, across many monitoring surfaces, provides for a Defence in Depth approach. Each layer also provides obstacles which may force a tripwire and alternative scanning sources to flush bad actors caught between differing styles, timings and mechanisms of review.
Monitoring should also become a source of information to assist in any diagnostic phase of a failure. The monitoring should help decrease the Mean Time To Repair (MTTR) of a known failure form and should be capable of eliminating a know failure form from considerations if it is a new form. It should be adjusted to detect the components and provide direction as to what are not in proper states so that detection is fast and can also confirm that the fix has been fully applied after the fact. Optimally monitoring may also be capable of seeing a failure during its development and prior to its full failure. This directly affects Mean Time Between Failures (MTBF).
Monitoring also has the problem of "TOO MUCH INFORMATION". It is critical that the levels of noise be balanced against the need to know what is really going on and when. Knowing what is the right information evolves and changes as attacks change, as people change, as the environment changes and as the the community learns. This is continuous process/service improvement (CPI or CSI) at its most fundemental. Any of the monitoring processes here must be able to learn, expand and adapt to changes. That is the constant.
These Tools
Over a number of years I had the responsibility to oversee a small Computer Community that was not of scale enough to justify capital and staff resources in large scale monitoring. Along that way things would fail and my community would require better. Along the way I built tools, like those below, to slowly build a repertoire of Monitoring solutions that allowed me to spot trouble points before the user community felt them or to facilitate noting what was "Not Normal" to the operations groups to shorten debugging periods. The tools also allowed me to react to requests for machine assessments, look for tell-tale trails and to report on high level Computer Community health with levels of detail and timeliness not otherwise available. The language of choice here is PowerShell. It is native to the windows world, it is the defacto tool of configuration for many Microsoft products, it has access into the .NET work of the Windows Operating System like no others. Many CyberSecurity groups ban it outright rather than taking advantage of the contols it does haev and the growing set of "Compiler/Registration" controls available. I hope these help. Use them once you understand them.
Standards Applied/Incorporated/Being Incorporated
Results of a run are always pushed to a text file in the .\Results folder with the name
The tools are built to drive off support manifests wherever possible. These files are typically CSV files either as type or as contents. This type of Manifest driven approach allowed me to build specific profiles for specific situations without the need of multiple tools. It also permitted me to add to what I knew, when I knew it. When I understood what were the pieces that needed to be available for DHCP to operate then I could add that. When we had finished debugging an Exchange failure I added those components. The Manifest also allowed me to cut down the sizee of any one run to a reasonable time slice.
The tools are also set to accept arguments passed. In this way what you want done, what manifest to use and the timeframe of review could be set for what was needed rather than being driven by defaults. The tools can be run direct from PowerShell, from the PowerShell ISE but also from ShortCuts on the a Desktop or from Scheduled Tasks providing you with a constant perspective the Computer Community you are interested in. In interactive Posershell you are prompted for the inputs. Through arguments you can tell it a few different operational modes: