Monitoring Manifesto

What is Monitoring?

This is a fluid document that continues to change based on current philosophies and events that dictate content based on new and evolving technologies.

This page is primarily focused on monitoring an environment consisting of devices and applications centered around information technology, but the principles can be applied to other environments as well.

Definitions

  • Monitoring: 
    • The systematic collection of data from a specified environment in a way that any changes or variations can be analyzed, recognized, and acted upon if needed.
  • Environment: 
    • The location of the entity that is to be monitored. This could be a physical location or an online service including multiple physical locations which consist of services that are composed of products and applications that internal and external entities utilize and depend upon for daily operations and functional duties.
  • Services: 
    • High level names for a set of products and applications made available to customers. Many combined services make up an overall environment. These services include many components which include but are not limited to servers, services, daemons, processes, applications, databases, network devices, endpoints, websites (internal and external), and related hardware.
  • Servers: 
    • Physical or virtual hardware platform that supports software applications which provide data and information to customers or other applications.
  • Daemons: 
    • Processes and applications that run on servers and provide access to customers or other applications and databases.
  • Notifications: 
    • Methods of contacting individuals identified as being responsible for services.

Structure

  • Component Configurations
    • Individual components may be classified as any level listed below but the overall service and environment will be classified as sum of each component. If ANY component or service is classified "Customized" then the overall environment is classified as Hybrid at best. There is no way to have an overall Structured environment when any one component or service is classified as Custom.
      • Structured
        • Services are the same, standardized, automated and not modified from initial deployment. Updates and changes can be completed through automated processes.
        • Auto-Discovery possible
        • Automation encouraged
        • Changes are regular and automated
        • Procedures are developed, built, and modified to fit available service offerings
      • Customized
        • Each service is different, modified from initial deployment, and requires individual, specialized attention for updates and changes.
        • Auto-Discovery discouraged and problematic
        • Automation is difficult
        • Changes and updates can be regular but require special attention and close supervision
        • Service offerings are developed, built, and modified to fit current procedures
      • Hybrid
        • A mixture of Structured and Customized
        • Auto-Discovery difficult and limited
        • Automation encouraged but limited
        • Changes and updates can be regular but require special attention and close supervision
        • Service offerings are developed, built, and modified to fit current procedures

Methods of monitoring

  • Data Collection Methods
    • SNMP
      • Polling
        • Periodically contacting endpoints on systems which provide performance and status information
      • Traps
        • Maintaining an endpoint for systems to contact in the event of an issue or failure
      • Endpoints
        • SNMP
        • PING
        • SSH
        • HTTP(S)
      • XML/JSON/Prometheus
      • Port scanning
      • Logs
      • Agents
  • Alerts / Notification
    • Methods
      • Alerting methods can consist of one or many of the following technologies.
      • Predictive
        • Sending notification based upon a threshold set by an algorithm that compares historically collected data to current rates of change to determine a future problem or outage.
      • Reactive
        • Sending a notification based upon a threshold is set upon current data collection methods.
      • Email
      • Pagers
      • Server or workstation Agents
      • Internal Application Popup
      • Website Display

What is monitored

  • Discovery of Services to monitor
    • Auto Discovery
      • A series of automated applications and processes that systematically 'search' a network, subnet, or list of endpoints while documenting AND/OR updating configurations based upon results in order to create an overall snapshot or picture of the target environment
    • Manual Discovery
      • A series of manual tasks to gather information that is used to document AND/OR update configurations in order to create an overall snapshot or picture of the target environment
    • Auto Discovery
      • Efficiently create and document endpoints on a mass scale
      • Accurate if automated on a mass scale
      • Reliable and repeatable on a mass scale
        • Can create more trouble it's worth
        • Not efficient in a highly customized legacy environment
        • Can cause a lot of work if automated incorrectly
    • Manual Discovery
      • Precautionary
        • Individuals reach out to service owners to identify and create endpoints for data collection from multiple aspects of a service to prevent future outages.
        • Service owners reach out to the monitoring team to identify and create endpoints for data collection from multiple aspects of a service to prevent future outages.
      • Reactive
        • Incidents and outages are analyzed and reviewed to identify and create endpoints for data collection from multiple aspects of a service to prevent future outages.
    • Data Collection Roadblocks
      • Past
        • Old services that need special attention with discovery. Current and new employees may not have the knowledge of the initial deployment and do not know what is available to be monitored.
      • Present
        • Current projects that do not take monitoring into consideration when deploying an application.
        • Services are slowly enabled over time to prevent issues and allow for close attention but are never declared fully productional.
      • Future
        • Monitoring is not included in the planning and deployment of the service.
      • Monitoring continues past the necessary date when it is no longer needed costing money and creating false positives when issues occur.

No comments:

Post a Comment