This blog post is the first in a trilogy in which we look at full-stack observability into your IT environment. In this first post we concentrate on monitoring vs. observability in general. In the second post we will examine the observability solution from Elastic. And in the third and final post we focus on our own PeopleSoft Manager Performance & Health solution that has been built on the Elastic Stack.
Obtain full-stack observability into your IT environment with Elastic and Blis Digital.
Observability: An origin story
Observability has gained popularity recently. Reasons for this are that it addresses the specific needs of modern, complex architectures and accelerates incident response, improves reliability, and ultimately enhances the user experience. While this may sound cool, on the other hand, observability is simply the ability to see, or monitor, what is going on inside an application. So, what makes observability different from the monitoring we do since the dawn of the Information Age? To answer that question, we need to look back at how monitoring evolved in our IT world.
Availability Monitoring
Our story starts in the 90s when the World Wide Web started to transition from altruistic knowledge sharing to making money through e-commerce. As more and more money was put in internet-based companies, outages and failures became more and more costly. So, the need to know if your website is available became a necessity. This resulted in the concept of availability monitoring. Availability (or uptime) monitoring is an automated way of checking (pinging) whether a service such as a website or an application is available. Unavailability would usually result in an email, SMS or other type of message send to an administrator to resolve whatever is the problem.
System Monitoring (SM)
As only being informed that a website or application is unavailable will not tell you much about what happened; the need to see what was going on from the inside became more apparent with the start of the new decade. This resulted in the rise of system monitoring. At first by creating simple scripts to check system internals against thresholds. And later with the help of specialised system monitoring tools. Those monitoring tools usually would send an email or alert to an administrator whenever a threshold is breached.
Currently, system monitoring is an umbrella category of software that enables organizations to manage, operate, and monitor IT systems in a centralized manner. Nagios was one of the first well known system monitoring tools that could be widely adapted across industries.
Real User Monitoring (RUM)
Knowing what is going on inside systems is nice, but it will not tell you how your users are perceiving your service. In the late 2000s the focus shifted from monitoring systems to monitoring users. The ability to monitor transactions from real users of a service gave great insight into what users actually were experiencing. Real User Monitoring passively collects data from real users in real time. Making it possible to optimize a service based on real data. At this time a service like Pingdom quickly became popular because it offered also website performance insights at a time when user experience became crucial for businesses.
Application Performance Monitoring (APM)
Now we are in de mid-2010s and have our users also on the table, there is still a void between the users and systems. Our applications sits between them as a black box that comes in many different shapes. By tracing and timing calls within the application we can introspect the path calls make and where time is spent. Dynatrace is one of the well-known pioneers in the Application Performance Monitoring space.
Today’s Application Performance Monitoring has been developed, as Gartner defines it, into a suite of monitoring software, comprising:
- Digital Experience Monitoring (DEM)
- Application Discovery, Tracing and Diagnostics (ADTD)
- Purpose-built Artificial Intelligence for IT Operations (AIOps).
Digital Experience Monitoring (DEM)
Digital Experience Monitoring is a software tool that supports the optimization of the operational experience and behaviour of a digital agent, human or machine, as it interacts with enterprise applications and services. The primary tools for Digital Experience Monitoring are synthetic monitoring, which actively emulates user interactions, and real user monitoring.
Application Discovery, Tracing, and Diagnostics (ADTD)
Application Discovery, Tracing, and Diagnostics tools seek to understand the relationships between applications using methods like Bytecode Instrumentation (BCI) or profiling. For example, by adding bytecode to a Java class during run time.
Artificial Intelligence for IT Operations (AIOps)
AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.
Shift towards observability
While Application Performance Monitoring filled the gap between our systems and users, it wasn’t the answer to the sheer growth in the complexity of our IT landscape in the early-2020s. The rise of microservices, containerization, distributed systems and cloud computing made that traditional monitoring tools were no longer sufficient. This new complexity required a shift from reactive monitoring to a more proactive approach. And this is where observability entered the room. By (centrally) combining metrics, logs, and traces to capture the whole picture of a system’s health new tools like Elasticsearch, Logstash, Kibana (ELK) were developed to address these needs.
Modern observability
Observability today has become a central practice, blending tools and best practices from monitoring, logging, tracing, and even user experience analytics to give a real-time, holistic view of a system. Observability is increasingly focused on automation and intelligent insights, with AI and machine learning algorithms aiding in anomaly detection and predictive analytics. As organizations adopt Site Reliability Engineering (SRE) practices, observability is the cornerstone that allow engineers to ask questions about system behaviour and proactively manage performance, reliability, and user experience. Platforms like OpenTelemetry and the Elastic Stack are examples of modern observability solutions.
Monitoring vs. observability
According to Gartner, todays observability is an evolution of established monitoring, emphasizing visibility of the state of the digital service by exploring high cardinality data outputs from the application. This in contrast to traditional forms of monitoring that focus on the individual components that make up the service.
Monitoring and observability are related but distinct concepts. While monitoring is about ensuring known metrics are within expected ranges, observability is about understanding the overall system’s health and diagnosing unknown issues. Observability enables engineers to explore, ask new questions, and better understand root causes, especially in dynamic environments where traditional monitoring alone falls short. When put side-by-side this are the key differences:
Monitoring:
- The when and what
- Measure and report specific metrics
- Identify anomalous system effects
- Focused on standalone systems.
Observability:
- The why and how
- Collect metrics, events, logs and traces for deep investigation
- Find the root cause of anomalous system effects
- Focused on multiple systems and chains.
Elastic Observability
With its observability solution the Elastic Stack is an good example of a modern observability platform. In the next blog post in this series we will take a closer look at this observability solution from Elastic.
Never miss a beat
The observability solution from Elastic enables you to achieve business and operational excellence. Enable cross-team collaboration to proactively detect, corelate and resolve issues. With the AI-powered observability platform from Elastic, implemented by Blis Digital, you gain full-stack observability into your IT environment.
Never miss a beat of your application with Blis Digital.
Can we help you?
Are you interested in learning more about Observability or facing a specific Performance challenge? Feel free to contact: Cees Schrijen, Account Manager Enterprise IT (c.schrijen@blisdigital.com | +31 6-17503107).
Read more about our Performance Monitoring and Analytics services.