Observability solutions are crucial for modern software development and operations teams, allowing them to gain insight into their systems, detect anomalies, and troubleshoot issues quickly.
Many organizations rely on third-party vendors such as Datadog, Dynatrace or New Relic for SaaS-based observability(there are many others). While these vendors are typically reliable, they are still susceptible to outages, which can have significant impact on an organization's ability to monitor and troubleshoot their systems.
During January 2023, Dynatrace subscribers faced login issues that lasted several hours. While agent, synthetic, and RUM data remained intact, the inability to monitor and visualize crucial infrastructure posed a significant challenge. In another incident only this month (March 2023), Datadog suffered a complete outage that resulted in both customer data loss and limited access. The Datadog support team worked for hours to restore the lost data, while customers experienced application latency, further exacerbating the situation. New Relic experienced similar issues in October 2022, although only lasting 2-3 hours, its still 2-3 hours of being blindfolded.
To address the potential risks of relying solely on a third-party observability vendor, it is crucial to establish a backup plan or PLAN B. One effective strategy is to leverage a data pipeline and observability platform such as Logiq.ai, which empowers customers to own and store their first-mile data locally before forwarding it to their preferred SaaS platform. Logiq.ai serves as more than a mere forwarder; it functions as an additional set of eyes, providing an observability PLAN B in the event of a future outage. Its in-line architecture ensures that every aspect is monitored twice, providing redundancy in the event of a system failure. In essence, even if one eye fails, the other eye remains functional, ensuring that crucial data remains visible and actionable.
With Logiq.ai, logs, metrics, and traces can be safely captured and forwarded, or analyzed before being sent to a SaaS platforms. This provides customers with more control over their data and reduces their dependency on any single vendor for observability. By owning their first-mile data, customers can still capture and store important data even if their SaaS observability vendor experiences an outage. If a vendor is unavailable, no data is lost, you have the power to replay again when available.
Real User Monitoring (RUM) and Synthetics have distinct differences. RUM is designed to "phone home," making it difficult to intercept. Meanwhile, Cloud Synthetics reside in external clouds. Nonetheless, if your network team employs path traceability tools like Kentik, you can diminish the impact of top-level domain monitoring and bolster your resilience.
In addition to providing a backup plan for observability, Logiq.ai can also help customers optimize their data pipeline by allowing them to pre-process and enrich their data before forwarding it to their SaaS platform. This can lead to faster and more accurate troubleshooting and analysis, reducing the time it takes to resolve issues.
By using Logiq.ai, your operations team can stay informed, and anomaly detection can continue to function effectively on your data. Furthermore, service management tools like Servicenow or Pagerduty will still receive alerts, notifying operators of any significant anomalies such as a higher than normal occurrence of 500 errors or a host running at 100% CPU capacity.
Another major advantage of utilizing Logiq.ai is that your data remains within your cloud, allowing you to retain as much data as you need without incurring exorbitant costs. Once implemented, you have full control over the data flow to SaaS providers, enabling you to selectively filter out data based on tags like "Environment=DEV, INT, UAT, PP" if desired. This can result in significant cost savings, potentially covering the expense of Logiq.ai tenfold.
There are several other options to consider for a PLAN B. For example, if you prefer AWS or Azure as your cloud platform, you can take advantage of native tools like AWS Cloudwatch or Azure Monitor to monitor your system and provide updates to your operations team. However, it is worth noting that you may still face comparable issues with RUM and Synthetics. Additionally, there is no data pipeline in place, which can result in data gaps following an outage.