You Cannot ‘Fly Blind’ in the Cloud Native Landscape
#1 Support In-depth Data Collection for the container infrastructure including the IaaS Layer
-
capacity issues with the underlying IaaS hosts where pods can run out of resources, including CPU/Memory/Disk/Network bottlenecks
-
failed containers that need to be migrated, issues with the k8s master components themselves, such as the API server, etcd etc
-
Metrics on pods, Deployments, ReplicaSets across Namespaces, their status, etc. Provide a real-time dynamic view of their usage patterns
-
The status of all the microservices and components such as service meshes, serverless functions, etc
-
Simple and advanced dashboards on all the above
-
Enable the prioritization of monitoring; start with the most frequent or highest business value components & then the others in relative order of priority
#2 Serve as the centralized single source of Monitoring truth
-
Platform performs constant24x7 monitoring across the stack – applications, the container platform, the underlying compute/network and storage. Provide an ability to monitor IaaS resources by ’tags’ and the container infrastructure using kubernetes metadata.
-
Provide self-service monitoring capability across the business and enables the delivery of reports/dashboards and analytics that matter to the relevant stakeholder. The “single pane of glass” paradigm is even more important here.
-
Long term data storage that enables not just ephemeral data collection but also advanced analytics that enables quick fault diagnostics
#3 Integrate with the Application development lifecycle – DevOps process & CI/CD pipelines
-
API based integration with CI/CD services
-
Flexible and easy to use UX that can be embedded with 3rd party applications
-
Ability to perform root cause identification and troubleshooting for test environments right from the development pipeline itself
-
Autoscaling of monitoring services based on application scale up/scale down and alerting
#4 Easy to Extend, to & must support fast time to root cause identification
-
Often users are drowning in alerts and need to spend as little time as possible before they understand the importance of what they are looking at as a way of deciding if an alert or trend merits further investigation.
-
The visualization tool should enable slicing and dicing of large datasets that span time series data, system events while enabling multiple perspectives both from a user, metadata, business, and statistical perspective. From a visualization standpoint support needs to be provided for all the usual suspects – min, max, std deviation, percentiles et al at a minimum.
-
Alerts need to be provided for the desired range of conditions – Actual vs Real. Prominent examples include CPU, Memory, Network, # of pod replicas, internal kubernetes components and finally an ability to set a custom alert.
-
A key point about notifications is to ensure that as much business context is added to system & application notifications. At the time of writing the vast majority of monitoring notification messages are meant for systems to read and interpret and not humans. Helpful hints also need to be provided on how to potentially act on the notification.
-
While this may introduce AI-Ops into the discussion – the system also needs to continually learn about the environment with the goal of increasing monitoring automation over a period of time. Why is this as important? My wager is that around 60% of monitoring notifications, alerts and workflows can be solved with automated actions thus freeing up valuable and high-cost human resources for other value-added tasks such as helping onboard different lines of business applications into the central platform.
#5 This one is culture – understand what needs to be monitored and avoid monitoring anti-patterns as much as you can
-
Applying a “one size fits all” or a “cookie cutter” approach to monitoring across the organization by using the same set of metrics & templates for every application. Every application is different and needs to be monitored in a fundamentally different manner
-
Not building a business case for monitoring that takes the interdisciplinary nature of monitoring into account. By that we mean understanding the core concerns of business, Infrastructure teams/SREs and developers.
-
Not providing self service & automation across a unified monitoring platform
-
Not being proactive in using monitoring to detect failures or unfavorable trends In applications before they happen
-
Not infusing monitoring early on in the development cycle
Conclusion
This article originated from http://www.vamsitalkstech.com/?p=7619
Vamsi Chemitiganti is a Tigera guest blogger. Vamsi Chemitiganti is Chief Strategist at Platform9 Systems. Vamsi works with Platform9’s Client CXOs and Architects to help them on key business transformation initiatives. He holds a BS in Computer Science and Engineering as well as an MBA from the University of Maryland, College Park.
————————————————-
Free Online Training
Access Live and On-Demand Kubernetes Tutorials
Calico Enterprise – Free Trial
Solve Common Kubernetes Roadblocks and Advance Your Enterprise Adoption
Join our mailing list
Get updates on blog posts, workshops, certification programs, new releases, and more!