Top-level pages for Kubernetes objects allow you to drill into the hierarchy of Kubernetes objects in your fleet. Main pages include lists of Clusters, namespaces, workloads, and Nodes.
For example, the Cluster main page shows the list of your Clusters. When you click on a Cluster in the list, it opens the Cluster detail page. That page shows the detail information for the Cluster along with a list of Nodes within the Cluster.
You can continue to drill into a Node and see the list of Pods for that Node, all the way to the container level.
There are also main pages for you to view alerts, configuration, and data for cost and efficiency.
For additional navigation tips, refer to Navigation tips for Kubernetes Monitoring.
Start with high-level snapshot
At the Kubernetes Overview home page, you can get a high-level look at your Clusters and alerts.
Refine counts of Kubernetes objects
Adjust the time range selector and filter by Cluster and namespace to view the counts for:
Clusters, Nodes, namespaces, workloads, Pods, and containers
You can use the time range selector to focus on a time period while looking for any spikes in CPU and memory usage in your Clusters.
When spikes occur:
Zoom in on the graph to narrow the time selection.
Hover over and click the peak of the spike to see the percentage of use compared to capacity. In the following example, the spike shows 46.5% of CPU usage compared to capacity.
Click the link to view the Cluster. The Cluster page shows the time range you set when zooming in on the graph.
You can review costs at a high level and by Kubernetes object, from Cluster all the way to the container level. At the Cost page, use the Overview and Savings tabs to gain a high-level understanding what Kubernetes is costing and how you can save. You can also see the cost of each item in a list view as well as on the detail pages.
CPU and memory prediction can help you ensure resources are available during spikes in resource usage and help you decrease the amount of unused resources due to over provisioning. To use prediction tools, first enable the Machine Learning plugin.
The following buttons are available in various views. Click them to show a prediction for Clusters, namespaces, workloads, Nodes, Pods, and containers:
Predict Mem Usage: Shows a predictive graph for memory usage one week in the future. Calculations are based on metrics from the previous week.
Predict CPU: Shows a predictive graph for CPU usage one week in the future. Calculations are based on metrics from the previous week.
With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development.
This feature can be seen on this Node details page.
Detect outlier Pod CPU usage
Within a workload detail page, click the Detect Outlier CPU Usage amongst Pods button to identify a Pod that has CPU usage different from the other Pods.
With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development.
This feature can be seen on this namespace details page.
Use Explore for troubleshooting
Click Explore this query in the Machine Learning plugin to view the raw data and troubleshoot issues. Here you can adjust parameters and see a more detailed graph of the findings.
Select a time range to see your historical data for any time frame you choose. As you navigate from page to page, the time range remains the same for period you set until you change it again.
As an example, the Pod optimization section of the Pod detail page shows a time range over several hours. You can use this to understand the historical pattern of CPU usage and memory usage.
Zoom into an area of any graph on the detail pages to narrow the time range selector even further. The time range remains selected until you click Back to default.
You can find deleted Clusters, namespaces, workloads, Nodes, Pods, and containers to understand what occurred in the past. To do so, set the time range selector to a past time period.
The following example shows a time range of the previous 30 days, and then filtering for Nodes with the condition of “No data”. The Node detail page shows a graph depicting when the Node expired.
Grafana Cloud has a default 30-day limit for queries. If your Kubernetes object was deleted 30 days beyond the current date, use the time range selector to choose a specific 30-day time frame in the past.
Use the network panels to understand when bandwidth limits are causing network saturation, which can lead to dropped packets.
On any detail page for Cluster, namespace, workload, Node, and Pod, click the Network tab to view:
Network Bandwidth Rx/Tx: Shows the rate of received and transmitted bytes
Network Saturation Rx/Tx dropped packets: Shows rate of received and transmitted packets dropped
Network Bandwidth and Network Saturation by Node, workload, or Pod: Shows the bandwidth and saturation by object
With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development.
This feature can be seen on the Network tab of this namespace details page.
View logs and events
From any detail page, click the Logs & Events tab to view the logs and events for that Kubernetes object.
Navigate easily within the Kubernetes Monitoring app to other capabilities in Grafana Cloud to analyze, troubleshoot, and solve issues.
Start an automated diagnostic
From a Pod, Cluster, namespace, or workload detail view, you can begin an incident investigation by clicking Run Sift investigation.
Sift performs a set of automated system checks and surfaces potential issues in your Kubernetes environment, and works to identify the root cause of an incident.
To access root cause analysis tools in Asserts, enable Asserts on your stack.
You can take troubleshooting deeper by understanding relationships between components and what is occurring between them.
Within Kubernetes Monitoring, access Asserts Workbench to perform root cause analysis.
From any list of Clusters, Nodes, workloads, namespaces, or Pods you choose, select the box to the left of the list item, and click the Compare in Asserts Workbench button.
The RCA Workbench opens in a new tab.
To return to Kubernetes Monitoring, click the browser back button.
View queries to troubleshoot with Explore
To further query data, use any of the Explore buttons available throughout the interface (such as Explore namespaces or Explore alerts). You see a view that provides additional query tools for troubleshooting.
Here are some tips and shortcuts for getting around in Kubernetes Monitoring.
Give it a try using Grafana Play
With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development.
This feature can be seen on the Kubernetes Monitoring Overview.
Jump between main pages
From any main page, click the icon beside the page title to see the menu of all main pages. Then click the page you want to open.
Throughout the views in Kubernetes Monitoring, you see color used as an additional means of indicating status or condition.
For example, sometimes text is a different color for Pod status: