PromQL Playground
The PromQL Playground provides an interactive interface to query and explore Prometheus metrics from your Kubernetes clusters. It allows you to write, test, and execute PromQL (Prometheus Query Language) queries in real-time.
Overview
The PromQL Playground is designed to help you:
- Explore available metrics in your clusters
- Test and validate PromQL queries before creating alert rules
- Debug metric collection and labeling
- Learn PromQL syntax with immediate feedback
Getting Started
Accessing the Playground
- Navigate to Alerts in the main menu
- Select PromQL Playground from the Kubernetes section
- Choose a cluster from the dropdown
Basic Usage
- Select a Cluster: Choose the Kubernetes cluster you want to query
- Write Your Query: Enter a PromQL expression in the query input
- Execute: Click "Execute Query" or press
Cmd/Ctrl + Enter - View Results: Results are displayed in a table format with metric names, labels, and values
Sample Queries
Basic Queries
Check Pod Status
upReturns the up/down status (1/0) of all monitored targets.
CPU Usage by Pod
container_cpu_usage_seconds_totalShows cumulative CPU time consumed by containers.
Memory Usage
container_memory_usage_bytesDisplays current memory usage in bytes for all containers.
Filtering Queries
Metrics for Specific Namespace
up{namespace="kube-system"}Returns metrics only for pods in the kube-system namespace.
Metrics for Specific Pod
container_cpu_usage_seconds_total{pod="my-app-pod"}Shows CPU usage for a specific pod.
Multiple Label Filters
container_memory_usage_bytes{namespace="production",container="app"}Filters by multiple labels simultaneously.
Aggregation Queries
Total CPU Usage per Namespace
sum by (namespace) (rate(container_cpu_usage_seconds_total[5m]))Aggregates CPU usage rate over 5 minutes, grouped by namespace.
Average Memory Usage
avg(container_memory_usage_bytes) by (namespace)Calculates average memory usage per namespace.
Pod Count per Namespace
count(kube_pod_info) by (namespace)Counts the number of pods in each namespace.
Rate and Increase Queries
HTTP Request Rate
rate(http_requests_total[5m])Calculates the per-second rate of HTTP requests over the last 5 minutes.
Network Traffic Rate
rate(container_network_receive_bytes_total[1m])Shows the rate of network bytes received per second.
Disk I/O Operations
rate(container_fs_writes_total[5m])Displays the rate of filesystem write operations.
Advanced Queries
CPU Usage Percentage
100 * (1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])))Calculates CPU usage as a percentage.
Memory Usage Percentage
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))Shows memory usage as a percentage of total memory.
Top 5 Pods by CPU Usage
topk(5, sum by (pod) (rate(container_cpu_usage_seconds_total[5m])))Returns the top 5 pods with highest CPU usage.
Pods with High Memory Usage
container_memory_usage_bytes > 1000000000Lists containers using more than 1GB of memory.
Helper Labels
The playground provides quick access to common label values through the Helper Labels drawer:
- Organization ID: Your organization identifier
- Project ID: Current project identifier
- Environment IDs: All environment identifiers in your project
These can be used in your queries for filtering:
up{organization_id="<your-org-id>"}Query Tips
Time Ranges
Use square brackets to specify time ranges:
[5m]- Last 5 minutes[1h]- Last 1 hour[1d]- Last 1 day
Functions
Common PromQL functions:
rate()- Calculate per-second rateincrease()- Calculate increase over time rangesum()- Sum valuesavg()- Average valuesmax()/min()- Maximum/minimum valuescount()- Count number of time seriestopk()/bottomk()- Top/bottom K values
Operators
- Arithmetic:
+,-,*,/,%,^ - Comparison:
==,!=,>,<,>=,<= - Logical:
and,or,unless
Best Practices
- Start Simple: Begin with basic metric queries and add filters gradually
- Use Time Ranges: Always specify appropriate time ranges for rate/increase functions
- Filter Early: Apply label filters to reduce the data set before aggregation
- Test Before Alerting: Validate queries in the playground before creating alert rules
- Monitor Performance: Complex queries may take longer to execute
Common Use Cases
Debugging Application Issues
# Check if pods are running
up{namespace="my-app"}
# View recent error logs count
sum(rate(log_messages_total{level="error"}[5m])) by (pod)Capacity Planning
# Current resource usage
sum(container_memory_usage_bytes{namespace="production"}) / 1024 / 1024 / 1024
# Trend over time
avg_over_time(container_cpu_usage_seconds_total[1h])Performance Monitoring
# Request latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
rate(http_requests_total{status=~"5.."}[5m])Troubleshooting
No Results Returned
- Verify the cluster is connected and metrics are being collected
- Check that the metric name is spelled correctly
- Ensure label filters match existing labels
- Try a simpler query without filters first
Query Timeout
- Reduce the time range
- Add more specific label filters
- Simplify aggregations
- Consider breaking complex queries into smaller parts
Next Steps
- Use validated queries to create Alert Rules
- Set up Probes for health checks
- Configure Alert Routing for notifications