Monitoring Your Services

This document provides comprehensive information on how to monitor the health, performance, and usage of your services using our platform's built-in monitoring tools and best practices.

Overview

Effective monitoring is crucial for ensuring the reliability and optimal performance of your applications. Our platform offers a suite of tools designed to give you deep insights into your system's behavior, allowing you to proactively identify and resolve issues before they impact your users.

Key Monitoring Metrics

We recommend tracking the following key metrics:

Using the Monitoring Dashboard

Navigate to the "Monitoring" section in your dashboard to access the following features:

Setting Up Alerts

To configure alerts:

  1. Go to the "Alerts" tab within the Monitoring section.
  2. Click "Create New Alert".
  3. Select the service(s) and metric(s) you want to monitor.
  4. Define the trigger condition (e.g., "Response Time > 500ms for 5 minutes").
  5. Choose your notification channels (e.g., email, Slack, PagerDuty).
  6. Save your alert configuration.

Best Practice: Configure alerts for anomalies rather than just absolute thresholds. This helps reduce alert fatigue by focusing on unusual behavior.

Log Aggregation and Analysis

In addition to metrics, our platform aggregates logs from your services, providing valuable context for diagnosing issues. You can search, filter, and analyze logs directly from the "Logs" tab.

Example log search query to find errors in the last hour:

level:error AND timestamp:now-1h

Health Checks

Ensure your services expose a health check endpoint (e.g., /health) that returns a 200 OK status when the service is healthy. Our monitoring system periodically polls these endpoints to verify service availability.

Metric Description Recommended Threshold
Uptime Percentage of time the service is available. > 99.9%
Response Time (p95) 95th percentile response time. < 200ms
Error Rate Percentage of non-2xx/3xx responses. < 0.1%

For detailed API endpoints related to monitoring data, please refer to the API Reference.

Troubleshooting Common Issues

If you encounter monitoring issues, check the following: