Monitoring & Observability
Monitoring & Observability
This document covers error tracking, logging, health checks, and the admin health dashboard for the Groundtruth Platform.
Sentry Error Tracking
The platform uses Sentry for error tracking across both the Next.js web application and the Python engine.
Next.js Integration
Sentry is integrated via @sentry/nextjs, which instruments three runtime environments:
- Client — Browser-side errors, unhandled promise rejections, React error boundaries
- Server — API route errors, server component errors, middleware failures
- Edge — Edge runtime errors (middleware, edge API routes)
Configuration is applied in next.config.ts using the withSentryConfig wrapper. This enables automatic instrumentation of:
- API route handlers
- Server-side data fetching
- Client-side navigation and rendering
Source maps are uploaded to Sentry during CI builds for readable stack traces in production. Source map uploads are disabled in local development to avoid slowing down builds.
Python Engine Integration
The engine uses sentry-sdk[fastapi] to capture errors from:
- FastAPI request handlers
- Background crew execution tasks
- Unhandled exceptions in engine modules
Required Environment Variables
| Variable | Description |
|---|---|
SENTRY_DSN | Sentry Data Source Name (the ingest URL for your project) |
SENTRY_ORG | Sentry organization slug (used for source map uploads) |
SENTRY_PROJECT | Sentry project slug (used for source map uploads) |
Sentry is optional. If SENTRY_DSN is not set, the platform operates normally without error tracking. See environment-variables.md for the full list of configuration variables.
Structured JSON Logging
Logger Module
A custom logger is implemented at src/lib/logger.ts. It provides structured, contextual logging with environment-aware formatting:
- Production — JSON format, machine-parseable. Each log entry is a single JSON object per line, suitable for ingestion by log aggregation services (Datadog, Logflare, etc.).
- Development — Human-readable format with color coding for quick scanning in the terminal.
Log Entry Fields
Every log entry includes structured metadata:
| Field | Description |
|---|---|
level | Log level: info, warn, error, debug |
message | Human-readable log message |
timestamp | ISO 8601 timestamp |
tenantId | Tenant context (when available) |
engagementId | Engagement context (when available) |
action | The action being performed (e.g., engagement.start, deliverable.write) |
metadata | Additional key-value pairs specific to the log event |
Migration from console.error
All console.error calls throughout the codebase have been migrated to logger.error to ensure consistent structured output. Direct console.log and console.error usage should be avoided in favor of the logger module.
Usage Example
import { logger } from '@/lib/logger';
logger.info('Engagement started', {
tenantId: tenant.id,
engagementId: engagement.id,
action: 'engagement.start',
});
logger.error('Engine request failed', {
tenantId: tenant.id,
action: 'engine.request',
metadata: { statusCode: 500, endpoint: '/run' },
});Admin Health Dashboard
Location
/dashboard/admin/health
Access Control
Restricted to users with the admin or owner role. Other roles (member, viewer) receive a 403 Forbidden response. Role enforcement is handled by the RBAC middleware. See security.md for details on the role hierarchy.
System Status Overview
The dashboard displays an overall system status:
| Status | Meaning |
|---|---|
| Healthy | All services responding normally |
| Degraded | One or more services slow or partially failing |
| Down | Critical service unreachable |
Service Checks
Three services are monitored with individual status and latency:
-
Database (PostgreSQL) — Executes a lightweight query against the Supabase PostgreSQL instance. Reports connection success/failure and round-trip latency in milliseconds.
-
Redis (Upstash) — Pings the Upstash Redis instance. Reports connection success/failure and latency. If Redis is not configured, the check reports "not configured" rather than "down" (Redis is optional).
-
Engine (Railway) — Calls the Railway-hosted Python engine's
/healthendpoint. Reports availability, latency, and engine version.
Dashboard Statistics
In addition to service health, the dashboard displays:
- Active runs — Number of currently running engagements
- Total engagements — Aggregate count across the platform
- Recent errors (24h) — Count of errors captured in the last 24 hours
Auto-Refresh
The dashboard automatically refreshes health data on a periodic interval to provide near-real-time status without manual reloading.
Health Endpoints
Web Application Health
GET /api/healthNo authentication required. Returns:
{
"status": "ok",
"service": "groundtruth-web",
"timestamp": "2026-02-18T12:00:00.000Z"
}This endpoint is suitable for uptime monitoring services (Vercel checks, UptimeRobot, Pingdom, etc.).
Engine Health
GET /healthHosted on the Railway Python engine. Returns the engine's status and version. Used by the admin health dashboard to verify engine availability.
Related Documentation
- Security — RBAC, RLS, rate limiting, audit logging
- Environment Variables — Full list of configuration variables including Sentry, Redis, and engine URL