Monitoring & Observability

This document covers error tracking, logging, health checks, and the admin health dashboard for the Groundtruth Platform.

Sentry Error Tracking

The platform uses Sentry for error tracking across both the Next.js web application and the Python engine.

Next.js Integration

Sentry is integrated via @sentry/nextjs, which instruments three runtime environments:

Client — Browser-side errors, unhandled promise rejections, React error boundaries
Server — API route errors, server component errors, middleware failures
Edge — Edge runtime errors (middleware, edge API routes)

Configuration is applied in next.config.ts using the withSentryConfig wrapper. This enables automatic instrumentation of:

API route handlers
Server-side data fetching
Client-side navigation and rendering

Source maps are uploaded to Sentry during CI builds for readable stack traces in production. Source map uploads are disabled in local development to avoid slowing down builds.

Python Engine Integration

The engine uses sentry-sdk[fastapi] to capture errors from:

FastAPI request handlers
Background crew execution tasks
Unhandled exceptions in engine modules

Required Environment Variables

Variable	Description
`SENTRY_DSN`	Sentry Data Source Name (the ingest URL for your project)
`SENTRY_ORG`	Sentry organization slug (used for source map uploads)
`SENTRY_PROJECT`	Sentry project slug (used for source map uploads)

Sentry is optional. If SENTRY_DSN is not set, the platform operates normally without error tracking. See environment-variables.md for the full list of configuration variables.

Structured JSON Logging

Logger Module

A custom logger is implemented at src/lib/logger.ts. It provides structured, contextual logging with environment-aware formatting:

Production — JSON format, machine-parseable. Each log entry is a single JSON object per line, suitable for ingestion by log aggregation services (Datadog, Logflare, etc.).
Development — Human-readable format with color coding for quick scanning in the terminal.

Log Entry Fields

Every log entry includes structured metadata:

Field	Description
`level`	Log level: `info`, `warn`, `error`, `debug`
`message`	Human-readable log message
`timestamp`	ISO 8601 timestamp
`tenantId`	Tenant context (when available)
`engagementId`	Engagement context (when available)
`action`	The action being performed (e.g., `engagement.start`, `deliverable.write`)
`metadata`	Additional key-value pairs specific to the log event

All console.error calls throughout the codebase have been migrated to logger.error to ensure consistent structured output. Direct console.log and console.error usage should be avoided in favor of the logger module.

Usage Example

import { logger } from '@/lib/logger';

logger.info('Engagement started', {
  tenantId: tenant.id,
  engagementId: engagement.id,
  action: 'engagement.start',
});

logger.error('Engine request failed', {
  tenantId: tenant.id,
  action: 'engine.request',
  metadata: { statusCode: 500, endpoint: '/run' },
});

Admin Health Dashboard

Location

/dashboard/admin/health

Access Control

Restricted to users with the admin or owner role. Other roles (member, viewer) receive a 403 Forbidden response. Role enforcement is handled by the RBAC middleware. See security.md for details on the role hierarchy.

System Status Overview

The dashboard displays an overall system status:

Status	Meaning
Healthy	All services responding normally
Degraded	One or more services slow or partially failing
Down	Critical service unreachable

Service Checks

Three services are monitored with individual status and latency:

Database (PostgreSQL) — Executes a lightweight query against the Supabase PostgreSQL instance. Reports connection success/failure and round-trip latency in milliseconds.
Redis (Upstash) — Pings the Upstash Redis instance. Reports connection success/failure and latency. If Redis is not configured, the check reports "not configured" rather than "down" (Redis is optional).
Engine (Railway) — Calls the Railway-hosted Python engine's /health endpoint. Reports availability, latency, and engine version.

Dashboard Statistics

In addition to service health, the dashboard displays:

Active runs — Number of currently running engagements
Total engagements — Aggregate count across the platform
Recent errors (24h) — Count of errors captured in the last 24 hours

Auto-Refresh

The dashboard automatically refreshes health data on a periodic interval to provide near-real-time status without manual reloading.

Health Endpoints

Web Application Health

GET /api/health

No authentication required. Returns:

{
  "status": "ok",
  "service": "groundtruth-web",
  "timestamp": "2026-02-18T12:00:00.000Z"
}

This endpoint is suitable for uptime monitoring services (Vercel checks, UptimeRobot, Pingdom, etc.).

Engine Health

GET /health

Hosted on the Railway Python engine. Returns the engine's status and version. Used by the admin health dashboard to verify engine availability.

Security — RBAC, RLS, rate limiting, audit logging
Environment Variables — Full list of configuration variables including Sentry, Redis, and engine URL

Monitoring & Observability

Monitoring & Observability

Sentry Error Tracking

Next.js Integration

Python Engine Integration

Required Environment Variables

Structured JSON Logging

Logger Module

Log Entry Fields

Migration from console.error

Usage Example

Admin Health Dashboard

Location

Access Control

System Status Overview

Service Checks

Dashboard Statistics

Auto-Refresh

Health Endpoints

Web Application Health

Engine Health

On this page