Analytics

Overview

The analytics dashboard provides visibility into engagement quality, task performance, and platform-wide improvement trends. Access it at /dashboard/analytics.

Period Selector

At the top of the dashboard, a period selector lets you filter all data by time range:

7d -- last 7 days
30d -- last 30 days (default)
90d -- last 90 days
All time -- all data since your first engagement

All cards, charts, and tables on the page update to reflect the selected period.

Overview Cards

Four summary cards are displayed at the top of the dashboard:

Card	Description
Average Score	The mean score across all scored tasks in the period, out of 10. Shows "--" if no scores exist.
Tasks Scored	The number of individual tasks that have received a quality score.
Excellent (9-10)	Count of tasks scoring 9 or 10 (green text).
Needs Work (<6)	Count of tasks scoring below 6, combining the weak and failing bands (red text).

Quality Score Distribution

A horizontal bar chart breaks down all scored tasks into five bands. Each bar shows the count and its proportional width relative to the total:

Band	Score Range	Color
Excellent	9 -- 10	Green
Good	7 -- 8	Blue
Acceptable	5 -- 6	Yellow
Weak	3 -- 4	Orange
Failing	1 -- 2	Red

This section is hidden if no scored tasks exist in the selected period.

A vertical bar chart showing the daily average quality score over the selected period. Each bar represents one day, with its height proportional to the average score (out of 10). Bars are color-coded by score level: green (8+), yellow (6--7), red (below 6). Hovering a bar shows the date, average score, and task count for that day.

This section requires at least two days of data to display.

Performance by Rubric Type

Tasks are evaluated using different rubric types (research, strategy, creative, analytical, etc.). The dashboard shows a breakdown for each rubric type that has scored tasks:

Rubric type name (e.g., "research")
Task count -- how many tasks used this rubric
Average score -- displayed as a proportional bar (out of 10), color-coded by score level
Rerun rate -- percentage of tasks where the observer recommended a rerun. Shown in yellow if above 0%.

A high rerun rate may indicate that a particular type of task needs prompt or agent adjustments.

Task Rankings

Two ranked lists highlight the extremes:

Top performers (green header) -- the five tasks with the highest average scores across the period.
Needs improvement (red header) -- the five tasks with the lowest average scores.

Each task shows its name and a score bar. Use the bottom list to identify tasks that may benefit from revised prompts, different agent assignments, or additional context.

This section is hidden if no scored tasks exist.

Pattern Analysis

Admins can trigger a cross-tenant anonymized pattern analysis by clicking the Run Pattern Analysis button at the top of the dashboard. This analysis:

Strips all client-specific data before processing.
Runs across engagement results from all tenants.
Identifies patterns such as consistently low scores in specific rubric types, common issues flagged by the observer, or recurring quality gaps.
Creates improvement proposals based on its findings.

Pattern analysis runs in the background and shows a loading state ("Analyzing...") while processing.

Improvement Proposals

Improvement proposals are generated by pattern analysis, the observer model, or the red-team reviewer. Each proposal card shows:

Field	Description
Title	Short summary of the proposed improvement.
Description	Detailed explanation of the issue and suggested fix.
Source	Where the proposal came from: `pattern_analysis`, `observer`, or `red_team`.
Date	When the proposal was created.

From the analytics dashboard, you can:

Approve a proposal (green button) to apply the suggested change.
Reject a proposal (red button) to dismiss it.

Only approved proposals are applied. Nothing changes automatically. If no proposals are pending, the section displays "No pending proposals. Run pattern analysis to generate insights."

Quality Scores per Engagement

On each engagement detail page, a quality scores section shows per-task scoring information in expandable cards. Each card shows:

Collapsed view:

Score badge (color-coded: green for 8+, yellow for 6--7, red for below 6)
Task name
Rubric type

Expanded view:

Criteria scores -- per-criterion mini bars showing how the task performed on individual scoring dimensions.
Insights (green-tinted section) -- observations extracted by the observer about what worked well.
Issues (red-tinted section) -- problems or weaknesses identified.
Rerun recommendation -- if the observer recommends rerunning the task, a yellow banner is displayed.

The quality scores section is hidden entirely if no scored tasks exist for the engagement.

Scoring Details

Scores range from 1 to 10 per task.
Scoring is performed by the observer model (GPT-4o-mini), which evaluates each task output against rubric criteria after the task completes.
Scores are stored in the TaskScore table and aggregated for dashboard display.

Engagements -- creating and running engagements that produce scored deliverables
Deliverables -- viewing deliverable content, versions, and approval status

Analytics

Analytics

Overview

Period Selector

Overview Cards

Quality Score Distribution

Score Trend

Performance by Rubric Type

Task Rankings

Pattern Analysis

Improvement Proposals

Quality Scores per Engagement

Scoring Details

On this page