Analytics
Analytics
Overview
The analytics dashboard provides visibility into engagement quality, task performance, and platform-wide improvement trends. Access it at /dashboard/analytics.
Period Selector
At the top of the dashboard, a period selector lets you filter all data by time range:
- 7d -- last 7 days
- 30d -- last 30 days (default)
- 90d -- last 90 days
- All time -- all data since your first engagement
All cards, charts, and tables on the page update to reflect the selected period.
Overview Cards
Four summary cards are displayed at the top of the dashboard:
| Card | Description |
|---|---|
| Average Score | The mean score across all scored tasks in the period, out of 10. Shows "--" if no scores exist. |
| Tasks Scored | The number of individual tasks that have received a quality score. |
| Excellent (9-10) | Count of tasks scoring 9 or 10 (green text). |
| Needs Work (<6) | Count of tasks scoring below 6, combining the weak and failing bands (red text). |
Quality Score Distribution
A horizontal bar chart breaks down all scored tasks into five bands. Each bar shows the count and its proportional width relative to the total:
| Band | Score Range | Color |
|---|---|---|
| Excellent | 9 -- 10 | Green |
| Good | 7 -- 8 | Blue |
| Acceptable | 5 -- 6 | Yellow |
| Weak | 3 -- 4 | Orange |
| Failing | 1 -- 2 | Red |
This section is hidden if no scored tasks exist in the selected period.
Score Trend
A vertical bar chart showing the daily average quality score over the selected period. Each bar represents one day, with its height proportional to the average score (out of 10). Bars are color-coded by score level: green (8+), yellow (6--7), red (below 6). Hovering a bar shows the date, average score, and task count for that day.
This section requires at least two days of data to display.
Performance by Rubric Type
Tasks are evaluated using different rubric types (research, strategy, creative, analytical, etc.). The dashboard shows a breakdown for each rubric type that has scored tasks:
- Rubric type name (e.g., "research")
- Task count -- how many tasks used this rubric
- Average score -- displayed as a proportional bar (out of 10), color-coded by score level
- Rerun rate -- percentage of tasks where the observer recommended a rerun. Shown in yellow if above 0%.
A high rerun rate may indicate that a particular type of task needs prompt or agent adjustments.
Task Rankings
Two ranked lists highlight the extremes:
- Top performers (green header) -- the five tasks with the highest average scores across the period.
- Needs improvement (red header) -- the five tasks with the lowest average scores.
Each task shows its name and a score bar. Use the bottom list to identify tasks that may benefit from revised prompts, different agent assignments, or additional context.
This section is hidden if no scored tasks exist.
Pattern Analysis
Admins can trigger a cross-tenant anonymized pattern analysis by clicking the Run Pattern Analysis button at the top of the dashboard. This analysis:
- Strips all client-specific data before processing.
- Runs across engagement results from all tenants.
- Identifies patterns such as consistently low scores in specific rubric types, common issues flagged by the observer, or recurring quality gaps.
- Creates improvement proposals based on its findings.
Pattern analysis runs in the background and shows a loading state ("Analyzing...") while processing.
Improvement Proposals
Improvement proposals are generated by pattern analysis, the observer model, or the red-team reviewer. Each proposal card shows:
| Field | Description |
|---|---|
| Title | Short summary of the proposed improvement. |
| Description | Detailed explanation of the issue and suggested fix. |
| Source | Where the proposal came from: pattern_analysis, observer, or red_team. |
| Date | When the proposal was created. |
From the analytics dashboard, you can:
- Approve a proposal (green button) to apply the suggested change.
- Reject a proposal (red button) to dismiss it.
Only approved proposals are applied. Nothing changes automatically. If no proposals are pending, the section displays "No pending proposals. Run pattern analysis to generate insights."
Quality Scores per Engagement
On each engagement detail page, a quality scores section shows per-task scoring information in expandable cards. Each card shows:
Collapsed view:
- Score badge (color-coded: green for 8+, yellow for 6--7, red for below 6)
- Task name
- Rubric type
Expanded view:
- Criteria scores -- per-criterion mini bars showing how the task performed on individual scoring dimensions.
- Insights (green-tinted section) -- observations extracted by the observer about what worked well.
- Issues (red-tinted section) -- problems or weaknesses identified.
- Rerun recommendation -- if the observer recommends rerunning the task, a yellow banner is displayed.
The quality scores section is hidden entirely if no scored tasks exist for the engagement.
Scoring Details
- Scores range from 1 to 10 per task.
- Scoring is performed by the observer model (GPT-4o-mini), which evaluates each task output against rubric criteria after the task completes.
- Scores are stored in the
TaskScoretable and aggregated for dashboard display.
Related Guides
- Engagements -- creating and running engagements that produce scored deliverables
- Deliverables -- viewing deliverable content, versions, and approval status