Calafai Docs

Troubleshooting

Troubleshooting

This guide covers common issues you may encounter when using the Groundtruth platform. Each section lists symptoms, likely causes, and resolution steps.

Engagement Run Failures

Run times out

Symptoms: Run status shows "failed" with a timeout-related error message. Deliverables are partially generated or missing.

Likely causes:

  • Engagement has too many tasks for the configured budget/timeout
  • An LLM provider is experiencing high latency
  • A task is stuck waiting for a dependency that failed silently

Resolution:

  1. Check the run logs (engagement detail page → Mission Control → Logs tab) for the last activity before the timeout
  2. If a specific task stalled, try reducing the engagement scope by removing or simplifying that task
  3. The engine's watchdog timeout is 30 minutes. If your engagement consistently needs more time, contact support
  4. Re-run the engagement — transient provider issues often resolve on retry

Budget exceeded

Symptoms: Run status shows "failed" with error containing "BUDGET" or "budget exceeded." Deliverables stop mid-generation.

Likely causes:

  • LLM costs exceeded the engagement's budget limit
  • Tasks using expensive models (strategy/writing tier) consumed the budget before cheaper tasks could run
  • Re-runs or retries doubled the cost

Resolution:

  1. Check the run's cost breakdown in the engagement detail page
  2. Increase the engagement budget in the engagement settings
  3. Consider reordering tasks so expensive strategy tasks run last (after cheaper analytical tasks confirm the direction)
  4. If using BYOK keys, costs may differ from platform estimates — check your provider dashboard

Run stalls (no progress)

Symptoms: Mission Control shows a task as "running" for an extended period with no new log entries or agent activity.

Likely causes:

  • The LLM provider is rate-limiting requests
  • The router's fallback chain exhausted all providers
  • A CrewAI agent entered a delegation loop

Resolution:

  1. Check the SSE stream (Mission Control feed) for the last agent.activity event
  2. If no activity for more than 5 minutes, stop the run and retry
  3. If the issue recurs on the same task, it may be a prompt or agent configuration issue — try simplifying the task description

API key invalid

Symptoms: Run fails immediately with an authentication or API key error.

Likely causes:

  • Platform API keys have expired or been rotated
  • BYOK keys were entered incorrectly
  • A provider has changed their API key format

Resolution:

  1. If using platform keys: this is a platform issue — contact support
  2. If using BYOK keys: go to Settings → LLM Keys and re-validate the failing provider's key
  3. Check that the key has the correct permissions (some providers require specific scopes)

Model unavailable

Symptoms: Run fails with error mentioning a specific model name and "unavailable" or "not found."

Likely causes:

  • A model has been deprecated by its provider
  • The xAI grok-4 model is temporarily unavailable (the platform will attempt fallback to Claude Sonnet)
  • Ollama models are only available in local development, not on Railway

Resolution:

  1. The router has automatic fallback chains — most model unavailability is handled transparently
  2. If the fallback also fails, retry after a few minutes
  3. If the issue persists, contact support — the model configuration may need updating

Deliverable Issues

Empty deliverables

Symptoms: A deliverable appears in the list but contains no content or only a filename.

Likely causes:

  • The CrewAI task produced output but the temp-file-to-database pipeline failed
  • The agent completed the task but didn't produce meaningful output
  • The task's output format specification was unclear

Resolution:

  1. Check the run logs for the specific task — look for output text in the task.completed event
  2. Re-run the engagement — the temp file issue is often transient
  3. If the problem recurs, review the task description in the engagement configuration — agents need clear output format instructions

JSON formatting in deliverables

Symptoms: Deliverable content contains raw JSON instead of formatted markdown.

Likely causes:

  • The agent returned structured data instead of a narrative deliverable
  • The task description didn't specify markdown output format
  • The formatting directive was overridden by a conflicting task instruction

Resolution:

  1. This is a known issue (backlog item #3)
  2. Edit the engagement and update the task description to explicitly request "formatted markdown output"
  3. As a workaround, the deliverable viewer renders JSON in a code block

Missing deliverables

Symptoms: Fewer deliverables than expected after a completed run.

Likely causes:

  • Some tasks were skipped due to budget gating
  • A dependency task failed, preventing downstream tasks from running
  • The observer recommended a re-run but the budget was exhausted

Resolution:

  1. Check the run status for skippedOperations — these are tasks the engine intentionally skipped to stay within budget
  2. Review the DAG view in Mission Control to see which tasks completed and which were skipped
  3. Increase the budget and re-run, or remove lower-priority tasks

Authentication Issues

Symptoms: Clicking a sign-in magic link shows "Link expired" or redirects to the login page.

Likely causes:

  • Magic links expire after 1 hour
  • The link was already used (single-use)
  • Email was delayed and the link expired before delivery

Resolution:

  1. Request a new magic link from the login page
  2. Check your spam folder — delayed emails may contain expired links
  3. If magic links consistently fail, try password-based login instead

Session timeout

Symptoms: You are redirected to the login page while using the platform, or API calls return 401.

Likely causes:

  • Supabase session tokens expire after the configured period
  • Browser cookies were cleared
  • Multiple tabs may have conflicting session states

Resolution:

  1. Log in again — your data and engagements are not affected
  2. If sessions expire too quickly, check that your browser isn't clearing cookies aggressively

SSO not working

Symptoms: SSO login redirects fail or produce a "provider not configured" error.

Likely causes:

  • SSO is an Enterprise plan feature — it is not available on Starter or Professional plans
  • The IdP metadata URL is incorrect or unreachable
  • SAML assertion attributes don't match expected fields

Resolution:

  1. Verify your plan includes SSO (Enterprise only)
  2. Check the IdP metadata URL is accessible from the public internet
  3. Contact support with the IdP provider name and configuration details

Billing Issues

Checkout fails

Symptoms: Stripe checkout session returns an error or the page doesn't load.

Likely causes:

  • Network connectivity issue to Stripe
  • The payment method was declined
  • Browser extensions blocking Stripe scripts

Resolution:

  1. Try again in a different browser or incognito mode
  2. Check that your payment method is valid and has sufficient funds
  3. Disable ad blockers or script blockers temporarily — Stripe requires JavaScript

Plan switching issues

Symptoms: After upgrading/downgrading, the old plan features still appear or new features are missing.

Likely causes:

  • Stripe webhook processing is delayed
  • The plan change is scheduled for the next billing cycle (downgrades)

Resolution:

  1. Wait a few minutes — webhook processing can take up to 60 seconds
  2. Refresh the page or log out and back in
  3. Check Settings → Billing to confirm the plan change was processed
  4. If the issue persists after 5 minutes, contact support

Credit exhaustion (Starter plan)

Symptoms: Run fails with a credit-related error. The Starter plan includes a monthly credit allowance.

Likely causes:

  • Monthly credit balance is depleted
  • A large engagement consumed remaining credits

Resolution:

  1. Check Settings → Billing for current credit balance
  2. Wait for the next billing cycle to receive new credits
  3. Upgrade to Professional for higher credit limits
  4. Review recent run costs to understand credit consumption

Client Portal Issues

Symptoms: Client receives "invalid token" or a blank page when accessing the portal URL.

Likely causes:

  • The token has expired (if an expiration date was set)
  • The token was revoked by the tenant admin
  • The URL was copied incorrectly (missing characters)

Resolution:

  1. Check the Portal Token Manager on the engagement detail page for token status
  2. If the token is expired or revoked, create a new one and send the updated link
  3. Verify the full URL was copied — portal tokens are long strings

Client can't comment or approve

Symptoms: Client can view deliverables but the comment or approve/reject buttons are missing.

Likely causes:

  • The token was created with view_only permission level
  • The token was created with view_comment but approve/reject requires view_approve

Resolution:

  1. Check the token's permission level in the Portal Token Manager
  2. Create a new token with the appropriate permission level
  3. Send the new portal URL to the client

Branding not applied

Symptoms: The portal shows default Groundtruth branding instead of the tenant's custom logo and colors.

Likely causes:

  • Tenant branding settings haven't been configured
  • The logo URL is inaccessible (CORS or broken link)

Resolution:

  1. Go to Settings → Branding and verify your logo URL and primary color are set
  2. Test the logo URL directly in a browser — it must be publicly accessible
  3. Refresh the portal page — branding changes take effect immediately

Attachment Issues

Upload fails (size limit)

Symptoms: File upload returns an error about file size.

Likely causes:

  • Files are limited to 50MB per attachment
  • The Supabase storage bucket has a size limit

Resolution:

  1. Reduce file size (compress images, simplify documents)
  2. For large datasets, upload a summary or representative sample instead

Parse failure

Symptoms: Attachment uploads successfully but parsing fails or returns an error.

Supported formats: PDF, XLSX, DOCX, PPTX, CSV, PNG, JPG, JSON, TXT, and plain text.

Likely causes:

  • The file is password-protected or encrypted
  • The file is corrupted or uses an unsupported variant of the format
  • Image files are too large for the PIL parser

Resolution:

  1. Remove password protection from the file and re-upload
  2. Try converting to a different format (e.g., DOCX → PDF)
  3. For spreadsheets, try exporting as CSV if XLSX parsing fails

Attachment summaries missing

Symptoms: Attachment appears in the list but the summary column is empty.

Likely causes:

  • The LLM summarization step failed (API error)
  • The parsed text was empty (image-only PDF, for example)

Resolution:

  1. Try re-parsing the attachment from the engagement detail page
  2. For image-only PDFs, OCR is not currently supported — convert to text-based PDF first

BYOK Key Issues

Validation fails

Symptoms: Adding a BYOK API key shows a validation error.

Likely causes:

  • The key format is incorrect for the provider
  • The key doesn't have the required permissions
  • The provider's validation endpoint is temporarily unavailable

Resolution:

  1. Double-check the key was copied completely (no leading/trailing spaces)
  2. Verify the key works in the provider's own dashboard or playground
  3. Check provider-specific format requirements:
    • OpenAI: Starts with sk-
    • Anthropic: Starts with sk-ant-
    • xAI: Check xAI dashboard for key format
    • Google: Service account JSON or API key
    • DeepSeek: Check DeepSeek dashboard
    • Mistral: Check Mistral dashboard

Provider errors with BYOK keys

Symptoms: Runs fail with provider-specific errors when using your own API keys.

Likely causes:

  • The key has been rotated or revoked at the provider
  • The key has insufficient quota or billing limits
  • The key doesn't have access to the specific model being requested

Resolution:

  1. Verify the key is still valid in the provider's dashboard
  2. Check the provider's usage/billing page for quota limits
  3. Some models require specific API access levels — verify the key has access to the model being used

Key rotation

Symptoms: Need to update a BYOK key after rotation.

Resolution:

  1. Go to Settings → LLM Keys
  2. Remove the old key for the provider
  3. Add the new key and validate it
  4. BYOK keys are encrypted at rest — old keys are permanently deleted on removal

Getting Help

Collecting information for a support request

When contacting support, include:

  1. Engagement name and slug — found on the engagement detail page
  2. Run ID — found in the run details or URL
  3. Timestamp — when the issue occurred (with timezone)
  4. Error message — exact text from the UI or logs
  5. Steps to reproduce — what you were doing when the issue occurred

Where to get help

  • Platform issues: Contact support via the Settings page
  • Billing issues: Contact support or manage directly through the Stripe billing portal (Settings → Billing → Manage)
  • Security concerns: Report immediately via the security contact in the platform footer

On this page