Data handling

What the upload script collects

AI coding agent transcripts

Claude Code — JSONL files from ~/.claude/projects/. The Docker container reads these locally to generate narratives and extract behavioral signals. Only the processed output leaves your machine: per-session narratives (~2,000 characters each), bounded tool-use summaries (session_events — file paths + truncated command text + action types, capped at ~3,000 events per session with text fields shortened within each event), a 200-character excerpt of your first prompt for each session, user-highlight excerpts (representative quotes drawn from your prompts, capped at 10,000 characters per session), steering traces (counts and timestamps of course-corrections), and dispatch metadata for subagent tasks (task descriptions ≤200 characters, passed through the credential redactor). Raw conversation history, full prompts, full agent responses, and full tool outputs do not leave your machine.

Cursor IDE — extracted from ~/Library/Application Support/Cursor (macOS) or the equivalent on Linux/Windows, and merged into the same pipeline.

Codex CLI — JSONL files from ~/.codex/sessions, uploaded and analyzed alongside Claude Code sessions.

Cursor IDE and Codex CLI sessions are restricted to sessions whose git remote matches the selected project, unless you pass --all.

Git repository data (unless --no-repo)

Your source code, file contents, and diffs stay on your machine. The Docker container mounts your repo read-only for on-device code quality analysis; only aggregate metrics (file counts, language ratios, complexity scores) are uploaded.

The following git metadata is uploaded to the server: per-author numstat totals (insertions/deletions per commit), velocity signals (commits/day, LOC/day, active days), commit metadata for up to ~1,000 recent commits (sha, short sha, author name, author email, date, subject — no diffs, no file contents), and your git remote URL. Use --no-repo to skip all repo analysis entirely.

Metadata sidecar

Git remote URLs, local directory paths, and PR links extracted from transcripts.

What the script removes before sending

Paxel applies these redaction patterns to all content destined for upload (decision text, session events, narratives, first-prompt excerpts, user-highlight quotes, and commit subjects) inside the Docker container, before any data leaves your machine — both at the source as each field is extracted and again, fail-closed, at the upload boundary. The same patterns also run server-side on persisted LLM payloads as defense-in-depth:

Pattern	What it catches
sk-ant-*	Anthropic API keys
sk-*	OpenAI API keys
sk_live_, rk_live_	Stripe secret keys
AKIA*	AWS access keys
AC, SK	Twilio Account / API-key SIDs
gh[pousr]_*	GitHub tokens (PATs, OAuth, fine-grained)
xoxb-, xapp-	Slack tokens
hf_*	HuggingFace tokens
npm_*	npm tokens
pypi-*	PyPI tokens
yk_*	YC / Paxel API tokens
AIza*	Google API keys
1//0*	Google OAuth refresh tokens
AccountKey=*	Azure storage keys
eyJ.eyJ.*	JSON Web Tokens (JWTs)
Bearer *	Bearer authorization tokens
-----BEGIN * PRIVATE KEY-----	PEM private keys (RSA/EC/OpenSSH/PKCS#8)
postgres://, redis://, … with credentials	Database connection strings
API_KEY=, SECRET_KEY=, etc.	Environment variable assignments

Excluded from code quality analysis

The on-device code quality analyzer skips: node_modules vendor .git build dist tmp log

How to limit what you send

--no-repo	Skip repo mounting and all git analysis. Only transcripts are analyzed and uploaded.
--since 2m	Only include sessions from the last 2 months (supports days, weeks, months).
--project NAME	Select a specific project by repository name instead of all projects.
--no-sentry	Disable client-side error reporting to Sentry for this run.

Account data

Email. Used for magic link authentication. Tokens are SHA256-hashed at rest, expire after 15 minutes, and are single-use.

Session cookie. _paxel_session, 1-week expiry. HttpOnly, Secure, SameSite=Lax.

API tokens. Used for Docker client authentication. SHA256-hashed at rest. Admins can revoke tokens and set usage limits.

What we generate from your data

LLM narratives. High-level behavioral summaries of each session, generated by Anthropic's Claude or OpenAI's GPT models (Paxel's analysis models, routed through a proxy).
Behavior scores. Numeric scores across 5 axes: Execution Leverage, Steering, Engineering Quality, Product Thinking, Planning.
Decision patterns. Structured records of how you directed the AI during coding sessions.
Most-questionable-prompts surface. For each upload, an LLM picks up to 3 of your decision-text user directives that read as vague or scoped-too-loosely and writes a short one-line reason per pick. Stored as {prompt, reason} pairs on the upload row and shown on your profile page under "Your most questionable prompts". Subject to the same admin-access and retention rules as your other upload data.
Subagent dispatch metadata. When you run Claude Code's Task or Agent tool to spawn a subagent, we record counts of dispatches, returns, the subagent's run_in_background flag, and a short dispatch description (≤200 chars, passed through the same redactor as decision text to strip code identifiers and file paths). The raw dispatch prompt is never uploaded — only the first 12 hex characters of its SHA-1 hash, used as a fallback identifier for matching subagent sessions to their parent when the parent session was filtered out upstream.
Evidence excerpts. Transcript and commit excerpts with vector embeddings, used for search and analysis.
Episode groupings. Sessions grouped into coherent work episodes.
Commit group analysis. Git diffs grouped and reviewed by LLM for code quality signals.
LLM call logs. Every LLM call Paxel makes — on behalf of your upload (via the client proxy) and for internal system tasks — is recorded on Postgres. Prompts, responses, and row metadata (model, token counts, cost, HMAC nonce, timestamps) all live on the same table. Retention schedule is in Section 10.
Upload error messages. When an upload fails, we store the normalized exception message and — for LLM proxy errors whose response body was not our usual JSON envelope (e.g. an infrastructure-layer "Forbidden" response) — a scrubbed preview of that response body. Total stored length is capped at 500 characters. Scrubbing removes Anthropic/OpenAI API keys, GitHub tokens, Bearer tokens, YC tokens, JWTs, email addresses, and IPv4 addresses before storage. These error messages are visible to you on your results page and to YC admins.

Local cache on your machine

The Paxel upload script keeps two kinds of working data on your machine. The LLM-result cache (which avoids re-billing identical prompts across runs) lives in a Docker-managed named volume (paxel-cache-<your-uid>) — reachable only through your Docker daemon, not a file in your home directory; clear it any time with --clean. When an upload fails after three retries, a stashed copy of the upload payload is written to ~/.paxel/data/pending-uploads/<id>.json.gz. The stash contains the same narratives, scores, decisions, session events, steering traces, and dispatch metadata that would have been sent to our server, plus a SHA-256 fingerprint of the API token used to create it — never the raw token. Stash files are mode 0600 inside a mode-0700 directory. Your next upload automatically replays a stashed upload and exits; a subsequent run proceeds with a fresh analysis. Stashes older than 14 days, or that cannot be replayed (token changed, server rejected, endpoint changed), are moved to a pending-uploads/failed/ quarantine subdirectory alongside a short .error.json marker that records the quarantine reason and — for server-rejected stashes — the first 500 characters of the server's response body. To remove all pending and quarantined stashes, re-run the upload command with the --clear-pending flag.

Third-party services

Anthropic Claude API	Transcript text sent for behavioral analysis. Requests are routed through both Anthropic's API and Microsoft Foundry. Processes condensed conversation excerpts (not source code files).
OpenAI GPT API	Transcript text sent for behavioral analysis. Requests are routed through both OpenAI's API and Microsoft Foundry. Processes condensed conversation excerpts (not source code files).
Google AI Studio (Gemini)	Code evidence and transcript excerpts sent for vector embedding generation. Used for semantic search in the chat feature. No data is stored by Google.
Mailgun	Email delivery only (magic link authentication). We send your email address; Mailgun delivers the message.
Google Fonts	Loaded browser-side. Exposes your IP address and user agent to Google.
Amazon Web Services	Application hosting (ECS Fargate, us-west-2 Oregon). Manages web server, background workers, database (RDS PostgreSQL), Redis (ElastiCache), and file storage (S3).
Cloudflare	Proxies both paxel.ycombinator.com (the main app — login, results, admin, the curl\|bash upload script, and the CLI device-auth flow) and paxel-llm.ycombinator.com (the LLM proxy that handles per-call requests from the Docker client to the configured LLM providers — Anthropic, OpenAI, or Microsoft Foundry). TLS terminates at Cloudflare for both, so it sees plaintext request and response bodies in transit — including LLM prompts and model responses (Claude, GPT, or provider-routed) on the proxy domain, and login/results/admin traffic on the main domain. Several user-private values are passed in URL query strings or paths and therefore appear in Cloudflare's request logs alongside other URL metadata: CLI device-auth codes (?code=, 8 chars, valid for 30 minutes), personalized upload-script API tokens (?token= on /upload.sh, the value lasts as long as the API token does — typically 90 days), and magic-link login tokens (in the URL path on /auth/verify/:token, single-use and valid for 15 minutes). Cloudflare applies its WAF and bot-detection rules, and logs request metadata (IPs, paths, status codes, security events) per Cloudflare's standard logging. Cloudflare's standard request logs do not include body content. Used for DDoS mitigation and WAF protection.
Sentry (server-side)	Error tracking in production only. On LLM-pipeline errors, we attach your upload slug, your account email (so we can proactively reach out about a failure), and — when the error is tied to a specific transcript session — that session's identifier. For LLM proxy errors whose response body was not our usual JSON envelope, a scrubbed preview of the response body is attached as event extra (same 500-character cap and same redaction list as the first-party storage above: Anthropic/OpenAI keys, GitHub tokens, Bearer tokens, YC tokens, JWTs, emails, IPv4 addresses). When an upload's /api/v1/results submission is rejected by server-side nonce verification (the anti-tampering check that compares each submitted nonce to its matching proxy log), we also fire a low-severity event with your API token's database id, your user id, the stable reason code (e.g. no_nonces_matched), the upload's idempotency key, and the counters that drove the decision (submitted/matched/mismatched counts and up to five truncated request-id identifiers from the unmatched sample). No nonce values, prompts, transcripts, or response content are attached. These are the same identifiers we would use to look you up in our own admin UI; they are not shared with third parties.
Sentry (client-side)	Exception class, message, and stack traces sent from the Docker container on your machine when the pipeline errors. Before sending, the client redacts home paths, API tokens, email addresses, git remote URLs, database connection strings (postgres/redis/mongodb/mysql/mssql/amqp URLs with embedded credentials), and long strings; strips OS/device/runtime context and local variables; and drops HTTP request bodies so prompts and transcript content never leave your machine via error telemetry. The upload slug and the failing transcript session's identifier are attached as tags so we can correlate a reported error to your upload and — if needed — email you to retry. No API tokens are attached. For LLM proxy errors where the response body was not our usual JSON envelope (e.g. an infrastructure-layer "Forbidden" response), a scrubbed preview of the response body — capped at 500 characters, with Anthropic/OpenAI keys, Bearer tokens, YC tokens, JWTs, emails, and IPv4 addresses redacted — is attached as event extra so we can diagnose the failure without asking you to share logs. The response body's content-type and a boolean "preview-present" flag are attached as low-cardinality tags. When the final upload POST to /api/v1/results is rejected (any 4xx) or fails after all retries (network, timeout, or 5xx), we also fire an event with the HTTP status, the server's "error" field (if the response was JSON), the idempotency key for this upload attempt, and a preview of the response body truncated to 1,000 characters. This response body is a structured Rails JSON error (e.g. {"error": "Verification failed: ..."}) or, for edge cases like a CDN intermediate page, a short HTML string; it never contains your upload payload, prompts, transcripts, or nonce values. The DSN is baked into published images so telemetry is on by default. Disable per-run with --no-sentry.

Who can see your results

You. Results pages require login. Only the account that uploaded the data can view the results.

YC admins. Employees with @ycombinator.com email addresses have admin access to all uploads.

Chat conversations. Stored server-side and tied to your upload. Same access rules apply.

Security

HTTPS with HSTS enforced on all connections
Secure, HttpOnly, SameSite cookies
API tokens and magic link tokens SHA256-hashed at rest
Rate limiting: 100 requests/hour per IP, 500 LLM calls/day per token
Sensitive parameters filtered from server logs
CORS restricted to application origin
Anti-gaming: 5-layer verification system (HMAC nonces, score re-derivation, anomaly detection)

What we do not collect

No source code files, working-tree snapshots, or git archive tarballs
No per-commit diffs or patch content
No raw transcript JSONL — we upload narratives and bounded-length session events, not the original conversation history
No analytics services (no Google Analytics, Mixpanel, Segment, Hotjar)
No tracking pixels or beacons
No browser fingerprinting
No cookies beyond the session cookie
No third-party ad or marketing trackers
localStorage is used client-side only (chat UI state)

Data retention and deletion

Upload data (scores, narratives, metrics) is stored indefinitely. When an upload is deleted, its associated data is cascade-deleted: projects, sessions, episodes, decisions, evidence chunks, chat conversations, and internal LLM calls.

LLM call payloads (prompts and responses) are stored on Postgres and are nulled out on a tiered schedule by an hourly job: proxy payloads (from client-pipeline LLM calls) are eligible for deletion 48 hours after successful verification, or 14 days if a submission never lands; internal system-call payloads are eligible at 30 days. Actual deletion runs once per hour, so the effective window is up to one additional hour. Payloads flagged for active fraud or legal investigation are retained until the hold expires or is explicitly released by an admin; a newly-banned API token holds its payloads for 180 days past the ban expiry set at the time of the ban.

Metadata rows (request id, HMAC nonce of the response, API token id, model, token counts, cost, timestamps, rejection reason) are retained indefinitely on Postgres for fraud review and anti-replay detection. Client IP addresses on proxy logs are retained for 90 days, then cleared — unless the row is under an active fraud, legal, or ban-linked hold, in which case the IP is preserved alongside the payload until the hold expires or is released.

Erasure. Proxy logs are scoped to your API token, not to an individual upload, so deleting an upload does not cascade to them. To erase your data ahead of the scheduled retention windows, email us at the address below. On request we immediately purge every LLM call's content — prompts, responses, system prompts, the stored client IP, and the cold-storage payloads — and delete your uploaded reports and your shared score projection. The content-free metadata described above (model, token counts, cost, timestamps) is retained: it no longer holds your content and it underpins our cost and abuse accounting. If you ask us to delete your account, we additionally anonymize the account itself — your email, profile, and git identity are removed and your API tokens are revoked — so nothing that identifies you remains.

Contact

Questions about privacy or data handling: oss@ycombinator.com

Example: what gets uploaded to YC

Anonymized example of the JSON payload sent from the Docker container. No source code, no file contents, no raw transcripts.

{
  "episode_scores": [
    {
      "title": "Authentication refactor",
      "scores": { "throughput": 7.5, "steering": 8.0, "eng_quality": 7.0, "product_thinking": 6.5, "planning": 7.0 },
      "confidence": "high"
    }
  ],
  "session_summaries": {
    "abc123": "Developer refactored auth middleware across 3 sessions. Started with a clear plan, tested edge cases, caught a regression early..."
  },
  "session_metadata": [
    { "session_id": "abc123", "started_at": "2026-04-15T09:12:03Z", "duration_minutes": 47, "project": "acme" }
  ],
  "git_remote": "git@github.com:acme/service.git",
  "recent_commits": [
    { "sha": "a1b2c3def4567890", "short": "a1b2c3d", "author": "Avery Builder", "email": "avery@acme.example", "date": "2026-04-15T09:12:03Z", "subject": "Refactor auth middleware" }
  ],
  "decisions": [
    { "title": "Chose JWT over session cookies", "rationale": "Rejected session cookies because the app is stateless. [identifier] handles refresh; [path] stores the decision log." }
  ],
  "git_metrics": {
    "velocity": { "loc_per_day": 3200, "commits_per_day": 12 },
    "total_commits": 47,
    "loc_stats": { "test_ratio": 1.2 }
  },
  "client_telemetry": {
    "pipeline_duration_s": 612,
    "llm_stats": { "total_calls": 84, "total_cost_cents": 31 },
    "system_info": { "ruby_version": "3.4.8", "rails_env": "production", "pid": 4321 }
  },
  "nonces": ["req_001:a1b2c3d4", "req_002:e5f6g7h8"]
}

How we handle your data