← All stories

BRANCH · ef-100-api-access-flag

API Access Feature Flag & Token Lifecycle

EF-100 Persona: Tenant admin + integration developer Stage: Public API productization Roots in: admin-shell-access Matrix=Not Ready

Tenant-level enablement for the public Voyage API. Without the feature flag, public-API endpoints return 404 (anti-probing — no leak that the API exists). With it, tenant admin issues scoped API tokens, tracks usage against a per-tenant rate-limit, sees audit log of every token-authenticated request, and can revoke tokens with cascading session invalidation. Tier-3 tightening: per-token request history now references ui-audit-log-viewer for the audit display foundation.

Preconditions

  • Tenant has a contracted SKU including api_access.enabled feature flag.
  • Tenant admin signs in to the admin shell.
  • Public API endpoints are deployed (independent of feature flag — flag controls access, not deploy).

Happy path

  1. Tenant admin opens Settings → API Access.

    Section gated by api_access.enabled. If disabled, section is hidden entirely; if enabled, page renders: existing tokens list (paginated ui-data-table with token name, scopes, created-by, last-used-at, status pill), "Create token" CTA, usage chart (calls per day, latency p50/p95/p99), rate-limit summary (calls remaining this period).

  2. Admin creates a new token.

    Modal: name (required, human-readable), scopes (multi-select via ui-permissions-matrix-style picker — read-events, write-events, read-guests, write-guests, read-reports, etc.), optional expiry (default 365 days), optional IP allow-list (CIDR). On submit, server creates token, returns the token string ONCE, never again.

  3. Token is displayed once.

    Modal shows the full token in a copy-to-clipboard region with explicit "This is your only chance to copy this token. Store it securely." Copy button works; click anywhere else to dismiss returns to the list. Server stores only a hash; the original cannot be retrieved later.

  4. Developer uses the token.

    Token in Authorization: Bearer <token> header. Server rate-limits per token + per tenant. Each request writes an audit log row (request method, path, status, response time, IP — IP truncated to /24 v4 / /48 v6 for privacy). Tokens scope to the configured abilities; out-of-scope requests return 403.

  5. Admin views usage.

    API Access dashboard shows per-token usage. Drill-in shows audit log (paginated; older than 90 days summarized). Usage chart shows latency tiers and rate-limit consumption.

  6. Admin revokes a token.

    Per-row revoke uses ui-destructive-confirmation. On confirm, token's hash record transitions to revoked. Within 60s (token cache TTL), all requests with that token return 401. Audit log row written for the revoke event.

Failure modes

Feature gated by SKU — anti-probing 404

Trigger: tenant on free tier (no api_access.enabled). External request hits a public API endpoint with any token.

Server returns 404 (NOT 403 or 402 — anti-probing avoids leaking that the API exists). Audit log captures the attempt. UI hides the API Access settings section entirely. Harness: stub free tier, dispatch GET /v1/api/events with bearer token, assert 404; admin shell — assert "API Access" menu item hidden.

Token displayed only once

Trigger: admin creates token; modal closes; admin needs the token again.

Server stores only the SHA-256 hash. Re-fetching the token row returns metadata only — never the secret. Admin must create a new token if lost. UI explicitly warns at creation. Harness: create token, navigate away, attempt to re-fetch, assert response has no plaintext token field.

Token in URL query string

Trigger: developer (or our SDK) accidentally puts the token in ?api_key= instead of the Authorization header.

Server REJECTS tokens in query strings (returns 400 TOKEN_IN_QUERY) — never accepts. Logs/CDN access logs would otherwise leak the token. Documentation explicitly forbids it. Harness: dispatch with token in query, assert 400.

Out-of-scope API call

Trigger: token has read-events scope only; developer attempts POST /v1/api/events.

Server returns 403 INSUFFICIENT_SCOPE with body listing the scope required. Audit log row written. Harness: stub read-only token, attempt POST, assert 403, assert audit row.

Rate limit hit

Trigger: developer's token exceeds tenant's per-minute rate limit.

Server returns 429 with Retry-After header in seconds and X-RateLimit-Reset showing the next-reset time. Token-specific limit applies first; tenant-wide limit applies as a ceiling. UI dashboard surfaces the rate-limit hit visibly. Harness: exceed limit, assert 429 + Retry-After + Reset headers present.

IP allow-list bypass attempt

Trigger: token has IP allow-list configured (CIDR 10.0.0.0/24); request comes from outside that range.

Server returns 403 IP_NOT_ALLOWED. Logs the source IP (truncated for privacy). Harness: stub source IP outside allow-list, assert 403; stub IP inside, assert 200.

Token revocation cascades within 60s

Trigger: admin revokes a token; an in-flight request from a developer tool with that token comes in 30s later.

Token cache TTL is 60s. Within that window, the cached state may still allow some requests through (acceptable). After 60s, revoked-token requests return 401 TOKEN_REVOKED. The audit log captures both pre-revoke usage and post-revoke 401s. Harness: revoke at T0; dispatch at T+5 (may succeed); dispatch at T+90 assert 401.

Token expiry honored

Trigger: token reaches its expiry timestamp.

Expired tokens return 401 TOKEN_EXPIRED. Admin dashboard shows status pill = expired. Tokens auto-disabled at expiry; renewable only by creating a new token (no extend-in-place — forces audit). Harness: stub past expires_at, dispatch request, assert 401 TOKEN_EXPIRED.

Cross-tenant token forgery

Trigger: attacker forges a token shaped like a valid one, OR uses a real tenant-A token against tenant-B URLs.

Token format includes a tenant prefix. The hash lookup is tenant-scoped. Cross-tenant attempts return 404 (anti-probing on tenant existence). Forged tokens never match any hash. Harness: forge structurally valid token, assert 401; use real cross-tenant token against wrong tenant URL, assert 404.

Audit log per request

Trigger: developer makes 1000 API requests over an hour.

Each request writes one audit row with token_id, method, path, status, response_time_ms, source_ip (truncated), user_agent. Audit retention: 90 days hot (queryable via dashboard), 7 years cold (compliance-archived to R2 with HMAC-signed URLs). Harness: dispatch 100 requests, assert 100 audit rows.

Token reveal via error message

Trigger: a server-side bug causes a 500 with a stack trace that includes the token in scope.

Error responses NEVER include the token. Tokens are tagged with a Sentry/scrubber filter that redacts before logging. Generic 500 response: "Internal server error. Reference: [trace-id]." Harness: instrument a deliberate 500, assert response body does NOT contain the token, assert error log scrubs the token.

Bulk-revoke on tenant compromise

Trigger: tenant suspects compromise; admin clicks "Revoke all tokens" emergency button.

Bulk revoke transitions ALL active tokens to revoked. Single audit row per token + a summary "bulk_revoke_emergency" event with operator + reason. Within 60s (cache TTL), all tokens fail. Confirmation requires type-to-confirm "REVOKE ALL TOKENS" exact phrase. Harness: bulk revoke, assert N audit rows + 1 summary event, assert all tokens 401 within 60s.

Stable test attributes

data-testWherePurpose
api-access-pageSettings → API AccessHidden if api_access.enabled=false
api-access-tokens-listPageui-data-table with token name + scopes + status
api-access-create-token-ctaPage headerVisible when role has api:tokens:create
api-access-create-token-modalModalname + scopes + expiry + ip-allowlist
api-access-token-revealModal: post-createShows token ONCE; copy-to-clipboard
api-access-token-warningModal: post-create"Only chance to copy" warning
api-access-revoke-buttonPer rowui-destructive-confirmation
api-access-bulk-revokePage headerEmergency bulk revoke; type-to-confirm
api-access-usage-chartPagePer-token usage chart
api-access-rate-limit-summaryPageCalls remaining + reset-at
api-access-audit-logPage; drill-in per tokenPer-request rows; paginated
api-access-token-status-pillPer tokenActive | Expired | Revoked

Agent test plan

Probe list
- feature-gate-anti-probing-404: free tier, GET /v1/api/events, 404
- menu-hidden-when-disabled: free tier, api-access-page not in menu
- token-displayed-only-once: create token, navigate away + back, no plaintext token retrievable
- token-in-query-rejected: GET ?api_key=..., 400 TOKEN_IN_QUERY
- out-of-scope-call: read-only token POSTs, 403 INSUFFICIENT_SCOPE
- rate-limit-429-with-retry-after: exceed limit, 429 + Retry-After + X-RateLimit-Reset headers
- ip-allow-list-rejects: outside CIDR, 403 IP_NOT_ALLOWED
- ip-allow-list-allows: inside CIDR, 200
- token-revocation-cascades-within-60s: revoke + dispatch at T+90, 401 TOKEN_REVOKED
- token-expiry-honored: stub past expires_at, 401 TOKEN_EXPIRED
- cross-tenant-token-404: real cross-tenant token, 404
- forged-token-401: structurally valid forgery, 401
- audit-row-per-request: 100 requests, 100 audit rows
- error-response-no-token-leak: deliberate 500, response body lacks token; logs scrubbed
- bulk-revoke-type-to-confirm: emergency revoke, requires "REVOKE ALL TOKENS" exact phrase
- bulk-revoke-affects-all-tokens-within-60s: post-bulk, all tokens 401 within cache TTL
- audit-log-90-days-hot-7-years-cold: query 91-days-old log, comes from cold archive
- create-token-server-stores-hash-only: db inspection shows only sha256 hash, no plaintext