← All stories

BRANCH · ef-062-offline-roster-sync

Offline Roster & Sync Monitor

EF-062 Persona: Event-day staff + organizer monitor Stage: Day-of (offline-first) Roots in: native-app-shell

Large-venue events assume unreliable network. Roster downloads at event entry, persists encrypted-at-rest, refreshes on demand + via background fetch. Staff scans queue locally and replay in chronological order on reconnect. The organizer sees a sync monitor in admin showing per-device queue depth + last-seen timestamps — answer to "is my staff checking people in or are their phones lying?"

Preconditions

  • Inherits native-app-shell trunk (signed in, event roster bootstrapped).
  • Event has at least 100 registrations (offline contract is most-tested at scale).
  • Local encrypted storage (iOS Keychain / Android EncryptedSharedPreferences) provisioned.

Happy path

  1. Roster download at event entry.

    First time staff opens an event, full roster downloads in chunks of 500 with progress display (uses ui-progress-bar equivalent). Each row includes guest_id, name, email, access_type_id, status, check_in_note, arrival_alert_json. After download, mark "ready for offline" with timestamp.

  2. Roster refresh: pull-to-refresh + bg-fetch.

    Pull-to-refresh fetches a roster delta (rows changed since last sync). Background fetch (iOS BGAppRefreshTask, Android WorkManager) runs every ~15 min when app is backgrounded; uses delta endpoint.

  3. Going offline.

    Native-app-shell offline banner activates. Check-in flow remains functional. Scans queue locally with idempotency-key (deviceId + guestId + scanTimestamp). Pending-sync banner increments.

  4. Reconnect — replay in chronological order.

    When network returns, queue replays one row at a time in the order they were recorded (preserves audit timeline). Server dedupes by idempotency-key so replays are safe even if some leaked through earlier. Pending banner decrements per success.

  5. Organizer sync monitor.

    Admin → Event → Sync Monitor shows: per-device row with device-id (fingerprinted), staff name, last-seen timestamp, last-roster-fetched, pending-sync count. Devices not seen in 10+ minutes get a yellow "stale" pill; 30+ minutes → red "offline" pill.

Failure modes

Initial download interrupted

Trigger: roster download starts but network fails partway through.

Resumable download — each chunk has a server-side cursor + a continuation token. Resume picks up from the last completed chunk. Until 100% complete, app shows "Roster incomplete — connect to network to finish" instead of allowing scans (avoid scanning against partial roster). Harness: stub network to fail at 50%, reconnect, resume to 100%.

Roster delta after row deleted server-side

Trigger: organizer cancels a registration while staff devices have it cached.

Delta endpoint returns tombstones for deleted rows. Local store removes the row + adds it to a per-device "deleted-since-cache" set so a subsequent scan of that guest's QR shows "Registration cancelled — refer to organizer." Harness: cancel registration server-side, fetch delta, scan cancelled guest, banner shows cancelled.

Roster delta after row added server-side

Trigger: organizer adds a walk-in registration after staff started the event with a stale roster.

Delta endpoint returns the new row. If a delta hasn't fetched yet, the staff's manual-entry autocomplete won't find the guest; app shows "Guest not in roster — refresh and try again" with a refresh CTA. Harness: server-side add new guest, delta not yet fetched, attempt manual-entry, refresh, find guest.

Replay preserves chronological order

Trigger: 10 offline scans queued at scan_ts T0..T9; reconnect at T10.

Replay sends each in order. Server's audit log shows check-ins at scan_ts in original order. Harness: 10 offline scans with synthetic timestamps, reconnect, server audit log timestamps match.

Replay during second offline window

Trigger: queue has 5 pending; reconnect briefly; replay 2 succeeds; network drops again with 3 still pending.

Successful sends are removed from queue. Failed (network-dropped mid-send) stay in queue. Subsequent reconnect resumes from the remaining 3. Harness: stub network with brief reconnect, assert 2 sent + 3 still queued + final reconnect drains all.

Local storage encrypted at rest

Trigger: device is lost or stolen; attacker boots into developer mode.

Roster + pending queue stored in iOS Keychain (kSecAttrAccessibleAfterFirstUnlockThisDeviceOnly) / Android EncryptedSharedPreferences. Cannot be read by a USB-attached debugger without device PIN. Harness: physical device inspection with debugger, assert storage path returns encrypted blobs (not plaintext).

Sync monitor — stale device

Trigger: a staff device hasn't reported in 12 minutes.

Organizer sync-monitor row shows yellow "Stale" pill. After 30 minutes — red "Offline" pill with last-seen timestamp. Notification optional (organizer-configured): "Device [name] hasn't synced in 30 minutes." Harness: stub device with last_seen=12min ago, monitor shows yellow; 30min ago, red.

Sync monitor — no staff devices online

Trigger: organizer opens monitor; all staff devices show offline.

Empty/offline state has clear copy: "No devices have synced in the last 30 minutes. If your event is live, this may indicate a network problem at the venue." Includes a "Send urgent message to staff" deeplink to ops contact. Harness: stub all devices stale, monitor shows distinct empty state.

Pending queue swells beyond reasonable

Trigger: device has 500+ pending check-ins (network has been out for hours).

App surfaces a warning banner: "500 check-ins pending sync. The longer you stay offline, the slower reconnection will be." Replay still functions — it's chunked (50 at a time) to avoid swamping server on reconnect. Harness: stub 500 pending, banner visible; reconnect, server receives in chunks of 50.

App force-quit during replay

Trigger: replay is mid-flight; user force-quits the app.

In-flight requests are cancelled (the requests that landed server-side complete; the ones in-progress may be partial). Idempotency-key dedupe means re-replay on relaunch is safe. Pending queue persists across force-quit (per native-app-shell trunk contract). Harness: simulate mid-replay force-quit, relaunch, queue resumes, server has no duplicates.

Cross-tenant device fingerprint collision

Trigger: same physical device used at multiple tenants' events; sync-monitor must scope to the current tenant.

Device fingerprint is tenant-scoped (hash of device_id + tenant_id). Cross-tenant accidental collision is impossible by construction. Sync monitor only shows devices reporting under current tenant context. Harness: register same device under tenant A + tenant B, monitor in tenant A doesn't show tenant B's check-ins.

Stable test attributes

identifierWherePurpose
roster-download-progressInitial syncui-progress-bar style
roster-incomplete-bannerUntil 100% completeBlocks scans
roster-refresh-pullPull-to-refresh on scan tabFetches delta
roster-stale-row-bannerIf scan finds no row"Refresh and try again"
cancelled-guest-bannerScan of cancelled guest"Refer to organizer"
pending-queue-warning-banner500+ pending"Reconnection will be slow"
sync-monitor-pageAdmin → Event → Sync MonitorOrganizer-side dashboard
sync-monitor-device-rowPer deviceLast-seen + queue depth + status
sync-monitor-stale-pillPer stale deviceYellow at 10min, red at 30min
sync-monitor-empty-stateAll devices offline"No devices synced recently" copy

Agent test plan

Probe list
- (manual) initial-roster-download-resumable: stub fail at 50%, reconnect, resume to 100%
- (manual) roster-incomplete-blocks-scan: until 100%, scan disabled
- (manual) delta-fetch-removes-cancelled-row: cancel server-side, delta, scan shows cancelled
- (manual) delta-fetch-adds-new-row: add server-side, delta, manual entry finds guest
- (manual) replay-chronological: 10 offline scans, reconnect, server timeline preserved
- (manual) replay-resume-after-second-offline: stub brief reconnect, queue partially drains, drain completes after second reconnect
- (manual) local-storage-encrypted: device debugger inspection, storage is opaque
- sync-monitor-stale-pill-at-10min: stub last_seen=10min, yellow pill
- sync-monitor-offline-pill-at-30min: stub last_seen=30min, red pill
- sync-monitor-empty-state-all-offline: stub all stale, distinct empty state visible
- pending-queue-warning-at-500: stub queue=500, banner visible
- replay-chunked-on-reconnect: stub 500 pending, server receives in chunks of 50
- (manual) force-quit-during-replay-no-duplicates: relaunch resumes, no dup server rows
- cross-tenant-device-scoping: same device in two tenants, monitor only shows current tenant's check-ins
- audit-log-replay-preserves-scan-ts: server's check-in row uses client scan_ts (not server's now-at-replay-time)