All updates
QABackend

Rolled Back QA Stress-Test Fixes for Security-Audit Persistence and Cached-Response Errors

PR #199StrangeNoobJun 18, 2026 · 10:35 UTC
QAJun 18, 2026

Executive summary

This change reverts a prior backend fix (PR #197) on the QA environment that had bundled two performance/reliability improvements: persisting security-audit records for elevated-access events, and preventing intermittent server errors on cached list responses. The revert returns the affected backend code to its previous state in QA so the bundled change can be reworked or re-validated before going further.

Why this was needed

The reverted work (PR #197) combined two distinct concerns into one change set during QA stress testing. Reverting on QA is the standard way to remove a recently merged change that needs rework or further validation, keeping the QA branch in a known-good state. The revert is a clean, mechanical undo of every line, file, and test that PR #197 introduced.

Client / user impact

No new user-facing capability is added by this change. Functionally, the QA environment returns to its pre-#197 behavior: the two issues that #197 targeted are present again in QA, namely (1) security-audit log entries for GLOBAL_SCOPE (elevated-access) events on certain admin/read endpoints were being rolled back rather than saved, and (2) the admin Rules list endpoint (/admin/api/rules) could intermittently return HTTP 500 under concurrent load when serving a cached response. These remain QA-only; this revert did not reach Production.

Technical scope

Reverts PR #197 in full (12 lines restored, 637 removed) across 8 files:

  • app/services/auth/tenant_scope_policy.py — removes the await write_db.commit() that persisted GLOBAL_SCOPE elevation audits, plus the per-request de-dup marker (_GLOBAL_SCOPE_AUDIT_KEY) that audited each (user, surface, reason) at most once.
  • app/utils/cache/manager.py — removes the systemic cache_result re-hydration helpers (_cached_return_model, _rehydrate_cached_value, _resolve_forward_ref) that coerced cache-hit dicts back into the declared Pydantic return type; cache hits again return raw dicts.
  • app/admin_portal/routers/admin_rules.py — removes the inline RuleListResponse.model_validate(...) guard in admin_list_rules.
  • Deletes 3 regression tests (test_global_scope_audit_dedup.py, test_admin_rules_cache_rehydration.py, test_cache_rehydration.py) and 2 design spec docs.

Risk & mitigation

Risk is moderate and reintroduced rather than newly created. Reverting re-exposes two known QA issues: lost security-audit rows for elevated-access events (an auditability gap on sensitive-data-access tracking) and possible 500s on the cached admin Rules list under load. Mitigation: the revert is a faithful, mechanical undo confined to QA (not Production), and the original design specs and regression tests should be restored alongside the eventual re-implementation so the fixes can be reapplied cleanly.

QA validation focus

  • Confirm the QA backend builds and the unit suite passes after the 3 deleted test files are gone.
  • Verify GET /admin/api/rules still responds correctly under normal load; expect the intermittent cache-hit 500 to potentially recur under concurrent load (regression is now re-present).
  • Spot-check that GLOBAL_SCOPE / elevated-access audit logging on the affected admin/read endpoints reflects pre-#197 behavior (audit rows may not persist).
  • Run make lint-scope to confirm tenant-scope policy integrity is unchanged by the revert.
  • Track the follow-up re-implementation of audit persistence and cache re-hydration before any promotion toward Production.