Backend Concurrency, Connection-Pool, and Grid Performance Hardening (QA)
Executive summary
A backend stability and performance release reached QA. It reworks how the API handles database concurrency and connection pooling, makes default grid sorting align with database indexes, and reduces per-request overhead from logging and monitoring. No user-facing features change; the goal is faster, more reliable list views and fewer connection-related errors under load.
Why this was needed
Under concurrent load the API risked exhausting the database connection limit, and some list endpoints (mails, batches, documents, folders) used default sort columns that did not match existing indexes, producing slower queries. Hard-coded assumptions (a fixed 103-connection ceiling, always-on file logging, always-on performance monitoring, and pool_pre_ping even behind PgBouncer) added avoidable overhead and made the service harder to tune per environment.
Client / user impact
- List/grid views for mails, inbox, batches, documents, document types, folders, and organizations should load faster because default sorts now use indexed columns.
- Fewer connection-pool exhaustion errors under heavy concurrent usage, with limits now configurable per environment.
- Lower request latency and noise from logging/monitoring that are now opt-in outside development.
- Permission and user-context checks are de-duplicated within a single request, reducing redundant database queries.
Technical scope
Grounded in the diff (+1579 / -5398 across 44 files):
- New
app/services/db_concurrency.py: helpers (dms_execute,identity_execute,with_new_dms_session, etc.) so parallel batch loaders run on separate sessions instead of sharing one. - New
app/utils/request_cache.py+ middleware reset inmain.py: request-scoped memoization forget_user_context,get_entity_type, accessible-entity IDs, and permission sets indynamic_permission_service_optimized.py. grid_columns.py: index-friendly default sorts per module (e.g. mails/batches/documentscreated_on desc, folders/document typesname asc); strips self-referencing filters in column-value discovery.database.py/pool_monitor.py: configurableDB_MAX_CONNECTIONS/DB_RESERVED_CONNECTIONS(removes hard-coded 103), disablespool_pre_pingwhenUSE_PGBOUNCER.main.py/logging_config.py/monitoring.py: request logging, file logging, and performance middleware/monitoring are now env-gated (default off in production); metric persistence is sampled/slow-only; blocking psutil and Celery health checks offloaded to threads.- Cache-key fixes in batch, audit, and permissions services to avoid cross-user/global key collisions.
- Deletes 12 internal
docs/*.mdplanning files (~5,200 lines); minor schema fix (assigned_byint) and an addedrequirements.txtdependency.
Risk & mitigation
Medium. The DB-session and connection-pool changes touch hot paths shared by many endpoints, and disabling pool_pre_ping behind PgBouncer plus new env defaults could surface stale-connection or behavior differences if env vars are misconfigured. Mitigation: changes are largely additive and env-gated; default sort changes are deterministic per module; verify QA environment variables (DB_MAX_CONNECTIONS, USE_PGBOUNCER, monitoring/logging toggles) before promoting.
QA validation focus
- Open grid/list views for mails, inbox, batches, documents, document types, folders, and organizations; confirm correct default ordering and improved load times.
- Drive concurrent batch/list traffic and watch for connection-pool errors or stalls; confirm pool metrics/alerts report sensible limits.
- Confirm permission-gated flows (admin hub access, tab access, system folders) still resolve correctly with request-scoped caching.
- Verify column-value/filter dropdowns still populate after the self-filter change.
- Confirm startup health logs and Celery worker detection still work with monitoring/logging defaults in QA.