All updates
QABackend

Database Migration Reliability Fix: Corrected Migration Chain and Long Revision ID Support

PR #89aisupport-datagainSep 15, 2025 · 21:55 UTC
QASep 15, 2025

Executive summary

This change repairs the backend database migration sequence so that scheduled schema updates apply cleanly. It corrects how three migrations link together and adds a step that lets the migration tracker store longer migration identifiers, preventing deployment failures. Despite the placeholder PR title ("test"), this is a substantive infrastructure fix that has reached the QA environment.

Why this was needed

Several recent migrations were configured incorrectly in two ways. First, one migration referenced another by its filename rather than its actual internal revision identifier, breaking the dependency chain so migrations could fail to find their predecessor. Second, some migration identifiers (e.g. merge_notifications_into_activitylog_0915) exceed the 32-character limit of the column the migration tool uses to record applied versions, which would cause the upgrade to error out when recording its state. A third issue: status-cleanup queries used the JSONB-only ? membership operator against a column that is stored as plain JSON, which would raise a SQL error at migration time.

Client / user impact

No visible feature or UI change for end users. The impact is operational reliability: backend schema deployments now complete without erroring on revision-chain mismatches, version-length overflow, or invalid JSON operators. This unblocks the related work (mail status exclusivity cleanup and merging notifications into the activity log) from applying successfully in QA and subsequent environments.

Technical scope

  • Added migration 20250915_expand_alembic_version_len_0915.py (revision expand_verlen_0915, down_revision add_fk_cleanup_0907) that widens alembic_version.version_num from VARCHAR(32) to VARCHAR(255), with a raw-SQL fallback and a guarded downgrade that only narrows the column if no row would be truncated.
  • Re-chained 20250915_merge_notifications_into_activitylog.py so its down_revision is now expand_verlen_0915, ensuring the column is widened before this long-named revision records itself.
  • Fixed 20250915_cleanup_mail_status_exclusivity.py down_revision to the real revision string db_timeouts_relax_hotfix_0911 (previously pointed at a filename, breaking the chain).
  • In the same cleanup migration, replaced JSONB value ? 'resolved' / 'is_read' membership checks with value->>'...' IS NOT NULL because the MailUserStatusLookup.value column is JSON, not JSONB.
  • Net change: 3 files, +67/-8 lines, all within alembic/versions/ (no application/runtime code touched).

Risk & mitigation

Risk is moderate and confined to the migration layer. Altering alembic_version is a metadata table change; the migration uses a try/except raw-SQL fallback and a non-destructive guarded downgrade, lowering risk of a failed upgrade. Re-pointing down_revision values changes the migration graph, so the chain must be validated to ensure no branches or duplicate heads result. Mitigation: run the full migration on a copy of QA/production data and confirm a single head before promoting beyond QA.

QA validation focus

  • Run alembic upgrade head on a fresh DB and on a QA-data copy; confirm all three migrations apply with no errors.
  • Verify alembic heads returns a single head and alembic history shows the expected order: add_fk_cleanup_0907 -> expand_verlen_0915 -> merge_notifications_into_activitylog_0915.
  • Confirm alembic_version.version_num is VARCHAR(255) after upgrade and that the long revision id is recorded successfully.
  • Confirm the mail status cleanup migration runs without JSON-operator SQL errors against MailUserStatusLookup.
  • Test alembic downgrade one step to validate the guarded narrowing logic does not truncate existing version values.