AuditTrail Sentinel — ALCOA+ Data-Integrity Scanner

Methodology · How AuditTrail Sentinel works

The premise. Across the last decade of FDA warning letters and 483 observations, the data-integrity findings cited most often — shared logins, after-hours edits without reason, deletion-after-creation patterns, post-batch edits, audit-trail discontinuity, vague or missing reasons for change — are all mechanically detectable from audit-trail exports. Most sites don't look. AuditTrail Sentinel looks.

Architecture

Pure Python, ~1,400 LOC, fully unit-tested. pandas for tabular work, lxml for XML audit-trail exports, regex for pattern detection inside reason-for-change strings, SQLite for inter-rule queries and finding persistence. CLI for batch runs (cron-able), Streamlit front-end for QA review. Designed and documented to run as a controlled, validated utility under GAMP 5 Category 5 with versioned URS, IQ/OQ/PQ, and audit trail of its own findings.

Ingestion layer

Plugins for each system family — LIMS (LabWare, SampleManager), CDS (Empower, Chromeleon), MES (Werum PAS-X, Aspen PEM), instrument-resident logs (LabX, KQCL, FTIR vendor logs). Each plugin maps the system's native audit-trail schema to a normalized internal record: record_id · sequence_num · timestamp · system · user_id · user_role · action · entity_type · entity_id · field · old_value · new_value · reason_for_change · ip_address · hostname · session_id · batch_id · batch_status.

Rule pack architecture

Each rule is a separate Python module with a documented detection function, a unit-test suite, configurable thresholds, and a fixed severity weight. The orchestrator runs all enabled rules sequentially against the SQLite store, persists findings with cross-references to the source records, and emits a JSON manifest plus a human-readable report. Rule weights are configurable per site so each deployment can tune sensitivity without touching code.

ALCOA+ attribute mapping

Rule	Name	Primary ALCOA+ attributes	MHRA 2018 citation
R-001	Shared Account Detection	Attributable	§6.2 · Access control
R-002	Unauthorized Privilege Escalation	Attributable · Accurate	§6.3 · User access management
R-003	Abnormal Time-Stamp Clustering	Contemporaneous · Accurate	§6.6 · Data review
R-004	Deletion-After-Creation	Original · Enduring	§6.16 · Data lifecycle
R-005	Sequence Gap Detection	Complete	§3.5 · Audit trail completeness
R-006	Post-Batch Edit	Contemporaneous · Enduring	§6.6 · Contemporaneous record
R-007	After-Hours Edit Pattern	Contemporaneous · Attributable	§6.6 · Data review timing
R-008	Audit Trail Gap (System-Wide)	Complete · Enduring	§3.5 · Audit trail continuity
R-009	Original Record Modification	Original · Accurate	§6.16 · Reason for change

Synthetic data disclosure

All records, users, IPs, hostnames, and findings shown above are fabricated for portfolio demonstration purposes. The dataset is generated client-side from a seeded random process designed to plant a known number of violations across each rule pack so the demo produces verifiable, repeatable results. No real GxP audit-trail data is present anywhere in this artifact.

What would go in version 2

Multivariate scoring (instead of per-rule independent weights), an ML-based anomaly layer trained on per-site baselines, automatic ticket creation into the QMS via API, and a Power BI back-end so QA can trend findings the same way they trend deviations. The path is clear; the rule layer comes first because it's deterministic and inspector-explainable.

Rule packs · 9 active