AuditTrail Sentinel · ALCOA+ Data-Integrity Scanner
9 rule packs · MHRA 2018 GxP DI Guidance aligned · Portfolio piece by Justin Arndt
github.com/j-arndt
Records Scanned
Systems
Findings
Scan Duration

Rule packs · 9 active

Findings

Severity System
Clear filters
Raw audit-trail sample — first 80 records (of scanned)
Record Timestamp System User Role Action Entity Field Reason IP / Host Batch

Methodology · How AuditTrail Sentinel works

The premise. Across the last decade of FDA warning letters and 483 observations, the data-integrity findings cited most often — shared logins, after-hours edits without reason, deletion-after-creation patterns, post-batch edits, audit-trail discontinuity, vague or missing reasons for change — are all mechanically detectable from audit-trail exports. Most sites don't look. AuditTrail Sentinel looks.

Architecture

Pure Python, ~1,400 LOC, fully unit-tested. pandas for tabular work, lxml for XML audit-trail exports, regex for pattern detection inside reason-for-change strings, SQLite for inter-rule queries and finding persistence. CLI for batch runs (cron-able), Streamlit front-end for QA review. Designed and documented to run as a controlled, validated utility under GAMP 5 Category 5 with versioned URS, IQ/OQ/PQ, and audit trail of its own findings.

Ingestion layer

Plugins for each system family — LIMS (LabWare, SampleManager), CDS (Empower, Chromeleon), MES (Werum PAS-X, Aspen PEM), instrument-resident logs (LabX, KQCL, FTIR vendor logs). Each plugin maps the system's native audit-trail schema to a normalized internal record: record_id · sequence_num · timestamp · system · user_id · user_role · action · entity_type · entity_id · field · old_value · new_value · reason_for_change · ip_address · hostname · session_id · batch_id · batch_status.

Rule pack architecture

Each rule is a separate Python module with a documented detection function, a unit-test suite, configurable thresholds, and a fixed severity weight. The orchestrator runs all enabled rules sequentially against the SQLite store, persists findings with cross-references to the source records, and emits a JSON manifest plus a human-readable report. Rule weights are configurable per site so each deployment can tune sensitivity without touching code.

ALCOA+ attribute mapping

RuleNamePrimary ALCOA+ attributesMHRA 2018 citation
R-001Shared Account DetectionAttributable§6.2 · Access control
R-002Unauthorized Privilege EscalationAttributable · Accurate§6.3 · User access management
R-003Abnormal Time-Stamp ClusteringContemporaneous · Accurate§6.6 · Data review
R-004Deletion-After-CreationOriginal · Enduring§6.16 · Data lifecycle
R-005Sequence Gap DetectionComplete§3.5 · Audit trail completeness
R-006Post-Batch EditContemporaneous · Enduring§6.6 · Contemporaneous record
R-007After-Hours Edit PatternContemporaneous · Attributable§6.6 · Data review timing
R-008Audit Trail Gap (System-Wide)Complete · Enduring§3.5 · Audit trail continuity
R-009Original Record ModificationOriginal · Accurate§6.16 · Reason for change

Synthetic data disclosure

All records, users, IPs, hostnames, and findings shown above are fabricated for portfolio demonstration purposes. The dataset is generated client-side from a seeded random process designed to plant a known number of violations across each rule pack so the demo produces verifiable, repeatable results. No real GxP audit-trail data is present anywhere in this artifact.

What would go in version 2

Multivariate scoring (instead of per-rule independent weights), an ML-based anomaly layer trained on per-site baselines, automatic ticket creation into the QMS via API, and a Power BI back-end so QA can trend findings the same way they trend deviations. The path is clear; the rule layer comes first because it's deterministic and inspector-explainable.