21 CFR Part 11-compliant immunogenicity analysis in R with regulog

Vaccine immunogenicity analyses supporting regulatory submissions operate under 21 CFR Part 11: every analytical decision, data access event, and result must be traceable, time-stamped, and tamper-evident. In most organisations, this means maintaining a separate paper audit trail that is manually assembled after the analysis — disconnected from the code, unverifiable, and a persistent inspection risk.

regulog solves this at the source. Every step of the analysis writes a hash-chained log entry in real time — who did what, when, and why — producing a tamper-evident audit trail that is a direct artefact of the analysis itself, not a retrospective reconstruction of it.

This walkthrough covers a complete immunogenicity analysis for a Phase III RSV vaccine trial: GMT computation, seroconversion rates, persistence analysis, outlier review, electronic sign-off, and audit trail export.

The trial

RSV-VAC-301 is a randomised, double-blind, placebo-controlled Phase III trial of a novel adjuvanted subunit RSV vaccine in adults aged 60 and older. The primary immunogenicity endpoint is the geometric mean titre (GMT) ratio of RSV-neutralising antibodies at Day 29 relative to placebo. Secondary endpoints include seroconversion rate (≥4-fold rise from baseline) and GMT persistence at Day 181.

Setup

library(regulog)

# Simulated CDISC-structured ADIS dataset
# 300 subjects, 3 visits (Day 1, 29, 181)
# RSV neutralising antibody titres (IU/mL)
# Log-normal distribution, 4% missingness
set.seed(2026)

Opening the audit session

regulog_init() creates the session object and writes the genesis record immediately — this starts the SHA-256 hash chain that links every subsequent log entry. Study-level context (SAP version, data cut, analysis set) is captured as the first logged note.

log <- regulog_init(
  app     = "RSV-VAC-301-primary-immunogenicity",
  version = "1.0.0",
  user    = "jsmith",
  path    = "logs/audit_RSV301_primary_v1.rlog"
)

log_note(log,
  "Primary immunogenicity analysis per SAP v2.0, Section 5.1. Protocol:
   RSV-VAC-301. Data cut: 2026-05-15. Analysis set: immunogenicity
   per-protocol population (PPROTFL = Y)."
)

log
# 
#   App:     RSV-VAC-301-primary-immunogenicity v1.0.0
#   User:    jsmith
#   Entries: 1
#   Path:    logs/audit_RSV301_primary_v1.rlog

Logging data access

Under 21 CFR Part 11 §11.10(e), every access to audit-relevant data must be logged. with_log() provides a local read() binding scoped to the block — every read inside it is logged automatically, capturing the resolved file path, row count, and column count:

with_log(log, {
  adsl <- read(read.csv, "data/adsl.csv")
  adis <- read(read.csv, "data/adis.csv")
})

Applying the analysis population

The primary analysis uses the per-protocol population (PPROTFL = Y). Every subject excluded from the analysis is documented — not just the count, but individual exclusion notes for missing-data cases:

# Filter to per-protocol population
adis_pp <- adis |> filter(PPROTFL == "Y")

log_action(log,
  action = "apply_pp_population",
  object = "RSV-VAC-301 per-protocol population",
  reason = sprintf(
    "Restricted to per-protocol population per SAP Section 3.2. ITT: %d
     subjects | PP: %d subjects | Excluded: %d (protocol deviations)",
    n_distinct(adis$USUBJID),
    n_distinct(adis_pp$USUBJID),
    n_distinct(adis$USUBJID) - n_distinct(adis_pp$USUBJID)
  )
)

Handling missing Day 29 titres

Subjects with missing primary timepoint assessments are excluded from the primary GMT analysis. Each exclusion is individually documented with log_note():

miss_d29 <- adis_pp |> filter(AVISITN == 29, is.na(AVAL))

log_note(log, sprintf(
  "Missing data review — Day 29 missing titres: %d subjects (%.1f%%) —
   excluded per SAP", nrow(miss_d29),
  nrow(miss_d29) / nrow(filter(adis_pp, AVISITN == 29)) * 100
))

# Document each excluded subject individually
for (subj in miss_d29$USUBJID) {
  log_note(log, sprintf(
    "Subject excluded (missing data) — USUBJID %s: Day 29 titre missing",
    subj
  ))
}

Primary analysis: GMT ratio

GMTs are computed as the back-transformed mean of log2-titres. The GMT ratio (Vaccine/Placebo) is tested via a two-sample t-test on the log2 scale:

adis_primary <- adis_pp |> filter(AVISITN == 29, !is.na(AVAL))

gmt_d29 <- adis_primary |>
  group_by(TRT01P) |>
  summarise(
    n   = n(),
    gmt = 2^mean(log2(AVAL)),
    gmt_lo = 2^(mean(log2(AVAL)) - qt(0.975, n()-1) * sd(log2(AVAL)) / sqrt(n())),
    gmt_hi = 2^(mean(log2(AVAL)) + qt(0.975, n()-1) * sd(log2(AVAL)) / sqrt(n()))
  )

ttest     <- t.test(log2(AVAL) ~ TRT01P, data = adis_primary)
gmt_ratio <- 2^(-diff(ttest$estimate))
ratio_ci  <- 2^(-rev(ttest$conf.int))

log_action(log,
  action = "compute_gmt_ratio",
  object = "GMT ratio (Vaccine/Placebo)",
  reason = sprintf(
    "Computed per SAP Section 5.1. GMT ratio = %.2f (95%% CI: %.2f, %.2f), p %s",
    gmt_ratio, ratio_ci[1], ratio_ci[2],
    ifelse(ttest$p.value < 0.001, "< 0.001", sprintf("= %.3f", ttest$p.value))
  )
)

Seroconversion analysis

A seroconverter is defined as a subject with a ≥4-fold rise from baseline. The definition is logged before computing the result — establishing what was pre-specified:

log_note(log,
  "Seroconversion defined as >= 4-fold rise from baseline (pre-specified per SAP)"
)

sc_rates <- adis_pp |>
  filter(AVISITN == 29, !is.na(AVAL), !is.na(BASE)) |>
  mutate(SEROCONV = as.integer(RATIO >= 4)) |>
  group_by(TRT01P) |>
  summarise(
    n    = n(),
    n_sc = sum(SEROCONV),
    rate = mean(SEROCONV)
  )

log_action(log,
  action = "compute_seroconversion",
  object = "Seroconversion rates",
  reason = sprintf(
    "Computed per SAP Section 5.2. %s",
    paste(sprintf("%s: %d/%d (%.1f%%)", sc_rates$TRT01P, sc_rates$n_sc,
                  sc_rates$n, sc_rates$rate * 100), collapse = " | ")
  )
)

Outlier review

A key audit requirement: flagged-but-retained observations must be documented with the decision rationale. log_note() makes this natural:

# Flag extreme observations (|z| > 3 on log2 scale)
outlier_review <- adis_primary |>
  group_by(TRT01P) |>
  mutate(
    z_score = (log2(AVAL) - mean(log2(AVAL))) / sd(log2(AVAL)),
    outlier = abs(z_score) > 3
  ) |> ungroup()

log_note(log, sprintf(
  "Outlier screen (|z| > 3, log2 scale): %d flagged — retained per SAP
   (no clinical basis for exclusion; sensitivity analysis planned)",
  sum(outlier_review$outlier)
))

# Log each flagged subject individually
for (i in which(outlier_review$outlier)) {
  log_note(log, sprintf(
    "Outlier flagged — USUBJID %s: z = %.2f — retained, flagged for sensitivity",
    outlier_review$USUBJID[i], outlier_review$z_score[i]
  ))
}

Electronic sign-off

Under 21 CFR Part 11 §11.100 and §11.200, electronic signatures must include the signatory's name, the date and time, and the meaning of the signature. log_signature() resolves the signer identity automatically from the session user set at regulog_init() — it cannot be overridden at signing time — and records the number of entries covered without any extra input:

log_signature(log,
  "I confirm that this analysis was conducted in accordance with SAP v2.0
   and that the results presented are accurate and complete to the best
   of my knowledge."
)

Verifying the hash chain

Any post-hoc modification to any log entry breaks the SHA-256 chain. verify_log() recomputes the full chain from the first entry and reports the first broken link — or confirms the chain is intact:

verify_log(log)
# regulog: Log intact: 34 entries, chain unbroken

Exporting the audit trail

export_audit_trail(log,
  format = "csv",
  signed = TRUE,
  path   = "outputs/audit_trail_RSV301_primary_v1.csv"
)
# regulog: exported 34 row(s) to outputs/audit_trail_RSV301_primary_v1.csv

The exported CSV contains: entry ID, timestamp (UTC), analyst, entry type, action, reason, and the SHA-256 hash linking each entry to its predecessor. With signed = TRUE, every row is also stamped with chain_intact and verified_at from a fresh verification run at export time. This file is the submission-ready audit trail — self-contained, independently verifiable, and directly connected to the analysis that produced the results.

What the audit trail proves

The complete record covers:

Who ran the analysis — analyst identity and electronic sign-off
When every step was executed — UTC timestamps on every entry
What data were loaded and from where, with row/column counts captured at read time
Why each decision was made — mandatory reason fields throughout
Which subjects were excluded and why — individual records
Which observations were flagged and what action was taken
What results were produced — exact values logged at computation

The SHA-256 chain means any modification after the fact — to any entry — is detectable without reference to the original system. This satisfies the tamper-evidence requirement of 21 CFR Part 11 §11.10(e) and EU Annex 11 Clause 9 directly from the analysis code, without any parallel documentation process.

Using regulog in practice

The pattern above generalises to any regulated R analysis. The key discipline: log before and after every consequential decision, not just at the end. Decisions that feel routine in isolation — which population flag to use, how to handle a missing timepoint, whether to retain an outlier — are exactly the decisions regulators ask about. regulog makes the answer permanently on record.

For integration with data provenance tracking (row-level lineage across the derivation pipeline), pair regulog with lineager. For reproducibility certification of the analysis environment, pair with reproducr. All three packages are designed to work together.

Ndoh Penn is a biostatistician based in Antwerp, Belgium, and the author of regulog, reproducr, and bayprior. Questions or corrections — hello@reprostats.org or open an issue on GitHub.