Reproducible Statistical Computing

Tools, frameworks, and practices for reproducible research in biostatistics

2026-06-07 18:16 PDT

Overview

Reproducibility is not a stylistic preference but a scientific requirement: an analysis that cannot be independently rerun, verified, and modified is not a completed analysis. In practice, achieving computational reproducibility in clinical research requires attention to software environment management, data provenance, workflow automation, and the tooling that connects each step of the analysis pipeline from raw data to rendered report.

This program develops and maintains a suite of R packages and command-line tools that operationalize reproducible workflows for biostatistical research. The central organizing framework is zzcollab, which instantiates a Docker-based, renv-pinned research compendium from a single command. The surrounding tools handle specific tasks – longitudinal visualization, Table 1 construction, power analysis, electronic data capture, and output formatting – that arise repeatedly across projects.

Tools

  • zzcollab – Docker-based reproducible research compendium framework. Creates a complete project scaffold (Dockerfile, renv.lock, .Rprofile, source code, data directory) from a single CLI command, targeting five research profiles (minimal, analysis, modeling, publishing, shiny).

  • zzedc – Electronic data capture system for clinical trials, providing form design, validation, and data export in a reproducible R-based pipeline.

  • zzrenvcheck – Validation tool for renv package dependency graphs, detecting mismatches between renv.lock and the active R library.

  • zzlongplot – Longitudinal data visualization for clinical trials, with MMRM-style trajectory displays and individual-profile overlays.

  • zztable1 – Next-generation Table 1 construction for clinical research, supporting multi-format output (LaTeX, HTML, plain text) from a single function call.

  • zzobj2fig – Renders any R modeling output as a publication-quality LaTeX or Typst table.

  • zzworld – WORLD-backwards cognitive test scoring and edit-distance analysis, implementing five scoring rules from the MMSE literature.

  • zzfisher – Fisher’s exact test for r×2 contingency tables with exact power analysis.

  • zzpower – Interactive power analysis calculator for clinical-trial designs.

  • nof1power – Power analysis and simulation for N-of-1 and parallel-group trial designs.

  • zzgit – Interactive git add/commit/push for zsh with Conventional Commits wizard and secret scanning.

  • zzvim-R – Vim/Neovim plugin for R integration: send code to an R session from the editor.

Current research

  • WORLD-backwards scoring: empirical comparison. Multi-cohort comparison of five WORLD-backwards scoring rules applied to seven ADCS and ADNI MMSE datasets, validating the zzworld implementation against legacy Perl, SAS, C, and R reference implementations. Whitepaper near submission.

  • zzcollab framework paper. Methods paper describing the five-pillar zzcollab architecture (Dockerfile, renv.lock, .Rprofile, source code, data) and its application to clinical research compendia.

  • zzedc methods paper. Description and evaluation of the zzedc electronic data capture system for clinical trials.

Publications

Work at the intersection of statistical computing and methodology is accessible through the full publications list by filtering on statistical-computing or data-visualization.