The Reproducibility Quest: Modern Tools for R and Python Data Science

Python

Reproducibility

DevContainers

Docker

Package Management

Best Practices

VS Code

GxP

Pharma

Setup

A comprehensive guide to achieving reproducible data science workflows — environment setup, package management, containerization, and what reproducibility actually means in GxP-regulated pharma contexts.

Author

Antoine Lucas

Published

June 10, 2025

Introduction

Reproducibility is the cornerstone of credible data science. Whether you’re in academia, pharma, finance, or tech, the ability to recreate your analysis environment and results is essential. Yet achieving true reproducibility has historically been painful — a maze of version conflicts, missing dependencies, and the dreaded “works on my machine” syndrome.

The good news? We’re living in a golden age of reproducibility tooling. Both R and Python ecosystems have evolved dramatically, offering modern solutions that make reproducible workflows not just possible, but practical. This post covers the complete picture: from setting up a solid development environment in the first place, through package management and containerization, to what reproducibility actually requires when you’re working under regulatory constraints.

But before diving into tools, let’s address a crucial question: how much reproducibility do you actually need?

Setting Up Your Environment

Before you can have a reproducible workflow, you need a consistent development environment. This matters more than most people realise — subtle differences between machines (R versions, system libraries, editor settings) cause problems that are hard to debug and impossible to prevent without a baseline setup.

VS Code as your polyglot IDE

If you write both R and Python, VS Code is the only reasonable choice. RStudio is excellent for pure R work but does not handle Python well. PyCharm goes the other way. VS Code handles both, plus Quarto, plus Bash, plus whatever else you need.

The extensions that actually matter for a data science setup:

Extension	Purpose
`REditorSupport.r`	R language support, inline output, completion
`Posit.air-vscode`	R formatter (air) — format-on-save
`ms-python.python`	Python language support
`charliermarsh.ruff`	Python formatter + linter (ruff) — format-on-save
`quarto.quarto`	Quarto rendering, preview, syntax
`mcanouil.quarto-wizard`	Extension manager for Quarto
`ms-toolsai.jupyter`	Jupyter kernel support
`marimo-team.vscode-marimo`	marimo reactive notebook support
`usernamehw.errorlens`	Inline linting errors (R and Python)

A minimal settings.json that wires everything together:

{
  "[r]": {
    "editor.defaultFormatter": "Posit.air-vscode",
    "editor.formatOnSave": true
  },
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.formatOnSave": true
  },
  "r.useRenvLibPath": true,
  "r.rterm.option": ["--no-save", "--no-restore-data", "--quiet"],
  "python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
  "editor.rulers": [88]
}

1: Use air (Posit’s Rust-based formatter) as the default R formatter — replaces styler.
2: Auto-format on every save — no manual air format . needed during editing.
3: Use ruff as formatter for Python — one tool replaces black, isort, and flake8.
4: Tell the R extension to use the renv project library instead of the system library.
5: Start R sessions without loading .RData — prevents stale objects from contaminating fresh runs.
6: Point Python to the project-local .venv — ensures uv-managed packages are picked up.
7: Draw a vertical ruler at column 88 — the default line-length limit used by ruff.

R: install via rig, not CRAN

rig is the R Installation Manager. Install it once, then install and switch between R versions cleanly:

# macOS
brew tap r-lib/rig && brew install rig

# Linux
curl -L https://rig.r-lib.org/rig-linux-latest.tar.gz | sudo tar xz -C /usr/local

# Install a specific R version
rig add 4.5
rig default 4.5

1: Install rig via Homebrew on macOS — the one-time setup step.
2: Linux alternative: download and extract the binary to /usr/local with sudo.
3: Install R 4.5 — rig add downloads and sets up that exact version without touching other installations.
4: Set R 4.5 as the default version system-wide.

Never install R by downloading from CRAN. rig makes version switching trivial and doesn’t pollute your system.

Python: install via uv, not the system Python

uv manages Python versions and virtual environments. The system Python should never be touched for project work:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install and pin a Python version for a project
uv python install 3.12
uv python pin 3.12

1: One-liner installer — writes a uv binary to ~/.cargo/bin (or ~/.local/bin). No admin rights needed.
2: Download and install CPython 3.12 — managed entirely by uv, isolated from the system Python.
3: Write a .python-version file pinning this project to 3.12 — respected by uv run and uv sync.

Formatting and linting

R: air — a Rust-based formatter from Posit. Fast, opinionated, and it replaces styler. Never use formatR. Configure via air.toml at project root.
Python: ruff — formatter and linter in one. Replaces black, isort, flake8. Configure via pyproject.toml.

Both integrate with VS Code for format-on-save. This matters for reproducibility in a non-obvious way: consistent formatting means diffs are about logic, not whitespace, which makes code review and change tracking actually useful.

Notebooks: marimo over Jupyter

If you write Python notebooks, marimo is worth evaluating. It’s reactive (cells re-execute when their dependencies change), stores notebooks as pure Python scripts (Git-friendly, no JSON blob), and has a better story for reproducibility than classical Jupyter. For R, Quarto documents are the right answer for anything that isn’t pure exploration.

The Reproducibility Spectrum: Scoping Your Needs

Not every project requires the same level of reproducibility investment. A quick exploratory analysis for an internal meeting doesn’t need the same rigor as a clinical trial submission to the U.S. Food and Drug Administration (FDA). Understanding where your project falls on this spectrum is essential for efficient resource allocation.

Levels of Reproducibility

Level	Description	Typical Use Cases	Investment
L1: Minimal	Code runs on author’s machine	Personal exploration, quick prototypes	Low
L2: Documented	README with manual setup instructions	Team projects, handoffs	Low-Medium
L3: Lockfile	Pinned package versions	Production analyses, shared projects	Medium
L4: Containerized	Full environment specification	Regulated industries, long-term archives	Medium-High
L5: Fully Isolated	Container + data versioning + workflow	Clinical trials, financial audits	High

Scoping Questions

Before setting up your reproducibility stack, ask:

Who needs to reproduce this?
- Just me, in 6 months? → L2-L3
- My team? → L3
- External auditors? → L4-L5
- The entire scientific community? → L5
What’s the regulatory context?
- Internal exploration → L1-L2
- Published research → L3-L4
- Regulatory submission → L5
What’s the project lifespan?
- One-off analysis → L1-L2
- Quarterly reports → L3
- Multi-year project → L4-L5
What’s the cost of failure?
- Learning exercise → L1
- Business decision → L3
- Patient safety → L5

The Reproducibility Tax

Every level of reproducibility adds friction:

L1→L2: Writing documentation (minutes)
L2→L3: Managing lockfiles, occasional dependency conflicts (hours)
L3→L4: Container setup, debugging build issues (days initially)
L4→L5: Data versioning, workflow management, continuous integration/continuous deployment (CI/CD) (ongoing effort)

The goal is to pay the right amount of tax for your context — not more, not less.

The Convergence: R and Python Are Aligning

Something remarkable is happening: the R and Python ecosystems are converging on similar reproducibility paradigms. This isn’t coincidental — it reflects hard-won lessons about what works.

Shared Principles

Both communities have arrived at the same conclusions:

Declarative over imperative — Define desired state, not installation steps
Lockfiles are essential — Capture exact versions at installation time
TOML for configuration — Human-readable, standardized format
Rust for tooling — Performance matters for developer experience
Holistic resolution — Solve the entire dependency tree before installing

The Modern Stack Comparison

Concept	Python	R	Convergence
Package manager	uv	rv	Both Rust, declarative, TOML-based
Config file	`pyproject.toml`	`rproject.toml`	Same format, similar structure
Lockfile	`uv.lock`	`rv.lock`	Same purpose, similar approach
Version manager	uv, pyenv	rig	CLI-based, multiple versions
Formatter	ruff	air	Fast, opinionated, Rust-based
Containers	devcontainers	devcontainers + Rocker	Same specification

This convergence means skills transfer between languages, and polyglot projects become easier to manage.

Package Management: The Foundation

Python: The uv Revolution

uv has fundamentally changed Python package management. Written in Rust, it’s blazingly fast and consolidates what used to require multiple tools (pyenv, pip, virtualenv, pip-tools) into one coherent experience.

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a new project
uv init my-analysis
cd my-analysis

# Add dependencies
uv add pandas numpy scikit-learn matplotlib

# Sync environment
uv sync

1: Scaffold a new project: creates pyproject.toml, README.md, and a hello.py stub.
2: Add packages to pyproject.toml and install them into the project’s .venv in one step.
3: Recreate .venv from the lockfile — ensures every collaborator gets bit-for-bit identical packages.

The pyproject.toml becomes your single source of truth:

[project]
name = "my-analysis"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
    "pandas>=2.0",
    "numpy>=1.24",
    "scikit-learn>=1.3",
    "matplotlib>=3.7",
]

[tool.uv]
dev-dependencies = [
    "pytest>=7.0",
    "ruff>=0.1",
]

1: Minimum Python version constraint — uv enforces this and will refuse to create an incompatible environment.
2: Dev-only dependencies not installed in production or CI — keeps the deployment environment lean.

Key features that make uv essential:

Lockfile (uv.lock) — Captures exact versions for reproducibility
Python version management — No need for separate pyenv
Blazing fast — 10-100x faster than pip
Declarative — Describe what you want, not how to get there

# uv can manage Python versions too
uv python install 3.12
uv python pin 3.11

# Run scripts with automatic environment setup
uv run python analysis.py

R: The rv Revolution

On the R side, rv brings the same declarative philosophy. If you’ve been using renv, rv will feel like a significant upgrade.

# Install rv
curl -sSL https://raw.githubusercontent.com/A2-ai/rv/refs/heads/main/scripts/install.sh | bash

# Create a new project
rv init my-analysis
cd my-analysis

# Add dependencies
rv add tidyverse arrow DBI

# Sync
rv sync

1: One-liner install — downloads the rv binary and places it on your PATH.
2: Initialise the project: creates rproject.toml with the current R version pre-filled.
3: Add packages to rproject.toml — rv resolves and installs them immediately.
4: Install all declared packages into the project library — the equivalent of renv::restore() but driven by a declarative manifest.

Your rproject.toml:

[project]
name = "my-analysis"
r_version = "4.5"

repositories = [
    { alias = "PPM", url = "https://packagemanager.posit.co/cran/latest" },
]

dependencies = [
    "tidyverse",
    "arrow",
    "DBI",
]

1: Pins the exact R version — rv will warn if you try to sync on a different version.
2: Use Posit Package Manager (PPM) instead of CRAN — provides pre-built binaries, dramatically faster installs on Linux.

renv: The Established Alternative

While rv is the future, renv remains the most widely adopted solution:

# Initialize renv
renv::init()

# Install packages as usual
install.packages("tidyverse")

# Snapshot your dependencies
renv::snapshot()

# Restore on another machine
renv::restore()

The key difference: renv is reactive (snapshot after installation), while rv is declarative (define desired state, then sync).

When to Use What

Scenario	Python	R
New projects	uv	rv
Legacy projects	pip + requirements.txt	renv
Corporate environments	uv (or pip if restricted)	renv (more established)
Bleeding edge	uv	rv
Maximum compatibility	pip	renv

Language Version Management

R: rig

rig is the R Installation Manager — think of it as pyenv for R:

# Install rig (macOS)
brew tap r-lib/rig && brew install rig

# Install rig (Linux)
curl -L https://rig.r-lib.org/rig-linux-latest.tar.gz | sudo tar xz -C /usr/local

# List available versions
rig available

# Install specific version
rig add 4.5
rig add 4.4

# Switch between versions
rig default 4.5

# List installed versions
rig list

Python: uv or pyenv

uv now handles Python version management directly:

# Install Python 3.12
uv python install 3.12

# Pin project to specific version
uv python pin 3.11

# List installed versions
uv python list

Alternatively, pyenv remains a solid choice:

# Install pyenv
curl https://pyenv.run | bash

# Install Python version
pyenv install 3.11.6

# Set local version for project
pyenv local 3.11.6

Environment Control: DevContainers

Package managers handle libraries, but what about system dependencies? OpenSSL, GDAL, database drivers, C compilers… these often cause the most painful reproducibility issues.

What Are DevContainers?

Development Containers provide a complete, containerized development environment. They’re Docker containers configured specifically for development, with full integrated development environment (IDE) integration.

A .devcontainer/devcontainer.json defines your environment:

{
    "name": "R & Python Data Science",
    "image": "ghcr.io/rocker-org/devcontainer/tidyverse:4.4",
    "features": {
        "ghcr.io/rocker-org/devcontainer-features/quarto-cli:1": {},
        "ghcr.io/devcontainers/features/python:1": {
            "version": "3.11"
        }
    },
    "customizations": {
        "vscode": {
            "extensions": [
                "REditorSupport.r",
                "ms-python.python",
                "quarto.quarto"
            ],
            "settings": {
                "r.rterm.linux": "/usr/local/bin/R"
            }
        }
    },
    "postCreateCommand": "uv sync && rv sync"
}

1: Base image from the Rocker Project — a battle-tested R 4.4 environment with the tidyverse and system libraries pre-installed.
2: Add Quarto CLI as a devcontainer feature — installs on top of the base image without modifying the Dockerfile.
3: Add Python 3.11 as a feature — layered cleanly alongside R in the same container.
4: Run after the container is created: restore the Python virtual environment and the R package library from their respective lockfiles.

When to Use DevContainers

DevContainers are L4 reproducibility — use them when:

System dependencies are complex (geospatial, databases, specific compilers)
Team onboarding needs to be instant
You need CI/CD parity with development
Long-term archival is required

Skip them when:

Pure R/Python with no system dependencies
Quick personal projects
Environments where Docker isn’t available

R-Specific DevContainers with Rocker

The Rocker Project provides excellent base images:

{
    "name": "R Development",
    "image": "ghcr.io/rocker-org/devcontainer/tidyverse:4.4",
    "features": {
        "ghcr.io/rocker-org/devcontainer-features/quarto-cli:1": {},
        "ghcr.io/rocker-org/devcontainer-features/rig:1": {}
    }
}

Available images:

rocker/r-ver — Base R
rocker/tidyverse — R + tidyverse packages
rocker/verse — tidyverse + LaTeX
rocker/geospatial — Includes GDAL (Geospatial Data Abstraction Library), GEOS (Geometry Engine — Open Source), PROJ (cartographic projections library)

Multi-Language DevContainer

For projects using both R and Python:

{
    "name": "R + Python Data Science",
    "build": {
        "dockerfile": "Dockerfile"
    },
    "features": {
        "ghcr.io/rocker-org/devcontainer-features/quarto-cli:1": {}
    },
    "postCreateCommand": "rv sync && uv sync",
    "customizations": {
        "vscode": {
            "extensions": [
                "REditorSupport.r",
                "ms-python.python",
                "quarto.quarto",
                "posit.publisher"
            ]
        }
    }
}

With a custom Dockerfile:

FROM ghcr.io/rocker-org/devcontainer/tidyverse:4.4

# Install uv for Python
RUN curl -LsSf https://astral.sh/uv/install.sh | sh

# Install rv for R
RUN curl -sSL https://raw.githubusercontent.com/A2-ai/rv/refs/heads/main/scripts/install.sh | bash

# System dependencies
RUN apt-get update && apt-get install -y \
    libpq-dev \
    libgdal-dev \
    && rm -rf /var/lib/apt/lists/*

Workflow Orchestration

Beyond package management, you need tools to manage the execution flow of your analysis. Snakemake is the most widely adopted cross-language option.

Snakemake

# Snakefile
rule all:
    input: "results/final_report.html"

rule clean_data:
    input: "data/raw.csv"
    output: "data/clean.csv"
    script: "scripts/clean.py"

rule train_model:
    input: "data/clean.csv"
    output: "models/model.pkl"
    script: "scripts/train.py"

Documentation: Quarto

Quarto unifies documentation across R, Python, and Julia. A typical Quarto document combines markdown narrative with code chunks in either language.

Key reproducibility feature: With freeze: auto in your YAML header, Quarto caches computational outputs, so documents can be re-rendered without re-executing all code — improving both reproducibility and build times.

Quarto documents use fenced code blocks with language specifiers like {python} or {r} to indicate executable code, and can output to HTML, PDF, Word, and many other formats.

Organizational Considerations

Team Standards

Different teams within an organization may need different reproducibility levels:

Team	Typical Level	Rationale
Data Science R&D	L2-L3	Fast iteration, exploration
Production Analytics	L3-L4	Reliability, handoffs
Regulatory/Compliance	L4-L5	Audit requirements
External Publications	L4-L5	Peer review, credibility

Building a Reproducibility Culture

Start with templates — Provide project templates at appropriate levels
Automate checks — CI/CD that validates reproducibility
Document decisions — Record why a certain level was chosen
Review periodically — Projects may need to level up over time

The Cost-Benefit Reality

The key insight: match your investment to your actual needs. Over-engineering reproducibility for a throwaway analysis wastes time. Under-engineering for a regulatory submission creates risk.

As project risk and longevity increase, so should your reproducibility investment — from L1 (personal exploration) through L5 (regulatory/clinical).

Regulated Environments: Reproducibility in Pharma and Good Practices (GxP)

Everything above applies to all data science work. But if you operate in a regulated environment — pharma, medical devices, clinical research — there are additional requirements that change the calculus significantly.

What FDA and EMA actually expect

The FDA’s 21 CFR Part 11 and the European Medicines Agency (EMA)’s comparable guidance don’t prescribe specific tools. What they require is an audit trail: given a result in a regulatory submission, you must be able to demonstrate exactly which code, which data, and which environment produced it — and reproduce it on demand.

In practice, this means:

Every analysis is traceable to a specific software version. Not “R 4.x”, but “R 4.4.2 with renv lockfile pinned to commit abc123”.
Changes are documented. A git history is not optional; it’s part of the validation evidence.
The environment is validated. IQ (Installation Qualification), OQ (Operational Qualification), PQ (Performance Qualification) aren’t bureaucratic theatre — they’re the documented proof that your platform behaves as expected.

sessionInfo() is necessary but not sufficient

sessionInfo() is the minimum. It tells you what packages were loaded in the R session at the time of analysis. Include it at the end of every regulatory-grade report:

sessionInfo()

But sessionInfo() alone doesn’t let you reproduce the environment. It describes it. To actually restore it, you need a lockfile.

renv + Docker as a practical IQ/OQ/PQ story

The validation stack for a Shiny platform running on Posit Connect looks like this when done right:

renv — lockfile in the repository, pinned to an exact snapshot date. renv::restore() gives any auditor the exact library used in production.
Docker — the base image is pinned (rocker/r-ver:4.4.2, not rocker/r-ver:latest). This ensures system libraries are frozen, not just R packages.
GitHub Actions — CI runs the test suite and renders key reports on every push. The green CI badge is part of the OQ evidence.
Posit Connect — deployment is version-controlled. Each deployed bundle has a manifest and can be rolled back.

The IQ document answers: “Is the software correctly installed?” — covered by the Docker image build log and the renv::restore() log.

The OQ document answers: “Does it perform as specified?” — covered by the automated test suite.

The PQ document answers: “Does it perform as required in production?” — covered by UAT sign-off and monitoring dashboards.

This isn’t a heavy process once it’s set up. The overhead is in the documentation, not the tooling. The tooling is Git, renv, Docker, and a CI runner — none of which require purchasing a validated platform.

The practical minimum for regulatory R work

If you’re not running a full validated platform but still need to produce defensible analyses (clinical biostatistics reports, ingredient safety dossiers, etc.):

Git — every script in version control, analysis tied to a commit hash
renv — lockfile committed alongside the code
sessionInfo() in every report — as a footer, not optional
Quarto with freeze: auto — computational outputs frozen and committed; the rendered report is reproducible without re-executing

This gets you to L3/L4 on the reproducibility spectrum at low overhead. It won’t satisfy IQ/OQ/PQ for a validated system, but it satisfies the traceability requirement for analytical work and will hold up to a reasonably thorough audit.

The Complete Reproducibility Checklist

Project Structure

my-project/
├── .devcontainer/
│   └── devcontainer.json     # Container environment (L4+)
├── .github/
│   └── workflows/
│       └── ci.yml            # Automated testing
├── data/
│   ├── raw/                  # Immutable raw data
│   └── processed/            # Generated data (gitignored)
├── src/                      # Source code
├── notebooks/                # Exploratory work
├── reports/                  # Quarto documents
├── tests/                    # Unit tests
├── pyproject.toml            # Python dependencies
├── uv.lock                   # Python lockfile
├── rproject.toml             # R dependencies
├── rv.lock                   # R lockfile
├── .gitignore
└── README.md

By Level

L1 (Minimal):

Code in version control

L2 (Documented):

README with setup instructions
List of required packages

L3 (Lockfile):

uv.lock / rv.lock / renv.lock
Language version specified
CI that runs tests

L4 (Containerized):

DevContainer configuration
System dependencies documented
CI uses same container

L5 (Fully Isolated):

Workflow management (Snakemake)
Complete audit trail
Archived container images

Summary: The Modern Reproducibility Toolkit

Need	Python	R
Package management	uv	rv, renv
Version management	uv, pyenv	rig
Containerization	devcontainers	devcontainers + Rocker
Workflow	Snakemake	Snakemake
Documentation	Quarto, Jupyter	Quarto
Formatting	ruff	air
Linting	ruff	lintr

Conclusion

Reproducibility is no longer optional — it’s table stakes for professional data science. The tooling has matured to the point where reproducible workflows are not just possible but practical.

But remember: reproducibility is a spectrum, not a binary. The right approach depends on your context — who needs to reproduce the work, for how long, and at what stakes.

Start with package management (uv for Python, rv or renv for R), add language version control (rig, uv python), and when system dependencies become complex, embrace devcontainers. Layer in Snakemake for complex multi-step pipelines, and document everything with Quarto.

The investment in reproducibility pays dividends: faster onboarding, fewer “works on my machine” incidents, easier debugging, and the confidence that your results can be verified and trusted. Just make sure you’re investing the right amount for your actual needs.