Skip to content

Instantly share code, notes, and snippets.

@inutano
Last active February 22, 2026 04:41
Show Gist options
  • Select an option

  • Save inutano/48b471141c546faffd77a27066a10a28 to your computer and use it in GitHub Desktop.

Select an option

Save inutano/48b471141c546faffd77a27066a10a28 to your computer and use it in GitHub Desktop.
WES Test Report: nf-core/rnaseq via Sapporo on macOS (Apple Silicon)

WES Test Report: nf-core/rnaseq via Sapporo

Environment

  • Host: macOS (Apple Silicon / arm64), Darwin 24.5.0
  • Docker Desktop: v29.2.1, API v1.53 (minimum API v1.44)
  • Sapporo WES: sapporo-wes-2.1.0 (ghcr.io/sapporo-wes/sapporo-service:latest)
  • Nextflow: 25.10.4 (via nextflow/nextflow:25.10.4 container)
  • Pipeline: nf-core/rnaseq (test profile)
  • Date: 2026-02-22

Step 1: Clone and start Sapporo

cd ~/work/wes-test
git clone https://github.com/sapporo-wes/sapporo-service.git
cd sapporo-service
docker compose up -d

Verified with:

curl -s localhost:1122/service-info | jq .

Confirmed Nextflow (NFL / DSL2) listed in workflow_type_versions and workflow_engine_versions.

Step 2: Prepare workflow parameters

workflow_params.json:

{
  "outdir": "results",
  "max_memory": "6.GB",
  "max_cpus": 2
}

workflow_engine_parameters.json:

{
  "-profile": "test,docker"
}

Step 3: Submit and iterate

Run 1 — OOM

Process requirement exceeds available memory -- req: 12 GB; avail: 7.7 GB

The default nf-core/rnaseq test profile requested 12 GB for FQ_LINT, exceeding Docker Desktop's memory allocation.

Fix: Added "max_memory": "6.GB" and "max_cpus": 2 to workflow_params.json.

Run 2 — Docker API version mismatch

docker: Error response from daemon: client version 1.32 is too old.
Minimum supported API version is 1.44, please upgrade your client to a newer version.

The nextflow/nextflow:25.10.4 image bundles Docker client API v1.32, but Docker Desktop requires >= v1.44.

Fix: Added -e DOCKER_API_VERSION=1.44 to the run_nextflow() function in sapporo/run.sh. Also added a Nextflow config with docker.envWhitelist = 'DOCKER_API_VERSION' to propagate the variable into child process containers.

Note: the stock run.sh already had this fix for cwltool, toil, and ep3 — but not for Nextflow. Also, local edits to run.sh required bind-mounting the file into the container via compose.yml:

volumes:
  - ${PWD}/sapporo/run.sh:/app/sapporo/run.sh:ro

Run 3 — Mount denied for pipeline assets

docker: Error response from daemon: mounts denied:
The path /.nextflow/assets/nf-core/rnaseq/bin is not shared from the host
and is not known to Docker.

Nextflow stores cloned pipeline assets at /.nextflow/assets/ inside its container. When spawning child containers, it tries to bind-mount that path — but Docker Desktop on macOS cannot access paths inside another container.

Fix: Set NXF_HOME and NXF_ASSETS environment variables to point into the shared run directory (${run_dir}/nxf_home), which is host-mounted and accessible to child containers.

Run 4 — Success

Pipeline ran for ~16 minutes and completed successfully.

Step 4: Verify results

curl -s localhost:1122/runs/25fedfa2-9792-4f62-a391-6a2da2a72628 | jq '.state'
# "COMPLETE"

955 output files produced across 9 directories:

Directory Contents
bbsplit Contamination screening stats
custom Merged genome + GTF (with GFP spike-in)
fastqc Raw read quality reports
fq_lint FASTQ format validation
multiqc Aggregated QC report
pipeline_info Execution metadata and resource usage
salmon Transcript-level quantification
star_salmon STAR alignment + Salmon quantification
trimgalore Adapter-trimmed reads and trim reports

Summary of changes to Sapporo

Two files were modified from the upstream defaults:

sapporo/run.shrun_nextflow() function

  • Added DOCKER_API_VERSION=1.44 env var
  • Added NXF_HOME / NXF_ASSETS env vars pointing to the shared run directory
  • Added a Nextflow config file (sapporo.config) with docker.envWhitelist

compose.yml

  • Bind-mounted the local sapporo/run.sh into the container at /app/sapporo/run.sh:ro

Step 5: Generate run summary from RO-Crate

After a successful run, Sapporo generates a Workflow Run RO-Crate (ro-crate-metadata.json) containing structured metadata about the execution. A Python script (summarize_crate.py) was created to parse this file and produce a human-readable Markdown summary.

Script overview

  • Location: summarize_crate.py
  • Dependencies: Python stdlib only (json, sys, datetime, collections)
  • Usage: python summarize_crate.py <path/to/ro-crate-metadata.json> > summary.md

Generated sections

  1. Header — Run name, ID, and completion status
  2. Run Overview — Workflow name/URL, language (Nextflow DSL2), engine versions (nextflow, sapporo 2.2.2), container image (nextflow/nextflow:25.10.4), start/end times, duration (15m 21s), exit code
  3. Input Parameters — 3 parameters: outdir, max_memory, max_cpus
  4. Output Summary — 955 files totalling 75.7 MB, broken down by 9 top-level directories, with a list of 14 output MIME types
  5. Alignment Statistics — Per-sample stats (total reads, mapped reads/rate, duplicate reads/rate) for 5 samples, derived from FileStats entities linked to BAM files
  6. Software Versions — All SoftwareApplication entities: nextflow, samtools (1.23), sapporo (2.2.2)

Implementation notes

  • Builds an @id → entity lookup dict for reference resolution
  • Parses actionStatus URLs to friendly text (e.g. CompletedActionStatusCompleted)
  • Computes duration from ISO 8601 timestamps
  • Formats file sizes human-readably (B, KB, MB, GB)
  • Groups output files by first path component after outputs/
  • Links FileStats back to parent BAM File entities via reverse lookup on the stats field
  • Handles missing fields gracefully
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment