Giang Nguyen
04 May 2026
We evaluated both BSBolt and Rastair for methylation calling. BSBolt provides accurate and fast bisulfite sequencing alignments and methylation calls, outperforming Bismark, BSSeeker2, BISCUIT, and BWA-Meth based on alignment accuracy and methylation calling accuracy. Rastair achieves F1 scores exceeding 0.99 for datasets above 30x depth while processing a 30x depth file in under 30 minutes with 32 CPU cores.
References:
nf-core/methylation is widely regarded as a leading pipeline for methylation calling for short-read sequencing, backed by a large and active community. However, it does come with several limitations:
Therefore, we developed the nf-short-read-methylation specifically to support large-scale methylation analysis from bisulfite sequencing data (BSBolt) and TAPs sequencing data (Rastair).
Key Features
BiSulfite Bolt is a bisulfite sequencing analysis platform that provides accurate and fast bisulfite sequencing alignments and methylation calls. It outperforms Bismark, BSSeeker2, BISCUIT, and BWA-Meth based on alignment accuracy and methylation calling accuracy.
Key advantages:
Rastair is an integrated software toolkit for simultaneous SNP detection and methylation calling from mC→T sequencing data (such as TAPS+ and Illumina's 5-Base chemistries). It combines machine-learning-based variant detection with genotype-aware methylation estimation.
Key advantages:
According to the BiSulfite Bolt publication, BSBolt outperforms existing bisulfite alignment tools:
| Tool | Alignment Accuracy | Methylation Calling |
|---|---|---|
| BSBolt | Highest | Most accurate |
| Bismark | Lower | Good |
| BSSeeker2 | Lower | Good |
| BISCUIT | Moderate | Moderate |
| BWA-Meth | Lower | Lower |
According to the Rastair publication on NA12878 benchmark datasets:
Default configuration uses:
taps: true for Rastair)pixi run nextflow run main.nf -profile docker -resume
Create a CSV samplesheet with your input. The pipeline supports three input modes:
- FASTQ Input (Full Pipeline)
sample,lane,fastq_1,fastq_2
sample1,L001,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz
sample2,L001,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz
- BAM Input (Skip Alignment)
sample,lane,bam,bai
sample1,L001,/path/to/sample1.bam,/path/to/sample1.bam.bai
- CRAM Input (Skip Alignment + Auto-Convert)
sample,lane,cram,crai
sample1,L001,/path/to/sample1.cram,/path/to/sample1.cram.crai
CRAM Benefits:
Samplesheet Columns:
sample: Sample identifierlane: Sequencing lane (optional, defaults to L001)fastq_1, fastq_2 (gzipped FASTQ files)bam, bai (aligned BAM + index)cram, crai (compressed alignment + index)It will automatically detect the input format (FASTQ, BAM, or CRAM) to run the appropriate steps:
nextflow run main.nf \
--input samplesheet.csv \
--profile docker \
-resume
Advanced Options
# Run with Rastair (TAPS-based methylation)
nextflow run main.nf \
--input samplesheet.csv \
--taps true \
--profile docker \
-resume
# Run with custom trim parameters (Rastair)
nextflow run main.nf \
--input samplesheet.csv \
--taps true \
--trim_OT 10 \
--trim_OB 10 \
--profile docker \
-resume
# BSBolt with pre-built index
nextflow run main.nf \
--input samplesheet.csv \
--bsbolt_index /path/to/bsbolt_index \
--profile docker \
-resume
# Custom reference genome
nextflow run main.nf \
--input samplesheet.csv \
--reference /path/to/reference.fa \
--profile docker \
-resume
For test mode with sample data:
nextflow run main.nf -profile docker,test -resume
Output files will be generated in the results/ directory. File structure depends on pipeline mode:
-BSBolt Results:
results/alignment/*.bam - Aligned BAM filesresults/bsbolt/methylation_calls/*.cgmap.gz - CGmap format methylation callsresults/bsbolt/methylation_calls/*.bedGraph.gz - BedGraph format for visualizationresults/bsbolt/aggregate_matrix/*_matrix.txt - Cross-sample methylation matrixresults/deduplicated/*.bam - Deduplicated BAM files-Rastair Results:
results/rastair/mbias/ - M-bias calculation results and plotsresults/rastair/call/*.txt - Methylation callsresults/rastair/methylkit/*.methylkit.txt.gz - MethylKit format for R analysis-Common output files:
results/multiqc_report.html - Interactive quality control reportresults/pipeline_info/ - Execution timeline and trace logsFor more advanced usage and configuration options, see the Pipeline Architecture documentation.