Pipeline Development

Nextflow is an awesome program that allows you to write a computational pipeline by making it simpler to put together many different tasks, maybe even using different programming languages. Nextflow makes parallelism easy and works to schedule jobs so you don’t have to! Nextflow also supports docker containers and conda enviornments to streamline reproducible pipelines and can be easily adapted to run on different systems! Not convinced? Check out this intro.

Some of the genomics/data analysis pipelines I contribute to:

NemaScan and cegwas2-nf - Genome-wide association mapping and simulations
alignment-nf - A nextflow pipeline for genome sequences alignment
trim-fq-nf - Performs FASTQ trimming to remove poor quality sequences and technical sequences such as adapters
concordance-nf - Pipeline to calculate genetic relatedness between strains
linkagemapping-nf - Perform QTL mapping with linkage mapping
nil-ril-nf - Pipeline to assign genotypes to near-isogenic lines and recombinant lines (not wild isolates)
post-gatk-nf and annotation-nf - Annotate VCF and perform species-wide population genetics analyses
wi-gatk - Generate species-wide VCF using GATK haplotype-aware variant calling

💡 I also maintain the Andersen Lab dry guide which details how to execute all of the above pipelines as well as contains tips and tricks for learning to code in R, the command line, and Nextflow

Coding skills highlights (/5⭐):

General problem solving: ⭐⭐⭐⭐⭐
R: ⭐⭐⭐⭐⭐
R-Tidyverse: ⭐⭐⭐⭐⭐
Data wrangling: ⭐⭐⭐⭐⭐
Data visulization: ⭐⭐⭐⭐⭐
Git/GitHub: ⭐⭐⭐⭐
R-Shiny Web App: ⭐⭐⭐⭐
Nextflow: ⭐⭐⭐⭐
Command line scripting: ⭐⭐⭐⭐
Conda environments: ⭐⭐⭐⭐
High Performance Cluster Computing: ⭐⭐⭐⭐
Docker/singularity images: ⭐⭐⭐
Google Cloud Platform: ⭐⭐⭐
Python: ⭐⭐
Java/Javascript: ⭐