A lot has happened since we last wrote about how best to learn Nextflow, over a year ago. Several new resources have been released including a new Nextflow Software Carpentries course and an excellent write-up by 23andMe.
We have collated some links below from a diverse collection of resources to help you on your journey to learn Nextflow. Nextflow is a community-driven project - if you have any suggestions, please make a pull request to this page on GitHub.
Without further ado, here is the definitive guide for learning Nextflow in 2022. These resources will support anyone in the journey from total beginner to Nextflow expert.
Before you start writing Nextflow pipelines, we recommend that you are comfortable with using the command-line and understand the basic concepts of scripting languages such as Python or Perl. Nextflow is widely used for bioinformatics applications, and scientific data analysis. The examples and guides below often focus on applications in these areas. However, Nextflow is now adopted in a number of data-intensive domains such as image analysis, machine learning, astronomy and geoscience.
We estimate that it will take at least 20 hours to complete the material. How quickly you finish will depend on your background and how deep you want to dive into the content. Most of the content is introductory but there are some more advanced dataflow and configuration concepts outlined in the workshop and pattern sections.
Nextflow is an open-source workflow framework for writing and scaling data-intensive computational pipelines. It is designed around the Linux philosophy of simple yet powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Combined with support for containerization, support for major cloud providers and on-premise architectures, Nextflow simplifies the writing and deployment of complex data pipelines on any infrastructure.
The following are some high-level motivations on why people choose to adopt Nextflow:
This informative post begins with the basic concepts of Nextflow and builds towards how Nextflow is used at 23andMe. It includes a detailed use case for how 23andMe run their imputation pipeline in the cloud, processing over 1 million individuals per day with over 10,000 CPUs in a single compute environment.
π Nextflow at 23andMe
This hands-on tutorial from Seqera Labs will guide you through implementing a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, using channels to pass data around and writing processes to perform tasks. It includes all scripts, input data and resources and is perfect for getting a taste of Nextflow.
Here youβll dive deeper into Nextflowβs most prominent features and learn how to apply them. The full workshop includes an excellent section on containers, how to build them and how to use them with Nextflow. The written materials come with examples and hands-on exercises. Optionally, you can also follow with a series of videos from a live training workshop.
The workshop includes topics on:
π Workshop & YouTube playlist.
The Nextflow Software Carpentry workshop (in active development) motivates the use of Nextflow and nf-core as development tools for building and sharing reproducible data science workflows. The intended audience are those with little programming experience, and the course provides a foundation to comfortably write and run Nextflow and nf-core workflows. Adapted from the Seqera training material above, the workshop has been updated by Software Carpentries instructors within the nf-core community to fit The Carpentries style of training. The Carpentries emphasize feedback to improve teaching materials so we would like to hear back from you about what you thought was both well-explained and what needs improvement. Pull requests to the course material are very welcome.
The workshop can be opened on Gitpod where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.
π You can find the course in The Carpentries incubator.
This on-demand webinar features Phil Ewels from SciLifeLab and nf-core, Brendan Boufler from Amazon Web Services and Evan Floden from Seqera Labs. The wide ranging dicussion covers the significance of scientific workflow, examples of Nextflow in production settings and how Nextflow can be integrated with other processes.
π Watch the webinar
This advanced section discusses recurring patterns and solutions to many common implementation requirements. Code examples are available with notes to follow along, as well as a GitHub repository.
π Nextflow Patterns & GitHub repository.
A tutorial covering the basics of using and creating nf-core pipelines. It provides an overview of the nf-core framework including:
π nf-core usage tutorials and nf-core developer tutorials
A collections of awesome Nextflow pipelines.
π Awesome Nextflow on GitHub
The following resources will help you dig deeper into Nextflow and other related projects like the nf-core community who maintain curated pipelines and a very active Slack channel. There are plenty of Nextflow tutorials and videos online, and the following list is no way exhaustive. Please let us know if we are missing anything.
The reference for the Nextflow language and runtime. These docs should be your first point of reference while developing Nextflow pipelines. The newest features are documented in edge documentation pages released every month with the latest stable releases every three months.
π Latest stable & edge documentation.
An index of documentation, deployment guides, training materials and resources for all things Nextflow and Tower.
π Seqera Labs docs
nf-core is a growing community of Nextflow users and developers. You can find curated sets of biomedical analysis pipelines written in Nextflow and built by domain experts. Each pipeline is stringently reviewed and has been implemented according to best practice guidelines. Be sure to sign up to the Slack channel.
π nf-core website and nf-core Slack
Nextflow Tower is a platform to easily monitor, launch and scale Nextflow pipelines on cloud providers and on-premise infrastructure. The documentation provides details on setting up compute environments, monitoring pipelines and launching using either the web graphic interface, CLI or API.
π Nextflow Tower and user documentation.
A quickstart for deploying a genomics analysis environment on Amazon Web Services (AWS) cloud, using Nextflow to create and orchestrate analysis workflows and AWS Batch to run the workflow processes.
Nextflow on Azure requires at minimum two Azure services, Azure Batch and Azure Storage. Follow the guides below to set up both services on Azure, and to get your storage and batch account names and keys.
π Azure Blog and GitHub repository.
A step-by-step guide to launching Nextflow Pipelines in Google Cloud.
This Nextflow Tutorial - Variant Calling Edition has been adapted from the Nextflow Software Carpentry training material and Data Carpentry: Wrangling Genomics Lesson. Learners will have the chance to learn Nextflow and nf-core basics, to convert a variant-calling bash-script into a Nextflow workflow and to modularize the pipeline using DSL2 modules and sub-workflows.
The workshop can be opened on Gitpod where you can try the exercises in an online computing environment at your own pace, with the course material in another window alongside.
π You can find the course in Nextflow Tutorial - Variant Calling Edition.
Special thanks to Mahesh Binzer-Panchal for reviewing the latest revision of this post and contributing the Software Carpentry workshop section.