Chapter 2 Basic concepts
Nextflow is a reactive workflow framework and a programming DSL that eases the writing of data-intensive computational pipelines.
It is designed around the idea that the Linux platform is the lingua franca of data science. Linux provides many simple but powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations.
Nextflow extends this approach, adding the ability to define complex program interactions and a high-level parallel computational environment based on the dataflow programming model.
2.1 Nextflow scripting
The Nextflow scripting language is an extension of the Groovy programming language. Groovy is a powerful programming language for the Java virtual machine. The Nextflow syntax has been specialized to ease the writing of computational pipelines in a declarative manner.
Nextflow can execute any piece of Groovy code or use any library for the JVM platform.
For example,
println "Hello, World!" // #1
x = 1 // #2
println x
x = new java.util.Date() // #2
println x
x = -3.1499392 // #2
println x
x = false // #2
println x
x = "Hi" // #2
println x
myList = [1776, -1, 33, 99, 0, 928734928763] // #3
println myList
square = { it * it } // #4
println square(9)
printMapClosure = { key, value ->
println "$key = $value"
} // #4
map_example=[ "Yue" : "Wu", "Mark" : "Williams", "Sudha" : "Kumari" ] // #5
[ "Yue" : "Wu", "Mark" : "Williams", "Sudha" : "Kumari" ].each(printMapClosure) // #6
- To print something is as easy as using one of the print or println methods
- To define a variable, simply assign a value to it
- A List object can be defined by placing the list items in square brackets
- A closure is a block of code that can be passed as an argument to a function. Thus, you can define a chunk of code and then pass it around as if it were a string or an integer
- Maps are used to store associative arrays or dictionaries. They are unordered collections of heterogeneous, named data
- the method Map.each() can take a closure with two arguments, to which it binds the key and the associated value for each key-value pair in the Map
To test this, Create a file called main_1.nf
using your favorite editor (i use nano)
Copy and paste the script above. Save the file (Ctrl+o enter)
and exit (Ctrl+x)
Run the following command:
There are many other things that can be done with Groovy scripts. Please have a look at Nextflow scripting
2.2 Processes and channels
In practice a Nextflow pipeline script is made by joining together different processes
. Each process can be written in any scripting language that can be executed by the Linux platform (Bash, Perl, Ruby, Python, etc.).
Processes are executed independently and are isolated from each other, i.e. they do not share a common (writable) state. The only way they can communicate is via asynchronous FIFO queues, called channels
in Nextflow.
Any process can define one or more channels as input and output. The interaction between these processes, and ultimately the pipeline execution flow itself, is defined by workflow declaration.