Chapter 7 Configuration file

When a pipeline script is launched, Nextflow looks for configuration files in multiple locations. Since each configuration file can contain conflicting settings, the sources are ranked to decide which settings to are applied. All possible configuration sources are reported below, listed in order of priority:

  1. Parameters specified on the command line (–something value)

  2. Parameters provided using the -params-file option

  3. Config file specified using the -c my_config option

  4. The config file named nextflow.config in the current directory

  5. The config file named nextflow.config in the workflow project directory

  6. The config file $HOME/.nextflow/config

  7. Values defined within the pipeline script itself (e.g. main.nf)

It is easy to write the config file. It is just a text file wherein you can assign variables. These variables are readily available in the pipeline to use. The comments should be written using //

propertyOne = 'world'
anotherProp = "Hello $propertyOne"
customPath = "$PATH:/my/app/folder"
// comment!

7.1 Config scopes

Configuration settings can be organized in different scopes by dot prefixing the property names with a scope identifier or grouping the properties in the same scope using the curly brackets notation.

alpha.x  = 1
alpha.y  = 'string value..'

beta {
     p = 2
     q = 'another string ..'
}

There are many important default scopes in Nextflow. You can have a look here.

7.1.1 Scope params

The params scope allows you to define parameters that will be accessible in the pipeline script. Simply prefix the parameter names with the params scope or surround them by curly brackets.

params {

   mzMLFilesInput = '/crex/proj/uppmax2024-2-11/metabolomics/mzMLData/*.mzML' // #1
  ppm_input=10 // #2

}

In the example above, we define two parameters:

  1. mzMLFilesInput which points to the location our input files.
  2. ppm_input which is used in a bash script

We can then use these two parameters anywhere inside our pipeline. For example:

mzMLFiles = Channel.fromPath( params.mzMLFilesInput ) // #1
process featureFinder {
debug true
input:
file x

"echo processing $x with $params.ppm_input ppm" // #2

}

workflow {
    featureFinder(mzMLFiles)
}

In this example: 1. We created a file channel using our parameter params.mzMLFilesInput 2. We accessed value of ppm_input using $param before the name of the parameter.

Try the above code! Create two files, main_12.nf and nextflow_12.config. and run the above example!

Pipeline script can use an arbitrary number of parameters that can be overridden either using the command line or the Nextflow configuration file. Any script parameter can be specified on the command line prefixing the parameter name with double dash characters e.g.:

nextflow run <my script> --foo Hello

Then, the parameter can be accessed in the pipeline script using the params.foo identifier.

How would you run the above script so that ppm_input is send to the pipeline the command line?

nextflow run main_12.nf -c  nextflow_12.config --ppm_input 20

What happend with the default value of ppm_input?

7.1.2 Scope process

The process configuration scope allows you to provide the default configuration for the processes in your pipeline.

You can specify here any property described in the process directive and the executor sections. For example,

process {
executor = 'slurm' // #1
  clusterOptions = { "-A $params.project ${params.clusterOptions ?: ''}" } // #2
}

In the example above: 1. We set the executor to slurm so that all the jobs will be send to slurm 2. We set some addition parameters for example -A for setting the project ID and clusterOptions for additional arguments

7.1.3 Scope executor

The executor configuration scope allows you to set the optional executor settings. The executor settings can be defined as shown below:

executor {
    name = 'slurm'
    queueSize = 200
    clusterOptions = { "-A $params.project ${params.clusterOptions ?: ''}" }
}

When using two (or more) different executors in your pipeline, you can specify their settings separately by prefixing the executor name with the symbol $ and using it as special scope identifier. For example:

executor {
  $slurm {
    queueSize = 200
    clusterOptions = { "-A $params.project ${params.clusterOptions ?: ''}" }
  }

  $local {
      cpus = 8
      memory = '32 GB'
  }
}

7.1.4 Configuration on UPPMAX

UPPMAX uses slurm job scheduler. We can use the process scope to instruct Nextflow to use slurm

Create a file called nextflow_13.config with the following content:

process.executor = 'slurm'
process.clusterOptions = { "-A $params.project ${params.clusterOptions ?: ''}"}

Create a file called main_13.nf

// This is main.nf
mzMLFiles = Channel.fromPath( '/crex/proj/uppmax2024-2-11/metabolomics/mzMLData/*.mzML' )
process featureFinder {
debug true
input:
file x

"echo processing $x"

}

workflow {
    featureFinder(mzMLFiles)
}

You can now run the pipeline using

nextflow main_13.nf -c nextflow_13.config --project "uppmax2024-2-11" --clusterOptions "-M snowy"

The process is now running on Uppmax. OK! This is probably going to take time. Use Ctrl+c to kill Nextflow! Then cancel all of your jobs!

scancel -u $USER -M snowy

7.1.5 Containers

The docker configuration scope controls how Docker containers are executed by Nextflow

For example

process.container = 'nextflow/examples' // #1

docker {
    enabled = true // #2
}

In the example above:

  1. We first set process.container to the location of the Docker image for example in Dockerhub.
  2. We enable Docker.

All the processes will then use Docker container.

Similarly we can use singularity.

Please be aware that if you Nextflow to use a Docker image and convert to singularity image. This process will take time. In order to save you time we have already done the conversion. To use what we have built, first run the following command:

export NXF_SINGULARITY_CACHEDIR=/crex/proj/uppmax2024-2-11/metabolomics/singularity

The above command will set the location of the image for Nextflow.

We can later create a file named nextflow_14.config with the following content

singularity.enabled = true // #1
process.container="bigdatacourse.sif" // #2
  1. Instruct Nextflow to use Singularity
  2. Sets the name of the image

We can now create another file main_14.nf with the following content:

mzMLFiles = Channel.fromPath( '/crex/proj/uppmax2024-2-11/metabolomics/mzMLData/*.mzML' )
process featureFinder {
debug true
input:
file x

"echo processing $x"

}

workflow {
    featureFinder(mzMLFiles)
}

Now we can run it using?

nextflow main_14.nf -c nextflow_14.config 

7.1.5.1 Configuration singularity on Uppmax

Can you combine the Singularity and Slurm together to run containers using slurm? What config file would you create?! How would you run the Nextflow pipeline? Remember to kill Nextflow! ‘Then cancel all of your jobs after you are done!’

singularity.enabled = true
process.container="bigdatacourse.sif"
process.executor = 'slurm'
process.clusterOptions = { "-A $params.project ${params.clusterOptions ?: ''}"}
mzMLFiles = Channel.fromPath('/crex/proj/uppmax2024-2-11/metabolomics/mzMLData/*.mzML' )
process featureFinder {
debug true
input:
file x

"echo processing $x"

}

workflow {
    featureFinder(mzMLFiles)
}
nextflow main_15.nf -c nextflow_15.config --project "uppmax2024-2-11" --clusterOptions "-M snowy"

Now let us cancel all the jobs

scancel -u $USER --state=pending
scancel -u $USER -t running

7.1.6 Config profiles

Configuration files can contain the definition of one or more profiles. A profile is a set of configuration attributes that can be activated/chosen when launching a pipeline execution by using the -profile command line option.

Configuration profiles are defined by using the special scope profiles which group the attributes that belong to the same profile using a common prefix.

profiles { 

    standard { // #1
        process.executor = 'local' // #2
    }

    uppmax_singularity { // #3
process.executor = 'slurm' // #4
    process.clusterOptions = { "-A $params.project ${params.clusterOptions ?: ''}"}
singularity.enabled = true // #5
    }
}

Save the above in a file as nextflow.config in the directory where the main.nf is kept.

In this example, we have created two profiles.

  1. standard uses local (#2) executor which means that the jobs will be running on the local environment if this profile is activated
  2. uppmax_singularity uses Slurm and Singularity for running jobs on UPPMAX.

Now if we want to run the pipeline on the local environment we use:

nextflow main.nf -profile standard

How would you run it if you wanted to run on UPPMAX?

nextflow main.nf -profile uppmax_singularity --project "uppmax2024-2-11" --clusterOptions "-M snowy"