Nextflow documentation is migrating

Nextflow documentation is being migrated to docs.seqera.io/nextflow. This site will remain available throughout the migration.

Process reference

This page lists the task properties, input/output methods, and directives available in process definitions.

Task properties

The following task properties are defined in the process body:

task.attempt: The current task attempt.
task.exitStatus: Available only in script: and shell: blocks; The exit code returned by the task script.; The exit code is only available after the task has been executed (e.g., the errorStrategy directive).
task.hash: Available only in exec: blocks; The task hash.
task.index: The process-level task index.
task.name: Available only in exec: blocks; The task name.
task.previousException: New in version 24.10.0.; The exception reported by the previous task attempt.; Since the exception is available after a failed task attempt, it can only be accessed when retrying a failed task execution, i.e., when task.attempt is greater than 1.
task.previousTrace: New in version 24.10.0.; The trace record associated with the previous task attempt.; Since the trace record is available after a failed task attempt, it can only be accessed when retrying a failed task execution, i.e., when task.attempt is greater than 1. See Trace file for a list of available fields.; Note

The trace fields %cpu and %mem can be accessed as pcpu and pmem, respectively.
task.process: The name of the process that spawned the task.
task.workDir: Available only in exec: blocks; The unique directory path for the task.

Note

Directive values for a task can be accessed via task.<directive>. See Using task directive values for more information.

Inputs and outputs (typed)

New in version 25.10.0.

Note

Typed processes require the nextflow.preview.types feature flag to be enabled in every script that uses them. The syntax and behavior may change in future releases.

Stage directives

The following directives can be used in the stage: section of a typed process:

env( name: String, String value ): Declares an environment variable with the specified name and value in the task environment.
stageAs( value: Path, filePattern: String ): Stages a file into the task directory under the given alias.
stageAs( value: Iterable<Path>, filePattern: String ): Stages a collection of files into the task directory under the given alias.
stdin( value: String ): Stages the given value as the standard input (i.e., stdin) to the task script.

Outputs

The following functions are available in the output: and topic: sections of a typed process:

env( name: String ) -> String

Returns the value of an environment variable from the task environment.

eval( command: String ) -> String

Returns the standard output of the specified command, which is executed in the task environment after the task script completes.

file( pattern: String, [options] ) -> Path

Returns a file from the task environment that matches the specified pattern.

Available options:

followLinks: Boolean: When true, target files are returned in place of any matching symlink (default: true).
glob: Boolean: When true, the file name is interpreted as a glob pattern (default: true).
hidden: Boolean: When true, hidden files are included in the matching output files (default: false).
includeInputs: Boolean: When true and the file name is a glob pattern, any input files matching the pattern are also included in the output (default: false).
maxDepth: Integer: Maximum number of directory levels to visit (default: no limit).
optional: Boolean: When true, the task will not fail if the given file is missing (default: false).
type: String: Type of paths returned, either file, dir or any (default: any, or file if the given file name contains a double star (**)).

files( pattern: String, [options] ) -> Set<Path>

Returns files from the task environment that match the given pattern.

Supports the same options as file() (except for optional).

stdout() -> String

Returns the standard output of the task script.

Inputs and outputs (legacy)

Inputs

val( identifier )

Declare a variable input. The received value can be any type, and it will be made available to the process body (i.e. script, shell, exec) as a variable given by identifier.

file( identifier | stageName )

Deprecated since version 19.10.0: Use path instead.

Declare a file input. The received value can be any type, and it will be staged into the task directory. If the received value is not a file or collection of files, it is implicitly converted to a string and written to a file.

The argument can be an identifier or string. If an identifier, the received value will be made available to the process body as a variable. If a string, the received value will be staged into the task directory under the given alias.

path( identifier | stageName )

Declare a file input. The received value should be a file or collection of files and will be staged into the task directory.

Tip

See Multiple input files for more information about accepting collections of files.

The argument can be an identifier or string. If an identifier, the received value will be made available to the process body as a variable. If a string, the received value will be staged into the task directory under the given alias.

Available options:

arity: New in version 23.10.0.; Specify the number of expected files. Can be a number, e.g. '1', or a range, e.g. '1..*'. If a task receives an invalid number of files for this path input, it will fail.
name: Specify how the file should be named in the task work directory. Can be a name or a pattern.
stageAs: Alias of name.

env( name )

Declare an environment variable input. The received value should be a string, and it will be exported to the task environment as an environment variable given by name.

stdin

Declare a stdin input. The received value should be a string, and it will be provided as the standard input (i.e. stdin) to the task script. It should be declared only once for a process.

tuple( arg1, arg2, ... )

Declare a tuple input. Each argument should be an input declaration such as val, path, env, or stdin.

The received value should be a tuple with the same number of elements as the tuple declaration, and each received element should be compatible with the corresponding tuple argument. Each tuple element is treated the same way as if it were a standalone input.

Outputs

val( value )

Declare a variable output. The argument can be any value, and it can reference any output variables defined in the process body (i.e. variables declared without the def keyword).

file( pattern )

Deprecated since version 19.10.0: Use path instead.

Declare a file output. It receives the output files from the task environment that match the given pattern.

Multiple patterns can be specified using the colon separator (:). The union of all files matched by each pattern will be collected.

path( pattern, [options] )

Declare a file output. It receives the output files from the task environment that match the given pattern.

Available options:

arity: New in version 23.10.0.; Specify the number of expected files. Can be a number or a range. If a task produces an invalid number of files for this path output, it will fail.; If the arity is 1, a single file will be emitted. Otherwise, a list will always be emitted, even if only one file is produced.; Warning

If the arity is not specified, a single file or list will be emitted based on whether a single file or multiple files are produced at runtime, resulting potentially in an output channel with a mixture of files and file collections.
followLinks: When true, target files are returned in place of any matching symlink (default: true).
glob: When true, the specified name is interpreted as a glob pattern (default: true).
hidden: When true, hidden files are included in the matching output files (default: false).
includeInputs: When true and the output path is a glob pattern, any input files matching the pattern are also included in the output (default: false).
maxDepth: Maximum number of directory levels to visit (default: no limit).
type: Type of paths returned, either file, dir or any (default: any, or file if the specified file name pattern contains a double star (**)).

env( name )

Declare an environment variable output. It receives the value of the environment variable (given by name) from the task environment.

Changed in version 24.04.0: Prior to this version, if the environment variable contained multiple lines of output, the output would be compressed to a single line by converting newlines to spaces.

stdout

Declare a stdout output. It receives the standard output of the task script.

eval( command )

New in version 24.04.0.

Declare an eval output. It receives the standard output of the given command, which is executed in the task environment after the task script.

If the command fails, the task will also fail.

tuple( arg1, arg2, ... )

Declare a tuple output. Each argument should be an output declaration such as val, path, env, stdin, or eval. Each tuple element is treated the same way as if it were a standalone output.

Generic options

The following options are available for all process outputs:

emit: <name>: Defines the name of the output channel.
optional: true | false: When true, the task will not fail if the specified output is missing (default: false).
topic: <name>: New in version 25.04.0.; Send the output to a topic channel with the given name.

Directives

accelerator

The accelerator directive defines the number of hardware accelerators (e.g. GPUs) required by each task execution. For example:

process hello {
    accelerator 4, type: 'nvidia-tesla-k80'

    script:
    """
    your_gpu_enabled --command --line
    """
}

The above example requests 4 GPUs of type nvidia-tesla-k80 for each task.

Note

This directive is only used by certain executors. Refer to the Executors page to see which executors support this directive.

Note

Additional options may be required to fully enable the use of accelerators. When using containers with GPUs, you must pass the GPU drivers through to the container. For Docker, this requires the option --gpus all in the docker run command. For Apptainer/Singularity, this requires the option --nv. The specific implementation details depend on the accelerator and container type being used.

The following options are available:

request: Integer

The number of requested accelerators.

Specifying this directive with a number (e.g., accelerator 4) is equivalent to the request option (e.g., accelerator request: 4).

type: String

The accelerator type.

The meaning of this option depends on the target execution platform. See the platform-specific documentation for more information about the available accelerators:

This option is not supported for AWS Batch. You can control the accelerator type indirectly through the allowed instance types in your Compute Environment. See the AWS Batch FAQs for more information.

afterScript

The afterScript directive executes a custom (Bash) snippet immediately after the main process has run. This may be useful to clean up your staging area.

When combined with the container directive, the afterScript is executed outside the specified container. In other words, the afterScript is always executed in the host environment.

arch

The arch directive defines the CPU architecture to build the software in use by the process’ task. For example:

process blast {
    spack 'blast-plus@2.13.0'
    arch 'linux/x86_64', target: 'cascadelake'

    script:
    """
    blastp -query input_sequence -num_threads ${task.cpus}
    """
}

The example above declares that the CPU generic architecture is linux/x86_64 (X86 64 bit), and more specifically that the microarchitecture is cascadelake (a specific generation of Intel CPUs).

This directive is currently used by the following Nextflow functionalities:

by the spack directive, to build microarchitecture-optimized applications;
by the Wave containers service, to build containers for one of the generic families of CPU architectures (see below);
by the spack strategy within Wave containers, to optimize the container builds for specific CPU microarchitectures.

Allowed values for the arch directive are as follows, grouped by equivalent family (choices available for the sake of compatibility):

X86 64 bit: linux/x86_64, x86_64, linux/amd64, amd64
ARM 64 bit: linux/aarch64, aarch64, linux/arm64, arm64, linux/arm64/v8
ARM 64 bit, older generation: linux/arm64/v7

Examples of values for the architecture target option are cascadelake, icelake, zen2 and zen3. See the Spack documentation for the full and up-to-date list of meaningful targets.

array

New in version 24.04.0.

The array directive submits tasks as job arrays for executors that support it.

A job array is a collection of jobs with the same resource requirements and the same script (parameterized by an index). Job arrays incur significantly less scheduling overhead compared to individual jobs, and as a result they are preferred by HPC schedulers where possible.

The directive should be specified with a given array size, along with an executor that supports job arrays. For example:

process hello {
    executor 'slurm'
    array 100

    script:
    """
    your_command --here
    """
}

Nextflow currently supports job arrays for the following executors:

A process using job arrays collects tasks and submits each batch as a job array when it is ready. Any “leftover” tasks are submitted as a partial job array.

Once a job array is submitted, each “child” task is executed as an independent job. Any tasks that fail (and can be retried) are retried without interfering with the tasks that succeeded. Retried tasks are submitted individually rather than through a job array, in order to allow for the use of dynamic resources.

The following directives must be uniform across all tasks in a process that uses job arrays, because these directives are specified once for the entire job array:

accelerator
clusterOptions
cpus
disk
machineType
memory
queue
resourceLabels
resourceLimits
time

For cloud-based executors like AWS Batch, or when using Fusion with any executor, the following additional directives must be uniform:

container
containerOptions

When using Wave, the following additional directives must be uniform:

conda

beforeScript

The beforeScript directive executes a custom (Bash) snippet before the main process script is run. This may be useful to initialize the underlying cluster environment or for other custom initialization.

For example:

process hello {
    beforeScript 'source /cluster/bin/setup'

    script:
    """
    echo 'hello'
    """
}

When the process is containerized (using the container directive), the beforeScript is executed in the container only if the executor is container-native (e.g. cloud batch executors, Kubernetes). Otherwise, the beforeScript is executed outside the container.

cache

The cache directive controls whether and how task executions are cached.

By default, cached task executions are re-used when the pipeline is launched with the resume option. The cache directive can be used to disable caching for a specific process:

process hello {
    cache false

    // ...
}

See Caching and resuming for more information.

The following options are available:

false: Disable caching.
true (default): Enable caching. Input file metadata (name, size, last updated timestamp) are included in the cache keys.
'deep': Enable caching. Input file content is included in the cache keys.
'lenient': Enable caching. Minimal input file metadata (name and size only) are included in the cache keys.; This strategy provides a workaround for incorrect caching invalidation observed on shared file systems due to inconsistent file timestamps.

clusterOptions

The clusterOptions directive specifies additional submission options for grid executors. You can use it to specify options for your cluster that are not supported directly by other process directives.

The cluster options can be a string:

process hello {
    clusterOptions '-x 1 -y 2'

    // ...
}

Changed in version 24.04.0: Prior to this version, grid executors that require each option to be on a separate line in the job script would attempt to split multiple options using a variety of different conventions. Multiple options can now be specified more clearly using a string list as shown below.

The cluster options can also be a string list:

process hello {
    clusterOptions '-x 1', '-y 2', '--flag'

    // ...
}

Grid executors that require one option per line will write each option to a separate line, while grid executors that allow multiple options per line will write all options to a single line, the same as with a string. This form is useful to control how the options are split across lines when it is required by the scheduler.

Note

This directive is only used by grid executors. Refer to the Executors page to see which executors support this directive.

Warning

While you can use the clusterOptions directive to specify options that are supported as process directives (queue, memory, time, etc), you should not use both at the same time, as it will cause undefined behavior. Most HPC schedulers will either fail or simply ignore one or the other.

conda

The conda directive defines the set of Conda packages required by each task. For example:

process hello {
    conda 'bwa=0.7.15'

    script:
    """
    your_command --here
    """
}

Nextflow automatically creates an environment for each unique set of Conda packages.

The name of the desired channel for a specific package can be specified using the standard Conda notation, e.g. bioconda::bwa=0.7.15. Multiple packages can be specified separating them with a blank space, e.g. bwa=0.7.15 fastqc=0.11.5.

The conda directive can also accept a Conda environment file path or the path of an existing Conda environment. See Conda environments for more information.

container

The container directive defines the container required by each task. For example:

process hello_docker {
    container 'busybox:latest'

    script:
    """
    your_command --here
    """
}

The corresponding container runtime (e.g. Docker, Singularity) should be running on the compute nodes where tasks are executed. See Containers for the container runtimes supported by Nextflow.

Note

This directive is ignored by native processes (i.e. exec processes).

containerOptions

The containerOptions directive specifies additional container options for the underlying container runtime (e.g. Docker, Singularity). For example:

process hello_docker {
    container 'busybox:latest'
    containerOptions '--volume /data/db:/db'

    output:
    path 'output.txt'

    script:
    """
    your_command --data /db > output.txt
    """
}

The above example provides a custom volume mount for a specific process.

Warning

This directive is not supported by the Kubernetes executor.

cpus

The cpus directive defines the number of CPUs required by each task execution. For example:

process blast {
    cpus 8

    script:
    """
    blastp -query input_sequence -num_threads ${task.cpus}
    """
}

This directive is required for tasks that execute multi-process or multi-threaded commands/tools and it is meant to reserve enough CPUs when a pipeline task is executed through a cluster resource manager.

See also: disk, memory, time, queue, Dynamic task resources

debug

The debug directive prints the standard output of each task to the pipeline standard output.

For example:

process hello {
    debug true

    script:
    """
    echo Hello
    """
}

Prints:

Hello

Removing the debug directive or setting it to false in the above example will cause Hello to not be printed.

disk

The disk directive defines the amount of disk storage required by each task execution. For example:

process hello {
    disk 2.GB

    script:
    """
    your_command --here
    """
}

The following suffixes can be used to specify disk values:

B: Bytes
KB: Kilobytes
MB: Megabytes
GB: Gigabytes
TB: Terabytes

See MemoryUnit for more information.

Note

The disk directive is only used by certain executors. Refer to the Executors page to see which executors support this directive.

See also: cpus, memory, time, queue, Dynamic task resources

errorStrategy

The errorStrategy directive defines how to handle task failures.

A task failure occurs when the executed script returns a non-zero exit code. By default, the pipeline run is aborted.

The following error strategies are available:

'terminate' (default): When a task fails, terminate the pipeline immediately and report an error. Pending and running jobs are killed.
'finish': When a task fails, wait for submitted and running tasks to finish and then terminate the pipeline, reporting an error.
'ignore': When a task fails, ignore it and continue the pipeline execution. If the workflow.failOnIgnore config option is set to true, the pipeline will report an error (i.e. return a non-zero exit code) upon completion. Otherwise, the pipeline will complete successfully.; See the workflow namespace for more information.
'retry': When a task fails, retry it.

When setting the errorStrategy directive to ignore the process doesn’t stop on an error condition, it just reports a message notifying you of the error event.

For example:

process hello {
    errorStrategy 'ignore'

    // ...
}

In this case, the workflow will complete successfully and return an exit status of 0. However, if you set workflow.failOnIgnore = true in your Nextflow configuration, the workflow will return a non-zero exit status and report the failed tasks as an error.

The retry error strategy retries failed tasks. For example:

process hello {
    errorStrategy 'retry'

    // ...
}

The number of times a failing process is re-executed is defined by the maxRetries and maxErrors directives.

Tip

More complex strategies depending on the task exit status or other parametric values can be defined using a dynamic errorStrategy. See Dynamic directives for details.

executor

The executor directive defines the underlying system where tasks are executed. For example:

process hello {
    executor 'slurm'

    // ...
}

Commonly used executors include:

awsbatch: AWS Batch
azurebatch: Azure Batch
google-batch: Google Cloud Batch
k8s: Kubernetes cluster
local: The local machine where the pipeline is launched
lsf: Platform LSF job scheduler
slurm: SLURM workload manager

Each executor supports additional configuration options under the executor config scope. See Executors for more information.

ext

The ext is a generic directive for user-defined properties. For example:

process star {
    container "biocontainers/star:${task.ext.version}"

    input:
    path genome
    tuple val(sampleId), path(reads)

    script:
    """
    STAR --genomeDir $genome --readFilesIn $reads ${task.ext.args ?: ''}
    """
}

In the above example, the process container version is controlled by ext.version, and the script supports additional command line arguments through ext.args.

The ext directive can be set in the process definition:

process hello {
    ext version: '2.5.3', args: '--alpha --beta'

    // ...
}

Or in the Nextflow configuration:

process.ext.version = '2.5.3'
process.ext.args = '--alpha --beta'

fair

New in version 23.04.0.

The fair directive, when enabled, guarantees that process outputs will be emitted in the order in which they were received. For example:

process hello {
    fair true

    input:
    val x

    output:
    tuple val(task.index), val(x)

    script:
    """
    sleep \$((RANDOM % 3))
    """
}

workflow {
    channel.of('A','B','C','D') | hello | view
}

The above example produces:

[1, A]
[2, B]
[3, C]
[4, D]

label

The label directive attaches a custom label to the process. For example:

process hello {
    label 'big_mem'

    script:
    """
    your_command --here
    """
}

A label may contain alphanumeric characters or _. It must start and end with an alphabetic character.

The same label can be applied to multiple processes. Multiple labels can be applied to the same process by using the label directive multiple times.

Process labels are used to apply shared process configuration via withLabel selectors. They are not recorded in execution logs, trace reports, or lineage metadata. See Process selectors for more information.

Note

To tag individual task executions for logging and debugging, use tag. To tag cloud computing resources for cost tracking, use resourceLabels. To attach metadata labels to output files for lineage tracking, use the label output directive in the output block.

machineType

The machineType can be used to specify a predefined Google Compute Platform machine type when running using the Google Batch, or when using auto-pools with Azure Batch.

For example:

process hello {
    machineType 'n1-highmem-8'

    script:
    """
    your_command --here
    """
}

maxErrors

The maxErrors directive defines the maximum number of task failures allowed for a process when using the retry error strategy. For example:

process hello {
    errorStrategy 'retry'
    maxErrors 5

    script:
    """
    echo 'do this as that .. '
    """
}

In the above example, the run will fail if the hello process accrues more than 5 failures across all of its task executions.

By default, there is no limit. However, the run can still fail if an individual task exceeds the number of retries allowed by the maxRetries directive.

maxForks

The maxForks directive defines the maximum number of concurrent task executions for a process. For example:

process hello {
    maxForks 1

    script:
    """
    your_command --here
    """
}

The above example forces the hello process to execute tasks sequentially.

By default, there is no limit. However, the number of concurrent tasks can still be limited globally by the number of CPUs (for local tasks) and the executor.queueSize config option.

maxRetries

The maxRetries directive defines the maximum number of times a task can be retried when using the retry error strategy. For example:

process hello {
    errorStrategy 'retry'
    maxRetries 3

    script:
    """
    echo 'do this as that .. '
    """
}

In the above example, the run will fail if any task executed by hello fails more than three times.

By default, only one retry per task is allowed. However, the run can still fail if the total number of failures for the process exceeds the number allowed by the maxErrors directive.

maxSubmitAwait

The maxSubmitAwait directive defines how long a task can remain in submission queue without being executed. Tasks that exceed this duration in the queue will fail.

It can be used with the retry error strategy to re-submit tasks to a different queue or with different resource requirements. For example:

process hello {
    errorStrategy 'retry'
    maxSubmitAwait 10.m
    maxRetries 3
    queue "${task.submitAttempt==1 ? 'spot-compute' : 'on-demand-compute'}"

    script:
    """
    your_command --here
    """
}

In the above example, each task is submitted to the spot-compute queue on the first attempt (task.submitAttempt==1). If a task remains in the queue for more than 10 minutes, it fails and is re-submitted to the on-demand-compute queue.

memory

The memory directive defines how much memory is required by each task execution. For example:

process hello {
    memory 2.GB

    script:
    """
    your_command --here
    """
}

The following suffixes can be used to specify memory values:

B: Bytes
KB: Kilobytes
MB: Megabytes
GB: Gigabytes
TB: Terabytes

See MemoryUnit for more information.

See also: cpus, disk, time, queue, Dynamic task resources

module

The module directive defines the set of Environment Modules required by each task, if supported by your compute environment. For example:

process blast {
    module 'ncbi-blast/2.2.27'

    script:
    """
    blastp -query <etc..>
    """
}

Multiple modules can be specified using the : separator:

process blast {
    module 'ncbi-blast/2.2.27:t_coffee/10.0:clustalw/2.1'

    script:
    """
    blastp -query <etc..>
  """
}

penv

The penv directive defines the parallel environment to use when submitting tasks to the SGE resource manager. For example:

process blast {
    cpus 4
    penv 'smp'
    executor 'sge'

    script:
    """
    blastp -query input_sequence -num_threads ${task.cpus}
    """
}

Refer to your cluster documentation or your system administrator to determine whether this feature is supported in your environment.

pod

The pod directive defines pod-specific settings, such as environment variables, secrets, and config maps, when using the Kubernetes executor.

For example:

process echo {
    pod env: 'MESSAGE', value: 'hello world'

    script:
    """
    echo $MESSAGE
    """
}

The above snippet defines an environment variable named MESSAGE whose value is 'hello world'.

Pod settings can be specified in Nextflow configuration:

// single setting
process.pod = [env: 'MESSAGE', value: 'hello world']

// multiple settings
process.pod = [
    [env: 'MESSAGE', value: 'hello world'],
    [secret: 'my-secret/key1', mountPath: '/etc/file.txt']
]

The following options are available:

affinity: <config>

Specifies the pod affinity with the given configuration.

annotation: '<name>', value: '<value>'

Can be specified multiple times

Defines a pod annotation with the given name and value.

automountServiceAccountToken: true | false

Specifies whether to automount service account token into the pod (default: true).

config: '<configMap>/<key>', mountPath: '</absolute/path>'

Can be specified multiple times

Mounts a ConfigMap with name and optional key to the given path. If the key is omitted, the path is interpreted as a directory and all entries in the ConfigMap are exposed in that path.

csi: '<config>', mountPath: '</absolute/path>'

Can be specified multiple times

Mounts a CSI ephemeral volume with the given configuration to the given path.

emptyDir: <config>, mountPath: '</absolute/path>'

Can be specified multiple times

Mounts an emptyDir with the given configuration to the given path.

env: '<name>', config: '<configMap>/<key>'

Can be specified multiple times

Defines an environment variable whose value is defined by the given ConfigMap and key.

env: '<name>', fieldPath: '<fieldPath>'

Can be specified multiple times

Defines an environment variable whose value is defined by the given field path value.

For example, the following pod option:

process.pod = [env: 'MY_NODE_NAME', fieldPath: 'spec.nodeName']

Maps to the following pod spec:

env:
  - name: MY_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

env: '<name>', secret: '<secret>/<key>'

Can be specified multiple times

Defines an environment variable whose value is defined by the given Secret and key.

env: '<name>', value: '<value>'

Can be specified multiple times

Defines an environment variable with the given name and value.

hostPath: '/host/absolute/path', mountPath: '</pod/absolute/path>'

New in version 23.10.0.

Can be specified multiple times

Allows creating hostPath volume and access it with the specified mountPath in the pod.

imagePullPolicy: 'IfNotPresent' | 'Always' | 'Never'

Specifies the image pull policy used by the pod to pull the container image.

imagePullSecret: '<name>'

Specifies the image pull secret used to access a private container image registry.

label: '<name>', value: '<value>'

Can be specified multiple times

Defines a pod label with the given name and value.

nodeSelector: <config>

Specifies the node selector with the given configuration.

The configuration can be a map or a string:

// map
process.pod = [nodeSelector: [disktype: 'ssd', cpu: 'intel']]

// string
process.pod = [nodeSelector: 'disktype=ssd,cpu=intel']

priorityClassName: '<name>'

Specifies the priority class name for pods.

privileged: true | false

Specifies whether the pod should run as a privileged container (default: false).

runAsUser: '<uid>'

Specifies the user ID with which to run the container. Shortcut for the securityContext option.

runtimeClassName: '<name>'

Specifies the runtime class.

schedulerName: '<name>'

Specifies which scheduler is used to schedule the container.

secret: '<secret>/<key>', mountPath: '</absolute/path>'

Can be specified multiple times

Mounts a Secret with name and optional key to the given path. If the key is omitted, the path is interpreted as a directory and all entries in the Secret are exposed in that path.

securityContext: <config>

Specifies the pod security context with the given configuration.

toleration: <config>

Can be specified multiple times

Specifies the pod toleration with the given configuration.

The configuration should be a map corresponding to a single toleration rule. For example, the following pod options:

process.pod = [
    [toleration: [key: 'key1', operator: 'Equal', value: 'value1', effect: 'NoSchedule']],
    [toleration: [key: 'key1', operator: 'Exists', effect: 'NoSchedule']],
]

Maps to the following pod spec:

tolerations:
  - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"
  - key: "key1"
    operator: "Exists"
    effect: "NoSchedule"

ttlSecondsAfterFinished

New in version 24.04.0.

Specifies the TTL mechanism for finished jobs in seconds. Applies to both successful and failed jobs.

volumeClaim: '<name>', mountPath: '</absolute/path>' [, subPath: '<path>', readOnly: true | false]

Can be specified multiple times

Mounts a Persistent volume claim with the given name to the given path.

The subPath option can be used to mount a sub-directory of the volume instead of its root.

The readOnly option can be used to mount the volume as read-only (default: false)

publishDir

Note

Workflow outputs can be used instead of publishDir. See Migrating to workflow outputs to learn how to migrate existing code.

The publishDir directive publishes matching process output files to a target directory. For example:

process hello {
    publishDir '/data/chunks'

    output:
    path 'chunk_*'

    script:
    """
    printf 'Hola' | split -b 1 - chunk_
    """
}

The above example publishes the chunk_* output files into the /data/chunks directory.

Only files that match the declaration in the output block are published, not all the outputs of the process.

The publishDir directive can be specified more than once in order to publish output files to different target directories based on different rules.

By default, files are published via symbolic link from the task directory to the target directory. Use the mode option to control this behavior:

process hello {
    publishDir '/data/chunks', mode: 'copy', overwrite: false

    output:
    path 'chunk_*'

    script:
    """
    printf 'Hola' | split -b 1 - chunk_
    """
}

Warning

Output files are published asynchronously after the task execution, so they may not be immediately available in the publish directory during the pipeline run. Downstream processes should access output files through the declared process outputs, not the publish directory.

Available options:

contentType

New in version 22.10.0.

Experimental: currently only supported for S3.

Allow specifying the media content type of the published file a.k.a. MIME type. If set to true, the content type is inferred from the file extension (default: false).

enabled

Enable or disable the publish rule depending on the boolean value specified (default: true).

failOnError

Changed in version 24.04.0: The default value was changed from false to true

When true abort the execution if some file can’t be published to the specified target directory or bucket for any cause (default: true)

mode

The file publishing method. Can be one of the following values:

'copy': Copies the output files into the publish directory.
'copyNoFollow': Copies the output files into the publish directory without following symlinks ie. copies the links themselves.
'link': Creates a hard link in the publish directory for each output file.
'move': Moves the output files into the publish directory. Note: this is only supposed to be used for a terminal process i.e. a process whose output is not consumed by any other downstream process.
'rellink': Creates a relative symbolic link in the publish directory for each output file.
'symlink': Creates an absolute symbolic link in the publish directory for each output file (default).

overwrite

When true any existing file in the target directory will be overridden (default: true during normal pipeline execution and false when pipeline execution is resumed).

path

Specifies the directory where files need to be published. Note: the syntax publishDir '/some/dir' is a shortcut for publishDir path: '/some/dir'.

pattern

Specifies a [glob][glob] file pattern that selects which files to publish from the overall set of output files.

saveAs

A closure which, given the name of the file being published, returns the actual file name or a full path where the file is required to be stored. This can be used to rename or change the destination directory of the published files dynamically by using a custom strategy. Return the value null from the closure to not publish a file. This is useful when the process has multiple output files, but you want to publish only some of them.

storageClass

New in version 23.04.0.

Experimental: currently only supported for S3.

Allow specifying the storage class to be used for the published file.

tags

Experimental: currently only supported for S3.

Allow the association of arbitrary tags with the published file e.g. tags: [MESSAGE: 'Hello world'].

queue

The queue directive defines the queue to which tasks should be submitted, for executors that support queues. For example:

process hello {
    queue 'long'
    executor 'slurm'

    script:
    """
    your_command --here
    """
}

Some executors can accept multiple queue names as a comma-separated string:

queue 'short,long,cn-el6'

However, this is not generally supported by cloud executors such as AWS Batch, Azure Batch, and Google Batch.

See Executors to see which executors support this directive.

resourceLabels

The resourceLabels directive attaches custom name-value pairs to task executions, for executors that support it. For example:

process hello {
    resourceLabels region: 'some-region', user: 'some-username'

    script:
    """
    your_command --here
    """
}

Resource labels are attached to underlying resources such as cloud VMs, and are intended for operational purposes such as cost tracking. They are not recorded in lineage metadata.

Resource labels are currently supported by the following executors:

Note

The limits and the syntax of the corresponding executor should be taken into consideration when using resource labels.

New in version 23.10.0: Resource labels in Azure are added to auto-pools, rather than jobs, in order to facilitate cost analysis. A new pool will be created for each new set of resource labels. Therefore, it is recommended to also set azure.batch.deletePoolsOnCompletion = true when using process-specific resource labels.

See also: label (for shared process configuration), tag (for per-task identification)

resourceLimits

New in version 24.04.0.

The resourceLimits directive defines environment-specific limits for task resource requests.

Resource limits can be specified in a process:

process hello {
    resourceLimits cpus: 24, memory: 768.GB, time: 72.h

    script:
    """
    your_command --here
    """
}

Or in the Nextflow configuration:

process.resourceLimits = [
    cpus: 24,
    memory: 768.GB,
    time: 72.h
]

Resource limits can be defined for the following directives:

cpus
disk
memory
time

When a task resource request exceeds the corresponding limit, the task resources are automatically reduced to comply with these limits before the job is submitted.

Resource limits are a useful way to prevent tasks with dynamic resources from requesting more resources than can be provided by an executor (e.g. a task requests 32 cores but the largest node in the cluster has 24).

scratch

The scratch directive executes each task in a temporary directory that is local to the compute node.

This is useful when executing tasks on an executor with a shared filesystem, because it decreases the network overhead of reading and writing files. Only the files declared as process outputs are copied to the pipeline work directory.

For example:

process hello {
    scratch true

    output:
    path 'data_out'

    script:
    """
    your_command --here
    """
}

It can also be specified in the Nextflow configuration:

process.scratch = true

By default, the scratch directive uses the $TMPDIR environment variable in the underlying node as the base scratch directory. If $TMPDIR is not defined, then it creates a scratch directory using the mktemp command.

Each task creates a subdirectory within the base scratch directory and automatically deletes it upon completion.

Note

Cloud-based executors enable scratch by default since the pipeline work directory resides in object storage.

The following values are supported:

false: Do not use a scratch directory.
true: Create a scratch directory in the directory defined by the $TMPDIR environment variable, or $(mktemp /tmp) if $TMPDIR is not set.
'$YOUR_VAR': Create a scratch directory in the directory defined by the given environment variable, or $(mktemp /tmp) if that variable is not set. The value must use single quotes, otherwise the environment variable will be evaluated in the pipeline script context.
'/my/tmp/path': Create a scratch directory in the specified directory.
'ram-disk': Create a scratch directory in the RAM disk /dev/shm/.

secret

The secret directive allows a process to access secrets.

For example:

process hello_secret {
    secret 'MY_ACCESS_KEY'
    secret 'MY_SECRET_KEY'

    script:
    """
    your_command --access \$MY_ACCESS_KEY --secret \$MY_SECRET_KEY
    """
}

Each secret is provided to the task as an environment variable.

See Secrets for more information.

Note

Secrets can only be used with the local or grid executors (e.g., Slurm or Grid Engine). Secrets can be used with AWS Batch and Google Batch when launched from Seqera Platform.

shell

The shell directive defines a custom shell command for process scripts. By default, script blocks are executed with /bin/bash -ue.

process hello {
    shell '/bin/bash', '-euo', 'pipefail'

    script:
    """
    your_command --here
    """
}

It can also be specified in the Nextflow configuration:

process.shell = ['/bin/bash', '-euo', 'pipefail']

spack

The spack directive defines the set of Spack packages required by each task. For example:

process hello {
    spack 'bwa@0.7.15'

    script:
    """
    your_command --here
    """
}

Nextflow automatically creates a Spack environment for each unique set of packages.

Multiple packages can be specified separating them with a blank space, e.g. bwa@0.7.15 fastqc@0.11.5.

The spack directive also accepts a Spack environment file path or the path of an existing Spack environment. See Spack environments for more information.

stageInMode

The stageInMode directive defines how input files are staged into the task work directory.

The following modes are supported:

'copy': Input files are staged in the task work directory by creating a copy.
'link': Input files are staged in the task work directory by creating a hard link for each of them.
'rellink': Input files are staged in the task work directory by creating a symbolic link with a relative path for each of them.
'symlink': Input files are staged in the task work directory by creating a symbolic link with an absolute path for each of them (default).

stageOutMode

The stageOutMode directive defines how output files are staged out from the scratch directory to the task work directory.

The following modes are supported:

'copy': Output files are copied from the scratch directory to the work directory.
'fcp': New in version 23.04.0.; Output files are copied from the scratch directory to the work directory by using the fcp utility (note: it must be available in the task environment).
'move': Output files are moved from the scratch directory to the work directory.
'rclone': New in version 23.04.0.; Output files are copied from the scratch directory to the work directory by using the rclone utility (note: it must be available in the task environment).
'rsync': Output files are copied from the scratch directory to the work directory by using the rsync utility.

storeDir

The storeDir directive stores task outputs in a permanent store directory instead of the work directory.

On subsequent runs, each task is executed only if the declared output files do not exist in the store directory. When the files are present, the task is skipped and these files are used as the task outputs.

The following example shows how to use the storeDir directive to create a directory containing a BLAST database for each species specified by an input parameter:

process make_blast_db {
    storeDir '/db/genomes'

    input:
    path species

    output:
    path "${dbName}.*"

    script:
    dbName = species.baseName
    """
    makeblastdb -dbtype nucl -in ${species} -out ${dbName}
    """
}

Warning

If a process uses storeDir and all of its outputs are optional, the process will always be skipped, even if the store directory is empty. This issue can be avoided by specifying at least one required file output.

Warning

The storeDir directive should not be used to publish outputs. Use the publishDir directive or workflow outputs instead.

tag

The tag directive defines a custom identifier for each task execution. For example:

process hello {
    tag "$code"

    input:
    val code

    script:
    """
    echo $code
    """
}

workflow {
    ch_codes = channel.of('alpha', 'gamma', 'omega')
    hello(ch_codes)
}

The above example logs each task with its corresponding tag:

[6e/28919b] Submitted process > hello (alpha)
[d2/1c6175] Submitted process > hello (gamma)
[1c/3ef220] Submitted process > hello (omega)

Tags are a useful way to track related tasks in a pipeline run. Tasks can be identified by tag in the execution log and the trace report.

Note

The name of a task in both reports and lineage is defined as <process> (<tag>).

Note

The tag directive is not related to the label directive. Process labels are only used for shared process configuration, not for tracking.

time

The time directive defines the maximum runtime for each task. For example:

process hello {
    time 1.h

    script:
    """
    your_command --here
    """
}

The following suffixes can be used to specify duration values:

ms: milliseconds
s: seconds
m: minutes
h: hours
d: days

See Duration for more information.

Note

This directive is only used by certain executors. Refer to the Executors page to see which executors support this directive.

See also: cpus, disk, memory, queue, Dynamic task resources