Resources#

Shell and scheduler arguments#

When submitting a workflow, hpcFlow generates jobscripts that are submitted to the scheduler (if using one), or invoked directly (if not). Depending on how the scheduler is configured by your HPC administrators, you may need to add extra arguments to the shebang line of the jobscript. A shebang line usually looks something like this:

#!/bin/bash

For example, on an HPC system, you might need to execute the job submission script via a bash login shell, meaning the first line in your jobscript should look like this:

#!/bin/bash --login

To achieve this in hpcFlow, we can edit the configuration’s shells block to look like this (note this excerpt is not a valid configuration on its own!):

config:
  shells:
    bash:
      defaults:
        executable_args: [--login]

In this way, we ensure that wherever a bash shell command is constructed (such as when constructing the shebang line for a jobscript), --login will be appended to the shell executable command.

We can also modify the shell executable path like this:

config:
  shells:
    bash:
      defaults:
        executable: /bin/bash # /bin/bash is the default value
        executable_args: [--login]

Additionally, there is one other place where the shell command is constructed, which is when hpcFlow invokes a commands file to execute a run. Typically, the shell command that you set in the above configuration change is sufficient. However, if you need these two scenarios to use different shell executables or executable arguments, you can additionally modify the scheduler’s shebang_executable default value in the configuration (which overrides the shell configuration) like this:

config:
  shells:
    bash:
      defaults:
        executable_args: [--login] # applied when invoking command files
  schedulers:
    sge:
      defaults:
        shebang_executable: [/path/to/bash/executable, arg_1, arg_2] # applied to scheduler shebang only

Note that in this case (for shebang_exectuable), the shell executable path must also be specified, in addition to the shell arguments.

Random seeds#

By default, hpcFlow sets a workflow-level resource item random_seed (using Numpy’s np.random.SeedSequence.generate_state(1)). This integer is then exposed as a (string) environment variable called HPCFLOW_RUN_RANDOM_SEED. The value of this seed can be overriden in the usual way of setting resource items (e.g. at the workflow level or at the task level, for a given action scope).

If specifying a repeats value in a task (or element set), a value sequence will be auto-generated by default to assign distinct random_seed values for each individual repeat (using the path resources.any.random_seed). By default a master seed will be generated and stored in the repeats descriptor, from which the sequence’s seed values are derived. For reproducibility, the master seed can be specified manually in the repeats descriptor:

tasks:
  - schema: my_schema
    inputs:
      p1: 100
    repeats:
      number: 2
      master_seed: 1234

To prevent a random_seed sequence being generated, you can set the repeats attribute generate_seed_sequence to False. For example, in YAML, instead of:

tasks:
  - schema: my_schema
    inputs:
      p1: 100
    repeats: 2

you can write:

tasks:
  - schema: my_schema
    inputs:
      p1: 100
    repeats:
      number: 2
      generate_seed_sequence: false