Resources#
Shell and scheduler arguments#
When submitting a workflow, hpcFlow generates jobscripts that are submitted to the scheduler (if using one), or invoked directly (if not). Depending on how the scheduler is configured by your HPC administrators, you may need to add extra arguments to the shebang line of the jobscript. A shebang line usually looks something like this:
#!/bin/bash
For example, on an HPC system, you might need to execute the job submission script via a bash login shell, meaning the first line in your jobscript should look like this:
#!/bin/bash --login
To achieve this in hpcFlow, we can edit the configuration’s shells block to look like this (note this excerpt is not a valid configuration on its own!):
config:
shells:
bash:
defaults:
executable_args: [--login]
In this way, we ensure that wherever a bash shell command is constructed (such as
when constructing the shebang line for a jobscript), --login will be appended to
the shell executable command.
We can also modify the shell executable path like this:
config:
shells:
bash:
defaults:
executable: /bin/bash # /bin/bash is the default value
executable_args: [--login]
Additionally, there is one other place where the shell command is constructed, which
is when hpcFlow invokes a commands file to execute a run. Typically, the shell
command that you set in the above configuration change is sufficient. However, if you
need these two scenarios to use different shell executables or executable arguments,
you can additionally modify the scheduler’s shebang_executable default value in
the configuration (which overrides the shell configuration) like this:
config:
shells:
bash:
defaults:
executable_args: [--login] # applied when invoking command files
schedulers:
sge:
defaults:
shebang_executable: [/path/to/bash/executable, arg_1, arg_2] # applied to scheduler shebang only
Note that in this case (for shebang_exectuable), the shell executable path must
also be specified, in addition to the shell arguments.
Random seeds#
By default, hpcFlow sets a workflow-level resource item random_seed (using
Numpy’s np.random.SeedSequence.generate_state(1)). This integer is then exposed
as a (string) environment variable called HPCFLOW_RUN_RANDOM_SEED. The
value of this seed can be overriden in the usual way of setting resource items
(e.g. at the workflow level or at the task level, for a given action scope).
If specifying a repeats value in a task (or element set), a value sequence will be
auto-generated by default to assign distinct random_seed values for each individual
repeat (using the path resources.any.random_seed). By default a master seed will be
generated and stored in the repeats descriptor, from which the sequence’s seed
values are derived. For reproducibility, the master seed can be specified manually in
the repeats descriptor:
tasks:
- schema: my_schema
inputs:
p1: 100
repeats:
number: 2
master_seed: 1234
To prevent a random_seed sequence being generated, you
can set the repeats attribute generate_seed_sequence to False. For example, in
YAML, instead of:
tasks:
- schema: my_schema
inputs:
p1: 100
repeats: 2
you can write:
tasks:
- schema: my_schema
inputs:
p1: 100
repeats:
number: 2
generate_seed_sequence: false