Installation#
There are two ways of using hpcFlow:
The hpcFlow command-line interface (CLI)
The hpcFlow Python package
Both of these options allow workflows to be designed and executed. The hpcFlow CLI is recommended for beginners and strongly recommended if you want to run hpcFlow on a cluster. The Python package allows workflows to be designed and explored via the Python API and is recommended for users comfortable working with Python. If you are interested in contributing to the development of hpcFlow, the Python package is the place to start.
The CLI and the Python package can be used simultaneously.
Using pip#
The recommended way to install hpcFlow is to use pip to install the Python package from PyPI:
pip install hpcflow-new2
This installs the python package, which also gives the CLI version of hpcFlow.
Release notes#
Release notes for this version (0.2.0a238) are available on GitHub. Use the version switcher in the top-right corner of the page to download/install other versions.
Alternative installation methods#
Although not currently recommended, advanced users may wish to use one of the alternative installation methods.
Configuration#
hpcFlow uses a config file to control details of how it executes workflows. A default config file will be created the first time you submit a workflow. This will work without modification on a personal machine, however if you are using hpcFlow on HPC you will likely need to make some modifications to describe the job scheduler, and settings for multiple cores, and to point to your hpcFlow environments file.
Some examples are given for the University of Manchester’s CSF.
The path to your config file can be found using hpcflow manage get-config-path
,
or to open the config file directly, use hpcflow open config
.
Environments#
hpcFlow has the concept of environments, similar to python virtual environments.
These are required so that tasks can run using the specific software they require.
Your hpcFlow environments must be defined in your environments (YAML) file before hpcFlow
can run workflows, and this environment file must be pointed to in the config file
via the environment_sources
key.
Once this has been done, your environment file can be be opened using hpcflow open env-source
.
Below is an example environments file that defines an environment for running Python scripts. Domain-specific tools can be added to the environments file as required, each with their own setup instructions for loading that tool on your machine.
- name: python_env
executables:
- label: python_script
instances:
- command: python "<<script_path>>" <<args>>
num_cores:
start: 1
stop: 32
parallel_mode: null
Note also that any hpcFlow environment which activates a python virtual environment as part of the setup, must also have the hpcFlow python package installed, and it must be the same version as is used to submit the workflow. In practice, this is most easily achieved by creating one python virtual environment and using it in each of these hpcFlow environments and to submit workflows.
Tips for SLURM#
hpcFlow currently has a fault such that it doesn’t select a SLURM partition based on the resources requested in your workflow file. As such, users must manually define this in their workflow files e.g.
resources:
any:
scheduler_args:
directives:
--time: 00:30:00
--partition: serial
Note also that for many SLURM schedulers, a time limit must also be specified as shown above.
A default time limit and partition
can be set in the config file, which will be used for tasks which don’t have this set explicitly
in a resources
block like the example above.