hpcflow.app.Workflow#

class hpcflow.app.Workflow(workflow_ref, store_fmt=None, fs_kwargs=None, **kwargs)#

Bases: Workflow

A concrete workflow.

Parameters:

workflow_ref (str | Path | int) – Either the path to a persistent workflow, or an integer that will interpreted as the local ID of a workflow submission, as reported by the app show command.
store_fmt (str | None) – The format of persistent store to use. Used to select the store manager class.
fs_kwargs (dict[str, Any] | None) – Additional arguments to pass when resolving a virtual workflow reference.
kwargs – For compatibility during pre-stable development phase.

Methods

`abort_run`	Abort the currently running action-run of the specified task/element.
`add_loop`	Add a loop to a subset of workflow tasks.
`add_submission`	Add a new submission.
`add_task`	Add a task to this workflow.
`add_task_after`	Add a new task after the specified task.
`add_task_before`	Add a new task before the specified task.
`batch_update`	A context manager that batches up structural changes to the workflow and commits them to disk all together when the context manager exits.
`cached_merged_parameters`
`cancel`	Cancel any running jobscripts.
`check_parameters_exist`	Check if all the parameters exist.
`copy`	Copy the workflow to a new path and return the copied workflow path.
`delete`	Delete the persistent data.
`elements`	Get the elements of the workflow's tasks.
`ensure_commands_file`	Ensure a commands file exists for the specified run.
`execute_combined_runs`	Execute a combined script (multiple runs) via a subprocess.
`execute_run`	Execute commands of a run via a subprocess.
`from_JSON_file`	Generate from a JSON file.
`from_JSON_string`	Generate from a JSON string.
`from_YAML_file`	Generate from a YAML file.
`from_YAML_string`	Generate from a YAML string.
`from_file`	Generate from either a YAML or JSON file, depending on the file extension.
`from_template`	Generate from a WorkflowTemplate object.
`from_template_data`	Generate from the data associated with a WorkflowTemplate object.
`get_EAR_IDs_of_tasks`	Get EAR IDs belonging to multiple tasks.
`get_EAR_skipped`	Check if an EAR is to be skipped.
`get_EARs_from_IDs`	Get element action run objects from a list of IDs.
`get_EARs_of_tasks`	Get EARs belonging to multiple tasks.
`get_all_EARs`	Get all runs in the workflow.
`get_all_element_iterations`	Get all iterations in the workflow.
`get_all_elements`	Get all elements in the workflow.
`get_all_parameter_data`	Retrieve all workflow parameter data.
`get_all_parameter_sources`	Retrieve all persistent parameters sources.
`get_all_parameters`	Retrieve all persistent parameters.
`get_all_submission_run_IDs`	Get the run IDs of all submissions.
`get_element_IDs_from_EAR_IDs`	Get the element IDs of EARs.
`get_element_iteration_IDs_from_EAR_IDs`	Get the element iteration IDs of EARs.
`get_element_iterations_from_IDs`	Return element iteration objects from a list of IDs.
`get_element_iterations_of_tasks`	Get element iterations belonging to multiple tasks.
`get_elements_from_IDs`	Return element objects from a list of IDs.
`get_iteration_task_pathway`	Get the iteration task pathway.
`get_parameter`	Get a single parameter.
`get_parameter_data`	Get the data relating to a parameter.
`get_parameter_set_statuses`	Get whether some parameters are set.
`get_parameter_source`	Get the source of a particular parameter.
`get_parameter_sources`	Get parameter sources known to the workflow.
`get_parameters`	Get parameters known to the workflow.
`get_process_IDs`	Return jobscript process IDs from all submissions of this workflow.
`get_run_directories`
`get_running_elements`	Retrieve elements that are running according to the scheduler.
`get_running_runs`	Retrieve runs that are running according to the scheduler.
`get_scheduler_job_IDs`	Return jobscript scheduler job IDs from all submissions of this workflow.
`get_store_EARs`	Get the persistent element action runs.
`get_store_element_iterations`	Get the persistent element iterations.
`get_store_elements`	Get the persistent elements.
`get_store_tasks`	Get the persistent tasks.
`get_task_IDs_from_element_IDs`	Get the task IDs of elements.
`get_task_elements`	Get the elements of a task.
`get_task_unique_names`	Return the unique names of all workflow tasks.
`get_text_file`	Retrieve the contents of a text file stored within the workflow.
`is_parameter_set`	Test if a particular parameter is set.
`list_jobscripts`	Print a table listing jobscripts and associated information from the specified submission.
`list_task_jobscripts`	Print a table listing the jobscripts associated with the specified (or all) tasks for the specified submission.
`process_shell_parameter_output`	Process the shell stdout/stderr stream according to the associated Command object.
`rechunk`	Rechunk metadata/runs and parameters/base arrays, making them more efficient.
`rechunk_parameter_base`	Reorganise the stored data chunks for parameters to be more efficient.
`rechunk_runs`	Reorganise the stored data chunks for EARs to be more efficient.
`reload`	Reload the workflow from disk.
`resolve_jobscripts`	Resolve this workflow to a set of jobscripts to run for a new submission.
`save_parameter`	Save a parameter where an EAR can find it.
`set_EAR_end`	Set the end time and exit code on an EAR.
`set_EAR_skip`	Record that an EAR is to be skipped due to an upstream failure or loop termination condition being met.
`set_EAR_start`	Set the start time on an EAR.
`set_EARs_initialised`	Set `EARs_initialised` to True for the specified iteration.
`set_multi_run_ends`	Set end times and exit codes on multiple runs.
`set_multi_run_starts`	Set the start time on multiple runs.
`set_parameter_value`	Set the value of a parameter.
`set_parameter_values`
`show_all_EAR_statuses`	Print a description of the status of every element action run in the workflow.
`submit`	Submit the workflow for execution.
`temporary_rename`	Rename an existing same-path workflow (directory) so we can restore it if workflow creation fails.
`unzip`	Convert the workflow to an unzipped form.
`wait`	Wait for the completion of specified/all submitted jobscripts.
`zip`	Convert the workflow to a zipped form.

Attributes

`artifacts_path`	Path to artifacts of the workflow (temporary files, etc).
`creation_info`	The creation descriptor for the workflow.
`execution_path`	Path to working directory path for executing.
`id_`	The ID of this workflow.
`input_files_path`	Path to input files for the workflow.
`loops`	The loops in this workflow.
`name`	The name of the workflow.
`num_EARs`	The total number of element action runs.
`num_added_tasks`	The total number of added tasks.
`num_element_iterations`	The total number of element iterations.
`num_elements`	The total number of elements.
`num_loops`	The total number of loops.
`num_submissions`	The total number of job submissions.
`num_tasks`	The total number of tasks.
`store_format`	The format of the workflow's persistent store.
`submissions`	The job submissions done by this workflow.
`submissions_path`	Path to submission data for ths workflow.
`task_artifacts_path`	Path to artifacts of tasks.
`tasks`	The tasks in this workflow.
`template`	The template that this workflow was made from.
`template_components`	The template components used for this workflow.
`ts_fmt`	The timestamp format.
`ts_name_fmt`	The timestamp format for names.
`url`	An fsspec URL for this workflow.

abort_run(submission_idx=-1, task_idx=None, task_insert_ID=None, element_idx=None)#

Abort the currently running action-run of the specified task/element.

Parameters:

task_idx (int | None) – The parent task of the run to abort.
element_idx (int | None) – For multi-element tasks, the parent element of the run to abort.
submission_idx (int) – Defaults to the most-recent submission.
task_insert_ID (int | None)

add_loop(loop)#

Add a loop to a subset of workflow tasks.

Parameters:: loop (Loop)
Return type:: None

add_submission(tasks=None, JS_parallelism=None, force_array=False, status=True)#

Add a new submission.

Parameters:

force_array (bool) – Used to force the use of job arrays, even if the scheduler does not support it. This is provided for testing purposes only.
tasks (list[int] | None)
JS_parallelism (bool | Literal['direct', 'scheduled'] | None)
status (bool)

Return type:

Submission | None

add_task(task, new_index=None)#

Add a task to this workflow.

Parameters:

task (Task)
new_index (int | None)

Return type:

None

add_task_after(new_task, task_ref=None)#

Add a new task after the specified task.

Parameters:

task_ref (Task | None) – If not given, the new task will be added at the end of the workflow.
new_task (Task)

Return type:

None

add_task_before(new_task, task_ref=None)#

Add a new task before the specified task.

Parameters:

task_ref (Task | None) – If not given, the new task will be added at the beginning of the workflow.
new_task (Task)

Return type:

None

property artifacts_path: Path#: Path to artifacts of the workflow (temporary files, etc).

batch_update(is_workflow_creation=False)#

A context manager that batches up structural changes to the workflow and commits them to disk all together when the context manager exits.

Parameters:: is_workflow_creation (bool)
Return type:: Iterator[None]

cached_merged_parameters()#

cancel(status=True)#

Cancel any running jobscripts.

Parameters:: status (bool)

check_parameters_exist(id_lst)#

Check if all the parameters exist.

Parameters:: id_lst (int | list[int])
Return type:: bool

copy(path='.')#

Copy the workflow to a new path and return the copied workflow path.

Parameters:: path (str | Path)
Return type:: Path

property creation_info: CreationInfo#: The creation descriptor for the workflow.

delete()#

Delete the persistent data.

Return type:: None

elements()#

Get the elements of the workflow’s tasks.

Return type:: Iterator[Element]

ensure_commands_file(submission_idx, js_idx, run)#

Ensure a commands file exists for the specified run.

Parameters:

submission_idx (int)
js_idx (int)
run (ElementActionRun)

Return type:

Path | bool

execute_combined_runs(submission_idx, jobscript_idx)#

Execute a combined script (multiple runs) via a subprocess.

Parameters:

submission_idx (int)
jobscript_idx (int)

Return type:

None

execute_run(submission_idx, block_act_key, run_ID)#

Execute commands of a run via a subprocess.

Parameters:

submission_idx (int)
block_act_key (BlockActionKey)
run_ID (int)

Return type:

None

property execution_path: Path#: Path to working directory path for executing.

classmethod from_JSON_file(JSON_path, path=None, name=None, name_add_timestamp=None, name_use_dir=None, overwrite=False, store='zarr', ts_fmt=None, ts_name_fmt=None, store_kwargs=None, variables=None, status=None)#

Generate from a JSON file.

Parameters:

JSON_path (PathLike) – The path to a workflow template in the JSON file format.
path (PathLike) – The directory in which the workflow will be generated. If not specified, the config item default_workflow_path will be used; if that is not set, the current directory is used.
name (str | None) – The name to use for the workflow. If not provided, the name will be set to that of the template (optionally suffixed by a date-timestamp if name_add_timestamp is True).
name_add_timestamp (bool | None) – If True, suffix the name with a date-timestamp. A default value can be set with the config item workflow_name_add_timestamp; otherwise set to True.
name_use_dir (bool | None) – If True, and name_add_timestamp is also True, the workflow directory name will be just the date-timestamp, and will be contained within a parent directory corresponding to the workflow name. A default value can be set with the config item workflow_name_use_dir; otherwise set to False.
overwrite (bool) – If True and the workflow directory (path + name) already exists, the existing directory will be overwritten.
store (str) – The persistent store to use for this workflow.
ts_fmt (str | None) – The datetime format to use for storing datetimes. Datetimes are always stored in UTC (because Numpy does not store time zone info), so this should not include a time zone name.
ts_name_fmt (str | None) – The datetime format to use when generating the workflow name, where it includes a timestamp.
store_kwargs (dict[str, Any] | None) – Keyword arguments to pass to the store’s write_empty_workflow method.
variables (dict[str, str] | Literal[False] | None) – String variables to substitute in the file given by JSON_path. Substitutions will be attempted if the JSON file looks to contain variable references (like “<<var:name>>”). If set to False, no substitutions will occur, which may result in an invalid workflow template!
status (Status | None)

Return type:

Workflow

classmethod from_JSON_string(JSON_str, path=None, name=None, name_add_timestamp=None, name_use_dir=None, overwrite=False, store='zarr', ts_fmt=None, ts_name_fmt=None, store_kwargs=None, variables=None, status=None)#

Generate from a JSON string.

Parameters:

JSON_str (str) – The JSON string containing a workflow template parametrisation.
path (PathLike) – The directory in which the workflow will be generated. If not specified, the config item default_workflow_path will be used; if that is not set, the current directory is used.
name (str | None) – The name to use for the workflow. If not provided, the name will be set to that of the template (optionally suffixed by a date-timestamp if name_add_timestamp is True).
name_add_timestamp (bool | None) – If True, suffix the name with a date-timestamp. A default value can be set with the config item workflow_name_add_timestamp; otherwise set to True.
name_use_dir (bool | None) – If True, and name_add_timestamp is also True, the workflow directory name will be just the date-timestamp, and will be contained within a parent directory corresponding to the workflow name. A default value can be set with the config item workflow_name_use_dir; otherwise set to False.
overwrite (bool) – If True and the workflow directory (path + name) already exists, the existing directory will be overwritten.
store (str) – The persistent store to use for this workflow.
ts_fmt (str | None) – The datetime format to use for storing datetimes. Datetimes are always stored in UTC (because Numpy does not store time zone info), so this should not include a time zone name.
ts_name_fmt (str | None) – The datetime format to use when generating the workflow name, where it includes a timestamp.
store_kwargs (dict[str, Any] | None) – Keyword arguments to pass to the store’s write_empty_workflow method.
variables (dict[str, str] | Literal[False] | None) – String variables to substitute in the string JSON_str. Substitutions will be attempted if the JSON string looks to contain variable references (like “<<var:name>>”). If set to False, no substitutions will occur, which may result in an invalid workflow template!
status (Status | None)

Return type:

Workflow

classmethod from_YAML_file(YAML_path, path=None, name=None, name_add_timestamp=None, name_use_dir=None, overwrite=False, store='zarr', ts_fmt=None, ts_name_fmt=None, store_kwargs=None, variables=None)#

Generate from a YAML file.

Parameters:

YAML_path (PathLike) – The path to a workflow template in the YAML file format.
path (PathLike) – The directory in which the workflow will be generated. If not specified, the config item default_workflow_path will be used; if that is not set, the current directory is used.
name (str | None) – The name to use for the workflow. If not provided, the name will be set to that of the template (optionally suffixed by a date-timestamp if name_add_timestamp is True).
name_add_timestamp (bool | None) – If True, suffix the name with a date-timestamp. A default value can be set with the config item workflow_name_add_timestamp; otherwise set to True.
name_use_dir (bool | None) – If True, and name_add_timestamp is also True, the workflow directory name will be just the date-timestamp, and will be contained within a parent directory corresponding to the workflow name. A default value can be set with the config item workflow_name_use_dir; otherwise set to False.
overwrite (bool) – If True and the workflow directory (path + name) already exists, the existing directory will be overwritten.
store (str) – The persistent store to use for this workflow.
ts_fmt (str | None) – The datetime format to use for storing datetimes. Datetimes are always stored in UTC (because Numpy does not store time zone info), so this should not include a time zone name.
ts_name_fmt (str | None) – The datetime format to use when generating the workflow name, where it includes a timestamp.
store_kwargs (dict[str, Any] | None) – Keyword arguments to pass to the store’s write_empty_workflow method.
variables (dict[str, str] | Literal[False] | None) – String variables to substitute in the file given by YAML_path. Substitutions will be attempted if the YAML file looks to contain variable references (like “<<var:name>>”). If set to False, no substitutions will occur, which may result in an invalid workflow template!

Return type:

Workflow

classmethod from_YAML_string(YAML_str, path=None, name=None, name_add_timestamp=None, name_use_dir=None, overwrite=False, store='zarr', ts_fmt=None, ts_name_fmt=None, store_kwargs=None, variables=None, status=None)#

Generate from a YAML string.

Parameters:

YAML_str (str) – The YAML string containing a workflow template parametrisation.
path (PathLike) – The directory in which the workflow will be generated. If not specified, the config item default_workflow_path will be used; if that is not set, the current directory is used.
name (str | None) – The name to use for the workflow. If not provided, the name will be set to that of the template (optionally suffixed by a date-timestamp if name_add_timestamp is True).
name_add_timestamp (bool | None) – If True, suffix the name with a date-timestamp. A default value can be set with the config item workflow_name_add_timestamp; otherwise set to True.
name_use_dir (bool | None) – If True, and name_add_timestamp is also True, the workflow directory name will be just the date-timestamp, and will be contained within a parent directory corresponding to the workflow name. A default value can be set with the config item workflow_name_use_dir; otherwise set to False.
overwrite (bool) – If True and the workflow directory (path + name) already exists, the existing directory will be overwritten.
store (str) – The persistent store to use for this workflow.
ts_fmt (str | None) – The datetime format to use for storing datetimes. Datetimes are always stored in UTC (because Numpy does not store time zone info), so this should not include a time zone name.
ts_name_fmt (str | None) – The datetime format to use when generating the workflow name, where it includes a timestamp.
store_kwargs (dict[str, Any] | None) – Keyword arguments to pass to the store’s write_empty_workflow method.
variables (dict[str, str] | Literal[False] | None) – String variables to substitute in the string YAML_str. Substitutions will be attempted if the YAML string looks to contain variable references (like “<<var:name>>”). If set to False, no substitutions will occur, which may result in an invalid workflow template!
status (Status | None)

Return type:

Workflow

classmethod from_file(template_path, template_format=None, path=None, name=None, name_add_timestamp=None, name_use_dir=None, overwrite=False, store='zarr', ts_fmt=None, ts_name_fmt=None, store_kwargs=None, variables=None, status=None)#

Generate from either a YAML or JSON file, depending on the file extension.

Parameters:

template_path (PathLike) – The path to a template file in YAML or JSON format, and with a “.yml”, “.yaml”, or “.json” extension.
template_format (Literal['json', 'yaml'] | None) – If specified, one of “json” or “yaml”. This forces parsing from a particular format regardless of the file extension.
path (str | None) – The directory in which the workflow will be generated. If not specified, the config item default_workflow_path will be used; if that is not set, the current directory is used.
name (str | None) – The name to use for the workflow. If not provided, the name will be set to that of the template (optionally suffixed by a date-timestamp if name_add_timestamp is True).
name_add_timestamp (bool | None) – If True, suffix the name with a date-timestamp. A default value can be set with the config item workflow_name_add_timestamp; otherwise set to True.
name_use_dir (bool | None) – If True, and name_add_timestamp is also True, the workflow directory name will be just the date-timestamp, and will be contained within a parent directory corresponding to the workflow name. A default value can be set with the config item workflow_name_use_dir; otherwise set to False.
overwrite (bool) – If True and the workflow directory (path + name) already exists, the existing directory will be overwritten.
store (str) – The persistent store to use for this workflow.
ts_fmt (str | None) – The datetime format to use for storing datetimes. Datetimes are always stored in UTC (because Numpy does not store time zone info), so this should not include a time zone name.
ts_name_fmt (str | None) – The datetime format to use when generating the workflow name, where it includes a timestamp.
store_kwargs (dict[str, Any] | None) – Keyword arguments to pass to the store’s write_empty_workflow method.
variables (dict[str, str] | Literal[False] | None) – String variables to substitute in the file given by template_path. Substitutions will be attempted if the file looks to contain variable references (like “<<var:name>>”). If set to False, no substitutions will occur, which may result in an invalid workflow template!
status (Status | None)

Return type:

Workflow

classmethod from_template(template, path=None, name=None, name_add_timestamp=None, name_use_dir=None, overwrite=False, store='zarr', ts_fmt=None, ts_name_fmt=None, store_kwargs=None, status=None)#

Generate from a WorkflowTemplate object.

Parameters:

template (WorkflowTemplate) – The WorkflowTemplate object to make persistent.
path (PathLike) – The directory in which the workflow will be generated. If not specified, the config item default_workflow_path will be used; if that is not set, the current directory is used.
name (str | None) – The name to use for the workflow. If not provided, the name will be set to that of the template (optionally suffixed by a date-timestamp if name_add_timestamp is True).
name_add_timestamp (bool | None) – If True, suffix the name with a date-timestamp. A default value can be set with the config item workflow_name_add_timestamp; otherwise set to True.
name_use_dir (bool | None) – If True, and name_add_timestamp is also True, the workflow directory name will be just the date-timestamp, and will be contained within a parent directory corresponding to the workflow name. A default value can be set with config the item workflow_name_use_dir; otherwise set to False.
overwrite (bool) – If True and the workflow directory (path + name) already exists, the existing directory will be overwritten.
store (str) – The persistent store to use for this workflow.
ts_fmt (str | None) – The datetime format to use for storing datetimes. Datetimes are always stored in UTC (because Numpy does not store time zone info), so this should not include a time zone name.
ts_name_fmt (str | None) – The datetime format to use when generating the workflow name, where it includes a timestamp.
store_kwargs (dict[str, Any] | None) – Keyword arguments to pass to the store’s write_empty_workflow method.
status (Status | None)

Return type:

Workflow

classmethod from_template_data(template_name, tasks=None, loops=None, resources=None, environments=None, config=None, path=None, workflow_name=None, name_add_timestamp=None, name_use_dir=None, overwrite=False, store='zarr', ts_fmt=None, ts_name_fmt=None, store_kwargs=None)#

Generate from the data associated with a WorkflowTemplate object.

Parameters:

template_name (str) – The name to use for the new workflow template, from which the new workflow will be generated.
tasks (list[Task] | None) – List of Task objects to add to the new workflow.
loops (list[Loop] | None) – List of Loop objects to add to the new workflow.
resources (Resources) – Mapping of action scopes to resource requirements, to be applied to all element sets in the workflow. resources specified in an element set take precedence of those defined here for the whole workflow.
environments (Mapping[str, Mapping[str, Any]] | None) – Environment specifiers, keyed by environment name.
config (dict | None) – Configuration items that should be set whenever the resulting workflow is loaded. This includes config items that apply during workflow execution.
path (PathLike | None) – The directory in which the workflow will be generated. If not specified, the config item default_workflow_path will be used; if that is not set, the current directory is used.
workflow_name (str | None) – The name to use for the workflow. If not provided, the name will be set to that of the template (optionally suffixed by a date-timestamp if name_add_timestamp is True).
name_add_timestamp (bool | None) – If True, suffix the workflow name with a date-timestamp. A default value can be set with the config item workflow_name_add_timestamp; otherwise set to True.
name_use_dir (bool | None) – If True, and name_add_timestamp is also True, the workflow directory name will be just the date-timestamp, and will be contained within a parent directory corresponding to the workflow name. A default value can be set with the config item workflow_name_use_dir; otherwise set to False.
overwrite (bool) – If True and the workflow directory (path + name) already exists, the existing directory will be overwritten.
store (str) – The persistent store to use for this workflow.
ts_fmt (str | None) – The datetime format to use for storing datetimes. Datetimes are always stored in UTC (because Numpy does not store time zone info), so this should not include a time zone name.
ts_name_fmt (str | None) – The datetime format to use when generating the workflow name, where it includes a timestamp.
store_kwargs (dict[str, Any] | None) – Keyword arguments to pass to the store’s write_empty_workflow method.

Return type:

Workflow

get_EAR_IDs_of_tasks(id_lst)#

Get EAR IDs belonging to multiple tasks.

Parameters:: id_lst (Iterable[int])
Return type:: list[int]

get_EAR_skipped(EAR_ID)#

Check if an EAR is to be skipped.

Parameters:: EAR_ID (int)
Return type:: int

get_EARs_from_IDs(ids, as_dict=False)#

Get element action run objects from a list of IDs.

Parameters:

ids (Iterable[int] | int)
as_dict (bool)

Return type:

list[ElementActionRun] | dict[int, ElementActionRun] | ElementActionRun

get_EARs_of_tasks(id_lst)#

Get EARs belonging to multiple tasks.

Parameters:: id_lst (Iterable[int])
Return type:: Iterator[ElementActionRun]

get_all_EARs()#

Get all runs in the workflow.

Return type:: list[ElementActionRun]

get_all_element_iterations()#

Get all iterations in the workflow.

Return type:: list[ElementIteration]

get_all_elements()#

Get all elements in the workflow.

Return type:: list[Element]

get_all_parameter_data(**kwargs)#

Retrieve all workflow parameter data.

Keyword Arguments:: dataset_copy (bool) – For Zarr stores only. If True, copy arrays as NumPy arrays.
Return type:: dict[int, Any]

get_all_parameter_sources(**kwargs)#

Retrieve all persistent parameters sources.

Return type:: list[ParamSource]

get_all_parameters(**kwargs)#

Retrieve all persistent parameters.

Keyword Arguments:: dataset_copy (bool) – For Zarr stores only. If True, copy arrays as NumPy arrays.
Return type:: list[StoreParameter]

get_all_submission_run_IDs()#

Get the run IDs of all submissions.

Return type:: Iterable[int]

get_element_IDs_from_EAR_IDs(id_lst)#

Get the element IDs of EARs.

Parameters:: id_lst (Iterable[int])
Return type:: list[int]

get_element_iteration_IDs_from_EAR_IDs(id_lst)#

Get the element iteration IDs of EARs.

Parameters:: id_lst (Iterable[int])
Return type:: list[int]

get_element_iterations_from_IDs(id_lst)#

Return element iteration objects from a list of IDs.

Parameters:: id_lst (Iterable[int])
Return type:: list[ElementIteration]

get_element_iterations_of_tasks(id_lst)#

Get element iterations belonging to multiple tasks.

Parameters:: id_lst (Iterable[int])
Return type:: Iterator[ElementIteration]

get_elements_from_IDs(id_lst)#

Return element objects from a list of IDs.

Parameters:: id_lst (Iterable[int])
Return type:: list[Element]

get_iteration_task_pathway(ret_iter_IDs=False, ret_data_idx=False)#

Get the iteration task pathway.

Parameters:

ret_iter_IDs (bool)
ret_data_idx (bool)

Return type:

Sequence[tuple]

get_parameter(index, **kwargs)#

Get a single parameter.

Parameter#

index:: The index of the parameter to retrieve.

keyword dataset_copy:: For Zarr stores only. If True, copy arrays as NumPy arrays.
kwtype dataset_copy:: bool

Parameters:: index (int)
Return type:: StoreParameter

get_parameter_data(index, **kwargs)#

Get the data relating to a parameter.

Parameters:: index (int)
Return type:: Any

get_parameter_set_statuses(id_lst)#

Get whether some parameters are set.

Parameters:: id_lst (Iterable[int])
Return type:: list[bool]

get_parameter_source(index)#

Get the source of a particular parameter.

Parameters:: index (int)
Return type:: ParamSource

get_parameter_sources(id_lst)#

Get parameter sources known to the workflow.

Parameters:: id_lst (Iterable[int])
Return type:: list[ParamSource]

get_parameters(id_lst, **kwargs)#

Get parameters known to the workflow.

Parameter#

id_lst:: The indices of the parameters to retrieve.

keyword dataset_copy:: For Zarr stores only. If True, copy arrays as NumPy arrays.
kwtype dataset_copy:: bool

Parameters:: id_lst (Iterable[int])
Return type:: Sequence[StoreParameter]

get_process_IDs()#

Return jobscript process IDs from all submissions of this workflow.

Return type:: tuple[int, …]

get_run_directories(run_ids=None, dir_indices_arr=None)#

Parameters:

run_ids (list[int] | None)
dir_indices_arr (ndarray | None)

Return type:

list[Path | None]

get_running_elements(submission_idx=-1, task_idx=None, task_insert_ID=None)#

Retrieve elements that are running according to the scheduler.

Parameters:

submission_idx (int)
task_idx (int | None)
task_insert_ID (int | None)

Return type:

list[Element]

get_running_runs(submission_idx=-1, task_idx=None, task_insert_ID=None, element_idx=None)#

Retrieve runs that are running according to the scheduler.

Parameters:

submission_idx (int)
task_idx (int | None)
task_insert_ID (int | None)
element_idx (int | None)

Return type:

list[ElementActionRun]

get_scheduler_job_IDs()#

Return jobscript scheduler job IDs from all submissions of this workflow.

Return type:: tuple[str, …]

get_store_EARs(id_lst)#

Get the persistent element action runs.

Parameters:: id_lst (Iterable[int])
Return type:: Sequence[StoreEAR]

get_store_element_iterations(id_lst)#

Get the persistent element iterations.

Parameters:: id_lst (Iterable[int])
Return type:: Sequence[StoreElementIter]

get_store_elements(id_lst)#

Get the persistent elements.

Parameters:: id_lst (Iterable[int])
Return type:: Sequence[StoreElement]

get_store_tasks(id_lst)#

Get the persistent tasks.

Parameters:: id_lst (Iterable[int])
Return type:: Sequence[StoreTask]

get_task_IDs_from_element_IDs(id_lst)#

Get the task IDs of elements.

Parameters:: id_lst (Iterable[int])
Return type:: list[int]

get_task_elements(task, idx_lst=None)#

Get the elements of a task.

Parameters:

task (WorkflowTask)
idx_lst (list[int] | None)

Return type:

list[Element]

get_task_unique_names(map_to_insert_ID=False)#

Return the unique names of all workflow tasks.

Parameters:: map_to_insert_ID (bool) – If True, return a dict whose values are task insert IDs, otherwise return a list.
Return type:: Sequence[str] | Mapping[str, int]

get_text_file(path)#

Retrieve the contents of a text file stored within the workflow.

Parameters:: path (str | Path)
Return type:: str

property id_: str#: The ID of this workflow.

property input_files_path: Path#: Path to input files for the workflow.

is_parameter_set(index)#

Test if a particular parameter is set.

Parameters:: index (int)
Return type:: bool

list_jobscripts(sub_idx=0, max_js=None, jobscripts=None, width=None)#

Print a table listing jobscripts and associated information from the specified submission.

Parameters:

sub_idx (int) – The submission index whose jobscripts are to be displayed.
max_js (int | None) – Maximum jobscript index to display. This cannot be specified with jobscripts.
jobscripts (list[int] | None) – A list of jobscripts to display. This cannot be specified with max_js.
width (int | None) – Width in characters of the printed table.

Return type:

None

list_task_jobscripts(sub_idx=0, task_names=None, max_js=None, width=None)#

Print a table listing the jobscripts associated with the specified (or all) tasks for the specified submission.

Parameters:

sub_idx (int) – The submission index whose jobscripts are to be displayed.
task_names (list[str] | None) – List of sub-strings to match to task names. Only matching task names will be included.
max_js (int | None) – Maximum jobscript index to display.
width (int | None) – Width in characters of the printed table.

property loops: WorkflowLoopList#: The loops in this workflow.

property name: str#

The name of the workflow.

The workflow name may be different from the template name, as it includes the creation date-timestamp if generated.

property num_EARs: int#: The total number of element action runs.

property num_added_tasks: int#: The total number of added tasks.

property num_element_iterations: int#: The total number of element iterations.

property num_elements: int#: The total number of elements.

property num_loops: int#: The total number of loops.

property num_submissions: int#: The total number of job submissions.

property num_tasks: int#: The total number of tasks.

process_shell_parameter_output(name, value, EAR_ID, cmd_idx, stderr=False)#

Process the shell stdout/stderr stream according to the associated Command object.

Parameters:

name (str)
value (str)
EAR_ID (int)
cmd_idx (int)
stderr (bool)

Return type:

Any

rechunk(chunk_size=None, backup=True, status=True)#

Rechunk metadata/runs and parameters/base arrays, making them more efficient.

Parameters:

chunk_size (int | None)
backup (bool)
status (bool)

rechunk_parameter_base(chunk_size=None, backup=True, status=True)#

Reorganise the stored data chunks for parameters to be more efficient.

Parameters:

chunk_size (int | None)
backup (bool)
status (bool)

rechunk_runs(chunk_size=None, backup=True, status=True)#

Reorganise the stored data chunks for EARs to be more efficient.

Parameters:

chunk_size (int | None)
backup (bool)
status (bool)

reload()#

Reload the workflow from disk.

Return type:: Self

resolve_jobscripts(cache, tasks=None, force_array=False)#

Resolve this workflow to a set of jobscripts to run for a new submission.

Parameters:

force_array (bool) – Used to force the use of job arrays, even if the scheduler does not support it. This is provided for testing purposes only.
cache (ObjectCache)
tasks (Sequence[int] | None)

Return type:

list[Jobscript]

save_parameter(name, value, EAR_ID)#

Save a parameter where an EAR can find it.

Parameters:

name (str)
value (Any)
EAR_ID (int)

set_EAR_end(block_act_key, run, exit_code)#

Set the end time and exit code on an EAR.

If the exit code is non-zero, also set all downstream dependent EARs to be skipped. Also save any generated input/output files.

Parameters:

block_act_key (BlockActionKey)
run (ElementActionRun)
exit_code (int)

Return type:

None

set_EAR_skip(skip_reasons)#

Record that an EAR is to be skipped due to an upstream failure or loop termination condition being met.

Parameters:: skip_reasons (dict[int, SkipReason])
Return type:: None

set_EAR_start(run_id, run_dir, port_number)#

Set the start time on an EAR.

Parameters:

run_id (int)
run_dir (Path | None)
port_number (int | None)

Return type:

None

set_EARs_initialised(iter_ID)#

Set EARs_initialised to True for the specified iteration.

Parameters:: iter_ID (int)
Return type:: None

set_multi_run_ends(runs)#

Set end times and exit codes on multiple runs.

If the exit code is non-zero, also set all downstream dependent runs to be skipped. Also save any generated input/output files.

Parameters:: runs (dict[BlockActionKey, list[tuple[ElementActionRun, int, Path | None]]])
Return type:: None

set_multi_run_starts(run_ids, run_dirs, port_number)#

Set the start time on multiple runs.

Parameters:

run_ids (list[int])
run_dirs (list[Path | None])
port_number (int)

Return type:

None

set_parameter_value(param_id, value, commit=False)#

Set the value of a parameter.

Parameters:

param_id (int | list[int])
value (Any)
commit (bool)

Return type:

None

set_parameter_values(values, commit=False)#

Parameters:

values (dict[int, Any])
commit (bool)

Return type:

None

show_all_EAR_statuses()#

Print a description of the status of every element action run in the workflow.

Return type:: None

property store_format: str#: The format of the workflow’s persistent store.

property submissions: list[Submission]#: The job submissions done by this workflow.

property submissions_path: Path#: Path to submission data for ths workflow.

submit(*, ignore_errors=False, JS_parallelism=None, print_stdout=False, wait=False, add_to_known=True, return_idx=False, tasks=None, cancel=False, status=True)#

Submit the workflow for execution.

Parameters:

ignore_errors (bool) – If True, ignore jobscript submission errors. If False (the default) jobscript submission will halt when a jobscript fails to submit.
JS_parallelism (bool | Literal['direct', 'scheduled'] | None) – If True, allow multiple jobscripts to execute simultaneously. If ‘scheduled’/’direct’, only allow simultaneous execution of scheduled/direct jobscripts. Raises if set to True, ‘scheduled’, or ‘direct’, but the store type does not support the jobscript_parallelism feature. If not set, jobscript parallelism will be used if the store type supports it, for scheduled jobscripts only.
print_stdout (bool) – If True, print any jobscript submission standard output, otherwise hide it.
wait (bool) – If True, this command will block until the workflow execution is complete.
add_to_known (bool) – If True, add the submitted submissions to the known-submissions file, which is used by the show command to monitor current and recent submissions.
return_idx (bool) – If True, return a dict representing the jobscript indices submitted for each submission.
tasks (list[int] | None) – List of task indices to include in the new submission if no submissions already exist. By default all tasks are included if a new submission is created.
cancel (bool) – Immediately cancel the submission. Useful for testing and benchmarking.
status (bool) – If True, display a live status to track submission progress.

Return type:

Mapping[int, Sequence[int]] | None

property task_artifacts_path: Path#: Path to artifacts of tasks.

property tasks: WorkflowTaskList#: The tasks in this workflow.

property template: WorkflowTemplate#: The template that this workflow was made from.

property template_components: TemplateComponents#: The template components used for this workflow.

classmethod temporary_rename(path, fs)#

Rename an existing same-path workflow (directory) so we can restore it if workflow creation fails.

Renaming will occur until the successfully completed. This means multiple new paths may be created, where only the final path should be considered the successfully renamed workflow. Other paths will be deleted.

Parameters:

path (str)
fs (AbstractFileSystem)

Return type:

str

property ts_fmt: str#: The timestamp format.

property ts_name_fmt: str#: The timestamp format for names.

unzip(path='.', *, log=None)#

Convert the workflow to an unzipped form.

Parameters:

path (str) – Path at which to create the new unzipped workflow. If this is an existing directory, the new workflow directory will be created within this directory. Otherwise, this path will represent the new workflow directory path.
log (str | None)

Return type:

str

property url: str#: An fsspec URL for this workflow.

wait(sub_js=None)#

Wait for the completion of specified/all submitted jobscripts.

Parameters:: sub_js (Mapping[int, Sequence[int]] | None)

zip(path='.', *, log=None, overwrite=False, include_execute=False, include_rechunk_backups=False)#

Convert the workflow to a zipped form.

Parameters:

path (str) – Path at which to create the new zipped workflow. If this is an existing directory, the zip file will be created within this directory. Otherwise, this path is assumed to be the full file path to the new zip file.
log (str | None)
overwrite (bool)
include_execute (bool)
include_rechunk_backups (bool)

Return type:

str

hpcflow.app.Workflow#

Parameter#

Parameter#

This Page