kedro.runner.ThreadRunner¶

class kedro.runner.ThreadRunner(max_workers=None, is_async=False)[source]¶

ThreadRunner is an AbstractRunner implementation. It can be used to run the Pipeline in parallel groups formed by toposort using threads.

Methods

`create_default_data_set`(ds_name)	Factory method for creating the default dataset for the runner.
`run`(pipeline, catalog[, hook_manager, ...])	Run the `Pipeline` using the datasets provided by `catalog` and save results back to the same objects.
`run_only_missing`(pipeline, catalog, hook_manager)	Run only the missing outputs from the `Pipeline` using the datasets provided by `catalog`, and save results back to the same objects.

__init__(max_workers=None, is_async=False)[source]¶

Instantiates the runner.

Parameters:

max_workers (Optional[int]) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count.
is_async (bool) – If True, set to False, because ThreadRunner doesn’t support loading and saving the node inputs and outputs asynchronously with threads. Defaults to False.

Raises:

ValueError – bad parameters passed

create_default_data_set(ds_name)[source]¶

Factory method for creating the default dataset for the runner.

Parameters:: ds_name (str) – Name of the missing dataset.
Return type:: MemoryDataset
Returns:: An instance of MemoryDataset to be used for all unregistered datasets.

run(pipeline, catalog, hook_manager=None, session_id=None)¶

Run the Pipeline using the datasets provided by catalog and save results back to the same objects.

Parameters:

Raises:

ValueError – Raised when Pipeline inputs cannot be satisfied.

Returns:

Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.

run_only_missing(pipeline, catalog, hook_manager)¶

Run only the missing outputs from the Pipeline using the datasets provided by catalog, and save results back to the same objects.

Parameters:

Raises:

ValueError – Raised when Pipeline inputs cannot be satisfied.

Returns:

Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.