kazu.utils.build_and_test_model_packs

Module Attributes

DEFAULT_RAY_TIMEOUT

A default timeout in seconds for Ray to finish building the model packs within.

Functions

build_all_model_packs(...[, debug, ray_timeout])

Build multiple model packs.

wait_for_model_pack_completion(futures[, ...])

Classes

BuildConfiguration

Dataclass that controls how a base model pack and config should be merged with a target model pack.

ModelPackBuilder

Exceptions

exception kazu.utils.build_and_test_model_packs.ModelPackBuildError[source]

Bases: Exception

class kazu.utils.build_and_test_model_packs.BuildConfiguration[source]

Bases: object

Dataclass that controls how a base model pack and config should be merged with a target model pack.

__init__(requires_base_config, resources, has_own_config, run_acceptance_tests=False, acceptance_test_json_path=None, run_consistency_checks=False, sanity_test_strings=<factory>)[source]
Parameters:
  • requires_base_config (bool)

  • resources (dict[str, list[str]])

  • has_own_config (bool)

  • run_acceptance_tests (bool)

  • acceptance_test_json_path (str | None)

  • run_consistency_checks (bool)

  • sanity_test_strings (list[str])

Return type:

None

acceptance_test_json_path: str | None = None

if run_acceptance_tests, path to serialised label studio tasks.

has_own_config: bool

does this model pack have its own config dir? (if used with use_base_config these will override any config files from the base config)

requires_base_config: bool

should this model pack use the base config as a starting point?

requires_resources: bool

Whether resources (e.g. model binaries) are required to build this model pack This will be set automatically based on the values of the other fields, it’s not available to set when instantiating the class.

resources: dict[str, list[str]]

what resource directories should this model pack include? structure is <parent_directory>:[paths within parent]

run_acceptance_tests: bool = False

should acceptance tests be run?

run_consistency_checks: bool = False

should consistency checks be run on the gold standard?

sanity_test_strings: list[str]

A list of strings to run through the pipeline after the model pack is built. If any exceptions are detected, the build will fail.

class kazu.utils.build_and_test_model_packs.ModelPackBuilder[source]

Bases: object

__init__(logging_config_path, target_model_pack_path, kazu_version, build_dir, maybe_base_configuration_path, skip_tests, zip_pack, *, _ray_trace_ctx=None)[source]

A ModelPackBuilder is a helper class to assist in the building of a model pack.

Danger

WARNING! since this class will configure the kazu global cache, executing multiple builds within the same python process could potentially lead to the pollution of the cache. This is because the KAZU_MODEL_PACK env variable is modified by this object, which should normally not happen. Rather than instantiating this object directly, one should instead use build_all_model_packs(), which will control this process for you.

Parameters:
  • logging_config_path (Path | None) – passed to logging.config.fileConfig()

  • target_model_pack_path (Path) – path to model pack to process

  • kazu_version (str) – version of kazu used to generate model pack

  • build_dir (Path) – build the pack in this directory

  • maybe_base_configuration_path (Path | None) – if this pack requires the base configuration, specify path

  • skip_tests (bool) – don’t run any tests

  • zip_pack (bool) – zip the pack at the end (requires the ‘zip’ CLI tool)

Return type:

None

apply_merge_configurations(*, _ray_trace_ctx=None)[source]
Return type:

None

build_caches_and_run_sanity_checks(cfg, *, _ray_trace_ctx=None)[source]

Execute all processed required to build model pack caches.

Parameters:

cfg (DictConfig)

Returns:

pipeline that was used to run sanity checks

Return type:

Pipeline

build_model_pack(*, _ray_trace_ctx=None)[source]

Execute the build process.

Returns:

path of new pack

Return type:

Path

clear_cached_resources_from_model_pack_dir(*, _ray_trace_ctx=None)[source]

Delete any cached data from the input path.

Returns:

Return type:

None

copy_resources_to_target(*, _ray_trace_ctx=None)[source]
Return type:

None

load_build_configuration(*, _ray_trace_ctx=None)[source]

Try to load a build configuration from the model pack root.

The merge configuration should be a json file called build_config.json.

Raises:

ModelPackBuildError – if the merge config isn’t found at the expected path

Return type:

BuildConfiguration

report_tested_dependencies(*, _ray_trace_ctx=None)[source]
Return type:

None

run_acceptance_tests(cfg, *, _ray_trace_ctx=None)[source]
Parameters:

cfg (DictConfig)

Return type:

None

zip_model_pack(*, _ray_trace_ctx=None)[source]

Call the zip subprocess to compress model pack (requires zip on CLI) also moves it to parent dir.

Returns:

Return type:

None

kazu.utils.build_and_test_model_packs.build_all_model_packs(maybe_base_configuration_path, model_pack_paths, zip_pack, output_dir, skip_tests, logging_config_path, max_parallel_build, debug=False, ray_timeout=10800.0)[source]

Build multiple model packs.

Parameters:
  • maybe_base_configuration_path (Path | None) – Path to the base configuration, if required

  • model_pack_paths (list[Path]) – list of paths to model pack resources

  • zip_pack (bool) – should the packs be zipped at the end?

  • output_dir (Path) – directory to build model packs in

  • skip_tests (bool) – don’t run any tests

  • logging_config_path (Path | None) – passed to logging.config.fileConfig

  • max_parallel_build (int | None) – build at most this many model packs simultaneously. If None, use all available CPUs

  • debug (bool) – Disables Ray parallelization, enabling the use of debugger tools

  • ray_timeout (float | None) – A timeout for Ray to complete model pack building within. Defaults to DEFAULT_RAY_TIMEOUT

Returns:

Return type:

None

kazu.utils.build_and_test_model_packs.wait_for_model_pack_completion(futures, timeout=10800.0)[source]
Parameters:
Return type:

list[ObjectRef]

kazu.utils.build_and_test_model_packs.DEFAULT_RAY_TIMEOUT = 10800.0

A default timeout in seconds for Ray to finish building the model packs within. This is equal to 3 hours