kazu.ontology_matching.assemble_pipeline

Functions

main(output_dir, parsers[, span_key])

Generates, serializes and returns a spaCy pipeline with an OntologyMatcher.

kazu.ontology_matching.assemble_pipeline.main(output_dir, parsers, span_key='RAW_HITS')[source]

Generates, serializes and returns a spaCy pipeline with an OntologyMatcher.

Generates an English spaCy pipeline with a tokenizer, a sentencizer with default config, and an OntologyMatcher based on the input parameters. The pipeline is written to disk, and also returned to the caller.

If a parser has no human curated OntologyStringResource configured, the OntologyMatcher is built using autogenerated OntologyStringResources (with any associated generated synonyms). This is useful for trying to understand which strings are ‘noisy’, but not recommended for production as raw ontology data tends to need some curation before it can be applied.

Parameters:
  • output_dir (str | Path) – the output directory to write the pipeline into.

  • parsers (list[OntologyParser]) – build the pipeline using these parsers as a data source.

  • span_key (str) – the key to use within the generated spaCy Docs’ span attribute to store and access recognised NER spans.

Returns:

Return type:

Language