kazu.ontology_matching.assemble_pipeline¶
Functions
|
Generates, serializes and returns a spaCy pipeline with an |
- kazu.ontology_matching.assemble_pipeline.main(output_dir, parsers, span_key='RAW_HITS')[source]¶
Generates, serializes and returns a spaCy pipeline with an
OntologyMatcher
.Generates an English spaCy pipeline with a tokenizer, a sentencizer with default config, and an OntologyMatcher based on the input parameters. The pipeline is written to disk, and also returned to the caller.
If a parser has no human curated
OntologyStringResource
configured, theOntologyMatcher
is built using autogeneratedOntologyStringResource
s (with any associated generated synonyms). This is useful for trying to understand which strings are ‘noisy’, but not recommended for production as raw ontology data tends to need some curation before it can be applied.- Parameters:
output_dir (str | Path) – the output directory to write the pipeline into.
parsers (list[OntologyParser]) – build the pipeline using these parsers as a data source.
span_key (str) – the key to use within the generated spaCy Docs’ span attribute to store and access recognised NER spans.
- Returns:
- Return type:
Language