kazu.ontology_matching.ontology_matcher¶
Classes
String matching to synonyms. |
|
OntologyMatcherConfig(span_key: str, match_id_sep: str, labels: list[str], parser_name_to_entity_type: dict[str, str]) |
- class kazu.ontology_matching.ontology_matcher.OntologyMatcher[source]¶
Bases:
object
String matching to synonyms.
Core strict matching is done by spaCy’s PhraseMatcher.
- __init__(nlp, name='ontology_matcher', *, span_key='RAW_HITS', match_id_sep=':::', parser_name_to_entity_type)[source]¶
- Parameters:
nlp (Language) – a spacy model, used for its vocab and tokenizer
name (str) – the name of this component. Used for spacy config
span_key (str) – the key for doc.spans to store the matches in
match_id_sep (str) – a separator this splits fields in the match id
parser_name_to_entity_type (dict[str, str]) – a mapping from parsers to their entity class
- create_phrasematchers(parsers)[source]¶
Create spaCy PhraseMatchers.
OntologyStringResource
s are produced byOntologyParser.populate_databases()
method.- Parameters:
parsers (list[OntologyParser])
- Returns:
- Return type:
tuple[PhraseMatcher | None, PhraseMatcher | None]
- filter_by_contexts(doc, spans)[source]¶
These filters work best when there is sentence segmentation available.
- from_disk(path, *, exclude=[])[source]¶
Load the pipe from disk.
Modifies the object in place and returns it.
- Parameters:
- Return type:
- span_in_FP_context(doc, ent_class)[source]¶
When an entity type has a FP matcher defined, spans that match are regarded as FPs.
- span_in_FP_coocc(doc, span, ent_class)[source]¶
When an entity type has a FP co-occ dic defined, a hit defined in the dict is regarded as a false positive when it matches at least one of its co-occ terms.
- span_in_TP_context(doc, ent_class)[source]¶
When an entity type has a TP matcher defined, it should match for this span to be regarded as a true hit.
- span_in_TP_coocc(doc, span, ent_class)[source]¶
When an entity type has a TP co-occ dict defined, a hit defined in the dict is only regarded as a true hit when it matches at least one of its co-occ terms.