kazu.ontology_matching.ontology_matcher

Classes

OntologyMatcher

String matching to synonyms.

OntologyMatcherConfig

OntologyMatcherConfig(span_key: str, match_id_sep: str, labels: list[str], parser_name_to_entity_type: dict[str, str])

class kazu.ontology_matching.ontology_matcher.OntologyMatcher[source]

Bases: object

String matching to synonyms.

Core strict matching is done by spaCy’s PhraseMatcher.

__call__(doc)[source]

Call self as a function.

Parameters:

doc (Doc)

Return type:

Doc

__init__(nlp, name='ontology_matcher', *, span_key='RAW_HITS', match_id_sep=':::', parser_name_to_entity_type)[source]
Parameters:
  • nlp (Language) – a spacy model, used for its vocab and tokenizer

  • name (str) – the name of this component. Used for spacy config

  • span_key (str) – the key for doc.spans to store the matches in

  • match_id_sep (str) – a separator this splits fields in the match id

  • parser_name_to_entity_type (dict[str, str]) – a mapping from parsers to their entity class

create_phrasematchers(parsers)[source]

Create spaCy PhraseMatchers.

OntologyStringResources are produced by OntologyParser.populate_databases()method.

Parameters:

parsers (list[OntologyParser])

Returns:

Return type:

tuple[PhraseMatcher | None, PhraseMatcher | None]

filter_by_contexts(doc, spans)[source]

These filters work best when there is sentence segmentation available.

Parameters:
  • doc (Doc)

  • spans (list[Span])

Return type:

list[Span]

from_disk(path, *, exclude=[])[source]

Load the pipe from disk.

Modifies the object in place and returns it.

Parameters:
Return type:

OntologyMatcher

set_context_matchers()[source]
set_labels(labels)[source]
Parameters:

labels (Iterable[str])

Return type:

None

span_in_FP_context(doc, ent_class)[source]

When an entity type has a FP matcher defined, spans that match are regarded as FPs.

Parameters:
  • doc (Doc | Span)

  • ent_class (str)

Return type:

bool

span_in_FP_coocc(doc, span, ent_class)[source]

When an entity type has a FP co-occ dic defined, a hit defined in the dict is regarded as a false positive when it matches at least one of its co-occ terms.

Parameters:
  • doc (Doc | Span)

  • span (Span)

  • ent_class (str)

Return type:

bool

span_in_TP_context(doc, ent_class)[source]

When an entity type has a TP matcher defined, it should match for this span to be regarded as a true hit.

Parameters:
  • doc (Doc | Span)

  • ent_class (str)

Return type:

bool

span_in_TP_coocc(doc, span, ent_class)[source]

When an entity type has a TP co-occ dict defined, a hit defined in the dict is only regarded as a true hit when it matches at least one of its co-occ terms.

Parameters:
  • doc (Doc | Span)

  • span (Span)

  • ent_class (str)

Return type:

bool

to_disk(path, *, exclude=[])[source]

Serialize the pipe to disk.

Parameters:
  • path (str | Path) – Path to serialize the pipeline to.

  • exclude (Iterable[str]) – String names of serialization fields to exclude.

Return type:

None

property labels: list[str]

The labels currently processed by this component.

property match_id_sep: str
property nr_lowercase_rules: int
property nr_strict_rules: int
property parser_name_to_entity_type: dict[str, str]
property span_key: str
class kazu.ontology_matching.ontology_matcher.OntologyMatcherConfig[source]

Bases: object

OntologyMatcherConfig(span_key: str, match_id_sep: str, labels: list[str], parser_name_to_entity_type: dict[str, str])

__init__(span_key, match_id_sep, labels, parser_name_to_entity_type)[source]
Parameters:
Return type:

None

labels: list[str]
match_id_sep: str
parser_name_to_entity_type: dict[str, str]
span_key: str