kazu.steps.linking.rules_based_disambiguation¶
Classes
Removes instances of |
- class kazu.steps.linking.rules_based_disambiguation.MatcherResult[source]¶
Bases:
AutoNameEnum
- HIT = 'HIT'¶
- MISS = 'MISS'¶
- NOT_CONFIGURED = 'NOT_CONFIGURED'¶
- class kazu.steps.linking.rules_based_disambiguation.RulesBasedEntityClassDisambiguationFilterStep[source]¶
Bases:
Step
Removes instances of
Entity
fromSection
s that don’t meet rules based disambiguation requirements in at least one location in the document.This step utilises spaCy Matcher rules to determine whether an entity class and or/mention entities are valid or not. These Matcher rules operate on the sentence in which each mention under consideration is located.
Rules can have both true positive and false positive aspects. If defined, that aspect MUST be correct at least once in the document for all entities with the same key (composed of the matched string and entity class) to be valid.
Non-contiguous entities are evaluated on the full span of the text they cover, rather than the specific tokens.
- __call__(doc)[source]¶
Process documents and respond with processed and failed documents.
Note that many steps will be decorated by
document_iterating_step()
ordocument_batch_step()
which will modify the ‘original’__call__
function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.
- __init__(class_matcher_rules, mention_matcher_rules)[source]¶
- Parameters:
class_matcher_rules (dict[str, dict[Literal['tp', 'fp'], list[list[dict[str, ~typing.Any]]] | None]]) –
these should follow the format:
{ "<entity class>": { "<tp or fp (for true positive or false positive rules respectively>": [ "<a list of rules>", "<according to the spaCy pattern matcher syntax>", ] } }
mention_matcher_rules (dict[str, dict[str, dict[Literal['tp', 'fp'], list[list[dict[str, ~typing.Any]]] | None]]]) –
these should follow the format:
{ "<entity class>": { "<mention to disambiguate>": { "<tp or fp>": [ "<a list of rules>", "<according to the spaCy pattern matcher syntax>", ] } } }