kazu.steps.linking.post_processing.strategy_runner

Functions

Classes

ConfidenceLevelStrategyExecution

The role of this class is to track which entities have had mappings successfully resolved, and which require the application of further strategies.

StrategyRunner

This is a complex class, designed to co-ordinate the running of various strategies over a document, with the end result producing mappings (grounding) for entities.

class kazu.steps.linking.post_processing.strategy_runner.ConfidenceLevelStrategyExecution[source]

Bases: object

The role of this class is to track which entities have had mappings successfully resolved, and which require the application of further strategies.

This is handled via tracking a dictionary of EntityKey to sets of parser names.

See further details in the __call__ docstring.

__call__(entity, strategy_index, document)[source]

Conditionally execute a mapping strategy over an entity.

Parameters:
  • entity (Entity) – entity to process

  • strategy_index (int) – index of strategy to run that is configured for this entity class

  • document (Document) – originating Document

Returns:

Return type:

Iterable[Mapping]

__init__(ent_class_strategies, default_strategies, stop_on_success=False)[source]
Parameters:
  • ent_class_strategies (dict[str, list[MappingStrategy]]) – per class strategies

  • default_strategies (list[MappingStrategy]) – default strategies

  • stop_on_success (bool) – If True, stop after the first successful strategy, even if some parsers remain unresolved. Otherwise, keep running until all parsers are resolved (or all relevant strategies have been tried).

get_strategies_for_entity_class(entity_class)[source]
Parameters:

entity_class (str)

Return type:

list[MappingStrategy]

reset()[source]

Clear state, ready for another execution.

Should be called when the underlying Document has changed.

Return type:

None

property longest_mapping_strategy_list_size: int
class kazu.steps.linking.post_processing.strategy_runner.StrategyRunner[source]

Bases: object

This is a complex class, designed to co-ordinate the running of various strategies over a document, with the end result producing mappings (grounding) for entities. Strategies that produce mappings may depend on the changing state of the Document, depending on whether other strategies are successful or not, hence why their precise co-ordination is crucial. Specifically we want the strategies that have higher precision to run before lower precision ones.

Beyond the precision of the strategy itself, the variables to consider are:

  1. the confidence of the NER systems in the match, in that different systems vary in terms of precision and recall for detecting entity spans.

  2. what LinkingCandidates are associated with the entity, and from which parser they originated from.

The __call__ method of this class operates as follows:

  1. group entities by order of MentionConfidence.

  2. sub-group these entities again by Entity.match and Entity.entity_class.

  3. divide these entities by whether they are symbolic or not.

  4. identify the maximum number of strategies that ‘could’ run.

  5. get the appropriate ConfidenceLevelStrategyExecution to run against this sub group.

  6. group the entities from 5. by EntityKey (i.e. a hashable representation of unique information required for mapping.

  7. conditionally execute the next strategy out of the maximum possible (from 4), and attach any resulting mappings to the relevant entity group. Note, the ConfidenceLevelStrategyExecution is responsible for deciding whether a strategy is executed or not.

__call__(doc)[source]

Run relevant strategies to decide what mappings to create.

Generally speaking, noun phrases should be easier to normalise than symbolic mentions, as there is more information to work with. Therefore, we group entities by mention confidence, split by symbolism, then run execute_hit_post_processing_strategies().

Parameters:

doc (Document)

Returns:

Return type:

None

__init__(symbolic_strategies, non_symbolic_strategies, cross_ref_managers=None)[source]
Parameters:
execute_hit_post_processing_strategies(ents_needing_mappings, document, confidence_strategy_execution)[source]

This method executes parts 5 - 7 in the class Docstring.

Parameters:
Returns:

Return type:

None

static group_entities_by_symbolism(entities)[source]

Groups entities into symbolic and non-symbolic lists, so they can be processed separately.

Parameters:

entities (Iterable[Entity])

Returns:

Return type:

tuple[list[Entity], list[Entity]]

kazu.steps.linking.post_processing.strategy_runner.entity_to_entity_key(e)[source]
Parameters:

e (Entity)

Return type:

tuple[str, str, str, frozenset[LinkingCandidate]]