kazu.steps.linking.post_processing.strategy_runner¶

Functions

entity_to_entity_key(e)

Classes

`ConfidenceLevelStrategyExecution`	The role of this class is to track which entities have had mappings successfully resolved, and which require the application of further strategies.
`StrategyRunner`	This is a complex class, designed to co-ordinate the running of various strategies over a document, with the end result producing mappings (grounding) for entities.

class kazu.steps.linking.post_processing.strategy_runner.ConfidenceLevelStrategyExecution[source]¶

Bases: object

The role of this class is to track which entities have had mappings successfully resolved, and which require the application of further strategies.

This is handled via tracking a dictionary of EntityKey to sets of parser names.

See further details in the __call__ docstring.

__call__(entity, strategy_index, document)[source]¶

Conditionally execute a mapping strategy over an entity.

Parameters:

entity (Entity) – entity to process
strategy_index (int) – index of strategy to run that is configured for this entity class
document (Document) – originating Document

Returns:

Return type:

Iterable[Mapping]

__init__(ent_class_strategies, default_strategies, stop_on_success=False)[source]¶

Parameters:

ent_class_strategies (dict[str, list[MappingStrategy]]) – per class strategies
default_strategies (list[MappingStrategy]) – default strategies
stop_on_success (bool) – If True, stop after the first successful strategy, even if some parsers remain unresolved. Otherwise, keep running until all parsers are resolved (or all relevant strategies have been tried).

get_strategies_for_entity_class(entity_class)[source]¶

Parameters:: entity_class (str)
Return type:: list[MappingStrategy]

reset()[source]¶

Clear state, ready for another execution.

Should be called when the underlying Document has changed.

Return type:: None

property longest_mapping_strategy_list_size: int¶

class kazu.steps.linking.post_processing.strategy_runner.StrategyRunner[source]¶

Bases: object

This is a complex class, designed to co-ordinate the running of various strategies over a document, with the end result producing mappings (grounding) for entities. Strategies that produce mappings may depend on the changing state of the Document, depending on whether other strategies are successful or not, hence why their precise co-ordination is crucial. Specifically we want the strategies that have higher precision to run before lower precision ones.

Beyond the precision of the strategy itself, the variables to consider are:

the confidence of the NER systems in the match, in that different systems vary in terms of precision and recall for detecting entity spans.
what LinkingCandidates are associated with the entity, and from which parser they originated from.

The __call__ method of this class operates as follows:

group entities by order of MentionConfidence.
sub-group these entities again by Entity.match and Entity.entity_class.
divide these entities by whether they are symbolic or not.
identify the maximum number of strategies that ‘could’ run.
get the appropriate ConfidenceLevelStrategyExecution to run against this sub group.
group the entities from 5. by EntityKey (i.e. a hashable representation of unique information required for mapping.
conditionally execute the next strategy out of the maximum possible (from 4), and attach any resulting mappings to the relevant entity group. Note, the ConfidenceLevelStrategyExecution is responsible for deciding whether a strategy is executed or not.

__call__(doc)[source]¶

Run relevant strategies to decide what mappings to create.

Generally speaking, noun phrases should be easier to normalise than symbolic mentions, as there is more information to work with. Therefore, we group entities by mention confidence, split by symbolism, then run execute_hit_post_processing_strategies().

Parameters:: doc (Document)
Returns:
Return type:: None

__init__(symbolic_strategies, non_symbolic_strategies, cross_ref_managers=None)[source]¶

Parameters:

symbolic_strategies (dict[str, ConfidenceLevelStrategyExecution]) – mapping of mention confidence to a ConfidenceLevelStrategyExecution for symbolic entities
non_symbolic_strategies (dict[str, ConfidenceLevelStrategyExecution]) – mapping of mention confidence to a ConfidenceLevelStrategyExecution for non-symbolic entities
cross_ref_managers (list[CrossReferenceManager] | None) – list of managers that will be applied to any created mappings, attempting to create xreferences

execute_hit_post_processing_strategies(ents_needing_mappings, document, confidence_strategy_execution)[source]¶

This method executes parts 5 - 7 in the class Docstring.

Parameters:

ents_needing_mappings (list[Entity]) – Expects entities to already be sorted based on entity_to_entity_key().
document (Document)
confidence_strategy_execution (ConfidenceLevelStrategyExecution)

Returns:

Return type:

None

static group_entities_by_symbolism(entities)[source]¶

Groups entities into symbolic and non-symbolic lists, so they can be processed separately.

Parameters:: entities (Iterable[Entity])
Returns:
Return type:: tuple[list[Entity], list[Entity]]

kazu.steps.linking.post_processing.strategy_runner.entity_to_entity_key(e)[source]¶

Parameters:: e (Entity)
Return type:: tuple[str, str, str, frozenset[LinkingCandidate]]