kazu.steps.linking.post_processing.strategy_runner¶
Functions
Classes
The role of this class is to track which entities have had mappings successfully resolved, and which require the application of further strategies. |
|
This is a complex class, designed to co-ordinate the running of various strategies over a document, with the end result producing mappings (grounding) for entities. |
- class kazu.steps.linking.post_processing.strategy_runner.ConfidenceLevelStrategyExecution[source]¶
Bases:
object
The role of this class is to track which entities have had mappings successfully resolved, and which require the application of further strategies.
This is handled via tracking a dictionary of EntityKey to sets of parser names.
See further details in the __call__ docstring.
- __call__(entity, strategy_index, document)[source]¶
Conditionally execute a mapping strategy over an entity.
- __init__(ent_class_strategies, default_strategies, stop_on_success=False)[source]¶
- Parameters:
ent_class_strategies (dict[str, list[MappingStrategy]]) – per class strategies
default_strategies (list[MappingStrategy]) – default strategies
stop_on_success (bool) – If
True
, stop after the first successful strategy, even if some parsers remain unresolved. Otherwise, keep running until all parsers are resolved (or all relevant strategies have been tried).
- class kazu.steps.linking.post_processing.strategy_runner.StrategyRunner[source]¶
Bases:
object
This is a complex class, designed to co-ordinate the running of various strategies over a document, with the end result producing mappings (grounding) for entities. Strategies that produce mappings may depend on the changing state of the Document, depending on whether other strategies are successful or not, hence why their precise co-ordination is crucial. Specifically we want the strategies that have higher precision to run before lower precision ones.
Beyond the precision of the strategy itself, the variables to consider are:
the confidence of the NER systems in the match, in that different systems vary in terms of precision and recall for detecting entity spans.
what LinkingCandidates are associated with the entity, and from which parser they originated from.
The __call__ method of this class operates as follows:
group entities by order of
MentionConfidence
.sub-group these entities again by
Entity.match
andEntity.entity_class
.divide these entities by whether they are symbolic or not.
identify the maximum number of strategies that ‘could’ run.
get the appropriate
ConfidenceLevelStrategyExecution
to run against this sub group.group the entities from 5. by EntityKey (i.e. a hashable representation of unique information required for mapping.
conditionally execute the next strategy out of the maximum possible (from 4), and attach any resulting mappings to the relevant entity group. Note, the
ConfidenceLevelStrategyExecution
is responsible for deciding whether a strategy is executed or not.
- __call__(doc)[source]¶
Run relevant strategies to decide what mappings to create.
Generally speaking, noun phrases should be easier to normalise than symbolic mentions, as there is more information to work with. Therefore, we group entities by mention confidence, split by symbolism, then run
execute_hit_post_processing_strategies()
.- Parameters:
doc (Document)
- Returns:
- Return type:
None
- __init__(symbolic_strategies, non_symbolic_strategies, cross_ref_managers=None)[source]¶
- Parameters:
symbolic_strategies (dict[str, ConfidenceLevelStrategyExecution]) – mapping of mention confidence to a
ConfidenceLevelStrategyExecution
for symbolic entitiesnon_symbolic_strategies (dict[str, ConfidenceLevelStrategyExecution]) – mapping of mention confidence to a
ConfidenceLevelStrategyExecution
for non-symbolic entitiescross_ref_managers (list[CrossReferenceManager] | None) – list of managers that will be applied to any created mappings, attempting to create xreferences
- execute_hit_post_processing_strategies(ents_needing_mappings, document, confidence_strategy_execution)[source]¶
This method executes parts 5 - 7 in the class Docstring.
- Parameters:
ents_needing_mappings (list[Entity]) – Expects entities to already be sorted based on
entity_to_entity_key()
.document (Document)
confidence_strategy_execution (ConfidenceLevelStrategyExecution)
- Returns:
- Return type:
None