kazu.utils.spacy_object_mapper¶
Classes
Maps entities and text from a |
- class kazu.utils.spacy_object_mapper.KazuToSpacyObjectMapper[source]¶
Bases:
objectMaps entities and text from a
Sectionto the spaCy data model usingbasic_spacy_pipeline().Attention
Providing incomplete
entity_classesfor your usage (or leaving it blank) can lead to errors that might only occur infrequently when processing the results, and therefore may be difficult to track down.Therefore, users should be careful to set
entity_classesto all the entity classes corresponding to attributes that they will access on the spaCy Tokens within the Spans of the result of__call__(), whether directly or via spaCy Matcher rules that check these custom attributes.The specific problem is that if you try to read a spaCy custom attribute that doesn’t exist, you will get an error like:
AttributeError: [E046] Can't retrieve unregistered extension attribute 'drug'. Did you forget to call the `set_extension` method?
This class uses the provided
entity_classesto callset_extension. If the providedentity_classesis incomplete - say, missing"drug"- and you then try to access thedrugattribute on a token in the result, you will get this error.- __init__(entity_classes={}, set_attributes_incrementally=False)[source]¶
- Parameters:
entity_classes (Iterable[str]) – known entity classes that the caller intends to access the spaCy extension attribute of with the result of
__call__(). See note above about the need to take care here.set_attributes_incrementally (bool) –
whether to set a spaCy custom extension attribute for ‘new’ entity classes in
Sectionpassed to__call__(). This will result in a more consistent result of__call__, where every Span in the dictionary will have an attribute for the relevantEntity’s entity class set toTruefor all the tokens in the span. However, it makes subtle bugs much more likely, soFalseis the default - see the note in the class-level docs if you are thinking about turning this on.
- entity_classes¶
A set of entity classes known to this class. These will all have a spaCy custom extension attribute set. If
set_attributes_incrementallyisTrue, as well as theentity_classespassed into the__init__, this will include all entity classes encountered so far processingSections passed in to__call__().