kazu.utils.spacy_object_mapper¶
Classes
Maps entities and text from a |
- class kazu.utils.spacy_object_mapper.KazuToSpacyObjectMapper[source]¶
Bases:
object
Maps entities and text from a
Section
to the spaCy data model usingbasic_spacy_pipeline()
.Attention
Providing incomplete
entity_classes
for your usage (or leaving it blank) can lead to errors that might only occur infrequently when processing the results, and therefore may be difficult to track down.Therefore, users should be careful to set
entity_classes
to all the entity classes corresponding to attributes that they will access on the spaCy Tokens within the Spans of the result of__call__()
, whether directly or via spaCy Matcher rules that check these custom attributes.The specific problem is that if you try to read a spaCy custom attribute that doesn’t exist, you will get an error like:
AttributeError: [E046] Can't retrieve unregistered extension attribute 'drug'. Did you forget to call the `set_extension` method?
This class uses the provided
entity_classes
to callset_extension
. If the providedentity_classes
is incomplete - say, missing"drug"
- and you then try to access thedrug
attribute on a token in the result, you will get this error.- __init__(entity_classes={}, set_attributes_incrementally=False)[source]¶
- Parameters:
entity_classes (Iterable[str]) – known entity classes that the caller intends to access the spaCy extension attribute of with the result of
__call__()
. See note above about the need to take care here.set_attributes_incrementally (bool) –
whether to set a spaCy custom extension attribute for ‘new’ entity classes in
Section
passed to__call__()
. This will result in a more consistent result of__call__
, where every Span in the dictionary will have an attribute for the relevantEntity
’s entity class set toTrue
for all the tokens in the span. However, it makes subtle bugs much more likely, soFalse
is the default - see the note in the class-level docs if you are thinking about turning this on.
- entity_classes¶
A set of entity classes known to this class. These will all have a spaCy custom extension attribute set. If
set_attributes_incrementally
isTrue
, as well as theentity_classes
passed into the__init__
, this will include all entity classes encountered so far processingSection
s passed in to__call__()
.