kazu.steps.joint_ner_and_linking.memory_efficient_string_matching¶
Classes
A wrapper for the ahocorasick algorithm. |
- class kazu.steps.joint_ner_and_linking.memory_efficient_string_matching.MemoryEfficientStringMatchingStep[source]¶
Bases:
ParserDependentStep
A wrapper for the ahocorasick algorithm.
In testing, this implementation is comparable in speed to a spaCy PhraseMatcher, and uses a fraction of the memory. Since this implementation is unaware of NLP concepts such as tokenization, we backfill this capability by checking for word boundaries with a custom spaCy tokenizer.
- __call__(doc)[source]¶
Process documents and respond with processed and failed documents.
Note that many steps will be decorated by
document_iterating_step()
ordocument_batch_step()
which will modify the ‘original’__call__
function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.
- __init__(parsers)[source]¶
- Parameters:
parsers (Iterable[OntologyParser]) – parsers that this step requires