kazu.steps.step¶
Module Attributes
A TypeVar for the type of the class whose method is decorated with |
Functions
|
Add error handling to a method that processes batches of |
|
Handle a list of |
Classes
A step that depends on ontology parsers in any form. |
|
- class kazu.steps.step.ParserDependentStep[source]¶
Bases:
Step
A step that depends on ontology parsers in any form.
Steps that need information from parsers should subclass this class, in order for the internal databases to be correctly populated. Generally, these will be steps that have anything to do with Entity Linking.
- __init__(parsers)[source]¶
- Parameters:
parsers (Iterable[OntologyParser]) – parsers that this step requires
- class kazu.steps.step.Self¶
A TypeVar for the type of the class whose method is decorated with
document_iterating_step()
ordocument_batch_step()
.alias of TypeVar(‘Self’)
- class kazu.steps.step.Step[source]¶
Bases:
Protocol
- __call__(docs)[source]¶
Process documents and respond with processed and failed documents.
Note that many steps will be decorated by
document_iterating_step()
ordocument_batch_step()
which will modify the ‘original’__call__
function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.
- kazu.steps.step.document_batch_step(batch_doc_callable)[source]¶
Add error handling to a method that processes batches of
Document
s.Use this to decorate a method that processes a batch of
Document
s at a time. The resulting method will wrap a call to the decorated function with error handling which will add exceptions to thePROCESSING_EXCEPTION
metadata of documents. Failed documents will be returned as the second element of the return value, as expected byStep.__call__()
.Generally speaking, it will save effort and repetition to decorate a
Step
with eitherdocument_iterating_step()
ordocument_batch_step()
, rather than implementing the error handling in theStep
itself.Normally,
document_iterating_step()
would be used in preference todocument_batch_step()
, unless the method involves computation which is more efficient when run in a batch, such as inference with a transformer-based Machine Learning model, or using spacy’s pipe method.Note that this will only work for a method of a class, rather than a standalone function, as it expects to have to pass through ‘self’ as a parameter.
- Parameters:
batch_doc_callable (Callable[[Self, list[Document]], Any]) – A function that processes a batch of documents, that you want to use as the
__call__
method of aStep
. This must do its work by mutating the input documents: the return value is ignored.- Returns:
- Return type:
Callable[[Self, list[Document]], tuple[list[Document], list[Document]]]
- kazu.steps.step.document_iterating_step(per_doc_callable)[source]¶
Handle a list of
Document
s and add error handling.Use this to decorate a method that processes a single
Document
. The resulting method will then iterate over a list ofDocument
s, calling the decorated function for eachDocument
. Errors are handled automatically and added to thePROCESSING_EXCEPTION
metadata of documents, with failed docs returned as the second element of the return value, as expected byStep.__call__()
.Generally speaking, it will save effort and repetition to decorate a
Step
with eitherdocument_iterating_step()
ordocument_batch_step()
, rather than implementing the error handling in theStep
itself.Normally,
document_iterating_step()
would be used in preference todocument_batch_step()
, unless the method involves computation which is more efficient when run in a batch, such as inference with a transformer-based Machine Learning model, or using spaCy’s pipe method.Note that this will only work for a method of a class, rather than a standalone function, as it expects to have to pass through ‘self’ as a parameter.
- Parameters:
per_doc_callable (Callable[[Self, Document], Any]) – A function that processes a single document, that you want to use as the
__call__
method of aStep
. This must do its work by mutating the input document: the return value is ignored.- Returns:
- Return type:
Callable[[Self, list[Document]], tuple[list[Document], list[Document]]]