kazu.steps.ner.seth

Classes

SethStep

A Step that calls SETH (SNP Extraction Tool for Human Variations) over py4j.

class kazu.steps.ner.seth.SethStep[source]

Bases: Step

A Step that calls SETH (SNP Extraction Tool for Human Variations) over py4j.

Attention

To use this step, you will need py4j installed, which is not installed as part of the default kazu install because this step isn’t used as part of the default pipeline.

You can either do:

$ pip install py4j

Or you can install required dependencies for all steps included in kazu with:

$ pip install kazu[all-steps]

Paper:

Thomas, P., Rocktäschel, T., Hakenberg, J., Mayer, L., and Leser, U. (2016).
Bioinformatics (2016)
Bibtex Citation Details
@Article{SETH2016,
Title= {SETH detects and normalizes genetic variants in text.},
Author= {Thomas, Philippe and Rockt{"{a}}schel, Tim and Hakenberg, J{"{o}}rg and Lichtblau, Yvonne and Leser, Ulf},
Journal= {Bioinformatics},
Year= {2016},
Month= {Jun},
Doi= {10.1093/bioinformatics/btw234},
Language = {eng},
Medline-pst = {aheadofprint},
Pmid = {27256315},
Url = {http://dx.doi.org/10.1093/bioinformatics/btw234}
}
__call__(doc)[source]

Process documents and respond with processed and failed documents.

Note that many steps will be decorated by document_iterating_step() or document_batch_step() which will modify the ‘original’ __call__ function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.

Parameters:
Returns:

The first element is all the provided docs (now modified by the processing), the second is the docs that failed to (fully) process correctly.

Return type:

tuple[list[Document], list[Document]]

__init__(entity_class, seth_fatjar_path, java_home, condition=None)[source]
Parameters:
  • entity_class (str) – the entity_class to assign to any Entities that emerge

  • seth_fatjar_path (str) – path to a py4j fatjar, containing SETH dependencies

  • java_home (str) – path to installed java runtime

  • condition (Callable[[Document], bool] | None) – Since SETH can be slow, we can optionally specify a callable, so that any documents that don’t contain pre-existing gene/protein entities are not processed