kazu.steps.linking.entity_class_disambiguation

Classes

DisambiguationEntry

EntityClassDisambiguationStep

EntityClassTfIdfScorer

ScoredContext

ScoredContext(entity_class, score, thresh)

TfIdfDisambiguationEntry

TfIdfDisambiguationEntry(entity_class, tfidf_document, tfidf_vectorizer, thresh)

class kazu.steps.linking.entity_class_disambiguation.DisambiguationEntry[source]

Bases: TypedDict

entity_class: str
relevant_text: list[str]
thresh: float
class kazu.steps.linking.entity_class_disambiguation.EntityClassDisambiguationStep[source]

Bases: Step

Warning

This step is deprecated and may be removed in a future release.

__call__(doc)[source]

Process documents and respond with processed and failed documents.

Note that many steps will be decorated by document_iterating_step() or document_batch_step() which will modify the ‘original’ __call__ function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.

Parameters:
Returns:

The first element is all the provided docs (now modified by the processing), the second is the docs that failed to (fully) process correctly.

Return type:

tuple[list[Document], list[Document]]

__init__(context)[source]

Optionally disambiguates the entity class (anatomy, drug, etc.) of entities that exactly share a span in a document.

For example, “UCB” could refer to “umbilical cord blood” an anatomical entity, or the pharmaceutical company UCB, a corporate entity. An expected context might be “umbilical pregnancy blood baby placenta…” in the former case, or “company business…” in the latter. Multiple expected contexts (disambiguation entries) should be provided to allow this step to choose the best matching entity class for an entity span. A tf-idf model is built to correlate an entity’s actual textual context with the provided expected context, and provided thresholds are used to allow the tf-idf model to choose the most suitable entity class.

Parameters:

context (dict[str, list[DisambiguationEntry]]) – Specifies synonyms to disambiguate along with an expected textual context around those synonyms.

static sentence_context_for_entity(entity, section, window=3)[source]
Parameters:
Return type:

str

spangrouped_ent_section_pairs(doc)[source]
Parameters:

doc (Document)

Return type:

Iterable[list[tuple[Entity, Section]]]

class kazu.steps.linking.entity_class_disambiguation.EntityClassTfIdfScorer[source]

Bases: object

__init__(spans_to_tfidf_disambiguator)[source]
Parameters:

spans_to_tfidf_disambiguator (dict[str, list[TfIdfDisambiguationEntry]])

static build_tfidf_documents(spans_text_disambiguator)[source]
Parameters:

spans_text_disambiguator (dict[str, list[DisambiguationEntry]])

Return type:

dict[str, list[TfIdfDisambiguationEntry]]

static disambiguation_entry_to_tfidf_entry(disamb_entry)[source]
Parameters:

disamb_entry (DisambiguationEntry)

Return type:

TfIdfDisambiguationEntry

static from_spans_to_sentence_disambiguator(spans_text_disambiguator)[source]
Parameters:

spans_text_disambiguator (dict[str, list[DisambiguationEntry]])

Return type:

EntityClassTfIdfScorer

score_entity_context(ent_span, ent_context)[source]

Score the entity context against the TfIdf documents specified for the entity’s span.

Parameters:
  • ent_span (str)

  • ent_context (str)

Returns:

Return type:

Iterable[ScoredContext]

static tfidf_score(ent_context, tfidf_disambig_entry)[source]
Parameters:
Return type:

ScoredContext

class kazu.steps.linking.entity_class_disambiguation.ScoredContext[source]

Bases: NamedTuple

ScoredContext(entity_class, score, thresh)

static __new__(_cls, entity_class, score, thresh)

Create new instance of ScoredContext(entity_class, score, thresh)

Parameters:
entity_class: str

Alias for field number 0

score: float

Alias for field number 1

thresh: float

Alias for field number 2

class kazu.steps.linking.entity_class_disambiguation.TfIdfDisambiguationEntry[source]

Bases: NamedTuple

TfIdfDisambiguationEntry(entity_class, tfidf_document, tfidf_vectorizer, thresh)

static __new__(_cls, entity_class, tfidf_document, tfidf_vectorizer, thresh)

Create new instance of TfIdfDisambiguationEntry(entity_class, tfidf_document, tfidf_vectorizer, thresh)

Parameters:
entity_class: str

Alias for field number 0

tfidf_document: ndarray

Alias for field number 1

tfidf_vectorizer: TfidfVectorizer

Alias for field number 2

thresh: float

Alias for field number 3