kazu.steps.linking.entity_class_disambiguation¶
Classes
ScoredContext(entity_class, score, thresh) |
|
TfIdfDisambiguationEntry(entity_class, tfidf_document, tfidf_vectorizer, thresh) |
- class kazu.steps.linking.entity_class_disambiguation.EntityClassDisambiguationStep[source]¶
Bases:
Step
Warning
This step is deprecated and may be removed in a future release.
- __call__(doc)[source]¶
Process documents and respond with processed and failed documents.
Note that many steps will be decorated by
document_iterating_step()
ordocument_batch_step()
which will modify the ‘original’__call__
function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.
- __init__(context)[source]¶
Optionally disambiguates the entity class (anatomy, drug, etc.) of entities that exactly share a span in a document.
For example, “UCB” could refer to “umbilical cord blood” an anatomical entity, or the pharmaceutical company UCB, a corporate entity. An expected context might be “umbilical pregnancy blood baby placenta…” in the former case, or “company business…” in the latter. Multiple expected contexts (disambiguation entries) should be provided to allow this step to choose the best matching entity class for an entity span. A tf-idf model is built to correlate an entity’s actual textual context with the provided expected context, and provided thresholds are used to allow the tf-idf model to choose the most suitable entity class.
- Parameters:
context (dict[str, list[DisambiguationEntry]]) – Specifies synonyms to disambiguate along with an expected textual context around those synonyms.
- class kazu.steps.linking.entity_class_disambiguation.EntityClassTfIdfScorer[source]¶
Bases:
object
- __init__(spans_to_tfidf_disambiguator)[source]¶
- Parameters:
spans_to_tfidf_disambiguator (dict[str, list[TfIdfDisambiguationEntry]])
- static build_tfidf_documents(spans_text_disambiguator)[source]¶
- Parameters:
spans_text_disambiguator (dict[str, list[DisambiguationEntry]])
- Return type:
- static disambiguation_entry_to_tfidf_entry(disamb_entry)[source]¶
- Parameters:
disamb_entry (DisambiguationEntry)
- Return type:
- static from_spans_to_sentence_disambiguator(spans_text_disambiguator)[source]¶
- Parameters:
spans_text_disambiguator (dict[str, list[DisambiguationEntry]])
- Return type:
- score_entity_context(ent_span, ent_context)[source]¶
Score the entity context against the TfIdf documents specified for the entity’s span.
- Parameters:
- Returns:
- Return type:
- static tfidf_score(ent_context, tfidf_disambig_entry)[source]¶
- Parameters:
ent_context (str)
tfidf_disambig_entry (TfIdfDisambiguationEntry)
- Return type:
- class kazu.steps.linking.entity_class_disambiguation.ScoredContext[source]¶
Bases:
NamedTuple
ScoredContext(entity_class, score, thresh)
- static __new__(_cls, entity_class, score, thresh)¶
Create new instance of ScoredContext(entity_class, score, thresh)
- class kazu.steps.linking.entity_class_disambiguation.TfIdfDisambiguationEntry[source]¶
Bases:
NamedTuple
TfIdfDisambiguationEntry(entity_class, tfidf_document, tfidf_vectorizer, thresh)
- static __new__(_cls, entity_class, tfidf_document, tfidf_vectorizer, thresh)¶
Create new instance of TfIdfDisambiguationEntry(entity_class, tfidf_document, tfidf_vectorizer, thresh)
- Parameters:
entity_class (str)
tfidf_document (ndarray)
tfidf_vectorizer (TfidfVectorizer)
thresh (float)
- tfidf_vectorizer: TfidfVectorizer¶
Alias for field number 2