kazu.ontology_preprocessing.synonym_generation¶
Classes
For every permutation of modifiers, generate a list of syns, then aggregate at the end. |
|
Generate hyphenated variants of ngrams. |
|
Generate additional synonyms using a mapping of (known) synonyms to a list of variations. |
|
Remove stopwords from a string. |
|
Interchange all suffixes within a provided set to produce new synonyms. |
|
Given lists of tokens, generate an alternative string based upon a query token. |
|
Generate alternative verb phrases based on a list of tense templates, and lemmas matched in a query. |
- class kazu.ontology_preprocessing.synonym_generation.CombinatorialSynonymGenerator[source]¶
Bases:
object
For every permutation of modifiers, generate a list of syns, then aggregate at the end.
- __call__(ontology_resources)[source]¶
Takes a set of
OntologyStringResource
s, and returns a new set ofOntologyStringResource
s with generated synonyms added as alternative_synonyms.- Parameters:
ontology_resources (set[OntologyStringResource])
- Returns:
- Return type:
- __init__(synonym_generators)[source]¶
- Parameters:
synonym_generators (Iterable[SynonymGenerator])
- class kazu.ontology_preprocessing.synonym_generation.GreekSymbolSubstitution[source]¶
Bases:
object
- ALL_SUBS: dict[str, set[str]] = {'alpha': {'Α', 'α'}, 'beta': {'Β', 'β', 'ϐ'}, 'chi': {'Χ', 'χ'}, 'delta': {'Δ', 'δ'}, 'epsilon': {'Ε', 'ε'}, 'eta': {'Η', 'η'}, 'final sigma': {'ς'}, 'gamma': {'Γ', 'γ'}, 'iota': {'Ι', 'ι'}, 'kappa': {'Κ', 'κ'}, 'lambda': {'Λ', 'λ'}, 'mu': {'Μ', 'μ'}, 'nu': {'Ν', 'ν'}, 'omega': {'Ω', 'ω'}, 'omicron': {'Ο', 'ο'}, 'phi': {'Φ', 'φ', 'ϕ'}, 'pi': {'Π', 'π'}, 'psi': {'Ψ', 'ψ'}, 'rho': {'Ρ', 'ρ'}, 'sigma': {'Σ', 'σ'}, 'tau': {'Τ', 'τ'}, 'theta': {'Θ', 'θ', 'ϴ'}, 'upsilon': {'Υ', 'υ'}, 'xi': {'Ξ', 'ξ'}, 'zeta': {'Ζ', 'ζ'}, 'Α': {'a', 'alpha', 'α'}, 'Β': {'b', 'beta', 'β'}, 'Γ': {'g', 'gamma', 'γ'}, 'Δ': {'d', 'delta', 'δ'}, 'Ε': {'e', 'epsilon', 'ε'}, 'Ζ': {'z', 'zeta', 'ζ'}, 'Η': {'e', 'eta', 'η'}, 'Θ': {'t', 'theta', 'θ'}, 'Ι': {'i', 'iota', 'ι'}, 'Κ': {'k', 'kappa', 'κ'}, 'Λ': {'l', 'lambda', 'λ'}, 'Μ': {'m', 'mu', 'μ'}, 'Ν': {'n', 'nu', 'ν'}, 'Ξ': {'x', 'xi', 'ξ'}, 'Ο': {'o', 'omicron', 'ο'}, 'Π': {'p', 'pi', 'π'}, 'Ρ': {'r', 'rho', 'ρ'}, 'Σ': {'s', 'sigma', 'σ'}, 'Τ': {'t', 'tau', 'τ'}, 'Υ': {'u', 'upsilon', 'υ'}, 'Φ': {'p', 'phi', 'φ'}, 'Χ': {'c', 'chi', 'χ'}, 'Ψ': {'p', 'psi', 'ψ'}, 'Ω': {'o', 'omega', 'ω'}, 'α': {'a', 'alpha', 'Α'}, 'β': {'b', 'beta', 'Β'}, 'γ': {'g', 'gamma', 'Γ'}, 'δ': {'d', 'delta', 'Δ'}, 'ε': {'e', 'epsilon', 'Ε'}, 'ζ': {'z', 'zeta', 'Ζ'}, 'η': {'e', 'eta', 'Η'}, 'θ': {'t', 'theta', 'Θ'}, 'ι': {'i', 'iota', 'Ι'}, 'κ': {'k', 'kappa', 'Κ'}, 'λ': {'l', 'lambda', 'Λ'}, 'μ': {'m', 'mu', 'Μ'}, 'ν': {'n', 'nu', 'Ν'}, 'ξ': {'x', 'xi', 'Ξ'}, 'ο': {'o', 'omicron', 'Ο'}, 'π': {'p', 'pi', 'Π'}, 'ρ': {'r', 'rho', 'Ρ'}, 'ς': {'f', 'final sigma', 'Σ'}, 'σ': {'s', 'sigma', 'Σ'}, 'τ': {'t', 'tau', 'Τ'}, 'υ': {'u', 'upsilon', 'Υ'}, 'φ': {'p', 'phi', 'Φ'}, 'χ': {'c', 'chi', 'Χ'}, 'ψ': {'p', 'psi', 'Ψ'}, 'ω': {'o', 'omega', 'Ω'}, 'ϐ': {'b', 'beta', 'Β'}, 'ϕ': {'p', 'phi', 'Φ'}, 'ϴ': {'t', 'theta', 'θ'}}¶
- greek_letter = 'ω'¶
- lower_greek_letter = 'θ'¶
- spelling = 'omega'¶
- upper_greek_letter = 'Ω'¶
- class kazu.ontology_preprocessing.synonym_generation.NgramHyphenation[source]¶
Bases:
SynonymGenerator
Generate hyphenated variants of ngrams.
- class kazu.ontology_preprocessing.synonym_generation.SeparatorExpansion[source]¶
Bases:
SynonymGenerator
- class kazu.ontology_preprocessing.synonym_generation.SpellingVariationReplacement[source]¶
Bases:
SynonymGenerator
Generate additional synonyms using a mapping of (known) synonyms to a list of variations.
- class kazu.ontology_preprocessing.synonym_generation.StopWordRemover[source]¶
Bases:
SynonymGenerator
Remove stopwords from a string.
- classmethod call(synonym_str)[source]¶
Implementations should override this method to generate new strings from an input string.
- all_stopwords = {'and', 'by', 'caused', 'in', 'involved', 'of', 'the', 'to', 'with'}¶
- class kazu.ontology_preprocessing.synonym_generation.StringReplacement[source]¶
Bases:
SynonymGenerator
- call(synonym_str)[source]¶
Implementations should override this method to generate new strings from an input string.
- GREEK_VARIANT_PREFIX_SUFFIX = {' ', '-', '‐', '‑', '‒', '–', '—', '―', '−'}¶
- class kazu.ontology_preprocessing.synonym_generation.SuffixReplacement[source]¶
Bases:
SynonymGenerator
Interchange all suffixes within a provided set to produce new synonyms.
Note, this is expected to be noisy, and for most of the generated synonyms not to be valid words. This class is present as a generation step for high recall, with curation of synonyms expected later.
In particular, note that this also doesn’t check for the longest matching suffix - e.g. for a synonym ‘anaemia’ and the suffixes ‘ia’, ‘a’ and ‘ic’, the new synonyms ‘anaemic’ and ‘amaemiic’ will both be generated.
- class kazu.ontology_preprocessing.synonym_generation.SynonymGenerator[source]¶
Bases:
ABC
- class kazu.ontology_preprocessing.synonym_generation.TokenListReplacementGenerator[source]¶
Bases:
SynonymGenerator
Given lists of tokens, generate an alternative string based upon a query token.
Note, this implementation is pretty basic, and only replaces one token at a time. It’s mainly designed for ontologies like Meddra which stretch the definition of an entity somewhat, by incorporating verbs (e.g. “increase in AST”).
- class kazu.ontology_preprocessing.synonym_generation.VerbPhraseVariantGenerator[source]¶
Bases:
SynonymGenerator
Generate alternative verb phrases based on a list of tense templates, and lemmas matched in a query.
It’s mainly designed for ontologies like Meddra which stretch the definition of an entity somewhat, by incorporating verbs (e.g. “increase in AST”).
- __init__(tense_templates, lemmas_to_consider, spacy_model_path)[source]¶
- Parameters:
template expressons to generate, for example:
["{NOUN} {TARGET}", "{TARGET} in {NOUN}"]
lemmas_to_consider (dict[str, list[str]]) –
a dict of verb lemmas to surface forms to generate, for example:
{"increase": ["increasing", "increased"], "decrease": ["decreased", "decreasing"]}
spacy_model_path (str) – path to a serialised spaCy model - must have a lemmatizer component.
- call(synonym_str)[source]¶
Implementations should override this method to generate new strings from an input string.
- NOUN_PLACEHOLDER = 'NOUN'¶
- VERB_PLACEHOLDER = 'TARGET'¶