kazu.language.string_similarity_scorers

Classes

BooleanStringSimilarityScorer

EntityNounModifierStringSimilarityScorer

Checks all modifier phrases in reference_term are represented in query_term.

EntitySubtypeStringSimilarityScorer

Checks all TYPE x mentions in match norm are represented in syn norm.

NumberMatchStringSimilarityScorer

Checks all numbers in reference_term are represented in query_term.

RapidFuzzStringSimilarityScorer

Uses rapid fuzz to calculate string similarity.

SapbertStringSimilarityScorer

Note this is an implementation of the StringSimilarityScorer Protocol, but as a Singleton we can't inherit it.

StringSimilarityScorer

Calculates a NumericMetric based on a string match or a normalised string match and a normalised synonym.

class kazu.language.string_similarity_scorers.BooleanStringSimilarityScorer[source]

Bases: StringSimilarityScorer, Protocol

__call__(reference_term, query_term)[source]

Call self as a function.

Parameters:
  • reference_term (str)

  • query_term (str)

Return type:

bool

class kazu.language.string_similarity_scorers.EntityNounModifierStringSimilarityScorer[source]

Bases: BooleanStringSimilarityScorer

Checks all modifier phrases in reference_term are represented in query_term.

__call__(reference_term, query_term)[source]

Call self as a function.

Parameters:
  • reference_term (str)

  • query_term (str)

Return type:

bool

__init__(noun_modifier_phrases)[source]
Parameters:

noun_modifier_phrases (list[str])

class kazu.language.string_similarity_scorers.EntitySubtypeStringSimilarityScorer[source]

Bases: BooleanStringSimilarityScorer

Checks all TYPE x mentions in match norm are represented in syn norm.

classmethod __call__(reference_term, query_term)[source]

Call self as a function.

Parameters:
  • reference_term (str)

  • query_term (str)

Return type:

bool

numeric_class_phrases = re.compile('TYPE (?:I|[0-9]+)')
class kazu.language.string_similarity_scorers.NumberMatchStringSimilarityScorer[source]

Bases: BooleanStringSimilarityScorer

Checks all numbers in reference_term are represented in query_term.

classmethod __call__(reference_term, query_term)[source]

Call self as a function.

Parameters:
  • reference_term (str)

  • query_term (str)

Return type:

bool

number_finder = re.compile('[0-9]+')
class kazu.language.string_similarity_scorers.RapidFuzzStringSimilarityScorer[source]

Bases: StringSimilarityScorer

Uses rapid fuzz to calculate string similarity.

Note, if the token count >4 and reference_term has more than 10 chars, token_sort_ratio is used. Otherwise, WRatio is used

static __call__(reference_term, query_term)[source]

Call self as a function.

Parameters:
  • reference_term (str)

  • query_term (str)

Return type:

bool | int | float

class kazu.language.string_similarity_scorers.SapbertStringSimilarityScorer[source]

Bases: object

Note this is an implementation of the StringSimilarityScorer Protocol, but as a Singleton we can’t inherit it.

__call__(reference_term, query_term)[source]

Call self as a function.

Parameters:
  • reference_term (str)

  • query_term (str)

Return type:

float

__init__(sapbert, cache_size=1000)[source]
Parameters:
  • sapbert (SapBertHelper) – The sapbert model to use

  • cache_size (int) – cache size, to prevent repeated calls to sapbert for the same string

class kazu.language.string_similarity_scorers.StringSimilarityScorer[source]

Bases: Protocol

Calculates a NumericMetric based on a string match or a normalised string match and a normalised synonym.

__call__(reference_term, query_term)[source]

Call self as a function.

Parameters:
  • reference_term (str)

  • query_term (str)

Return type:

bool | int | float

__init__(*args, **kwargs)[source]