kazu.language.string_similarity_scorers¶
Classes
Checks all modifier phrases in reference_term are represented in query_term. |
|
Checks all TYPE x mentions in match norm are represented in syn norm. |
|
Checks all numbers in reference_term are represented in query_term. |
|
Uses rapid fuzz to calculate string similarity. |
|
Note this is an implementation of the StringSimilarityScorer Protocol, but as a Singleton we can't inherit it. |
|
Calculates a NumericMetric based on a string match or a normalised string match and a normalised synonym. |
- class kazu.language.string_similarity_scorers.BooleanStringSimilarityScorer[source]¶
Bases:
StringSimilarityScorer
,Protocol
- class kazu.language.string_similarity_scorers.EntityNounModifierStringSimilarityScorer[source]¶
Bases:
BooleanStringSimilarityScorer
Checks all modifier phrases in reference_term are represented in query_term.
- class kazu.language.string_similarity_scorers.EntitySubtypeStringSimilarityScorer[source]¶
Bases:
BooleanStringSimilarityScorer
Checks all TYPE x mentions in match norm are represented in syn norm.
- numeric_class_phrases = re.compile('TYPE (?:I|[0-9]+)')¶
- class kazu.language.string_similarity_scorers.NumberMatchStringSimilarityScorer[source]¶
Bases:
BooleanStringSimilarityScorer
Checks all numbers in reference_term are represented in query_term.
- number_finder = re.compile('[0-9]+')¶
- class kazu.language.string_similarity_scorers.RapidFuzzStringSimilarityScorer[source]¶
Bases:
StringSimilarityScorer
Uses rapid fuzz to calculate string similarity.
Note, if the token count >4 and reference_term has more than 10 chars, token_sort_ratio is used. Otherwise, WRatio is used
- class kazu.language.string_similarity_scorers.SapbertStringSimilarityScorer[source]¶
Bases:
object
Note this is an implementation of the StringSimilarityScorer Protocol, but as a Singleton we can’t inherit it.
- __init__(sapbert, cache_size=1000)[source]¶
- Parameters:
sapbert (SapBertHelper) – The sapbert model to use
cache_size (int) – cache size, to prevent repeated calls to sapbert for the same string