kazu.steps.ner.llm_ner

Classes

AzureOpenAILLMModel

A class to interact with the Azure OpenAI API for LLMs.

FreeFormResultParser

Tries to identify a valid JSON from the LLM response.

LLMModel

LLMNERStep

A step to perform Named Entity Recognition using a Language Model.

ResultParser

SectionProcessingStrategy

If a document is very long, it may exceed the LLM context length.

StructuredOutputResultParser

If LLM is configured for a structured output, this parser can be used to select a key that contains the entities.

VertexLLMModel

A class to interact with the VertexAI API for LLMs.

class kazu.steps.ner.llm_ner.AzureOpenAILLMModel[source]

Bases: LLMModel

A class to interact with the Azure OpenAI API for LLMs.

__call__(text)[source]

Call the LLM model with the given text and return the raw response.

Parameters:

text (str) – The text to pass to the LLM model.

Returns:

the raw string response

Return type:

str

__init__(model, deployment, api_version, sys_prompt, temp)[source]

Initialize the AzureOpenAILLMModel.

Parameters:
  • model (str) – The model to use.

  • deployment (str) – The deployment to use.

  • api_version (str) – The API version to use.

  • sys_prompt (str) – The system prompt to use.

  • temp (float) – The temperature to use.

Return type:

None

class kazu.steps.ner.llm_ner.FreeFormResultParser[source]

Bases: ResultParser

Tries to identify a valid JSON from the LLM response.

parse_result(result)[source]

Parse the raw response from the LLM model into a dictionary of entities.

Parameters:

result (str) – The raw response from the LLM model.

Returns:

A dictionary of entities and their class.

Return type:

dict[str, Any]

class kazu.steps.ner.llm_ner.LLMModel[source]

Bases: Protocol

__call__(text)[source]

Call the LLM model with the given text and return the raw response.

Parameters:

text (str) – The text to pass to the LLM model.

Returns:

the raw string response

Return type:

str

__init__(*args, **kwargs)[source]
class kazu.steps.ner.llm_ner.LLMNERStep[source]

Bases: Step

A step to perform Named Entity Recognition using a Language Model.

The LLM is used to produce a raw json response per document section, which is then parsed into entities and their classes, then ahocorasick is used to find matches in the document text. If there are conflicts, the class of the first match in the document is used.

__call__(doc)[source]

Process documents and respond with processed and failed documents.

Note that many steps will be decorated by document_iterating_step() or document_batch_step() which will modify the ‘original’ __call__ function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.

Parameters:
Returns:

The first element is all the provided docs (now modified by the processing), the second is the docs that failed to (fully) process correctly.

Return type:

tuple[list[Document], list[Document]]

__init__(model, result_parser, section_processing_strategy=SectionProcessingStrategy.CONCATENATE_AND_PROCESS)[source]

Initialize the LLMNERStep.

Parameters:
  • model (LLMModel) – The LLM model to use.

  • result_parser (ResultParser) – How should the raw response be parsed into entities.

  • section_processing_strategy (SectionProcessingStrategy) – How should the sections be processed.

Return type:

None

class kazu.steps.ner.llm_ner.ResultParser[source]

Bases: Protocol

__init__(*args, **kwargs)[source]
parse_result(result)[source]

Parse the raw response from the LLM model into a dictionary of entities.

Parameters:

result (str) – The raw response from the LLM model.

Returns:

A dictionary of entities and their class.

Return type:

dict[str, Any]

class kazu.steps.ner.llm_ner.SectionProcessingStrategy[source]

Bases: AutoNameEnum

If a document is very long, it may exceed the LLM context length.

This enum provides the means to process document sections individually.

__new__(value)[source]
CONCATENATE_AND_PROCESS = 'CONCATENATE_AND_PROCESS'
PROCESS_INDIVIDUALLY_AND_DROP_FAILED_SECTIONS = 'PROCESS_INDIVIDUALLY_AND_DROP_FAILED_SECTIONS'
PROCESS_INDIVIDUALLY_AND_KEEP_FAILED_SECTIONS = 'PROCESS_INDIVIDUALLY_AND_KEEP_FAILED_SECTIONS'
class kazu.steps.ner.llm_ner.StructuredOutputResultParser[source]

Bases: ResultParser

If LLM is configured for a structured output, this parser can be used to select a key that contains the entities.

__init__(entity_key)[source]

Initialize the StructuredOutputResultParser.

Parameters:

entity_key (str) – The key in the structured output that contains the entities.

Return type:

None

parse_result(result)[source]

Parse the raw response from the LLM model into a dictionary of entities.

Parameters:

result (str) – The raw response from the LLM model.

Returns:

A dictionary of entities and their class.

Return type:

dict[str, str]

class kazu.steps.ner.llm_ner.VertexLLMModel[source]

Bases: LLMModel

A class to interact with the VertexAI API for LLMs.

__call__(text)[source]

Call the LLM model with the given text and return the raw response.

Parameters:

text (str) – The text to pass to the LLM model.

Returns:

the raw string response

Return type:

str

__init__(project, prompt, model, generation_config, location, safety_settings=None)[source]

Initialize the VertexLLMModel.

Parameters:
  • project (str) – The project to use.

  • prompt (str) – The prompt to use.

  • model (str) – The model to use.

  • generation_config (dict[str, Any]) – The generation config to use.

  • location (str) – The location to use.

  • safety_settings (List[SafetySetting] | Dict[HarmCategory, HarmBlockThreshold] | None) – The safety settings to use. Optional.

Return type:

None

set_safety_settings(safety_settings=None)[source]
Parameters:

safety_settings (List[SafetySetting] | Dict[HarmCategory, HarmBlockThreshold] | None)

Return type:

None