kazu.steps.ner.llm_ner

Classes

AzureOpenAILLMModel

A class to interact with the Azure OpenAI API for LLMs.

LLMModel

LLMNERStep

A step to perform Named Entity Recognition using a Language Model.

VertexLLMModel

A class to interact with the VertexAI API for LLMs.

class kazu.steps.ner.llm_ner.AzureOpenAILLMModel[source]

Bases: LLMModel

A class to interact with the Azure OpenAI API for LLMs.

__init__(model, deployment, api_version, sys_prompt, temp)[source]

Initialize the AzureOpenAILLMModel.

Parameters:
  • model (str) – The model to use.

  • deployment (str) – The deployment to use.

  • api_version (str) – The API version to use.

  • sys_prompt (str) – The system prompt to use.

  • temp (float) – The temperature to use.

Return type:

None

call_llm(text)[source]

Call the LLM model with the given text and return the raw response.

Parameters:

text (str) – The text to pass to the LLM model.

Returns:

The raw response from the LLM model.

Return type:

str

class kazu.steps.ner.llm_ner.LLMModel[source]

Bases: ABC

__call__(text)[source]

Call the LLM model with the given text and return the raw response and parsed entities.

Parameters:

text (str) – The text to pass to the LLM model.

Returns:

A tuple of the raw response and the found entities as a dict.

Return type:

tuple[str, dict[str, str] | None]

abstract call_llm(text)[source]

Call the LLM model with the given text and return the raw response.

Parameters:

text (str) – The text to pass to the LLM model.

Returns:

The raw response from the LLM model.

Return type:

str

static parse_result(result)[source]

Parse the raw response from the LLM model into a dictionary of entities.

Parameters:

result (str) – The raw response from the LLM model.

Returns:

A dictionary of entities and their class.

Return type:

dict[str, str] | None

class kazu.steps.ner.llm_ner.LLMNERStep[source]

Bases: Step

A step to perform Named Entity Recognition using a Language Model.

The LLM is used to produce a raw json response per document section, which is then parsed into entities and their classes, then ahocorasick is used to find matches in the document text. If there are conflicts, the class of the first match in the document is used.

__call__(doc)[source]

Process documents and respond with processed and failed documents.

Note that many steps will be decorated by document_iterating_step() or document_batch_step() which will modify the ‘original’ __call__ function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.

Parameters:
Returns:

The first element is all the provided docs (now modified by the processing), the second is the docs that failed to (fully) process correctly.

Return type:

tuple[list[Document], list[Document]]

__init__(model, drop_failed_sections=False)[source]

Initialize the LLMNERStep.

Parameters:
  • model (LLMModel) – The LLM model to use.

  • drop_failed_sections (bool) – Whether to drop sections that fail to parse. This is useful if you want to generate training data for fine-tuning a smaller model.

Return type:

None

class kazu.steps.ner.llm_ner.VertexLLMModel[source]

Bases: LLMModel

A class to interact with the VertexAI API for LLMs.

__init__(project, prompt, model, generation_config, location, safety_settings=None)[source]

Initialize the VertexLLMModel.

Parameters:
  • project (str) – The project to use.

  • prompt (str) – The prompt to use.

  • model (str) – The model to use.

  • generation_config (dict[str, Any]) – The generation config to use.

  • location (str) – The location to use.

  • safety_settings (List[SafetySetting] | Dict[HarmCategory, HarmBlockThreshold] | None) – The safety settings to use. Optional.

Return type:

None

call_llm(text)[source]

Call the LLM model with the given text and return the raw response.

Parameters:

text (str) – The text to pass to the LLM model.

Returns:

The raw response from the LLM model.

Return type:

str

set_safety_settings(safety_settings=None)[source]
Parameters:

safety_settings (List[SafetySetting] | Dict[HarmCategory, HarmBlockThreshold] | None)

Return type:

None