kazu.steps.ner.llm_ner¶
Classes
A class to interact with the Azure OpenAI API for LLMs. |
|
Tries to identify a valid JSON from the LLM response. |
|
A step to perform Named Entity Recognition using a Language Model. |
|
If a document is very long, it may exceed the LLM context length. |
|
If LLM is configured for a structured output, this parser can be used to select a key that contains the entities. |
|
A class to interact with the VertexAI API for LLMs. |
- class kazu.steps.ner.llm_ner.AzureOpenAILLMModel[source]¶
Bases:
LLMModel
A class to interact with the Azure OpenAI API for LLMs.
- class kazu.steps.ner.llm_ner.FreeFormResultParser[source]¶
Bases:
ResultParser
Tries to identify a valid JSON from the LLM response.
- class kazu.steps.ner.llm_ner.LLMModel[source]¶
Bases:
Protocol
- class kazu.steps.ner.llm_ner.LLMNERStep[source]¶
Bases:
Step
A step to perform Named Entity Recognition using a Language Model.
The LLM is used to produce a raw json response per document section, which is then parsed into entities and their classes, then ahocorasick is used to find matches in the document text. If there are conflicts, the class of the first match in the document is used.
- __call__(doc)[source]¶
Process documents and respond with processed and failed documents.
Note that many steps will be decorated by
document_iterating_step()
ordocument_batch_step()
which will modify the ‘original’__call__
function signature to match the expected signature for a step, as the decorators handle the exception/failed documents logic for you.
- __init__(model, result_parser, section_processing_strategy=SectionProcessingStrategy.CONCATENATE_AND_PROCESS)[source]¶
Initialize the LLMNERStep.
- Parameters:
model (LLMModel) – The LLM model to use.
result_parser (ResultParser) – How should the raw response be parsed into entities.
section_processing_strategy (SectionProcessingStrategy) – How should the sections be processed.
- Return type:
None
- class kazu.steps.ner.llm_ner.SectionProcessingStrategy[source]¶
Bases:
AutoNameEnum
If a document is very long, it may exceed the LLM context length.
This enum provides the means to process document sections individually.
- CONCATENATE_AND_PROCESS = 'CONCATENATE_AND_PROCESS'¶
- PROCESS_INDIVIDUALLY_AND_DROP_FAILED_SECTIONS = 'PROCESS_INDIVIDUALLY_AND_DROP_FAILED_SECTIONS'¶
- PROCESS_INDIVIDUALLY_AND_KEEP_FAILED_SECTIONS = 'PROCESS_INDIVIDUALLY_AND_KEEP_FAILED_SECTIONS'¶
- class kazu.steps.ner.llm_ner.StructuredOutputResultParser[source]¶
Bases:
ResultParser
If LLM is configured for a structured output, this parser can be used to select a key that contains the entities.
- class kazu.steps.ner.llm_ner.VertexLLMModel[source]¶
Bases:
LLMModel
A class to interact with the VertexAI API for LLMs.
- __init__(project, prompt, model, generation_config, location, safety_settings=None)[source]¶
Initialize the VertexLLMModel.
- Parameters:
project (str) – The project to use.
prompt (str) – The prompt to use.
model (str) – The model to use.
generation_config (dict[str, Any]) – The generation config to use.
location (str) – The location to use.
safety_settings (List[SafetySetting] | Dict[HarmCategory, HarmBlockThreshold] | None) – The safety settings to use. Optional.
- Return type:
None