Visualising results in Label Studio¶
Kazu is integrated into the popular Label Studio tool, so that you can visualise Kazu NER and linking information with the customised View we provide (including non-contiguous and nested entities). This is also useful for annotating and benchmarking Kazu against your own data, as well as testing custom components.
Our recommended workflow is as follows:
pre-annotate your documents with Kazu
import hydra from hydra.utils import instantiate from omegaconf import DictConfig from kazu.utils.constants import HYDRA_VERSION_BASE from kazu.pipeline import Pipeline from kazu.data import Document @hydra.main(version_base=HYDRA_VERSION_BASE, config_path="conf", config_name="config") def run_docs(cfg: DictConfig) -> None: pipeline: Pipeline = instantiate(cfg.Pipeline) docs = [Document.create_simple_document(x) for x in ["doc 1 text", "doc 2 text etc"]] pipeline(docs) if __name__ == "__main__": run_docs()
load your annotations into Label Studio
from kazu.annotation.label_studio import ( LabelStudioManager, LabelStudioAnnotationView, ) from kazu.data import Document docs: list[Document] # create the view view = LabelStudioAnnotationView( ner_labels={ "cell_line": "red", "cell_type": "darkblue", "disease": "orange", "drug": "yellow", "gene": "green", "species": "purple", "anatomy": "pink", "molecular_function": "grey", "cellular_component": "blue", "biological_process": "brown", } ) # if running locally... url_and_port = "http://localhost:8080" headers = { "Authorization": "Token <your token here>", "Content-Type": "application/json", } manager = LabelStudioManager(project_name="test", headers=headers, url=url_and_port) manager.create_linking_project() manager.update_tasks(docs) manager.update_view(view=view, docs=docs)
view/correct annotations in label studio. Once you’re finished, you can export back to Kazu Documents as follows:
from kazu.annotation.label_studio import LabelStudioManager from kazu.data import Document url_and_port = "http://localhost:8080" headers = { "Authorization": "Token <your token here>", "Content-Type": "application/json", } manager = LabelStudioManager(project_name="test", headers=headers, url=url_and_port) docs: list[Document] = manager.export_from_ls()
Your ‘gold standard’ entities will now be accessible on the
kazu.data.Section.metadata
dictionary with the key: ‘gold_entities’
For an example of how we integrate label studio into the Kazu acceptance tests, take a look at kazu.annotation.acceptance_test.analyse_full_pipeline()