Visualising results in Label Studio

Kazu is integrated into the popular Label Studio tool, so that you can visualise Kazu NER and linking information with the customised View we provide (including non-contiguous and nested entities). This is also useful for annotating and benchmarking Kazu against your own data, as well as testing custom components.

Our recommended workflow is as follows:

  1. pre-annotate your documents with Kazu

    import hydra
    from hydra.utils import instantiate
    from omegaconf import DictConfig
    from kazu.utils.constants import HYDRA_VERSION_BASE
    from kazu.pipeline import Pipeline
    from kazu.data import Document
    
    
    @hydra.main(version_base=HYDRA_VERSION_BASE, config_path="conf", config_name="config")
    def run_docs(cfg: DictConfig) -> None:
        pipeline: Pipeline = instantiate(cfg.Pipeline)
        docs = [Document.create_simple_document(x) for x in ["doc 1 text", "doc 2 text etc"]]
        pipeline(docs)
    
    
    if __name__ == "__main__":
        run_docs()
    
  2. load your annotations into Label Studio

    from kazu.annotation.label_studio import (
        LabelStudioManager,
        LabelStudioAnnotationView,
    )
    from kazu.data import Document
    
    docs: list[Document]
    
    # create the view
    view = LabelStudioAnnotationView(
        ner_labels={
            "cell_line": "red",
            "cell_type": "darkblue",
            "disease": "orange",
            "drug": "yellow",
            "gene": "green",
            "species": "purple",
            "anatomy": "pink",
            "molecular_function": "grey",
            "cellular_component": "blue",
            "biological_process": "brown",
        }
    )
    
    # if running locally...
    url_and_port = "http://localhost:8080"
    headers = {
        "Authorization": "Token <your token here>",
        "Content-Type": "application/json",
    }
    
    manager = LabelStudioManager(project_name="test", headers=headers, url=url_and_port)
    manager.create_linking_project()
    manager.update_tasks(docs)
    manager.update_view(view=view, docs=docs)
    
  3. view/correct annotations in label studio. Once you’re finished, you can export back to Kazu Documents as follows:

    from kazu.annotation.label_studio import LabelStudioManager
    from kazu.data import Document
    
    url_and_port = "http://localhost:8080"
    headers = {
        "Authorization": "Token <your token here>",
        "Content-Type": "application/json",
    }
    
    manager = LabelStudioManager(project_name="test", headers=headers, url=url_and_port)
    
    docs: list[Document] = manager.export_from_ls()
    
  4. Your ‘gold standard’ entities will now be accessible on the kazu.data.Section.metadata dictionary with the key: ‘gold_entities’

For an example of how we integrate label studio into the Kazu acceptance tests, take a look at kazu.annotation.acceptance_test.analyse_full_pipeline()