kazu.ontology_preprocessing.downloads

Some functions to update public resources in the kazu model pack.

Classes

ChemblParquetOntologyDownloader

Downloads the ChEMBL database and exports a subset of it as a parquet file.

OBOOntologyDownloader

OntologyDownloader

OpenTargetsOntologyDownloader

OwlOntologyDownloader

Used when version info is contained within the ontology.

SimpleOntologyDownloader

class kazu.ontology_preprocessing.downloads.ChemblParquetOntologyDownloader[source]

Bases: OntologyDownloader

Downloads the ChEMBL database and exports a subset of it as a parquet file.

__init__(chembl_version)[source]
Parameters:

chembl_version (str)

download(local_path, skip_download=False)[source]

Download the ontology to the local path.

Parameters:
  • local_path (Path) – the path to download the ontology to

  • skip_download (bool) – whether to skip the download

Returns:

the path to the downloaded ontology

Return type:

Path

version(local_path=None)[source]

Get the version of the ontology.

Note that this method should be idempotent, i.e. it should not change the state of the ontology. Also, it may be able to determine the version of the ontology without querying it directly (e.g. by looking at the file name, or if it is known a priori). If this is not the case, you can implement a method here to do something more sophisticated, such as querying the ontology directly via sparql.

Alternatively, it may only be known prior to calling the download method. In this case, implementations should store the version as a field, so it can be returned from there.

Parameters:

local_path (Path | None) – the path to the ontology

Returns:

the version of the ontology

Return type:

str

CHEMBL_FILENAME_TEMPLATE = 'chembl_%s_subset.parquet'
CHEMBL_QUERY = "SELECT DISTINCT * FROM (\n                        SELECT chembl_id AS idx, pref_name AS default_label, synonyms AS syn, syn_type AS mapping_type\n                        FROM molecule_dictionary AS md\n                            JOIN molecule_synonyms ms ON md.molregno = ms.molregno\n                        UNION ALL\n                        SELECT chembl_id AS idx, pref_name AS default_label, pref_name AS syn, 'pref_name' AS mapping_type\n                        FROM molecule_dictionary) as t1\n                    WHERE t1.DEFAULT_LABEL is not null;\n                    "
CHEMBL_TEMPLATE = 'https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_%s/chembl_%s_sqlite.tar.gz'
class kazu.ontology_preprocessing.downloads.OBOOntologyDownloader[source]

Bases: SimpleOntologyDownloader

version(local_path=None)[source]

Get the version of the ontology.

Note that this method should be idempotent, i.e. it should not change the state of the ontology. Also, it may be able to determine the version of the ontology without querying it directly (e.g. by looking at the file name, or if it is known a priori). If this is not the case, you can implement a method here to do something more sophisticated, such as querying the ontology directly via sparql.

Alternatively, it may only be known prior to calling the download method. In this case, implementations should store the version as a field, so it can be returned from there.

Parameters:

local_path (Path | None) – the path to the ontology

Returns:

the version of the ontology

Return type:

str

class kazu.ontology_preprocessing.downloads.OntologyDownloader[source]

Bases: ABC

delete_previous(local_path)[source]

Delete the previous version of the ontology.

Parameters:

local_path (Path) – the path to the ontology to delete

Return type:

None

abstract download(local_path, skip_download=False)[source]

Download the ontology to the local path.

Parameters:
  • local_path (Path) – the path to download the ontology to

  • skip_download (bool) – whether to skip the download

Returns:

the path to the downloaded ontology

Return type:

Path

abstract version(local_path=None)[source]

Get the version of the ontology.

Note that this method should be idempotent, i.e. it should not change the state of the ontology. Also, it may be able to determine the version of the ontology without querying it directly (e.g. by looking at the file name, or if it is known a priori). If this is not the case, you can implement a method here to do something more sophisticated, such as querying the ontology directly via sparql.

Alternatively, it may only be known prior to calling the download method. In this case, implementations should store the version as a field, so it can be returned from there.

Parameters:

local_path (Path | None) – the path to the ontology

Returns:

the version of the ontology

Return type:

str

class kazu.ontology_preprocessing.downloads.OpenTargetsOntologyDownloader[source]

Bases: OntologyDownloader

__init__(open_targets_version, open_targets_dataset_name)[source]
Parameters:
  • open_targets_version (str)

  • open_targets_dataset_name (str)

delete_previous(local_path)[source]

We could use rmtree here but it’s safer to just remove the files we know we downloaded.

Parameters:

local_path (Path)

Returns:

Return type:

None

download(local_path, skip_download=False)[source]

Download the ontology to the local path.

Parameters:
  • local_path (Path) – the path to download the ontology to

  • skip_download (bool) – whether to skip the download

Returns:

the path to the downloaded ontology

Return type:

Path

version(local_path=None)[source]

Get the version of the ontology.

Note that this method should be idempotent, i.e. it should not change the state of the ontology. Also, it may be able to determine the version of the ontology without querying it directly (e.g. by looking at the file name, or if it is known a priori). If this is not the case, you can implement a method here to do something more sophisticated, such as querying the ontology directly via sparql.

Alternatively, it may only be known prior to calling the download method. In this case, implementations should store the version as a field, so it can be returned from there.

Parameters:

local_path (Path | None) – the path to the ontology

Returns:

the version of the ontology

Return type:

str

OT_PREFIX = 'ftp://ftp.ebi.ac.uk/pub/databases/opentargets/platform/'
class kazu.ontology_preprocessing.downloads.OwlOntologyDownloader[source]

Bases: SimpleOntologyDownloader

Used when version info is contained within the ontology.

version(local_path=None)[source]

Queries the Ontology for owl:versionInfo. Failing that, queries for owl:versionIRI. If it can’t find it, it falls back to the superclass implementation.

Parameters:

local_path (Path | None)

Returns:

Return type:

str

class kazu.ontology_preprocessing.downloads.SimpleOntologyDownloader[source]

Bases: OntologyDownloader

__init__(url)[source]
Parameters:

url (str)

download(local_path, skip_download=False)[source]

Download the ontology to the local path.

Parameters:
  • local_path (Path) – the path to download the ontology to

  • skip_download (bool) – whether to skip the download

Returns:

the path to the downloaded ontology

Return type:

Path

version(local_path=None)[source]

Get the version of the ontology.

Note that this method should be idempotent, i.e. it should not change the state of the ontology. Also, it may be able to determine the version of the ontology without querying it directly (e.g. by looking at the file name, or if it is known a priori). If this is not the case, you can implement a method here to do something more sophisticated, such as querying the ontology directly via sparql.

Alternatively, it may only be known prior to calling the download method. In this case, implementations should store the version as a field, so it can be returned from there.

Parameters:

local_path (Path | None) – the path to the ontology

Returns:

the version of the ontology

Return type:

str