kazu.utils.download_gilda_contexts¶
This script does the following things:
Query Ensembl to get gene to protein ID maps
Query wikidata sparql to get a list of wikidata ids to Ensembl gene or Ensembl protein IDs
Query wikidata api to get a list of wikipedia page urls with the wikidata IDs from 2
Query Wikipedia API to get page content for each page from 3
Join wiki page content to Ensembl gene ids based on above relationships
Functions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Classes
WikipediaEnsemblMapping(ensembl_gene_id: str, ensembl_protein_ids: set[str] = <factory>, wiki_gene_ids: set[str] = <factory>, wiki_protein_ids: set[str] = <factory>, wiki_gene_urls_to_text: dict[str, typing.Optional[str]] = <factory>, wiki_protein_urls_to_text: dict[str, typing.Optional[str]] = <factory>) |
- class kazu.utils.download_gilda_contexts.WikipediaEnsemblMapping[source]¶
Bases:
object
WikipediaEnsemblMapping(ensembl_gene_id: str, ensembl_protein_ids: set[str] = <factory>, wiki_gene_ids: set[str] = <factory>, wiki_protein_ids: set[str] = <factory>, wiki_gene_urls_to_text: dict[str, typing.Optional[str]] = <factory>, wiki_protein_urls_to_text: dict[str, typing.Optional[str]] = <factory>)
- kazu.utils.download_gilda_contexts.create_wiki_mappings(gene_df, protein_df, ensembl_gene_to_protein_mappings, wikidata_id_to_wikipedia_urls, wikipage_to_text)[source]¶