Datasources¶
This section documents the datasources used as input for the static data available in VarFish.
The download and precomputation is done by the Snakemake workflow in varfish-db-downloader
.
This git repository uses continous integration with a reduced dataset (and some small data that is used from the repository directly, such as a list of curated microdeletion/-duplication regions from the literature) for automated testing.
The reduced dataset is downloaded automatically from URLs in a download_urls.yml
file.
Thus, there is full transparency and traceability of the data sources used.
Further, a nightly CI job is run to check whether the URLs are still available (but not if the data has changed).
Data in Repository¶
The following datasources are used directly from the repository.
Name |
License |
Synopsis |
Source |
---|---|---|---|
ACMG SF List v3.1 |
public domain |
Supplementary Findings Gene List of ACMG |
|
DOMINO |
Public Domain |
Score for assessing the probability for a gene to harbour dominant changes |
Institute of Molecular and Clinical Ophthalmology Basel; PMID:28985496 |
Enrichment Regions |
Target regions of NGS enrichment kits |
||
Patho MMS |
Public Domain |
Curated regions for microdeletion and microduplication scores |
|
sHet |
N/A (Emailed Author) |
Gene haploinsuffiency score |
Downloaded Data¶
The following datasources are downloaded from public internet resources.
Name |
License |
Synopsis |
Source |
---|---|---|---|
AlphaMissense |
AlphaMissense score |
||
CADD Score |
sequence variant pathogenicity scores |
||
ClinGen |
clinical gene and genome annotation |
||
Comparative Toxicogenomics Database |
database of biological named entities |
||
dbNSFP academic |
suitable for academic use |
nonsynonymous variant pathogenicity scores |
|
dbNSFP commercial |
suitable for commercial use |
nonsynonymous variant pathogenicity scores |
|
dbSNP |
Structural variants from dbSNP |
||
dbVar |
Structural variants from dbVar |
||
Database of Genomic Variants (DGV) |
no restrictions |
Structural variants from DGV |
|
DECIPHER HI |
N/A (Emailed Author) |
DECIPHER haploinsufficiency score |
|
ENSEMBL |
ENSEMBL gene/genome annotation and transcripts |
||
ExAC CNVs |
Copy number variants from ExAC |
||
GenomicsEngland PanelApp |
Gene panels with disease associations from Genomics England |
||
gnomAD exomes and genomes |
sequence and structural variants, gene constraint scores |
||
GTeX |
tissue-specific gene expression |
||
HelixMtDb |
N/A (Emailed Author) |
mitochondrial genome frequencies |
|
HGNC |
gene information |
||
HPO |
Human Phenotype Ontology |
||
Human Disease Ontology (DO) |
ontology of human diseases |
||
MONDO |
Mondo Disease Ontology |
||
NCBI ClinVar |
clinical variant interpretation |
||
NCBI Gene |
gene information |
||
NCBI mim2gene |
gene-disease associations |
||
NCBI RefSeq |
gene/genome annotation and transcripts |
||
OMIM titles |
restricted |
some OMIM disease names are contained in other databases such as HPO |
misc. other datasources |
ORDO |
Orphanet Rare Disease Ontology |
||
Orphadata |
Orphanet disease-gene associations |
||
rCNV Score |
dosage sensitivity score |
||
TAD annotation |
N/A (Emailed Author) |
Topologically Associated Domains annotation |
|
1000G SV map |
structural variants from thousand genomes phase 3 |
||
UCSC assembly-related tracks |
assembly-related tracks, genomicSuperDups, rmsk, altSeqLiftOverPsl, fixSeqLiftOverPsl, multiz100way |