Datasources#
This section documents the datasources used as input for the static data available in VarFish.
The download and precomputation is done by the Snakemake workflow in varfish-db-downloader.
This git repository uses continous integration with a reduced dataset (and some small data that is used from the repository directly, such as a list of curated microdeletion/-duplication regions from the literature) for automated testing.
The reduced dataset is downloaded automatically from URLs in a download_urls.yml file.
Thus, there is full transparency and traceability of the data sources used.
Further, a nightly CI job is run to check whether the URLs are still available (but not if the data has changed).
Data in Repository#
The following datasources are used directly from the repository.
Name |
License |
Synopsis |
Source |
|---|---|---|---|
ACMG SF List v3.1 |
public domain |
Supplementary Findings Gene List of ACMG |
|
DOMINO |
N/A |
Score for assessing the probability for a gene to harbour dominant changes |
Institute of Molecular and Clinical Ophthalmology Basel; PMID:28985496 |
Enrichment Regions |
N/A |
Target regions of NGS enrichment kits |
|
Patho MMS |
N/A |
Curated regions for microdeletion and microduplication scores |
|
sHet |
N/A |
Gene haploinsuffiency score |
Downloaded Data#
The following datasources are downloaded from public internet resources.
Name |
License |
Synopsis |
Source |
|---|---|---|---|
AlphaMissense |
AlphaMissense score |
||
CADD Score |
sequence variant pathogenicity scores |
||
ClinGen |
clinical gene and genome annotation |
||
Comparative Toxicogenomics Database |
database of biological named entities |
||
dbNSFP academic |
suitable for academic use |
nonsynonymous variant pathogenicity scores |
|
dbNSFP commercial |
suitable for commercial use |
nonsynonymous variant pathogenicity scores |
|
dbSNP |
Structural variants from dbSNP |
||
dbVar |
Structural variants from dbVar |
||
Database of Genomic Variants (DGV) |
no restrictions |
Structural variants from DGV |
|
DECIPHER HI |
N/A |
DECIPHER haploinsufficiency score |
|
ENSEMBL |
ENSEMBL gene/genome annotation and transcripts |
||
ExAC CNVs |
Copy number variants from ExAC |
||
GenomicsEngland PanelApp |
Gene panels with disease associations from Genomics England |
||
gnomAD exomes and genomes |
sequence and structural variants, gene constraint scores |
||
GTeX |
tissue-specific gene expression |
||
HelixMtDb |
N/A |
mitochondrial genome frequencies |
|
HGNC |
gene information |
||
HPO |
Human Phenotype Ontology |
||
Human Disease Ontology (DO) |
ontology of human diseases |
||
MONDO |
Mondo Disease Ontology |
||
NCBI ClinVar |
clinical variant interpretation |
||
NCBI Gene |
gene information |
||
NCBI mim2gene |
gene-disease associations |
||
NCBI RefSeq |
gene/genome annotation and transcripts |
||
OMIM titles |
restricted |
some OMIM disease names are contained in other databases such as HPO |
misc. other datasources |
ORDO |
Orphanet Rare Disease Ontology |
||
Orphadata |
Orphanet disease-gene associations |
||
rCNV Score |
N/A |
dosage sensitivity score |
|
TAD annotation |
N/A |
Topologically Associated Domains annotation |
|
1000G SV map |
structural variants from thousand genomes phase 3 |
||
UCSC assembly-related tracks |
assembly-related tracks, genomicSuperDups, rmsk, altSeqLiftOverPsl, fixSeqLiftOverPsl, multiz100way |