.. _doc_datasources: =========== Datasources =========== This section documents the datasources used as input for the static data available in VarFish. The download and precomputation is done by the Snakemake workflow in ``varfish-db-downloader``. This git repository uses continous integration with a reduced dataset (and some small data that is used from the repository directly, such as a list of curated microdeletion/-duplication regions from the literature) for automated testing. The reduced dataset is downloaded automatically from URLs in a ``download_urls.yml`` file. Thus, there is full transparency and traceability of the data sources used. Further, a nightly CI job is run to check whether the URLs are still available (but not if the data has changed). .. _doc_datasources_repo: ------------------ Data in Repository ------------------ The following datasources are used directly from the repository. .. list-table:: :widths: 30 20 30 20 :class: longtable :header-rows: 1 * - Name - License - Synopsis - Source * - ACMG SF List v3.1 - public domain - Supplementary Findings Gene List of ACMG - `PMID:35802134 `__ * - DOMINO - :term:`Public Domain ` - Score for assessing the probability for a gene to harbour dominant changes - Institute of Molecular and Clinical Ophthalmology Basel; `PMID:28985496 `__ * - Enrichment Regions - `Public Domain `__ - Target regions of NGS enrichment kits - `UCSC Table Browser `__ * - Patho MMS - Public Domain - Curated regions for microdeletion and microduplication scores - `PMID:36435749 `__ * - sHet - :term:`N/A (Emailed Author)` - Gene haploinsuffiency score - `PMID:31004148 `__ .. _doc_datasources_downloaded_data: --------------- Downloaded Data --------------- The following datasources are downloaded from public internet resources. .. list-table:: :widths: 30 20 30 20 :class: longtable :header-rows: 1 * - Name - License - Synopsis - Source * - AlphaMissense - `CC BY-NC-SA 4.0 `__ - AlphaMissense score - `AlphaMissense `__ * - CADD Score - `free for non-commercial `__ - sequence variant pathogenicity scores - `CADD `__ * - ClinGen - `CC0 `__ - clinical gene and genome annotation - `ClinGen `__ * - Comparative Toxicogenomics Database - `free for non-commercial `__ - database of biological named entities - `CTD `__ * - dbNSFP academic - suitable for academic use - nonsynonymous variant pathogenicity scores - `dbNSFP `__ * - dbNSFP commercial - suitable for commercial use - nonsynonymous variant pathogenicity scores - `dbNSFP `__ * - dbSNP - `no restrictions `__ - Structural variants from dbSNP - `NCBI dbVar `__ * - dbVar - `no restrictions `__ - Structural variants from dbVar - `NCBI dbVar `__ * - Database of Genomic Variants (DGV) - no restrictions - Structural variants from DGV - `The Centre for Applied Genomics `__ * - DECIPHER HI - :term:`N/A (Emailed Author)` - DECIPHER haploinsufficiency score - `PMID:20976243 `__ * - ENSEMBL - `no restriction `__ - ENSEMBL gene/genome annotation and transcripts - `ENSEMBL `__ * - ExAC CNVs - `no restrictions `__ - Copy number variants from ExAC - `gnomAD `__ * - GenomicsEngland PanelApp - `non-commercial `__ - Gene panels with disease associations from Genomics England - `GenomicsEngland `__ * - gnomAD exomes and genomes - `no restrictions `__ - sequence and structural variants, gene constraint scores - `gnomAD `__ * - GTeX - `free `__ - tissue-specific gene expression - `GTEx `__ * - HelixMtDb - :term:`N/A (Emailed Author)` - mitochondrial genome frequencies - `HelixMtDb `__ * - HGNC - `CC0 `__ - gene information - `HGNC `__ * - HPO - `free `__ - Human Phenotype Ontology - `HPO `__ * - Human Disease Ontology (DO) - `CC0 `__ - ontology of human diseases - `Disease Ontology `__ * - MONDO - `CC BY 4.0 `__ - Mondo Disease Ontology - `OBO Foundry `__ * - NCBI ClinVar - `no restrictions `__ - clinical variant interpretation - `NCBI ClinVar `__ * - NCBI Gene - `no restrictions `__ - gene information - `NCBI Gene `__ * - NCBI mim2gene - `no restrictions `__ - gene-disease associations - `NCBI MedGen `__ * - NCBI RefSeq - `no restrictions `__ - gene/genome annotation and transcripts - `NCBI RefSeq `__ * - OMIM titles - restricted - some OMIM disease names are contained in other databases such as HPO - misc. other datasources * - ORDO - `CC BY 4.0 `__ - Orphanet Rare Disease Ontology - `BioOntology.org `__ * - Orphadata - `CC BY 4.0 `__ - Orphanet disease-gene associations - `Orphadata `__ * - rCNV Score - `no restrictions `__ - dosage sensitivity score - `PMID:35917817 `__ * - TAD annotation - :term:`N/A (Emailed Author)` - Topologically Associated Domains annotation - `YUE Lab `__ * - 1000G SV map - `Fort Lauderdale Agreement `__ - structural variants from thousand genomes phase 3 - `IGSR `__ * - UCSC assembly-related tracks - `no restrictions `__ - assembly-related tracks, genomicSuperDups, rmsk, altSeqLiftOverPsl, fixSeqLiftOverPsl, multiz100way - `UCSC Table Browser `__