Developer Installation

The VarFish installation for developers should be set up differently from the installation for production use.

The reason being is that the installation for production use runs completely in a Docker environment. All containers are assigned to a Docker network that the host by default has no access to, except for the reverse proxy that gives access to the VarFish webinterface.

The developers installation is intended not to carry the full VarFish database such that it is light-weight and fits on a laptop. We advise to install the services not running in a Docker container.

Please find the instructions for the Windows installation at the end of the page.

Install Postgres

Follow the instructions for your operating system to install Postgres. Make sure that the version is 12 (11, 13 and 14 also work). Ubuntu 20 already includes postgresql 12. In case of older Ubuntu versions, this would be.

sudo apt install postgresql-12

Adapt the postgres configuration file, for postgres 14 this would be:

sudo sed -i \
  -e 's/.*max_locks_per_transaction.*/max_locks_per_transaction = 1024 # min 10/' \
  /etc/postgresql/14/main/postgresql.conf

Install Redis

Redis is the broker that celery uses to manage the queues. Follow the instructions for your operating system to install Redis. For Ubuntu, this would be:

sudo apt install redis-server

Install Python Pipenv

We use pipenv for managing dependencies. The advantage over pip is that also the versions of “dependencies of dependencies” will be tracked in a Pipfile.lock file. This allows for better reprocubility.

Also, note that VarFish is developed using Python 3.10+ only. To install Python 3.10+, you can use pyenv. If you already have Python 3.10 (check with python --version then you can skip this step).

git clone https://github.com/pyenv/pyenv.git ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv init -)"\nfi' >> ~/.bashrc
exec $SHELL
pyenv install 3.10
pyenv global 3.10

Now, install the latest version of pip and pipenv:

pip install --upgrade pip pipenv

Clone git repository

Clone the VarFish Server repository and switch into the checkout.

git clone --recursive https://github.com/varfish-org/varfish-server
cd varfish-server

Install Frontend Dependencies

Execute the utils/install_frontend_os_dependencies.sh script to install OS package dependencies of Node/TypeScript packages. Essentially, this installs NodeJS in a current version. The script was written for Ubuntu, you will have to adjust it for other OS.

sudo bash utils/install_frontend_os_dependencies.sh

Now, you can install the Node/TypeScript dependencies as follows:

## go into frontend directory
cd frontend
## setup pipenv environment
make deps

(Optional) Build Frontend

Execute the following command to build the frontend. This is not required as during development, the Vite server will create the necessary files on the fly.

## go into frontend directory
cd frontend
## setup pipenv environment
make serve

Server Frontend

You can now start the Vite server to serve the Vite/Typescript based frontend. Note that this is not accessible on its own as it is embedded into websites served by the backend.

## go into frontend directory
cd frontend
## start server
make serve

For the remainder of the installation steps, use a new terminal and keep the frontend server running.

Install Backend Dependencies

Execute the utils/install_backend_os_dependencies.sh script to install OS package dependencies of Python packages. The script was written for Ubuntu, you will have to adjust it for other OS.

sudo bash utils/install_backend_os_dependencies.sh

Now, you can install the Python dependencies as follows:

## go into backend directory
cd backend
## setup pipenv environment
make deps

Afterwards, you can either enter the Pipenv environment or directly run helper make commands.

## go into backend directory
cd backend
## start pipenv shell
pipenv shell
## OR
make lint
make format
make test

Setup Database

Use the tool provided in utils/ to set up the database. The name for the database should be varfish (create new user: yes, name: varfish, password: varfish).

bash utils/setup_database.sh

Prepare Backend

Next, create a backend/.env file with the following content.

export DATABASE_URL="postgres://varfish:varfish@127.0.0.1/varfish"
export CELERY_BROKER_URL=redis://localhost:6379/0
export PROJECTROLES_ADMIN_OWNER=root
export DJANGO_SETTINGS_MODULE=config.settings.local

To create the tables in the VarFish database, run the migrate command. This step can take a few minutes.

## go into backend directory
cd backend
## run migrations
make migrate

Once done, create a superuser for your VarFish instance. By default, the VarFish root user is named root (the setting can be changed in the .env file with the PROJECTROLES_ADMIN_OWNER variable).

cd backend
pipenv run python manage.py createsuperuser

Last, download the icon sets for VarFish and make scripts, stylesheets and icons available.

make geticons
make collectstatic

Init DB for Development

To kickstart development, execute the following command. This will create a category “DevCategory”, a project “DevProject”, and a case with a quatro pedigree in the database. Note that no actual data files are being created. However, this is suitable for frontend development.

cd backend
pipenv run python manage.py initdev

Write down the passwort of the created devuser users so you can later login with their accounts. To reset the passwords of the root and devuser when they already create, use the python manage.py changepassword command or call python manage.py initdev --reset-password.

Database Import (Legacy)

Note

This section explains the data import for the old/legacy way of managing variant queries. Here, large amounts of annotation data were queried in a Postgres database. The “new way” uses annotation Docker services and the Rust-based worker. Please skip this section unless you need the legacy database tables.

First, download the pre-build database files that we provide and unpack them. Please make sure that you have enough space available. The packed file consumes 31 Gb. When unpacked, it consumed additional 188 GB.

cd /plenty/space
wget https://file-public.bihealth.org/transient/varfish/varfish-server-background-db-20201006.tar.gz{,.sha256}
sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
tar xzvf varfish-server-background-db-20201006.tar.gz

We recommend to exclude the large databases: frequency tables, extra annotations and dbSNP. Also, keep in mind that importing the whole database takes >24h, depending on the speed of your disk.

This is a list of the possible imports, sorted by its size:

Component

Size

Exclude

Function

gnomAD_genomes

80G

highly recommended

frequency annotation

extra-annos

50G

highly recommended

diverse

dbSNP

32G

highly recommended

SNP annotation

thousand_genomes

6,5G

highly recommended

frequency annotation

gnomAD_exomes

6,0G

highly recommended

frequency annotation

knowngeneaa

4,5G

highly recommended

alignment annotation

clinvar

3,3G

highly recommended

pathogenicity classification

ExAC

1,9G

highly recommended

frequency annotation

dbVar

573M

recommended

SNP annotation

gnomAD_SV

250M

recommended

SV frequency annotation

ncbi_gene

151M

gene annotation

ensembl_regulatory

77M

frequency annotation

DGV

43M

SV annotation

hpo

22M

phenotype information

hgnc

15M

gene annotation

gnomAD_constraints

13M

frequency annotation

mgi

10M

mouse gene annotation

ensembltorefseq

8,3M

identifier mapping

hgmd_public

5,0M

gene annotation

ExAC_constraints

4,6M

frequency annotation

refseqtoensembl

2,0M

identifier mapping

ensembltogenesymbol

1,6M

identifier mapping

ensembl_genes

1,2M

gene annotation

HelixMTdb

1,2M

MT frequency annotation

refseqtogenesymbol

1,1M

identifier mapping

refseq_genes

804K

gene annotation

mim2gene

764K

phenotype information

MITOMAP

660K

MT frequency annotation

kegg

632K

pathway annotation

mtDB

336K

MT frequency annotation

tads_hesc

108K

domain annotation

tads_imr90

108K

domain annotation

vista

104K

orthologous region annotation

acmg

16K

disease gene annotation

You can find the import_versions.tsv file in the root folder of the package. This file determines which component (called table_group and represented as folder in the package) gets imported when the import command is issued. To exclude a table, simply comment out (#) or delete the line. Excluding tables that are not required for development can reduce time and space consumption. Also, the GRCh38 tables can be excluded.

A space-consumption-friendly version of the file would look like this

build       table_group     version
GRCh37      acmg    v2.0
#GRCh37     clinvar 20200929
#GRCh37     dbSNP   b151
#GRCh37     dbVar   latest
GRCh37      DGV     2016
GRCh37      ensembl_genes   r96
GRCh37      ensembl_regulatory      latest
GRCh37      ensembltogenesymbol     latest
GRCh37      ensembltorefseq latest
GRCh37      ExAC_constraints        r0.3.1
#GRCh37     ExAC    r1
#GRCh37     extra-annos     20200704
GRCh37      gnomAD_constraints      v2.1.1
#GRCh37     gnomAD_exomes   r2.1
#GRCh37     gnomAD_genomes  r2.1
#GRCh37     gnomAD_SV       v2
GRCh37      HelixMTdb       20190926
GRCh37      hgmd_public     ensembl_r75
GRCh37      hgnc    latest
GRCh37      hpo     latest
GRCh37      kegg    april2011
#GRCh37     knowngeneaa     latest
GRCh37      mgi     latest
GRCh37      mim2gene        latest
GRCh37      MITOMAP 20200116
GRCh37      mtDB    latest
GRCh37      ncbi_gene       latest
GRCh37      refseq_genes    r105
GRCh37      refseqtoensembl latest
GRCh37      refseqtogenesymbol      latest
GRCh37      tads_hesc       dixon2012
GRCh37      tads_imr90      dixon2012
#GRCh37     thousand_genomes        phase3
GRCh37      vista   latest
#GRCh38     clinvar 20200929
#GRCh38     dbVar   latest
#GRCh38     DGV     2016

To perform the import, issue:

cd backend
pipenv python manage.py import_tables \
  --tables-path /plenty/space/varfish-server-background-db-20201006

Performing the import twice will automatically skip tables that are already imported. To re-import tables, add the --force parameter to the command:

cd backend
pipenv python manage.py import_tables \
  --tables-path varfish-db-downloader --force

Run Server and Celery

Now, open two terminals and start the VarFish server and the celery server.

## in terminal 1
make serve
## in a separate terminal 2
make celery

Continue the tutorial in a new terminal.

Install Annotation Services

VarFish uses a number of internal annotation services that you need to install as well. The instructions below will provide you with a development subset that contains information on all genes but variant information on genes BRCA1 and TGDS only.

First, install Docker and docker compose following the official manual.

Then, install the s5cmd tool for downloading data later on.

wget -O /tmp/s5cmd_2.1.0_Linux-64bit.tar.gz \
  https://github.com/peak/s5cmd/releases/download/v2.1.0/s5cmd_2.1.0_Linux-64bit.tar.gz
tar -C /tmp -xf /tmp/s5cmd_2.1.0_Linux-64bit.tar.gz
sudo cp /tmp/s5cmd /usr/local/bin/

Next, follow the instructions on the varfish-docker-compose-ng README.

## clone
git clone https://github.com/varfish-org/varfish-docker-compose-ng.git

## go into directory
cd varfish-docker-compose-ng

## create volumes directories
mkdir -p .dev/volumes/{minio,varfish-static}/data
## create secrets
mkdir -p .dev/secrets
echo password >.dev/secrets/db-password
echo postgresql://varfish:password@postgres/varfish >.dev/secrets/db-url
echo minio-root-password >.dev/secrets/minio-root-password
echo minio-varfish-password >.dev/secrets/minio-varfish-password
## ensure that pwgen is installed first
pwgen
## generate a 100 character secret
pwgen 100 1 >.prod/secrets/varfish-server-django-secret-key
## copy environment file
cp env.tpl .env
## copy docker-compose override file
cp docker-compose.override.yml-dev docker-compose.override.yml

## setup some configuration
mkdir -p .dev/config/nginx
cp utils/nginx/nginx.conf .dev/config/nginx

## download dev data
bash download-data.sh

Now you can take up the backing services using:

docker compose up

Try It Out

You now have the system services Postgres and Redis running. You also have frontend vite development service, the backend Django server, and the Celery worker running. You can now try out VarFish by going to localhost:8080 and login with the superuser account you created above.

Installation (Windows)

The setup was done on a recent version of Windows 10 with Windows Subsystem for Linux Version 2 (WSL2).

Installation WSL2

Following [this tutorial](https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10) to install WSL2.

  • Note that the whole thing appears to be a bit convoluted, you start out with wsl.exe –install

  • Then you can install latest LTS Ubuntu 22.04 with the Microsoft Store

  • Once complete, you probably end up with a WSL 1 (one!) that you can conver to version 2 (two!) with wsl –set-version Ubuntu-22.04 2 or similar.

  • WSL2 has some advantages including running a full Linux kernel but is even slower in I/O to the NTFS Windows mount.

  • Everything that you do will be inside the WSL image.

Installation Docker Desktop

Follow the Install Docker Desktop instructions. Then, ensure that the Docker Engine is running.

Install OS Dependencies

## install dependencies
sudo apt install libsasl2-dev python3-dev libldap2-dev libssl-dev gcc make rsync
## install postgres and redis
sudo apt install postgresql postgresql-server-dev-14 postgresql-client redis
## start postgres, must be done after each WSL2 start
sudo service postgresql start
sudo service postgresql status
## start redis, must be done after each WSL2 start
sudo service redis-server start
sudo service redis-server status
## update postgres configuration and restart, only do this once
sudo sed -i -e 's/.*max_locks_per_transaction.*/max_locks_per_transaction = 1024 # min 10/' /etc/postgresql/14/main/postgresql.conf
sudo service postgresql restart

Create a postgres user varfish with password varfish and a database.

sudo -u postgres createuser -s -r -d varfish -P
[enter varfish as password]
sudo -u postgres createdb --owner=varfish varfish

From here on, you can follow the instructions for the Linux installation, starting at ref:dev_install_python_pipenv.