Developer Installation¶

The VarFish installation for developers should be set up differently from the installation for production use.

The reason being is that the installation for production use runs completely in a Docker environment. All containers are assigned to a Docker network that the host by default has no access to, except for the reverse proxy that gives access to the VarFish webinterface.

The developers installation is intended not to carry the full VarFish database such that it is light-weight and fits on a laptop. We advise to install the services not running in a Docker container.

Please find the instructions for the Windows installation at the end of the page.

Install Postgres¶

Follow the instructions for your operating system to install Postgres. Make sure that the version is 12 (11, 13 and 14 also work). Ubuntu 20 already includes postgresql 12. In case of older Ubuntu versions, this would be.

sudo apt install postgresql-12

Adapt the postgres configuration file, for postgres 14 this would be:

sudo sed -i \
  -e 's/.*max_locks_per_transaction.*/max_locks_per_transaction = 1024 # min 10/' \
  /etc/postgresql/14/main/postgresql.conf

Install Redis¶

Redis is the broker that celery uses to manage the queues. Follow the instructions for your operating system to install Redis. For Ubuntu, this would be:

sudo apt install redis-server

Install Python Pipenv¶

We use pipenv for managing dependencies. The advantage over pip is that also the versions of “dependencies of dependencies” will be tracked in a Pipfile.lock file. This allows for better reprocubility.

Also, note that VarFish is developed using Python 3.10+ only. To install Python 3.10+, you can use pyenv. If you already have Python 3.10 (check with python --version then you can skip this step).

git clone https://github.com/pyenv/pyenv.git ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv init -)"\nfi' >> ~/.bashrc
exec $SHELL
pyenv install 3.10
pyenv global 3.10

Now, install the latest version of pip and pipenv:

pip install --upgrade pip pipenv

Clone git repository¶

Clone the VarFish Server repository and switch into the checkout.

git clone --recursive https://github.com/varfish-org/varfish-server
cd varfish-server

Install Frontend Dependencies¶

Execute the utils/install_frontend_os_dependencies.sh script to install OS package dependencies of Node/TypeScript packages. Essentially, this installs NodeJS in a current version. The script was written for Ubuntu, you will have to adjust it for other OS.

sudo bash utils/install_frontend_os_dependencies.sh

Now, you can install the Node/TypeScript dependencies as follows:

## go into frontend directory
cd frontend
## setup pipenv environment
make deps

(Optional) Build Frontend¶

Execute the following command to build the frontend. This is not required as during development, the Vite server will create the necessary files on the fly.

## go into frontend directory
cd frontend
## setup pipenv environment
make serve

Server Frontend¶

You can now start the Vite server to serve the Vite/Typescript based frontend. Note that this is not accessible on its own as it is embedded into websites served by the backend.

## go into frontend directory
cd frontend
## start server
make serve

For the remainder of the installation steps, use a new terminal and keep the frontend server running.

Install Backend Dependencies¶

Execute the utils/install_backend_os_dependencies.sh script to install OS package dependencies of Python packages. The script was written for Ubuntu, you will have to adjust it for other OS.

sudo bash utils/install_backend_os_dependencies.sh

Now, you can install the Python dependencies as follows:

## go into backend directory
cd backend
## setup pipenv environment
make deps

Afterwards, you can either enter the Pipenv environment or directly run helper make commands.

## go into backend directory
cd backend
## start pipenv shell
pipenv shell
## OR
make lint
make format
make test

Setup Database¶

Use the tool provided in utils/ to set up the database. The name for the database should be varfish (create new user: yes, name: varfish, password: varfish).

bash utils/setup_database.sh

Prepare Backend¶

Next, create a backend/.env file with the following content.

export DATABASE_URL="postgres://varfish:varfish@127.0.0.1/varfish"
export CELERY_BROKER_URL=redis://localhost:6379/0
export PROJECTROLES_ADMIN_OWNER=root
export DJANGO_SETTINGS_MODULE=config.settings.local

To create the tables in the VarFish database, run the migrate command. This step can take a few minutes.

## go into backend directory
cd backend
## run migrations
make migrate

Once done, create a superuser for your VarFish instance. By default, the VarFish root user is named root (the setting can be changed in the .env file with the PROJECTROLES_ADMIN_OWNER variable).

cd backend
pipenv run python manage.py createsuperuser

Last, download the icon sets for VarFish and make scripts, stylesheets and icons available.

make geticons
make collectstatic

Init DB for Development¶

To kickstart development, execute the following command. This will create a category “DevCategory”, a project “DevProject”, and a case with a quatro pedigree in the database. Note that no actual data files are being created. However, this is suitable for frontend development.

cd backend
pipenv run python manage.py initdev

Write down the passwort of the created devuser users so you can later login with their accounts. To reset the passwords of the root and devuser when they already create, use the python manage.py changepassword command or call python manage.py initdev --reset-password.

Database Import (Legacy)¶

Note

This section explains the data import for the old/legacy way of managing variant queries. Here, large amounts of annotation data were queried in a Postgres database. The “new way” uses annotation Docker services and the Rust-based worker. Please skip this section unless you need the legacy database tables.

First, download the pre-build database files that we provide and unpack them. Please make sure that you have enough space available. The packed file consumes 31 Gb. When unpacked, it consumed additional 188 GB.

cd /plenty/space
wget https://file-public.bihealth.org/transient/varfish/varfish-server-background-db-20201006.tar.gz{,.sha256}
sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
tar xzvf varfish-server-background-db-20201006.tar.gz

We recommend to exclude the large databases: frequency tables, extra annotations and dbSNP. Also, keep in mind that importing the whole database takes >24h, depending on the speed of your disk.

This is a list of the possible imports, sorted by its size:

Component	Size	Exclude	Function
gnomAD_genomes	80G	highly recommended	frequency annotation
extra-annos	50G	highly recommended	diverse
dbSNP	32G	highly recommended	SNP annotation
thousand_genomes	6,5G	highly recommended	frequency annotation
gnomAD_exomes	6,0G	highly recommended	frequency annotation
knowngeneaa	4,5G	highly recommended	alignment annotation
clinvar	3,3G	highly recommended	pathogenicity classification
ExAC	1,9G	highly recommended	frequency annotation
dbVar	573M	recommended	SNP annotation
gnomAD_SV	250M	recommended	SV frequency annotation
ncbi_gene	151M		gene annotation
ensembl_regulatory	77M		frequency annotation
DGV	43M		SV annotation
hpo	22M		phenotype information
hgnc	15M		gene annotation
gnomAD_constraints	13M		frequency annotation
mgi	10M		mouse gene annotation
ensembltorefseq	8,3M		identifier mapping
hgmd_public	5,0M		gene annotation
ExAC_constraints	4,6M		frequency annotation
refseqtoensembl	2,0M		identifier mapping
ensembltogenesymbol	1,6M		identifier mapping
ensembl_genes	1,2M		gene annotation
HelixMTdb	1,2M		MT frequency annotation
refseqtogenesymbol	1,1M		identifier mapping
refseq_genes	804K		gene annotation
mim2gene	764K		phenotype information
MITOMAP	660K		MT frequency annotation
kegg	632K		pathway annotation
mtDB	336K		MT frequency annotation
tads_hesc	108K		domain annotation
tads_imr90	108K		domain annotation
vista	104K		orthologous region annotation
acmg	16K		disease gene annotation

You can find the import_versions.tsv file in the root folder of the package. This file determines which component (called table_group and represented as folder in the package) gets imported when the import command is issued. To exclude a table, simply comment out (#) or delete the line. Excluding tables that are not required for development can reduce time and space consumption. Also, the GRCh38 tables can be excluded.

A space-consumption-friendly version of the file would look like this

build       table_group     version
GRCh37      acmg    v2.0
#GRCh37     clinvar 20200929
#GRCh37     dbSNP   b151
#GRCh37     dbVar   latest
GRCh37      DGV     2016
GRCh37      ensembl_genes   r96
GRCh37      ensembl_regulatory      latest
GRCh37      ensembltogenesymbol     latest
GRCh37      ensembltorefseq latest
GRCh37      ExAC_constraints        r0.3.1
#GRCh37     ExAC    r1
#GRCh37     extra-annos     20200704
GRCh37      gnomAD_constraints      v2.1.1
#GRCh37     gnomAD_exomes   r2.1
#GRCh37     gnomAD_genomes  r2.1
#GRCh37     gnomAD_SV       v2
GRCh37      HelixMTdb       20190926
GRCh37      hgmd_public     ensembl_r75
GRCh37      hgnc    latest
GRCh37      hpo     latest
GRCh37      kegg    april2011
#GRCh37     knowngeneaa     latest
GRCh37      mgi     latest
GRCh37      mim2gene        latest
GRCh37      MITOMAP 20200116
GRCh37      mtDB    latest
GRCh37      ncbi_gene       latest
GRCh37      refseq_genes    r105
GRCh37      refseqtoensembl latest
GRCh37      refseqtogenesymbol      latest
GRCh37      tads_hesc       dixon2012
GRCh37      tads_imr90      dixon2012
#GRCh37     thousand_genomes        phase3
GRCh37      vista   latest
#GRCh38     clinvar 20200929
#GRCh38     dbVar   latest
#GRCh38     DGV     2016

To perform the import, issue:

cd backend
pipenv python manage.py import_tables \
  --tables-path /plenty/space/varfish-server-background-db-20201006

Performing the import twice will automatically skip tables that are already imported. To re-import tables, add the --force parameter to the command:

cd backend
pipenv python manage.py import_tables \
  --tables-path varfish-db-downloader --force

Run Server and Celery¶

Now, open two terminals and start the VarFish server and the celery server.

## in terminal 1
make serve
## in a separate terminal 2
make celery

Continue the tutorial in a new terminal.

Install Annotation Services¶

VarFish uses a number of internal annotation services that you need to install as well. The instructions below will provide you with a development subset that contains information on all genes but variant information on genes BRCA1 and TGDS only.

First, install Docker and docker compose following the official manual.

Then, install the s5cmd tool for downloading data later on.

wget -O /tmp/s5cmd_2.1.0_Linux-64bit.tar.gz \
  https://github.com/peak/s5cmd/releases/download/v2.1.0/s5cmd_2.1.0_Linux-64bit.tar.gz
tar -C /tmp -xf /tmp/s5cmd_2.1.0_Linux-64bit.tar.gz
sudo cp /tmp/s5cmd /usr/local/bin/

Next, follow the instructions on the varfish-docker-compose-ng README.

## clone
git clone https://github.com/varfish-org/varfish-docker-compose-ng.git

## go into directory
cd varfish-docker-compose-ng

## create volumes directories
mkdir -p .dev/volumes/{minio,varfish-static}/data
## create secrets
mkdir -p .dev/secrets
echo password >.dev/secrets/db-password
echo postgresql://varfish:password@postgres/varfish >.dev/secrets/db-url
echo minio-root-password >.dev/secrets/minio-root-password
echo minio-varfish-password >.dev/secrets/minio-varfish-password
## ensure that pwgen is installed first
pwgen
## generate a 100 character secret
pwgen 100 1 >.prod/secrets/varfish-server-django-secret-key
## copy environment file
cp env.tpl .env
## copy docker-compose override file
cp docker-compose.override.yml-dev docker-compose.override.yml

## setup some configuration
mkdir -p .dev/config/nginx
cp utils/nginx/nginx.conf .dev/config/nginx

## download dev data
bash download-data.sh

Now you can take up the backing services using:

docker compose up

Try It Out¶

You now have the system services Postgres and Redis running. You also have frontend vite development service, the backend Django server, and the Celery worker running. You can now try out VarFish by going to localhost:8080 and login with the superuser account you created above.

Installation (Windows)¶

The setup was done on a recent version of Windows 10 with Windows Subsystem for Linux Version 2 (WSL2).

Installation WSL2¶

Following [this tutorial](https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10) to install WSL2.

Note that the whole thing appears to be a bit convoluted, you start out with wsl.exe –install
Then you can install latest LTS Ubuntu 22.04 with the Microsoft Store
Once complete, you probably end up with a WSL 1 (one!) that you can conver to version 2 (two!) with wsl –set-version Ubuntu-22.04 2 or similar.
WSL2 has some advantages including running a full Linux kernel but is even slower in I/O to the NTFS Windows mount.
Everything that you do will be inside the WSL image.

Installation Docker Desktop¶

Follow the Install Docker Desktop instructions. Then, ensure that the Docker Engine is running.

Install OS Dependencies¶

## install dependencies
sudo apt install libsasl2-dev python3-dev libldap2-dev libssl-dev gcc make rsync
## install postgres and redis
sudo apt install postgresql postgresql-server-dev-14 postgresql-client redis
## start postgres, must be done after each WSL2 start
sudo service postgresql start
sudo service postgresql status
## start redis, must be done after each WSL2 start
sudo service redis-server start
sudo service redis-server status
## update postgres configuration and restart, only do this once
sudo sed -i -e 's/.*max_locks_per_transaction.*/max_locks_per_transaction = 1024 # min 10/' /etc/postgresql/14/main/postgresql.conf
sudo service postgresql restart

Create a postgres user varfish with password varfish and a database.

sudo -u postgres createuser -s -r -d varfish -P
[enter varfish as password]
sudo -u postgres createdb --owner=varfish varfish

From here on, you can follow the instructions for the Linux installation, starting at ref:dev_install_python_pipenv.