Installation

TissueMAPS uses a distributed client-server model. As a consequence, client and server components typically need to be installed on different machines:

  • Client code runs on the user’s local machine and interacts with the server over HTTP protocol. No local installation is required for the web user interface, since the Javascript code is served via the web server and runs in the browser. Other client interfaces need to be installed locally, but they have very few dependencies and are easy to deploy on various platforms (Linux, MacOSX, Windows).
  • Server code has many more dependencies and is designed for UNIX platforms. The server may also run on the user’s local machine during development and testing, but is typically deployed on a remote virutual machine (or multiple virtual machines) in the cloud.

If you don’t use a dedicated machine, we recommend using a virtual environment for TissueMAPS related Python dependencies:

Install virtualenv and virtualenvwrapper:

pip install virtualenvwrapper

Add the following lines to your ~/.bash_profile file:

export WORKON_HOME=$HOME/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh

Create a tissuemaps project:

mkvirtualenv tissuemaps

You can deactivate the environment:

deactivate

and later re-activate it when needed:

workon tissuemaps

Warning

A coexisting anaconda installation doens’t play nice with virtual environments and will create problems; see potential solution. Therefore, we prefer working with good old virtualenvs.

Clients

Users can interact with the TissueMAPS server via a standard web browser (tested with Chrome, Firefox and Safari) without the need to install any additional software locally.

Additional HTTP client implementations are available through the TmClient repository.

TmClient

The tmclient Python package provides an active programming and command line interface.

Requirements

  • Python: Many platforms are shipped with Python already pre-installed. If not, it can be downloaded from python.org. Using version 2.7.9 or higher is recommended.
  • Pip: The Python package manager is automatically installed with Python distributions obtained from python.org. Otherwise, it can be installed with the get-pip.py script.
  • GCC or similar compiler

Installation

pip install tmclient

Server

The server backend consists of the following core components:

and the following optional components, which are only required for a larger multi-machine cluster setup:

  • Compute servers: Slurm (job schedular and workload manager)
  • File system servers: GlusterFS (scalable network file system)
  • Monitoring server: Ganglia (monitoring system)

All components are open-source and deployable on various public and private clouds.

Public cloud images

TissueMAPS provides publically accessible images on Amazon Web Services (AWS) in form of shared Amazon Machine Images (AMIs).

Based on the images one can simply launch a virtual machine in the cloud via the Elastic Compute Cloud (EC2) console. To find the TissueMAPS images, filter public AMIs in the Frankfurt region for AMI Name: TissueMAPS server (see AWS documentation).

Docker

TissueMAPS further provides pre-built container images, which are publicly available on Docker hub. The easiest way to install the containerized application is via Docker compose.

Installation

Simply download the docker-compose.yml file and bring up the containers:

git clone https://github.com/tissuemaps/tissuemaps
cd ~/tissuemaps
docker-compose up -d

The first call will take a while because container images need to be downloaded. Subsequent calls will be highly responsive.

TmDeploy

The tmdeploy Python package provides the tm_deploy program, which uses Ansible for

  • provisioning and deployment of remote virtual machines (VMs) in the cloud
  • building, running and shipping Linux containers

The program uses Ansible playbooks and Ansible container to install and configure server components in virtual Linux environments (supported distributions: Ubuntu 16.04 Xenial and CentOS-7). The same Ansible roles are used for setting up cloud VMs and containers.

For more information on invididual roles, please refer to the TmDeploy repository.

Requirements

  • Python: Many platforms are shipped with Python already pre-installed. If not, it can be downloaded from python.org. Using version 2.7.9 or higher is recommended.
  • Pip: The Python package manager is automatically installed with Python distributions obtained from python.org. Otherwise, it can be installed with the get-pip.py script.
  • OpenSSH: Using version 7.2 or higher is recommended.
  • OpenSSL (including the development libraries)
  • GCC or similar compiler
  • Docker CE (optional, required for building containers): Download the community edition for your operating system from the Docker Store. Also make sure that your operating system (OS) user has permissions to run the docker daemon. This can be achieved by adding the user to the docker group: sudo usermod -aG docker $(whoami).

Installation

pip install tmdeploy

Build & Run Containers

Containers are used for local development and testing. Setup is straight forward and doesn’t require any additional configuration.

Usage

Build container images:

tm_deploy -vv container build

Create and run containers:

tm_deploy -vv container start

Launch & Deploy VMs

Dedicated virtual machines are used for production deployment in the cloud. This requires a setup configuration file in YAML format (the default location of the file is ~/.tmaps/setup/setup.yml).

The setup configuration has two main sections:

  • cloud (CloudSection): Information about the cloud provider on which machines should be set up. Currently, three providers are supported:
  • architecture (ArchitectureSection): Computational resources that should be set up and how they should be configured. The different server components (web server, application server, database servers, …) may all be hosted on a single machine or get distributed across several machines. For consistency, clusters (ClusterSection) refers to sets of machines that get configured the same way - even if there is only a single machine. Each cluster is composed of one or more node_types (ClusterNodeTypeSection). Nodes belonging to a particular node type get assigned to one or more groups (AnsibleGroupSection), which determine how these nodes will be named and configured.

Tip

Copy one of the setup templates and modify it according to your needs.

Standalone (single-node) setup

The following Ansible groups are supported:

  • tisuemaps_server
  • tissuemaps_db_master
  • tissuemaps_db_worker

Example setup for the Elastic Compute Cloud (EC2) provider based on a CentOS 7 image:

This configuration will set up a single machine with 4 CPU cores and 3.75 GB of RAM per virtual CPU and create a seprate storage volume of 500GB size. Depending on your needs you may want to choose a different machine type and/or volume size. Note that when you omit the volume_size variable, no additional volume will be used and only the boot disk will be available.

Note

The resulting virtual machine instance will have the name tissuemaps-standalone-server-001. This naming convention is a bit of an overkill for a single server. However, it becomes useful when building multiple clusters with different types of nodes. For consistency, we stick to this naming conventing also for simple standalone use case.

Cluster (multi-node) setup

Additional components can be configured using playbooks provided by Elasticluster. The following Ansible groups are supported:

  • tisuemaps_server
  • tissuemaps_db_master
  • tissuemaps_db_worker
  • tisuemaps_compute
  • glusterfs_server
  • glusterfs_client
  • slurm_master
  • slurm_worker
  • ganglia_master
  • ganglia_monitor

Example setup for the Elastic Compute Cloud (EC2) provider based on a CentOS 7 image:

  region: eu-central-1
  network: &architecture tm-cluster
  ip_range: 10.65.8.0/24

architecture:
  name: *architecture
  clusters:

    - name: slurm
      node_types:
        - name: frontend
          count: 1
          groups:
            - name: tissuemaps_server
              vars:
                tm_version: v0.3.2
                nginx_processes: 4
                uwsgi_processes: 4
                tm_compute_cores: 4
                tm_compute_memory: 14500
                storage_directory: /glusterfs
            - name: slurm_master
            - name: glusterfs_client
            - name: ganglia_monitor
          instance:
            image: ami-7cbc6e13 #ami-fa2df395
            ssh_user: centos
            flavor: m4.2xlarge
            tags:
              - web
              - compute
            assign_public_ip: yes

        - name: worker
          count: 8
          groups:
            - name: tissuemaps_compute
              vars:
                storage_directory: /glusterfs
            - name: slurm_worker
            - name: glusterfs_client
            - name: ganglia_monitor
          instance:
            image: ami-7cbc6e13 #ami-fa2df395
            ssh_user: centos
            flavor: c4.xlarge
            tags:
              - compute

    - name: postgresql
      node_types:
        - name: coordinator
          count: 1
          groups:
            - name: tissuemaps_db_master
              vars:
                db_master_cores: 4
                db_master_memory: 29000
            - name: ganglia_monitor
          instance:
            image: ami-7cbc6e13 #ami-fa2df395
            ssh_user: centos
            flavor: i3.xlarge
            volume_size: 950
            tags:
              - storage

        - name: worker
          count: 2
          groups:
            - name: tissuemaps_db_worker
              vars:
                db_worker_cores: 4
                db_worker_memory: 29000
            - name: ganglia_monitor
          instance:
            image: ami-7cbc6e13 #ami-fa2df395
            ssh_user: centos
            flavor: i3.xlarge
            volume_size: 950
            tags:
              - storage

    - name: glusterfs
      node_types:
        - name: server
          count: 2
          groups:
            - name: glusterfs_server
            - name: ganglia_monitor
          instance:
            image: ami-7cbc6e13 #ami-fa2df395
            ssh_user: centos
            flavor: m4.xlarge
            volume_mountpoint: /srv/glusterfs
            volume_size: 2000
            tags:
              - storage

    - name: ganglia
      node_types:
        - name: server
          count: 1
          groups:
            - name: ganglia_master
          instance:
            image: ami-7cbc6e13 #ami-fa2df395
            ssh_user: centos
            flavor: t2.medium
            tags:
              - web
              - compute
            assign_public_ip: yes

This configuration will set up one TissueMAPS server instance, one database coordinator server instance, two database worker server instances, two file system server instances, one monitoring server instance and eight compute instances. Depending on your needs, you may want to choose different number of nodes, machine types or volume sizes.

Note

TissueMAPS implements fair scheduling, based on SLURM accounts. To enable this functionality, create TissueMAPS user accounts via the tm_add command line tool.

Tip

When deploying houndreds of compute nodes, it can be benefitial to use a pre-built image to speed up the cluster deployment process. To this end, configure a dedicated machine with only the tissuemaps_compute group and create a snapshot of the configured instance. The thereby created image can then be reused to quickly boot additional machines for a large cluster setup.

Credentials

To connect to the configured cloud, credentials are required, which must be provided via the following provider-specific environment variables:

  • os provider:
    • OS_PROJECT_NAME: name of the project
    • OS_AUTH_URL: URL of the identity endpoint
    • OS_USERNAME: username
    • OS_PASSWORD: password
  • gce provider:
    • GCE_PROJECT: name of the project
    • GCE_EMAIL: email associated with the project
    • GCE_CREDENTIALS_FILE_PATH: path to JSON credentials file
  • ec2 provider:
    • AWS_ACCESS_KEY_ID: access key
    • AWS_SECRET_ACCESS_KEY: corresponding secret access key

Tip

It can be convenient to place these variables into a ~/.credentials.sh file:

export AWS_ACCESS_KEY_ID=XXX
export AWS_SECRET_ACCESS_KEY=XXX

and source the file prior to running tm_deploy:

source ~/.credentials.sh
Usage

Launch virtual machine instances in the cloud:

tm_deploy -vv vm launch

The launch command calls the instance.yml playbook.

Note

The tm_deploy program will by default look for a setup file at the following location: ~/.tmaps/setup/setup.yml. Alternatively, the path to the file can be specified via the --setup-file or -s flag.

Note

An SSH key pair will be automatically created on the local machine and uploaded to the cloud. The generated key files will be placed into ~/.ssh. The name of the key pair is determined by key_name.

Note

A private network and subnetwork will get automatically created. In addition, each node gets assigned to one or more security groups (firewall) rules based on the configured tags. Only machines tagged with web will get a public IP and can be directly accessed via SSH, HTTP and HTTPS on ports 22, 80 and 443, respectively. The other machines are only accessible from within the private network. tm_deploy uses an SSH ProxyCommand to connect to machines within the private network using a web tagged machine as a bastion host.

Deploy TissueMAPS on virtual machine instances:

tm_deploy -vv vm deploy

The deploy command runs the following playbooks:

Terminate virtual machine instances:

tm_deploy -vv vm terminate

Note

The terminate command will remove virtual machine instances and storage volumes, but networks and security groups won’t get deleted.