Installation¶
TissueMAPS uses a distributed client-server model. As a consequence, client and server components typically need to be installed on different machines:
- Client code runs on the user’s local machine and interacts with the server over HTTP protocol. No local installation is required for the web user interface, since the Javascript code is served via the web server and runs in the browser. Other client interfaces need to be installed locally, but they have very few dependencies and are easy to deploy on various platforms (Linux, MacOSX, Windows).
- Server code has many more dependencies and is designed for UNIX platforms. The server may also run on the user’s local machine during development and testing, but is typically deployed on a remote virutual machine (or multiple virtual machines) in the cloud.
If you don’t use a dedicated machine, we recommend using a virtual environment for TissueMAPS related Python dependencies:
Install virtualenv and virtualenvwrapper:
pip install virtualenvwrapper
Add the following lines to your ~/.bash_profile
file:
export WORKON_HOME=$HOME/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh
Create a tissuemaps
project:
mkvirtualenv tissuemaps
You can deactivate the environment:
deactivate
and later re-activate it when needed:
workon tissuemaps
Warning
A coexisting anaconda installation doens’t play nice with virtual environments and will create problems; see potential solution. Therefore, we prefer working with good old virtualenvs.
Clients¶
Users can interact with the TissueMAPS server via a standard web browser (tested with Chrome, Firefox and Safari) without the need to install any additional software locally.
Additional HTTP client implementations are available through the TmClient repository.
TmClient¶
The tmclient
Python package provides an active programming and command line interface.
Requirements¶
- Python: Many platforms are shipped with Python already pre-installed. If not, it can be downloaded from python.org. Using version 2.7.9 or higher is recommended.
- Pip: The Python package manager is automatically installed with Python distributions obtained from python.org. Otherwise, it can be installed with the get-pip.py script.
- GCC or similar compiler
Installation¶
pip install tmclient
Server¶
The server backend consists of the following core components:
- Web server: NGINX
- Dynamic web page: TmUI
- Application server: uWSGI
- Application: TmServer and TmLibrary
- Database servers: PostgreSQL with Citus and Postgis extensions
and the following optional components, which are only required for a larger multi-machine cluster setup:
- Compute servers: Slurm (job schedular and workload manager)
- File system servers: GlusterFS (scalable network file system)
- Monitoring server: Ganglia (monitoring system)
All components are open-source and deployable on various public and private clouds.
Public cloud images¶
TissueMAPS provides publically accessible images on Amazon Web Services (AWS) in form of shared Amazon Machine Images (AMIs).
Based on the images one can simply launch a virtual machine in the cloud via the Elastic Compute Cloud (EC2) console. To find the TissueMAPS images, filter public AMIs in the Frankfurt
region for AMI Name: TissueMAPS server
(see AWS documentation).
Docker¶
TissueMAPS further provides pre-built container images, which are publicly available on Docker hub. The easiest way to install the containerized application is via Docker compose.
Requirements¶
Installation¶
Simply download the docker-compose.yml
file and bring up the containers:
git clone https://github.com/tissuemaps/tissuemaps
cd ~/tissuemaps
docker-compose up -d
The first call will take a while because container images need to be downloaded. Subsequent calls will be highly responsive.
TmDeploy¶
The tmdeploy
Python package provides the tm_deploy
program, which uses Ansible for
- provisioning and deployment of remote virtual machines (VMs) in the cloud
- building, running and shipping Linux containers
The program uses Ansible playbooks and Ansible container to install and configure server components in virtual Linux environments (supported distributions: Ubuntu 16.04 Xenial and CentOS-7). The same Ansible roles are used for setting up cloud VMs and containers.
For more information on invididual roles, please refer to the TmDeploy repository.
Requirements¶
- Python: Many platforms are shipped with Python already pre-installed. If not, it can be downloaded from python.org. Using version 2.7.9 or higher is recommended.
- Pip: The Python package manager is automatically installed with Python distributions obtained from python.org. Otherwise, it can be installed with the get-pip.py script.
- OpenSSH: Using version 7.2 or higher is recommended.
- OpenSSL (including the development libraries)
- GCC or similar compiler
- Docker CE (optional, required for building containers): Download the community edition for your operating system from the Docker Store. Also make sure that your operating system (OS) user has permissions to run the docker daemon. This can be achieved by adding the user to the docker group:
sudo usermod -aG docker $(whoami)
.
Installation¶
pip install tmdeploy
Build & Run Containers¶
Containers are used for local development and testing. Setup is straight forward and doesn’t require any additional configuration.
Usage¶
Build container images:
tm_deploy -vv container build
Create and run containers:
tm_deploy -vv container start
Launch & Deploy VMs¶
Dedicated virtual machines are used for production deployment in the cloud. This requires a setup configuration file in YAML format (the default location of the file is ~/.tmaps/setup/setup.yml
).
The setup configuration has two main sections:
- cloud (
CloudSection
): Information about the cloud provider on which machines should be set up. Currently, three providers are supported:
- cloud (
- architecture (
ArchitectureSection
): Computational resources that should be set up and how they should be configured. The different server components (web server, application server, database servers, …) may all be hosted on a single machine or get distributed across several machines. For consistency, clusters (ClusterSection
) refers to sets of machines that get configured the same way - even if there is only a single machine. Each cluster is composed of one or more node_types (ClusterNodeTypeSection
). Nodes belonging to a particular node type get assigned to one or more groups (AnsibleGroupSection
), which determine how these nodes will be named and configured.
Tip
Copy one of the setup templates and modify it according to your needs.
Standalone (single-node) setup¶
The following Ansible groups are supported:
tisuemaps_server
tissuemaps_db_master
tissuemaps_db_worker
Example setup for the Elastic Compute Cloud (EC2) provider based on a CentOS 7 image:
This configuration will set up a single machine with 4 CPU cores and 3.75 GB of RAM per virtual CPU and create a seprate storage volume of 500GB size. Depending on your needs you may want to choose a different machine type and/or volume size. Note that when you omit the volume_size
variable, no additional volume will be used and only the boot disk will be available.
Note
The resulting virtual machine instance will have the name tissuemaps-standalone-server-001
. This naming convention is a bit of an overkill for a single server. However, it becomes useful when building multiple clusters with different types of nodes. For consistency, we stick to this naming conventing also for simple standalone use case.
Cluster (multi-node) setup¶
Additional components can be configured using playbooks provided by Elasticluster. The following Ansible groups are supported:
tisuemaps_server
tissuemaps_db_master
tissuemaps_db_worker
tisuemaps_compute
glusterfs_server
glusterfs_client
slurm_master
slurm_worker
ganglia_master
ganglia_monitor
Example setup for the Elastic Compute Cloud (EC2) provider based on a CentOS 7 image:
region: eu-central-1
network: &architecture tm-cluster
ip_range: 10.65.8.0/24
architecture:
name: *architecture
clusters:
- name: slurm
node_types:
- name: frontend
count: 1
groups:
- name: tissuemaps_server
vars:
tm_version: v0.3.2
nginx_processes: 4
uwsgi_processes: 4
tm_compute_cores: 4
tm_compute_memory: 14500
storage_directory: /glusterfs
- name: slurm_master
- name: glusterfs_client
- name: ganglia_monitor
instance:
image: ami-7cbc6e13 #ami-fa2df395
ssh_user: centos
flavor: m4.2xlarge
tags:
- web
- compute
assign_public_ip: yes
- name: worker
count: 8
groups:
- name: tissuemaps_compute
vars:
storage_directory: /glusterfs
- name: slurm_worker
- name: glusterfs_client
- name: ganglia_monitor
instance:
image: ami-7cbc6e13 #ami-fa2df395
ssh_user: centos
flavor: c4.xlarge
tags:
- compute
- name: postgresql
node_types:
- name: coordinator
count: 1
groups:
- name: tissuemaps_db_master
vars:
db_master_cores: 4
db_master_memory: 29000
- name: ganglia_monitor
instance:
image: ami-7cbc6e13 #ami-fa2df395
ssh_user: centos
flavor: i3.xlarge
volume_size: 950
tags:
- storage
- name: worker
count: 2
groups:
- name: tissuemaps_db_worker
vars:
db_worker_cores: 4
db_worker_memory: 29000
- name: ganglia_monitor
instance:
image: ami-7cbc6e13 #ami-fa2df395
ssh_user: centos
flavor: i3.xlarge
volume_size: 950
tags:
- storage
- name: glusterfs
node_types:
- name: server
count: 2
groups:
- name: glusterfs_server
- name: ganglia_monitor
instance:
image: ami-7cbc6e13 #ami-fa2df395
ssh_user: centos
flavor: m4.xlarge
volume_mountpoint: /srv/glusterfs
volume_size: 2000
tags:
- storage
- name: ganglia
node_types:
- name: server
count: 1
groups:
- name: ganglia_master
instance:
image: ami-7cbc6e13 #ami-fa2df395
ssh_user: centos
flavor: t2.medium
tags:
- web
- compute
assign_public_ip: yes
This configuration will set up one TissueMAPS server instance, one database coordinator server instance, two database worker server instances, two file system server instances, one monitoring server instance and eight compute instances. Depending on your needs, you may want to choose different number of nodes, machine types or volume sizes.
Note
TissueMAPS implements fair scheduling, based on SLURM accounts. To enable this functionality, create TissueMAPS user accounts via the tm_add
command line tool.
Tip
When deploying houndreds of compute nodes, it can be benefitial to use a pre-built image to speed up the cluster deployment process. To this end, configure a dedicated machine with only the tissuemaps_compute
group and create a snapshot of the configured instance. The thereby created image can then be reused to quickly boot additional machines for a large cluster setup.
Credentials¶
To connect to the configured cloud, credentials are required, which must be provided via the following provider-specific environment variables:
os
provider:OS_PROJECT_NAME
: name of the projectOS_AUTH_URL
: URL of the identity endpointOS_USERNAME
: usernameOS_PASSWORD
: password
gce
provider:GCE_PROJECT
: name of the projectGCE_EMAIL
: email associated with the projectGCE_CREDENTIALS_FILE_PATH
: path to JSON credentials file
ec2
provider:AWS_ACCESS_KEY_ID
: access keyAWS_SECRET_ACCESS_KEY
: corresponding secret access key
Tip
It can be convenient to place these variables into a ~/.credentials.sh
file:
export AWS_ACCESS_KEY_ID=XXX
export AWS_SECRET_ACCESS_KEY=XXX
and source the file prior to running tm_deploy
:
source ~/.credentials.sh
Usage¶
Launch virtual machine instances in the cloud:
tm_deploy -vv vm launch
The launch
command calls the instance.yml playbook.
Note
The tm_deploy
program will by default look for a setup file at the following location: ~/.tmaps/setup/setup.yml
. Alternatively, the path to the file can be specified via the --setup-file
or -s
flag.
Note
An SSH key pair will be automatically created on the local machine and uploaded to the cloud. The generated key files will be placed into ~/.ssh
. The name of the key pair is determined by key_name
.
Note
A private network
and subnetwork
will get automatically created. In addition, each node gets assigned to one or more security groups (firewall) rules based on the configured tags
. Only machines tagged with web
will get a public IP and can be directly accessed via SSH, HTTP
and HTTPS
on ports 22, 80 and 443, respectively. The other machines are only accessible from within the private network. tm_deploy
uses an SSH ProxyCommand
to connect to machines within the private network using a web
tagged machine as a bastion host.
Deploy TissueMAPS on virtual machine instances:
tm_deploy -vv vm deploy
The deploy
command runs the following playbooks:
- site.yml for roles provided via the elasticluster package in case additional, non-core groups are specified in the setup configuration
- site.yml of the
tmdeploy
package
Terminate virtual machine instances:
tm_deploy -vv vm terminate
Note
The terminate
command will remove virtual machine instances and storage volumes, but networks and security groups won’t get deleted.