Developer guide

Frontend

Architecture

The web frontend of TissueMAPS is largely based on the framework AngularJS.

Classes and functions encapsulating core application logic, which are therefore not UI-specific, are separated from the UI-specifc code. Code comprising core application logic is located in the subdirectory core and the other directories are reserved for code handling views, user input, as well as AngularJS-related things. Several server-side resources like the Experiment also have a client-side representation. However, it is important to note that these concepts are not exactly the same. A client-side object is constructed from a serialized server-side object, but it may have other properties that are only of interest to code dealing with the user interface.

Data access objects (DAO)

Whenever a class in TissueMAPS wants to access a resource from the server, the call has to go through a model-specific data access object (DAO). These objects issue HTTP-requests and handle the deserialization process when contructing actual model class instances from JSON objects.

Dialogs

To display messages to the user by means of dialogs (popup windows), TissueMAPS provides a service called dialogService. For example, to inform the user that some request has been performed successfully, this service can be used like this:

dialogService.info('Task XY has been performed successfully')

Similar utility methods exist for error or warning messages.

Errors that result because of a server-side exception (such as authorization failures or not found resources) are caught by an errorInterceptor and automatically displayed to the user. Thus, manually handling such errors is not necessary.

Viewer

The main class of the TissueMAPS interface is the Viewer class. The viewer is in charge of visualizing an experiment and to handle related resources such as mapobjects and tools with which the experiment can be analyzed.

The actual visualization of the microscopy images is done with an extended version of OpenLayers that allows WebGL-powered rendering of tiled images and vector data, as well as additive blending of images. The interface to OpenLayers is hidden within several wrapper classes such as ImageTileLayer and VectorTileLayer. Ultimately, whenever the user scrolls the map, OpenLayers will prompt the underlying layer objects to perform a GET request to the tile server to get a slice of the pyramid image or a collection of vectoral data (for example cell outlines). Analoguous to these layer classes, the Viewport class is a wrapper around an OpenLayers Map object and is used within TissueMAPS to add and remove layer objects.

Data analysis tools

The TissueMAPS interface uses a plugin mechanism for data analysis tools. This mechanism ensures that each implemented Tool can be selected from the toolbar and results returned from the server are interpreted correctly. To make use of this plugin mechanism, tool-specific code must be located under src/tools/<ToolName> and provide an AngularJS controller named <ToolName>Ctrl.ts and a HTML template named <ToolName>Template.html. When clicking on a tool button in the toolbar, TissueMAPS will create an instance of this controller and link it to a tool window-specific $scope. This tool window will then further be populated with the template content.

Templates can make use of several pre-defined widgets. For example, the following tag will insert a widget with which the desired MapobjectType can be selected:

<tm-mapobject-type-widget></tm-mapobject-type-widget></p>

Another widget can be used to select a specific Feature:

<tm-feature-selection-widget
  selected-mapobject-type="mapobjectTypeWidget.selectedType">
</tm-feature-selection-widget>

Implementations of existing tools provide a good idea of how to implement a new tool.

HTTP client interfaces

Backend

The TissueMAPS backend is implemented to large extent in Python 2.7.

The code is distributed across different repositories, each of them hosting a Python package:

There are several reasons for splitting code across different repositories:

  • Independent usage: Packages tmlibrary and jtlibrary can be used indepently, also jtmodules and jtlibrary packages can be used independent of jterator. Separating these packages keeps requirements for each of them to a minimum.
  • Independent installation: Python packages can be pip-installed directly from Github. To this end, each repository should only contain a single package (setup.py file).
  • Code locality: In a distributed multi-node setup, not all packages are required on the same machine. For example, the tmserver package gets only deployed on the machine hosting the web server, while the tmlibrary package gets deployed on all compute servers.

Documentation

TissueMAPS uses numpydoc for code documentation. Please make yourself familiar with the NumPy style and reStructuredText and follow the PEP 257 docstring conventions to ensure that your documentation will be build correctly. Since Python is a dynamically typed language, we put emphasis on rigorously documentating the type of parameters and return values. To this end, we use type hints as specified by PEP 484 (see typing module).

Coding style

Please take time to read through PEP 8 - the official style guide for Python code - and stick to it!

Tests

TissueMAPS uses pytest together with tox and runs integrated tests with jenkins. Tests should be placed in a separate repository folder called tests outside of the package sibling to the package root folder.

Server development

REST API

The RESTful API is implemented in tmserver.api in form of a Flask Blueprint. All HTTP requests starting with /api will thereby automatically get associated with view functions in the api package. Module in which api view functions are defined must be imported at the package level to make them available to the blueprint. Care must be taken with respect to code structure to prevent circular imports, which can easily occur due to the way blueprints are implemented in Flask, see note in docs.

All custom exceptions derived from HTTPException and defined in tmserver.error will automatically be handled via error handlers. The client auto-injects the resulting error messages and displays them to the user.

View functions should be documented using the HTTPDomain Sphinx extension.

The TmClient repository provides client REST API wrappers in different languages. When you add support for additional routes, please implement the interfaces such that they are as similar as possible between languages.

Library development

Data models

TissueMAPS uses PostgreSQL via the SQLAlchemy Object Relational Mapper (ORM). The respective model classes are implemented in the tmlib.models package.

The main tissuemaps database schema manages user credentials and permissions and holds a references for each Experiment. Experiment related data reside in separate schemas. These experiment-specific schemas are called tissuemaps_experiment_<id>, where id is the ID of the respective experiment assigned by ExperimentReference in the main database. Therefore, models mapping to data that belongs to an experiment must implement ExperimentModel, while models representing global data must implement MainModel. All derived model classes should be imported such that they are available at the level of the tmlib.models package namespace.

Microscope types

TissueMAPS uses the Bio-Formats library to read heterogenous microscope image and metadata file formats.

Since TissueMAPS is written in Python, we cannot use the Java library directly. To this end, we make use of the python-bioformats package, which interacts with the library via a Java bridge. Unfortunately, there are several issues with this approach, ranging from incomplete metadata parsing to large memory consumption.

Although Bio-Formats supports a large number of file formats, many of the vendor-specific formats are not fully supported, in particular when it comes to reading additional microscope-specific metadata files, and crucial information is sometimes missing. Therefore, TissueMAPS does not fully rely on Bio-Formats in terms of reading and interpreting image metadata, but uses the following multi-step approach instead (implemented as the “image conversion” workflow stage):

  • The metaextract step extracts metadata in form of OMEXML from each image file using the showinf Bio-Formats command line tool.
  • The metaconfig step then combines metadata extracted from each image with metadata provided by other sources, for example via custom microscope-type specific MetadataReader classes or (in the worst case) user input, and saves the configured metadata in the database. The TissueMAPS database schema uses similar terminology as the OME schema (e.g. Plate, Well, Channel), but puts less emphasis on microscope details and more on multi-scale map representation of images and segmented objects (e.g. ChannelLayer, LabelLayer or MapobjectSegmentation).
  • The imextract step finally extract the pixel data from image files and stores them in a standarized format. Currently we use HDF5, but this may be subject to change. Developers are therefore advised to access image data via the respective FileModel classes.

The most critical step in this stage is metaconfig. In fact, it is crucial for the entire subsequent workflow. Because it is so important that images are handled correctly, TissueMAPS requires users to specify the microscope_type for each experiment. To register a new microscope, developers must implement a microscope-specific MetadataHandler and MetadataReader. Please refer to the docuementation of the metaconfig step for more details.

Workflows

Workflows can be dynamically assembled from steps, which are implemented as subpackages of tmlib.workflow. To this end, TissueMAPS builds on top of GC3Pie - a high-level API for building and managing large, inter-dependent task collections with support for different cluster backends.

Steps get automatically equipped with an active programming interface for distributed computing (by implementing WorkflowStepAPI) as well as a command line interface (by implementing WorkflowStepCLI). By subclassing these two base classes, you basically create a new step. This design makes it easy to develop a new step and plug it into an existing workflow. Workflows can further be easily customized by subclassing WorkflowDependencies or any other already implemented workflow type, such as CanonicalWorkflowDependencies.

The main entry point for a step is the __main__() method of WorkflowStepCLI, which is accessed by a WorkflowStepJob via the step-specific command line interface autogenerated from the CLI class derived from WorkflowStepCLI.

For more information on how to develop new steps and combine them into workflows, please refer to documentation of the tmlib.workflow package. Already implemented steps should further serve as good examples.

Note

Server-side developed steps and workflows are fully functional and don’t require any client-side modifications. They will automatically integrate into the UI workflow manager.

Data analysis tools

Data analysis tools allow users to interactively analyse image data in a visually asisted way in the viewer. They are implemented in the tmlib.tools package and make use of the Pandas library. The server submits client tool requests to the available computational resources for asynchronous processing. This is done on the one hand to remove load from the web server.

The main entry point for tool functionliaty is __main__() method of ToolRequestManger. It is accessed by a ToolJob via the command line using the tm_tool script, which gets autogenerated from the parser provided by ToolRequestManager.

For more information on how to develop new tools and make them available to the UI, please refer to the documentation of the tmlib.tools package. Already implemented tools, such as Clustering or Heatmap should also serve as an example and a good starting point for developing a new tool.

Note

In contrast to a workflow step, a tool also requires some frontend developement. This design decision was made to give developers as much flexiblity as possible when it comes to the design of new tools. The potential uses cases are consequently too broad to be handled entirely server-side. Please refer to data analysis tools in the frontend section for more details on how to develop a tool client-side.

A WorkflowStep and a Tool both represent a distributed computational task, but from a conceptual point of view, they are two different things. The former is used in the workflow manager for general image processing tasks, while the latter is used in the viewer for machine learning tasks. This doesn’t mean that image processing and machine learning should be handled separately per-se. For example, pixel-based image segmentation would be an execellant use case for a tool.

Jterator module development

TissueMAPS provides with jterator a cross-language pipeline engine for scientific computing and image analysis. The program uses Python as a glue language, but can plug in “modules” written in different languages. It makes use of easily human readable and modifiable YAML files to define pipeline logic and module input/output.

Python was chosen as programming language because it represents a good trade-off between development time and performance. The language is relatively easy to learn and its interpreted nature facilitates scripting and testing. The powerful NumPy package provides an great framework for n-dimensional array operations. In addition, there are numerous established C/C++ image processing libraries with Python bindings that use NumPy arrays as data container:

This makes it easy to combine algorithms implemented in different libraries into an image analysis workflow. In addition to Python, pipelines can integrate modules written in other programming languages frequently used for scientific computing:

Main ideas

  • Simple development and testing: A module is simply a file that defines a function for the main entry point and creates a namespace.
  • Short list of dependencies: A module only requires the NumPy package.
  • Independence of processing steps: Module arguments are either NumPy arrays, scalars (integer/floating point numbers, strings or booleans), or a sequence of scalars. Modules don’t produce any side effects. They are unit testable.
  • Strict separation of GUI handling and actual processing: Modules don’t interact with a GUI or file system. Their main function receives images in form of arrays as input arguments and returns images in form of arrays. They can optionally generate and return a JSON representation of a figure which can be embedded in a website for interactive visualization.
  • Cross-language compatibility: Restriction of module input/output to NumPy arrays and build-in Python types to facilitate interfaces to other languages.

Pipeline descriptor file

A pipeline is a sequence of connected modules that collectively represents a computational task (somewhat similar to a UNIX-world pipeline), i.e. a unit of execution that runs in memory on a single compute unit. Order of modules and pipeline input are defined in a .pipe YAML pipeline descriptor file. Input/output settings for each module are provided by additional .handles YAML module I/O descriptor files.

Here is an example of a .pipe.yaml YAML descriptor file:

description: An example pipeline that does nothing.

version: '0.0.1'

input:

    channels:
      - name: channel1
        correct: true
      - name: channel2
        correct: true

pipeline:

    -   source: python_module.py
        handles: handles/my_python_module.handles.yaml
        active: true

    -   source: r_module.r
        handles: handles/my_r_module.handles.yaml
        active: true

    -   source: matlab_module.m
        handles: handles/my_m_module.handles.yaml
        active: true

The pipeline section is an array of included modules. Module handles files can in principle reside at any location and the path to the files has to be provided. This path can either be absolute or relative to the project directory (as in the example above). Module source files must reside within the language-specific jtmodules package, since they should be importable. Only the file basename must be provided. Modules are run or skipped depending on the value of active. Alternatively, modules can of course also be inactivated by commenting them out; however, this is incompatible with the user interface.

All channels specified in the input section will be loaded by the program and the corresponding images made available to modules in the pipeline. Images will optionally be corrected for illumination artifacts depending on the value of correct.

Modules

Modules are the actual executable code in the pipeline. A module is a file that defines a main() function, which serves as the main entry point for the program. Modules must be free of side effects, in particular they don’t read from or write to disk. This will be enforced by jterator by calling the module function in a sandbox. Special modules are available for storing data generated within a pipeline, such as segmentation results and features extracted for the segmented objects.

Python modules encapsulate code and provide a separate scope and namespace. Conceptually they are classes with attributes (constants) and static methods (functions). For compatibility we use a similar implementation for non-Python languages to provide the user a similar interface across different languages (Matlab, R, …).

To this end, each module must define a VERSION constant and a main() function. The main function serves as the main entry point and will be called by jterator when executed as part of a pipeline. You can add additional “private” functions/methods to the module. Note, however, that code, which is intended for reuse across modules, should be rather imported from a separate library, such as jtlibrary or any other installable package.

Shown here are minimalistic examples of modules in different languages. They don’t do much, except returning one of the input arguments.

Python example
import collections
import jtlib

VERSION = '0.0.1'

Output = collections.namedtuple('Output', ['output_image', 'figure'])

def main(input_image, plot=False):

    if plot:
        figure = jtlib.plotting.create_figure()
    else:
        figure = ""

    return Output(input_image, figure)

The module named python_module (residing in a file called python_module.py) can be imported and called as follows:

import numpy as np
import jtmodules.python_module

img = np.zeros((10,10))
jtmodules.python_module.main(img)

Note

The return type of main() must be namedtuple. Instances of this type behave like tuple objects, which can be indexed and are iterable, but on top fields are accessible via attribute lookup.

Matlab example

To get the same interface and namespace in Matlab, we need to implement the main() function as a static method of class matlab_module.

import jtlib.*;

classdef matlab_module

    properties (Constant)

        VERSION = '0.0.1'

    end

    methods (Static)

        function [output_image, figure] = main(input_image, plot)

            if nargin < 2
                plot = false;
            end

            if plot
                figure = jtlib.plotting.create_figure();
            else
                figure = '';
            end

            output_image = input_image;

        end

    end
end

Thereby, the module named matlab_module (residing in a file called matlab_module.m) can be imported and called the same way as Python modules:

import jtmodules.matlab_module;

img = (10, 10);
jtmodules.matlab_module.main(img)

Note

The Matlab main() function must return output arguments using the [] notation.

Warning

Matlab class struct is not supported for input arguments or return values!

R example

To implement the same interface in R, we have to get a bit more creative, since R is not a proper programming language (Oops! Did I just say that?).

library(jtlib)

r_module <- new.env()

r_module$VERSION <- '0.0.1'

r_module$main <- function(input_image, plot=FALSE){

    output <- list()
    output[['output_image']] <- input_image

    if (plot) {
        output[['figure']] <- jtlib::plotting.create_figure()
    } else {
        output[['figure']] <- ''
    }

    return(output)
}

The module named r_module (residing in a file called r_module.r) can now be imported and called using $ as namespace separator:

library(jtmodules)

img <- matrix(0, 10, 10)
jtmodules::r_module$main(img)

Note

The return value of main() in R must be a list with named members.

Module descriptor files

Input and output of modules is described in module-specific handles files:

version: 0.0.1

input:

    - name: string_example
      type: Character
      value: mystring

    - name: integer_example
      type: Numeric
      value: 1
      options:
        - 1
        - 2

    - name: piped_image_input_example
      type: IntensityImage
      key: a.unique.string

    - name: array_example
      type: Sequence
      value:
        - 2.3
        - 1.7
        - 4.6

    - name: boolean_example
      type: Boolean
      value: true

    - name: plot
      type: Plot
      value: false

output:

    - name: piped_image_output_example
      type: LabelImage
      key: another.unique.string

    - name: figure
      type: Figure

Each handle item in the input section describes an argument that is passed to the main() function of the module. Each item in the output section describes an argument of the module-specifig output object (return value), which should be returned by the main() function.

The handle type descriped in the YAML file is mirrored by a Python class, which asserts data types and handles input/output. Constant input arguments have a value key, which represents the actual argument. Images can be piped between modules and the corresponding input arguments have a key key. It serves as a lookup for the actual value, i.e. the pixels array, which is stored an an in-memory key-value store. The value of key in the YAML description must be unique across the entire pipeline. Since names of .handles files are unique, best practice is to use the handle filename as a namespace and combine them with the name of the output handle to create a unique hashable identifier (for the above Python example the key would resolve to “my_py_module.output_image”).

The following handle types are implemented:

  • Constant input handle types: parameters that specify the actual argument value (derived from InputHandle)
    • Numeric: number (Union[int, float])
    • Character: string (basestring)
    • Boolean: boolean (bool)
    • Sequence: atomic array (List[Union[int, float, basestring, bool]])
    • Plot: boolean (bool)
  • Pipe input and output handle types: parameters that specify a “key” to retrieve the actual argument value (derived from PipeHandle)
    • IntensityImage: grayscale image with 8-bit or 16-bit unsigned integer data type (Union[numpy.uint8, numpy.uint16])
    • LabelImage: labeled image with 32-bit integer data type (numpy.int32)
    • BinaryImage: binary image with boolean data type (numpy.bool)
    • SegmentedObjects: subtype of LabelImage, with additional methods for registering connected components in the image as objects, which can subsequently be used by measurement modules to extract features for the objects
  • Measurement output handle type: parameters that specify object and object_ref to reference instances of SegmentedObjects and optionally channel_ref to reference an instance of IntensityImage (derived from OutputHandle)
    • Measurement: array of multidimensional matrices (one per time point), where columns are features and rows are segmented objects (List[pandas.DataFrame])
  • Figure output handle type: parameters that register the provided value as a figure (derived from OutputHandle)

Values of SegmentedObjects, Measurement, and Figure handles are automatically persisted on disk. To this end, segmented objects need to be registered via the register_objects module.

Note

Values of SegmentedObjects and Measurement will become available in the viewer as objects and features, respectively, and can be used by data analysis tools.

The Plot input and Figure output handle types are used to implement plotting functionality. The program will automatically set plot to false for running in headless mode on the cluster.

Warning

To implement plotting use the provided Plot and Figure handle types. Otherwise, headless mode can’t be enforced.

Code structure

Modules should be light weight wrappers and mainly concerned with handling input and returning output in the expected format (and optionally the creation of a figure). Optimally, the actual image processing gets delegated to libraries to facilitate reuse of code by other modules. Importing modules in other modules is discouraged. You can use external libraries or implement custom solutions in the provided jtlibrary package (available for each of the implemented languages).

Naming conventions

Since Jterator is written in Python, we recommend following PEP 8 style guide for module and function names. Therefore, we use short all-lowercase names for modules with underscores separating words if necessary, e.g. modulename or long_module_name. See naming conventions.

Coding style

For Python, we encourage following PEP 8 style guide. For Matlab and R we recommend following Google’s style guidelines, see Matlab style guide (based on Richard Johnson’s MATLAB Programming Style Guidelines) and R style guide.

Figures

The plotting library plotly is used to generate interactive plots for visualization of module results in the web-based user interface. The advantage of this library is that is has a uniform API and generates identical outputs across different languages (Python, Matlab, R, Julia). Each module creates only one figure. If you have the feeling that you need more than one figure, it’s an indication that you should break down your code into multiple modules.

Documentation

We use sphinx with the numpydoc extension to auto-generate module documentation (see also documentation).

For Matlab code, use the Sphinx Matlab domain.

Each module must have a docstring that describes its functionality and purpuse. In addition, a dosctring must be provided for the main() function that describes input parameters and return values.