tmlib.tools package

Module contents

Data analysis tools.

This packages provides tools for interactive data analysis and machine learning. A tool is an implementation of Tool that can process client requests and persist the result of the analysis in form of an instance of ToolResult in the database. The client can stream the result provided via LabelLayer and Plot and visualize it on the map in an interactive an responsive manner.

Custom tools can be added by implementing Tool and import the derived class in tmlib.tools.

Consider the following example for a new tool named Foo. It implements the abstract method process_request and an additional method bar (which does nothing):

from tmlib.tools.base import Tool


class Foo(Tool):

    __icon__ = 'FOO'

    __description__ = 'Does nothing.'

    def __init__(self, experiment_id):
        super(Foo, self).__init__(experiment_id)

    def bar(self, values):
        return values

    def process_request(self, submission_id, payload):
        mapobject_type_name = payload['chosen_object_type']
        feature_name = payload['selected_feature']

        values = self.load_feature_values(mapobject_type_name, [feature_name])
        labels = self.bar(values)

        result_id = self.register_result(
            submission_id, mapobject_type_name,
            label_type='ContinuousLabelLayer'
        )

        self.save_result_values(result_id, labels)

Note

Each tool also requires a client-side representation.

tmlib.tools.get_available_tools()

Lists available tools.

Returns:

List[str]

names of available tools

tmlib.tools.get_tool_class(name)

Gets the tool-specific implementation of Tool.

Parameters:

name: str

name of the tool

Returns:

type

tool class

Submodules

tmlib.tools.base module

Base classes for data analysis tools.

class tmlib.tools.base.Classifier(experiment_id)

Bases: tmlib.tools.base.Tool

Abstract base class for classification tools.

Parameters:

experiment_id: int

ID of the experiment for which the tool request is made

predict(feature_data, model, scaler=None)

Predicts class labels for mapobjects based on feature_values using pre-trained model.

Parameters:

feature_data: pandas.DataFrame

feature values based on which labels should be predicted

model: sklearn.base.BaseEstimator

model fitted on training data

scaler: sklearn.preprocessing.data.RobustScaler, optional

scaler fitted on training data to rescale feature_data the same way

Returns:

pandas.Series

predicted labels for each mapobject

train_supervised(feature_data, labels, method, n_fold_cv)

Trains a classifier for mapobjects based on feature_data and known labels.

Parameters:

feature_data: pandas.DataFrame

feature values that should be used to train the classifier

labels: Dict[int, int]

mapping of Mapobject ID to assigned label

method: str

method to use for classification

n_fold_cv: int

number of crossvalidation iterations (n-fold)

Returns:

Tuple[sklearn.base.BaseEstimator]

trained supervised classifier and scaler

train_unsupervised(feature_data, k, method)

Trains a classifier that groups mapobjects into k classes based on feature_data.

Parameters:

feature_data: pandas.DataFrame

feature values that should be used to train the classifier

k: int

number of classes

method: str

model to use for clustering

Returns:

Tuple[sklearn.base.BaseEstimator]

trained unsupervised classifier and scaler

class tmlib.tools.base.Tool(experiment_id)

Bases: object

Abstract base class for data analysis tools.

Tools use the Pandas DataFrame data container. This is compatible with standard machine learning libries, such as Scikit-Learn Caffe or Keras.

Parameters:

experiment_id: int

ID of the experiment for which the tool request is made

calculate_extrema(mapobject_type_name, feature_name)

Calculates minimum and maximum values of a given feature and mapobject type.

Parameters:

mapobject_type_name: str

name of the selected MapobjectType

feature_names: List[str]

name of each selected Feature

Returns:

Tuple[float]

min and max

get_random_mapobject_subset(mapobject_type_name, n)

Selects a random subset of mapobjects.

Parameters:

mapobject_type_name: str

name of the selected MapobjectType

n: int

number of mapobjects that should be selected at random

Returns:

Tuple[int]

IDs of selected mapobject

identify_features_with_null_values(feature_data)

Identifies features with NULL values (including NaNs).

Parameters:

feature_data: pandas.DataFrame

data frame where columns are feature names and rows and objects

Returns:

Tuple[Union[str, int]]

name of the feature and the number of objects with NULL values

load_feature_values(mapobject_type_name, feature_names, mapobject_ids=None)

Loads values for each given feature of the given mapobject type.

Parameters:

mapobject_type_name: str

name of the selected MapobjectType

feature_names: List[str]

name of each selected Feature

mapobject_ids: List[int], optional

ID of each Mapobject for which values should be selected; if None values for all objects will be loaded (default: None)

Returns:

pandas.DataFrame

dataframe where columns are features and rows are mapobjects indexable by their ID

partition_mapobjects(mapobject_type_name, n)

Splits mapobjects into partitions of size n.

Parameters:

mapobject_type_name: str

name of the selected MapobjectType

n: int

number of mapobjects per partition

Returns:

List[List[int]]

mapobject IDs

process_request(submission_id, payload)

Processes a tool request sent by the client.

Parameters:

submission_id: int

ID of the corresponding job submission

payload: dict

an arbitrary mapping provided by the client that describes the job

register_result(submission_id, mapobject_type_name, result_type, **result_attributes)

Registers a result for the given tool request.

Parameters:

submission_id: int

ID of the corresponding job submission

mapobject_type_name: str

name of the selected MapobjectType

result_type: str

name of a class derived from ToolResult

**result_attributes: dict, optional

result-specific attributes as key-value value pairs that get parsed to the constructor of the implemented result_type

Returns:

int

ID of the tool result

save_result_values(mapobject_type_name, result_id, data)

Saves generated label values.

Parameters:

mapobject_type_name: str

name of the selected MapobjectType

result_id: int

ID of a registerd ToolResult

data: pandas.Series

series with multi-level index for “mapobject_id” and “tpoint”

See also

tmlib.models.result.LabelValues

tmlib.tools.classification module

class tmlib.tools.classification.Classification(experiment_id)

Bases: tmlib.tools.base.Classifier

Tool for supervised classification.

Parameters:

experiment_id: int

ID of the experiment for which the tool request is made

process_request(submission_id, payload)

Processes a client tool request and inserts the generated result into the database. The payload is expected to have the following form:

{
    "choosen_object_type": str,
    "selected_features": [str, ...],
    "training_classes": [
        {
            "name": str,
            "object_ids": [int, ...],
            "color": str
        },
        ...
    ],
    "options": {
        "method": str,
        "n_fold_cv": int
    }

}
Parameters:

submission_id: int

ID of the corresponding job submission

payload: dict

description of the tool job

tmlib.tools.clustering module

class tmlib.tools.clustering.Clustering(experiment_id)

Bases: tmlib.tools.base.Classifier

Parameters:

experiment_id: int

ID of the experiment for which the tool request is made

process_request(submission_id, payload)

Processes a client tool request and inserts the generated result into the database. The payload is expected to have the following form:

{
    "choosen_object_type": str,
    "selected_features": [str, ...],
    "options": {
        "method": str,
        "k": int
    }
}
Parameters:

submission_id: int

ID of the corresponding job submission

payload: dict

description of the tool job

tmlib.tools.heatmap module

class tmlib.tools.heatmap.Heatmap(experiment_id)

Bases: tmlib.tools.base.Tool

Parameters:

experiment_id: int

ID of the experiment for which the tool request is made

process_request(submission_id, payload)

Processes a client tool request, where the payload is expected to have the following form:

{
    "choosen_object_type": str,
    "selected_feature": str
}
Parameters:

submission_id: int

ID of the corresponding job submission

payload: dict

description of the tool job

tmlib.tools.jobs module

class tmlib.tools.jobs.ToolJob(tool_name, arguments, output_dir, submission_id, user_name)

Bases: tmlib.jobs.Job

Class for a tool job, which can be submitted to a cluster for asynchronous processing of a client request.

Parameters:

tool_name: str

name of the respective tool

arguments: List[str]

command line arguments

output_dir: str

absolute path to the output directory, where log reports will be stored

submission_id: int

ID of the corresponding submission

user_name: str

name of the submitting user

name

str:name of the job

tmlib.tools.manager module

class tmlib.tools.manager.ToolRequestManager(experiment_id, tool_name, verbosity)

Bases: tmlib.submission.SubmissionManager

Command line interface for handling TissueMAPS tool requests.

Parameters:

experiment_id: int

ID of the processed experiment

tool_name: str

name of the corresponding tool

verbosity: int

logging verbosity level

create_job(submission_id, user_name, duration='06:00:00', memory=4294.967296, cores=2)

Creates a job for asynchroneous processing of a client tool request.

Parameters:

submission_id: int

ID of the corresponding submission

user_name: str

name of the submitting user

duration: str, optional

computational time that should be allocated for the job in HH:MM:SS format (default: "06:00:00")

memory: int, optional

amount of memory in Megabyte that should be allocated for the job (defaults to resource.max_cores_per_job x resource.max_memory_per_core)

cores: int, optional

number of CPU cores that should be allocated for the job (defaults to resource.max_cores_per_job)

Returns:

tmlib.tools.jobs.ToolJob

tool job

get_log_output(submission_id)

Gets log output (standard output and error).

Parameters:

submission_id: int

ID of the tool job Submission

Returns:

Dict[str, str]

“stdout” and “stderr” for the given job

get_payload(submission_id)

Get payload for tool request.

Parameters:

submission_id: int

ID of the respective Submission

Returns:

dict

payload

store_payload(payload, submission_id)

Persists payload for tool request on disk.

Parameters:

payload: dict

tool job description in form of a mapping of key-value pairs

submission_id: int

ID of the corresponding Submission

Raises:

TypeError

when payload is not a mapping

tm_tool.py

TissueMAPS command line interface for processing client tool requests.

experiment_id

ID of the experiment that should be processed

-h, --help

show this help message and exit

--verbosity, -v

increase logging verbosity

--name {Clustering,Heatmap,Classification}, -n {Clustering,Heatmap,Classification}

name of the tool

--submission_id <submission_id>, -s <submission_id>

ID of the corresponding submission