tmlib.tools package¶
Module contents¶
Data analysis tools.
This packages provides tools for interactive data analysis and machine learning.
A tool is an implementation of Tool
that can process client requests and persist the result of the analysis
in form of an instance of ToolResult
in the database. The client can stream the result provided via
LabelLayer
and
Plot
and visualize it on the map in an
interactive an responsive manner.
Custom tools can be added by implementing Tool
and import the derived class in tmlib.tools
.
Consider the following example for a new tool named Foo
.
It implements the abstract method
process_request
and an
additional method bar
(which does nothing):
from tmlib.tools.base import Tool
class Foo(Tool):
__icon__ = 'FOO'
__description__ = 'Does nothing.'
def __init__(self, experiment_id):
super(Foo, self).__init__(experiment_id)
def bar(self, values):
return values
def process_request(self, submission_id, payload):
mapobject_type_name = payload['chosen_object_type']
feature_name = payload['selected_feature']
values = self.load_feature_values(mapobject_type_name, [feature_name])
labels = self.bar(values)
result_id = self.register_result(
submission_id, mapobject_type_name,
label_type='ContinuousLabelLayer'
)
self.save_result_values(result_id, labels)
Note
Each tool also requires a client-side representation.
-
tmlib.tools.
get_available_tools
()¶ Lists available tools.
Returns: List[str]
names of available tools
-
tmlib.tools.
get_tool_class
(name)¶ Gets the tool-specific implementation of
Tool
.Parameters: name: str
name of the tool
Returns: type
tool class
Submodules¶
tmlib.tools.base module¶
Base classes for data analysis tools.
-
class
tmlib.tools.base.
Classifier
(experiment_id)¶ Bases:
tmlib.tools.base.Tool
Abstract base class for classification tools.
Parameters: experiment_id: int
ID of the experiment for which the tool request is made
-
predict
(feature_data, model, scaler=None)¶ Predicts class labels for mapobjects based on feature_values using pre-trained model.
Parameters: feature_data: pandas.DataFrame
feature values based on which labels should be predicted
model: sklearn.base.BaseEstimator
model fitted on training data
scaler: sklearn.preprocessing.data.RobustScaler, optional
scaler fitted on training data to rescale feature_data the same way
Returns: pandas.Series
predicted labels for each mapobject
-
train_supervised
(feature_data, labels, method, n_fold_cv)¶ Trains a classifier for mapobjects based on feature_data and known labels.
Parameters: feature_data: pandas.DataFrame
feature values that should be used to train the classifier
labels: Dict[int, int]
mapping of
Mapobject
ID to assigned labelmethod: str
method to use for classification
n_fold_cv: int
number of crossvalidation iterations (n-fold)
Returns: Tuple[sklearn.base.BaseEstimator]
trained supervised classifier and scaler
-
train_unsupervised
(feature_data, k, method)¶ Trains a classifier that groups mapobjects into k classes based on feature_data.
Parameters: feature_data: pandas.DataFrame
feature values that should be used to train the classifier
k: int
number of classes
method: str
model to use for clustering
Returns: Tuple[sklearn.base.BaseEstimator]
trained unsupervised classifier and scaler
-
-
class
tmlib.tools.base.
Tool
(experiment_id)¶ Bases:
object
Abstract base class for data analysis tools.
Tools use the Pandas DataFrame data container. This is compatible with standard machine learning libries, such as Scikit-Learn Caffe or Keras.
Parameters: experiment_id: int
ID of the experiment for which the tool request is made
-
calculate_extrema
(mapobject_type_name, feature_name)¶ Calculates minimum and maximum values of a given feature and mapobject type.
Parameters: mapobject_type_name: str
name of the selected
MapobjectType
feature_names: List[str]
name of each selected
Feature
Returns: Tuple[float]
min and max
-
get_random_mapobject_subset
(mapobject_type_name, n)¶ Selects a random subset of mapobjects.
Parameters: mapobject_type_name: str
name of the selected
MapobjectType
n: int
number of mapobjects that should be selected at random
Returns: Tuple[int]
IDs of selected mapobject
-
identify_features_with_null_values
(feature_data)¶ Identifies features with NULL values (including NaNs).
Parameters: feature_data: pandas.DataFrame
data frame where columns are feature names and rows and objects
Returns: Tuple[Union[str, int]]
name of the feature and the number of objects with NULL values
-
load_feature_values
(mapobject_type_name, feature_names, mapobject_ids=None)¶ Loads values for each given feature of the given mapobject type.
Parameters: mapobject_type_name: str
name of the selected
MapobjectType
feature_names: List[str]
name of each selected
Feature
mapobject_ids: List[int], optional
ID of each
Mapobject
for which values should be selected; ifNone
values for all objects will be loaded (default:None
)Returns: pandas.DataFrame
dataframe where columns are features and rows are mapobjects indexable by their ID
-
partition_mapobjects
(mapobject_type_name, n)¶ Splits mapobjects into partitions of size n.
Parameters: mapobject_type_name: str
name of the selected
MapobjectType
n: int
number of mapobjects per partition
Returns: List[List[int]]
mapobject IDs
-
process_request
(submission_id, payload)¶ Processes a tool request sent by the client.
Parameters: submission_id: int
ID of the corresponding job submission
payload: dict
an arbitrary mapping provided by the client that describes the job
-
register_result
(submission_id, mapobject_type_name, result_type, **result_attributes)¶ Registers a result for the given tool request.
Parameters: submission_id: int
ID of the corresponding job submission
mapobject_type_name: str
name of the selected
MapobjectType
result_type: str
name of a class derived from
ToolResult
**result_attributes: dict, optional
result-specific attributes as key-value value pairs that get parsed to the constructor of the implemented result_type
Returns: int
ID of the tool result
-
save_result_values
(mapobject_type_name, result_id, data)¶ Saves generated label values.
Parameters: mapobject_type_name: str
name of the selected
MapobjectType
result_id: int
ID of a registerd
ToolResult
data: pandas.Series
series with multi-level index for “mapobject_id” and “tpoint”
See also
tmlib.models.result.LabelValues
-
tmlib.tools.classification module¶
-
class
tmlib.tools.classification.
Classification
(experiment_id)¶ Bases:
tmlib.tools.base.Classifier
Tool for supervised classification.
Parameters: experiment_id: int
ID of the experiment for which the tool request is made
-
process_request
(submission_id, payload)¶ Processes a client tool request and inserts the generated result into the database. The payload is expected to have the following form:
{ "choosen_object_type": str, "selected_features": [str, ...], "training_classes": [ { "name": str, "object_ids": [int, ...], "color": str }, ... ], "options": { "method": str, "n_fold_cv": int } }
Parameters: submission_id: int
ID of the corresponding job submission
payload: dict
description of the tool job
-
tmlib.tools.clustering module¶
-
class
tmlib.tools.clustering.
Clustering
(experiment_id)¶ Bases:
tmlib.tools.base.Classifier
Parameters: experiment_id: int
ID of the experiment for which the tool request is made
-
process_request
(submission_id, payload)¶ Processes a client tool request and inserts the generated result into the database. The payload is expected to have the following form:
{ "choosen_object_type": str, "selected_features": [str, ...], "options": { "method": str, "k": int } }
Parameters: submission_id: int
ID of the corresponding job submission
payload: dict
description of the tool job
-
tmlib.tools.heatmap module¶
-
class
tmlib.tools.heatmap.
Heatmap
(experiment_id)¶ Bases:
tmlib.tools.base.Tool
Parameters: experiment_id: int
ID of the experiment for which the tool request is made
-
process_request
(submission_id, payload)¶ Processes a client tool request, where the payload is expected to have the following form:
{ "choosen_object_type": str, "selected_feature": str }
Parameters: submission_id: int
ID of the corresponding job submission
payload: dict
description of the tool job
-
tmlib.tools.jobs module¶
-
class
tmlib.tools.jobs.
ToolJob
(tool_name, arguments, output_dir, submission_id, user_name)¶ Bases:
tmlib.jobs.Job
Class for a tool job, which can be submitted to a cluster for asynchronous processing of a client request.
Parameters: tool_name: str
name of the respective tool
arguments: List[str]
command line arguments
output_dir: str
absolute path to the output directory, where log reports will be stored
submission_id: int
ID of the corresponding submission
user_name: str
name of the submitting user
-
name
¶ str:name of the job
-
tmlib.tools.manager module¶
-
class
tmlib.tools.manager.
ToolRequestManager
(experiment_id, tool_name, verbosity)¶ Bases:
tmlib.submission.SubmissionManager
Command line interface for handling TissueMAPS tool requests.
Parameters: experiment_id: int
ID of the processed experiment
tool_name: str
name of the corresponding tool
verbosity: int
logging verbosity level
-
create_job
(submission_id, user_name, duration='06:00:00', memory=4294.967296, cores=2)¶ Creates a job for asynchroneous processing of a client tool request.
Parameters: submission_id: int
ID of the corresponding submission
user_name: str
name of the submitting user
duration: str, optional
computational time that should be allocated for the job in HH:MM:SS format (default:
"06:00:00"
)memory: int, optional
amount of memory in Megabyte that should be allocated for the job (defaults to
resource.max_cores_per_job
xresource.max_memory_per_core
)cores: int, optional
number of CPU cores that should be allocated for the job (defaults to
resource.max_cores_per_job
)Returns: tmlib.tools.jobs.ToolJob
tool job
-
get_log_output
(submission_id)¶ Gets log output (standard output and error).
Parameters: submission_id: int
ID of the tool job
Submission
Returns: Dict[str, str]
“stdout” and “stderr” for the given job
-
get_payload
(submission_id)¶ Get payload for tool request.
Parameters: submission_id: int
ID of the respective
Submission
Returns: dict
payload
-
store_payload
(payload, submission_id)¶ Persists payload for tool request on disk.
Parameters: payload: dict
tool job description in form of a mapping of key-value pairs
submission_id: int
ID of the corresponding
Submission
Raises: TypeError
when payload is not a mapping
-
tm_tool.py¶
TissueMAPS command line interface for processing client tool requests.
-
experiment_id
¶
ID of the experiment that should be processed
-
-h
,
--help
¶
show this help message and exit
-
--verbosity
,
-v
¶
increase logging verbosity
-
--name
{Clustering,Heatmap,Classification}
,
-n
{Clustering,Heatmap,Classification}
¶ name of the tool
-
--submission_id
<submission_id>
,
-s
<submission_id>
¶ ID of the corresponding submission