tmlib package

Module contents

TissueMAPS library.

The package provides active programming and command line interfaces for distributed image analysis.

It has three subpackages that serve different tasks:

  • tmlib.models: object-relational mapper classes for interaction with the database
  • tmlib.workflow: modular routines for distributed image processing
  • tmlib.tools: plugins for interactive machine learning

Subpackages

Submodules

tmlib.config module

class tmlib.config.LibraryConfig

Bases: tmlib.config.TmapsConfig

TissueMAPS configuration specific to the tmlib package.

formats_home

str: absolute path to the root directory of local copy of TissueMAPS/TmFormats repository (default: "~/tmformats")

modules_home

str: absolute path to the directory that contains the jterator module source code files (default: "~/jtlibrary/modules")

resource

gc3libs.utils.Struct: information about the enabled GC3Pie resource

storage_home

str: absolute path to root directory of file system storage

class tmlib.config.TmapsConfig

Bases: object

Abstract base class for TissueMAPS configuration settings.

TissueMAPS code is distributed across mutliple Python packages, but configuration settings are bundeled in one global config file (CONFIG_FILE) with an INI-like file format.

The environment variable TMAPS_CONFIG_FILE can be used to overwrite the default location of the file.

Properties defined on this base class are written into the DEFAULT section of the file. Each package that requires configuration should implement this base class and create a separate package-specific section in the config file for the derived class.

build_db_worker_uri(host, port)

Builds URI for a “worker” database server.

Parameters:

host: str

IP address or domain name of the server

port: int

port to which the server listens

Returns:

str

URI for the “worker” database server

db_host

str: IP address or DNS name of master database (default: "localhost")

db_master_uri

str: URI for the “master” database server

db_nodes

int: number of database worker nodes (default: 2)

db_password

str: database password

db_port

str: port of the master database (default: 5432)

db_user

str: database user (default: "tissuemaps")

items

List[Tuple[str, str or int or bool]]: (name, value) pairs for each configuration parameter in the implemented section

read()

Reads the configuration from file.

See also

tmlib.config.CONFIG_FILE

write()

Writes the configuration to file.

tmlib.errors module

exception tmlib.errors.CliArgError

Bases: exceptions.Exception

Error class that is raised when the value of an command line argument is invalid.

exception tmlib.errors.DataError

Bases: exceptions.Exception

Error class that is raised when data is not available.

exception tmlib.errors.DataIntegrityError

Bases: exceptions.Exception

Error class that is raised when a dataset doesn’t fullfile certain requirements.

exception tmlib.errors.DataModelError

Bases: exceptions.Exception

Error class that is raised when a model class has attributes that are not supported.

exception tmlib.errors.JobDescriptionError

Bases: exceptions.OSError

Error class that is raised when no job descriptor files are found.

exception tmlib.errors.MetadataError

Bases: exceptions.Exception

Error class that is raised when a metadata element cannot be retrieved.

exception tmlib.errors.NotSupportedError

Bases: exceptions.Exception

Error class that is raised when a feature is not supported by the program.

exception tmlib.errors.PipelineDescriptionError

Bases: tmlib.errors.PipelineError

Error class that is raised when information in pipeline description is missing or incorrect.

exception tmlib.errors.PipelineError

Bases: exceptions.Exception

Base class for jterator pipeline errors.

exception tmlib.errors.PipelineOSError

Bases: tmlib.errors.PipelineError

Error class that is raised when pipeline related files do not exist on disk.

exception tmlib.errors.PipelineRunError

Bases: tmlib.errors.PipelineError

Error class that is raised when an error occurs upon running a jterator pipeline.

exception tmlib.errors.PyramidCreationError

Bases: exceptions.Exception

Error class that is raised when an error occurs upon creation of a pyramid image, i.e. stitching of images together to a large overview image.

exception tmlib.errors.RegexError

Bases: exceptions.Exception

Error class that is raised when a regular expression pattern didn’t match.

exception tmlib.errors.RegistryError

Bases: exceptions.Exception

Error class that is raised when a class is not registered.

exception tmlib.errors.StitchError

Bases: exceptions.Exception

Error class that is raised when an error occurs upon stitching of images for the generation of a mosaic.

exception tmlib.errors.SubmissionError

Bases: exceptions.Exception

Error class that is raised when submitted jobs failed.

exception tmlib.errors.WorkflowDescriptionError

Bases: tmlib.errors.WorkflowError

Error class that is raised when the workflow is not correctly described.

exception tmlib.errors.WorkflowError

Bases: exceptions.Exception

Base class for workflow errors.

exception tmlib.errors.WorkflowTransitionError

Bases: tmlib.errors.WorkflowError

Error class that is raised when requirements for transition to the next workflow stage or step are not fulfilled.

tmlib.formats module

class tmlib.formats.Formats

Bases: object

Class for providing information on supported file formats.

TissueMAPS supports most file formats supported by Bio-Formats.

SUPPORT_FOR_ADDITIONAL_FILES = set(['cellvoyager', 'visiview'])

Some file formats require additional metadata files, which are not directly supported by Bio-Formats. For more information, please refer to tmlib.metaconfig.default.configure_ome_metadata_from_additional_files()

extract_supported_formats(input_filename, support_level=0)

Extract names and extensions of supported formats from XML or HTML file and save them as key-value pairs in a JSON file.

The XML file can be generated via the Bio-Formats command line tool formatlist:

formatlist -xml > supported-formats.xml

The HTML file can be downloaded from the Bio-Formats website, which lists supported formats together with the level of support for each format:

wget http://www.openmicroscopy.org/site/support/bio-formats5.1/supported-formats.html
Parameters:

input_filename: str

absolute path to the XML or HTML file, that specifies the supported formats

support_level: uint, optional

minimum level of support for reading pixel and metadata, where 0 is no support, 1 is “poor” and 5 is “outstanding” support (Note: support information is only available for the HTML file)

Raises:

OSError

when filename does not exist

supported_extensions
Returns:

Set[str]

file extensions of supported formats

supported_formats
Returns:

Dict[str, List[str]]

names and file extensions of supported formats as key-value pairs

tmlib.image module

class tmlib.image.ChannelImage(array, metadata=None)

Bases: tmlib.image.Image

Class for a channel image: a grayscale image.

Parameters:

array: numpy.ndarray[uint16]

2D pixels array

metadata: tmlib.metadata.ChannelImageMetadata, optional

image metadata (note that some methods need to access metadata)

array

numpy.ndarray[numpy.uint16]: 2D pixels array

clip(lower, upper, inplace=True)

Clips intensity values below lower and above upper, i.e. set all pixel values below lower to lower and all above upper to upper.

Parameters:

lower: int

value below which pixel values should be clippe

upper: int

value above which pixel values should be clipped

inplace: bool, optional

whether values should be clipped in place rather than creating a new image object (default: True)

Returns:

tmlib.image.ChannelImage

image with clipped pixels

correct(stats, inplace=True)

Corrects the image for illumination artifacts.

Parameters:

stats: tmlib.image.IllumstatsContainer

mean and standard deviation statistics at each pixel position calculated over all images of the same channel

inplace: bool, optional

whether values should be corrected in place rather than creating a new image object (default: True)

Returns:

tmlib.image.ChannelImage

image with pixels corrected for illumination

Raises:

ValueError

when channel doesn’t match between illumination statistics and image

png_encode()

Encodes pixels of the image in PNG format.

Returns:

numpy.ndarray[numpy.uint8]

encoded pixels array

scale(lower, upper, inplace=True)

Scales values to 8-bit such that the range [lower, upper] will be mapped to the range [0, 255].

Parameters:

lower: int

value below which pixel values will be set to 0

upper: int

value above which pixel values will be set to 255

inplace: bool, optional

whether values should be rescaled in place rather than creating a new image object (default: True)

Returns:

tmlib.image.Image

image with rescaled pixels

tiff_encode()

Encodes pixels of the image in TIFF format.

Returns:

numpy.ndarray[numpy.uint8]

encoded pixels array

class tmlib.image.IllumstatsContainer(mean, std, percentiles)

Bases: object

Container for illumination statistics images.

Provides the mean and standard deviation matrices for a given channel. The statistics are calculated at each pixel position over all sites acquired in the same channel [R6].

References

[R6](1, 2) Stoeger T, Battich N, Herrmann MD, Yakimovich Y, Pelkmans L. 2015. Computer vision for image-based transcriptomics. Methods.
Parameters:

mean: tmlib.image.IllumstatsImage

mean values at each pixel coordinate calculated over all sites

std: tmlib.image.IllumstatsImage

standard deviation values at each pixel coordinate calculated over all sites

percentiles: Dict[float, int]

intensity percentiles calculated over all sites

get_closest_percentile(value)

Obtains the value for the percentile closest to a given value.

Parameters:

value: int or float

approximate percentile value

Returns:

int

smooth(sigma=5)

Smoothes mean and standard deviation statistic images with a Gaussian filter. This is useful to prevent the introduction of artifacts upon coorection due to individual outliers pixels with extreme values.

Parameters:

sigma: int, optional

size of the standard deviation of the Gaussian kernel (default: 5)

class tmlib.image.IllumstatsImage(array, metadata=None)

Bases: tmlib.image.Image

Class for a statistics image: a 2D greyscale image with a single band and data type float.

Parameters:

array: numpy.ndarray[numpy.float]

2D pixels array

metadata: tmlib.metadata.IllumstatsImageMetadata

metadata (default: None)

array

numpy.ndarray[numpy.float]: 2D pixels array

class tmlib.image.Image(array, metadata=None)

Bases: object

Base class for an image that holds a 2D pixels array.

Parameters:

array: numpy.ndarray

2D pixels array

metadata: tmlib.metadata.ImageMetadata, optional

image metadata (default: None)

align(crop=True, inplace=True)

Aligns, i.e. shifts and optionally crops, an image based on pre-calculated shift and residue values.

Parameters:

crop: bool, optional

whether image should be cropped or rather padded with zero values (default: True)

inplace: bool, optional

whether the array of the existing image should be replaced instead of creating a copy (default: True)

Returns:

tmlib.image.Image

aligned image

array

numpy.ndarray: 2D pixels array

dimensions

Tuple[int]: y, x, z dimensions of the pixels array

dtype

str: data type of pixels array elements

extract(y_offset, height, x_offset, width)

Extracts a continuous, rectangular plane of pixels from the image.

Parameters:

y_offset: int

index of the top, left point of the hyperrectangle on the y axis

height: int

height of the hyperrectangle, i.e. length of the hyperrectangle along the y axis

x_offset: int

index of the top, left point of the hyperrectangle on the x axis

width: int

width of the hyperrectangle, i.e. length of the hyperrectangle along the x axis

Returns:

tmlib.image.Image

extracted image with dimensions height x width

insert(image, y_offset, x_offset, inplace=True)

Inserts a continuous, hyperrectangular volume of pixels into an image.

Parameters:

image: tmlib.image.Image

image whose pixels should be inserted

y_offset: int

index of the top, left point of the hyperrectangle on the y axis

x_offset: int

index of the top, left point of the hyperrectangle on the x axis

inplace: bool, optional

insert pixels into the existing image rather than into a copy (default: True)

Returns:

tmlib.image.Image

modified image

is_binary

bool: whether pixels array has boolean data type

is_float

bool: whether pixels array has float data type

is_int

bool: whether pixels array has integer data type

is_int32

bool: whether pixels array has 32-bit integer data type

is_uint

bool: whether pixels array has unsigned integer data type

is_uint16

bool: whether pixels array has 16-bit unsigned integer data type

is_uint8

bool: whether pixels array has 8-bit unsigned integer data type

join(image, axis)

Joins two pixels arrays.

Parameters:

image: tmlib.image.Image

image object whose values should be joined

axis: str

axis along which the two images should be merged (options: {"x", "y"})

Returns:

tmlib.image.Image

joined image

merge(image, axis, offset, inplace=True)

Merges pixels arrays of two images into one.

Parameters:

tmlib.image.Image

rescaled image

metadata

tmlib.metadata.ImageMetadata: image metadata

pad_with_background(n, side)

Pads one side of the pixels array with zero values.

Parameters:

n: int

number of pixels that should be added along the given axis

side: str

side of the array that should be padded relative to the y, x axis of an individual plane (options: {"top", "bottom", "left", "right"})

Returns:

tmlib.image.Image

padded image

shrink(factor, inplace=True)

Shrinks the first two dimensions of the pixels array by factor. pixels values of the aggregated array are the mean of the neighbouring pixels, where the neighbourhood is defined by factor.

Parameters:

factor: int

factor by which the size of the image should be reduced along the y and x axis

inplace: bool, optional

shrink the array inplace instead of returning a copy (default: True)

Returns:

tmlib.image.Image

shrunken image

smooth(sigma, inplace=True)

Applies a Gaussian smoothing filter to the pixels array.

Parameters:

sigma: int

size of the standard deviation of the Gaussian kernel

inplace: bool, optional

smooth the array inplace instead of returning a copy (default: True)

Returns:

tmlib.image.Image

smoothed image

class tmlib.image.PyramidTile(array, metadata=None)

Bases: tmlib.image.Image

Class for a pyramid tile: an image with a single z-level and y, x dimensions of 256 x 256 pixels.

Parameters:

array: numpy.ndarray[uint8]

pixels array

metadata: tmlib.metadata.PyramidTileMetadata, optional

image metadata (default: None)

TILE_SIZE = 256
array

numpy.ndarray[numpy.uint8]: 2D pixels array

classmethod create_as_background(add_noise=False, mu=None, sigma=None, metadata=None)

Creates an image with background pixels. By default background will be zero values. Optionally, Gaussian noise can be added to simulate camera background.

Parameters:

add_noise: bool, optional

add Gaussian noise (default: False)

mu: int, optional

mean of background noise (default: None)

sigma: int, optional

variance of background noise (default: None)

metadata: tmlib.metadata.ImageMetadata, optional

image metadata (default: None)

Returns:

tmlib.image.PyramidTile

image with background pixel values

classmethod create_from_binary(string, metadata=None)

Creates an image from a JPEG encoded binary string.

Parameters:

string: str

binary string

metadata: tmlib.metadata.ImageMetadata, optional

image metadata (default: None)

Returns:

tmlib.image.PyramidTile

classmethod create_from_buffer(buf, metadata=None)

Creates an image from a JPEG encoded buffer object.

Parameters:

buf:

buffer

metadata: tmlib.metadata.ImageMetadata, optional

image metadata (default: None)

Returns:

tmlib.image.PyramidTile

jpeg_encode(quality=95)

Encodes the image as a JPEG buffer object.

Parameters:

quality: int, optional

JPEG quality from 0 to 100 (default: 95)

Returns:

numpy.ndarray

Examples

>>> img = PyramidTile.create_as_background()
>>> buf = img.jpeg_encode()
>>> with open('myfile.jpeg', 'w') as f:
>>>     f.write(buf)
metadata

tmlib.metadata.ImageMetadata: image metadata

class tmlib.image.SegmentationImage(array, metadata=None)

Bases: tmlib.image.Image

Class for a segmentation image: a labeled image where each segmented object is encoded by a unique one-based identifier value.

Parameters:

array: numpy.ndarray[numpy.int32]

pixels array

metadata: tmlib.metadata.SegmentationImageMetadata, optional

image metadata (default: None)

array

numpy.ndarray[numpy.int32]: 2D pixels array

classmethod create_from_polygons(polygons, y_offset, x_offset, dimensions, metadata=None)

Creates an object of class tmlib.image.SegmentationImage based on coordinates of object contours.

Parameters:

polygons: Tuple[Union[int, geoalchemy2.elements.WKBElement]]

label and geometry for each segmented object

y_offset: int

global vertical offset that needs to be subtracted from y-coordinates

x_offset: int

global horizontal offset that needs to be subtracted from x-coordinates

dimensions: Tuple[int]

x, y dimensions of image z-planes that should be created

metadata: tmlib.metadata.SegmentationImageMetadata, optional

image metadata (default: None)

Returns:

tmlib.image.SegmentationImage

created image

extract_polygons(y_offset, x_offset)

Creates a polygon representation for each segmented object. The coordinates of the polygon contours are relative to the global map, i.e. an offset is added to the Site.

Parameters:

y_offset: int

global vertical offset that needs to be subtracted from y-coordinates (y-axis is inverted)

x_offset: int

global horizontal offset that needs to be added to x-coordinates

Returns:

Generator[Tuple[Union[int, shapely.geometry.polygon.Polygon]]]

label and geometry for each segmented object

tmlib.import_utils module

Utility functions for dynamic import.

tmlib.import_utils.load_method_args(method_name)

Load general arguments that can be parsed to a method of an implemented subclass of a tmlib.cli.WorkflowStepCLI base class

Parameters:

method_name: str

name of the method

Returns:

tmlib.args.Args

argument container

Raises:

AttributeError

when the “args” module doesn’t contain a method-specific implementation of the Args base class

tmlib.import_utils.load_var_method_args(step_name, method_name)

Load variable step-specific arguments that can be parsed to a method of an implemented subclass of a tmlib.cli.WorkflowStepCLI base class.

Parameters:

step_name: str

name of the program

method_name: str

name of the method

Returns:

tmlib.args.Args

argument container

Raises:

ImportError

when subpackage with name step_name doesn’t have a module named “args”

tmlib.jobs module

class tmlib.jobs.Job(arguments, output_dir, submission_id, user_name, parent_id=None)

Bases: gc3libs.Application

Abstract base class for a job, which can be submitted for processing on different cluster backends.

Parameters:

arguments: List[str]

command line arguments

output_dir: str

absolute path to the output directory, where log reports will be stored

submission_id: int

ID of the corresponding submission

user_name: str

name of the submitting user

parent_id: int, optional

ID of the parent job collection

is_new

bool: whether the job is in state NEW

is_running

bool: whether the job is in state RUNNING

is_stopped

bool: whether the job is in state STOPPED

is_submitted

bool: whether the job is in state SUBMITTED

is_terminated

bool: whether the job is in state TERMINATED

name

str: name of the job

retry()

Decides whether the job should be retried.

Returns:

bool

whether job should be resubmitted

sbatch(resource, **kwargs)

Overwrites the original sbatch method to enable fair-share scheduling on SLURM backends.

See also

gc3libs.Application.sbatch()

tmlib.log module

class tmlib.log.InfoFilter(name='')

Bases: logging.Filter

Initialize a filter.

Initialize with the name of the logger which, together with its children, will have its events allowed through the filter. If no name is specified, allow every event.

filter(rec)
tmlib.log.LEVELS_TO_VERBOSITY = {0: 0, 40: 1, 10: 3, 50: 1, 20: 2, 30: 1}

dict[int, int]: Mapping of logging level to logging verbosity

tmlib.log.VERBOSITY_TO_LEVELS = {0: 0, 1: 30, 2: 20, 3: 10}

dict[int, int]: Mapping of logging verbosity to logging level

class tmlib.log.Whitelist(*whitelist)

Bases: logging.Filter

filter(record)
tmlib.log.configure_logging()

Configures the root logger for command line applications.

Two stream handlers will be added to the logger:
  • “out” that will direct INFO & DEBUG messages to the standard output

stream * “err” that will direct WARN, WARNING, ERROR, & CRITICAL messages to the standard error stream

tmlib.log.map_logging_verbosity(verbosity)

Maps logging verbosity to a level as expected by the logging module.

Parameters:

verbosity: int

logging verbosity

Returns:

int

logging level

Raises:

TypeError

when verbosity doesn’t have type int

ValueError

when verbosity is negative

tmlib.metadata module

class tmlib.metadata.ChannelImageMetadata(channel_id, site_id, cycle_id, tpoint, zplane)

Bases: tmlib.metadata.SiteImageMetadata

Metadata for ChannelImage.

Parameters:

channel_id: int

ID of the parent Channel

site_id: int

ID of the parent Site

cycle_id: int

ID of the parent Cycle

tpoint: int

zero-based time point index

zplane: int

zero-based z-level index

bottom_residue

int: excess pixels at the bottom

channel_id

int: ID of the corresponding Channel

cycle_id

int: ID of the corresponding Cycle

is_clipped

bool: whether the image is clipped

is_corrected

bool: whether the image is corrected for illumination artifacts

is_rescaled

bool: whether the image is rescaled

left_residue

int: excess pixels at the left side

right_residue

int: excess pixels at the right side

top_residue

int: excess pixels at the top

x_shift

int: shift of the image in pixels in x direction relative to the site in the reference cycle

y_shift

int: shift of the image in pixels in y direction relative to the site in the reference cycle

class tmlib.metadata.IllumstatsImageMetadata(channel_id)

Bases: tmlib.metadata.ImageMetadata

Metadata for an IllumstatsImage.

Parameters:

channel_id: int

ID of the parent Channel

channel_id

int: ID of the corresponding Channel

is_smoothed

bool: whether the illumination statistics image has been smoothed

class tmlib.metadata.ImageFileMapping(**kwargs)

Bases: object

Mapping of 2D pixel planes to original microscope image file(s) and the location of individual planes within these files.

Parameters:

kwargs: dict, optional

file mapping as key-value pairs

files

List[str]: absolute path to the microscope image files

planes

List[int]: zero-based position index of the required planes in the source file

ref_index

int: index of the image in the OMEXML Series

series

List[int]: zero-based position index of the required series in the source file

to_dict()

Attributes of the class as key-value pairs.

Returns:dict

Examples

>>> ifm = ImageFileMapping()
>>> ifm.series = [0, 0]
>>> ifm.planes = [0, 1]
>>> ifm.files = ["a", "b"]
>>> ifm.to_dict()
{'series': [0, 0], 'planes': [0, 1], 'files': ['a', 'b']}
>>> ifm = ImageFileMapping(series=[0, 0], planes=[0, 1], files=["a", "b"])
>>> ifm.to_dict()
{'series': [0, 0], 'planes': [0, 1], 'files': ['a', 'b']}
class tmlib.metadata.ImageMetadata

Bases: object

Base class for image metadata.

is_aligned

bool: whether the image has been aligned between cycles

is_omitted

bool: whether the image should be omitted from further analysis

class tmlib.metadata.PyramidTileMetadata(z, y, x, channel_layer_id)

Bases: object

Metadata for a PyramidTile.

Parameters:

z: int

zero-based zoom level index

y: int

zero-based row index

x: int

zero-based column index

channel_layer_id: int

ID of the parent ChannelLayer

class tmlib.metadata.SegmentationImageMetadata(mapobject_type_id, site_id, tpoint, zplane)

Bases: tmlib.metadata.SiteImageMetadata

Metadata for SegmentationImage.

Parameters:

mapobject_type_id: int

ID of the parent MapobjectType

site_id: int

ID of the parent Site

tpoint: int

zero-based time point index

zplane: int

zero-based z-level index

mapobject_type_id

int: ID of the corresponding MapobjectType

class tmlib.metadata.SiteImageMetadata(site_id, tpoint, zplane)

Bases: tmlib.metadata.ImageMetadata

Base class for metadata of images that map to an individual Site.

Parameters:

site_id: int

ID of the parent Site

tpoint: int

zero-based time point index

zplane: int

zero-based z-level index

site_id

int: ID of the corresponding Site

tpoint

int: zero-based time point index

zplane

int: zero-based time point index

tmlib.readers module

class tmlib.readers.BFImageReader(filename)

Bases: object

Class for reading data from vendor-specific image file formats as numpy.ndarray objects using the Bio-Formats library.

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

read()

Reads an image from a file.

For details on reading images via Bio-Format from Python, see load_image().

Returns:

numpy.ndarray

pixel array

Raises:

OSError

when filename does not exist

NotSupportedError

when the file format is not supported by the reader

read_subset(series=None, plane=None)

Reads a subset of images from a file.

Parameters:

series: int, optional

zero-based series index (only relevant if the file contains more than one Image elements)

plane: int, optional

zero-based plane index within a series (only relevant if Image elements within the file contain more than one Plane element)

Returns:

numpy.ndarray

2D pixel array

Raises:

OSError

when filename does not exist

NotSupportedError

when the file format is not supported by the reader

class tmlib.readers.BFOmeXmlReader(filename)

Bases: object

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

read()
class tmlib.readers.DatasetReader(filename)

Bases: tmlib.readers.Reader

Class for reading data from a HDF5 file using the h5py library.

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

exists(path)

Checks whether path exists within the file.

Parameters:

path: str

absolute path to a group or dataset in the file

Returns:

bool

True if path exists and False otherwise

get_attribute(path, name)

Get an attribute attached to a dataset.

Parameters:

path: str

absolute path to the dataset within the file

name: str

name of the attribute

Returns:

???

Raises:

KeyError

when path does not exist

AttributeError

when dataset does not have an attribute called name

get_dims(path)

Get the dimensions of a dataset.

Parameters:

path: str

absolute path to the dataset within the file

Returns:

Tuple[int]

number of rows and columns of the dataset

Raises:

KeyError

when path does not exist

get_type(path)

‘Get the data type of a dataset.

Parameters:

path: str

absolute path to the dataset within the file

Returns:

type

data type of the dataset

Raises:

KeyError

when path does not exist

list_datasets(path='/', pattern='.*')

Lists datasets within a given group.

Parameters:

path: str, optional

absolute path to a group in the file (default: "/")

pattern: str, optional

regular expression pattern to filter datasets (default: ".*")

Returns:

List[str]

names of the datasets in path

Raises:

KeyError

when path does not exist

list_groups(path, pattern='.*')

Lists groups within a given group.

Parameters:

path: str

absolute path to a group in the file

pattern: str, optional

regular expression pattern to filter groups (default: ".*")

Returns:

List[str]

names of the groups in path

Raises:

KeyError

when path does not exist

read(path)

Reads a dataset.

Parameters:

path: str

absolute path to the dataset within the file

Returns:

numpy.ndarray

dataset

Raises:

KeyError

when path does not exist

read_subset(path, index=None, row_index=None, column_index=None)

Reads a subset of a dataset. For fancy-indexing see h5py docs.

Parameters:

path: str

absolute path to the dataset within the file

index: int or List[int], optional

zero-based index

row_index: int or List[int], optional

zero-based row index

column_index: int or List[int], optional

zero-based column index

Returns:

numpy.ndarray

dataset

Raises:

KeyError

when path does not exist

class tmlib.readers.DatatableReader(filename)

Bases: tmlib.readers.Reader

Class for reading data from a HDF5 file using the pytables library.

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

exists(path)

Check whether a path exists within the file.

Parameters:

path: str

absolute path to a group or dataset in the file

Returns:

bool

True if path exists and False otherwise

read(path)

Reads a dataset.

Parameters:

path: str

absolute path to the dataset within the file

Returns:

pandas.DataFrame

dataset

Raises:

IOError

when path already exists

class tmlib.readers.ImageReader(filename)

Bases: tmlib.readers.Reader

Class for reading pixel data from standard image file formats as numpy.ndarray objects.

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

read(dtype=<type 'numpy.uint16'>)

Reads pixels data from image file.

Parameters:

dtype: type, optional

numpy data type (default: numpy.uint16)

Returns:

numpy.ndarray

pixels data

class tmlib.readers.JavaBridge(active=True)

Bases: object

Class for using a Java Virtual Machine for javabridge.

class tmlib.readers.JsonReader(filename)

Bases: tmlib.readers.Reader

Class for reading data from files in JSON format.

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

read()

Reads data from JSON file.

Returns:

dict or list

file content

class tmlib.readers.Reader(filename)

Bases: object

Abstract base class for reading data from files.

Readers make use of the with statement context manager. and thus follow a similar syntax:

with Reader('/path/to/file') as f:
    data = f.read()
Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

read()
class tmlib.readers.TextReader(filename)

Bases: tmlib.readers.Reader

Class for reading data from text files.

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

read()

Reads data from text file.

Returns:

lxml.etree._Element

xml

class tmlib.readers.XmlReader(filename)

Bases: tmlib.readers.Reader

Class for reading data from files in XML format.

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

read()

Reads data from XML file.

Returns:

lxml.etree._Element

xml

class tmlib.readers.YamlReader(filename)

Bases: tmlib.readers.Reader

Class for reading data from files in YAML 1.2 format.

Parameters:

filename: str

absolute path to a file

Raises:

OSError

when filename does not exist

read()

Reads YAML file.

Returns:

dict or list

file content

tmlib.readers.load_json(string)

Convert JSON string to Python object.

Parameters:

string: str

JSON string

Returns:

dict or list

tmlib.readers.load_yaml(string)

Converts YAML string to Python object.

Parameters:

string: str

YAML string

Returns:

dict or list

tmlib.submission module

class tmlib.submission.SubmissionManager(experiment_id, program_name)

Bases: object

Mixin class for submission and monitoring of computational tasks.

Parameters:

experiment_id: int

ID of the processed experiment

program_name: str

name of the submitting program

register_submission(user_id=None)

Creates a database entry in the “submissions” table.

Parameters:

user_id: int, optional

ID of submitting user (if not the user who owns the experiment)

Returns:

Tuple[int, str]

ID of the submission and the name of the submitting user

tmlib.utils module

Decorators and other utility functions.

tmlib.utils.assert_path_exists(*expected)

Decorator function that asserts that a path to a file or directory on disk specified by a function argument exists.

Parameters:

expected: List[str], optional

names of arguments that should be checked

Raises:

ValueError

when a name is provided that is not an argument of the function

OSError

when the path specified by the function argument doesn’t exists on disk

tmlib.utils.assert_type(**expected)

Decorator function that asserts that the type of function arguments.

Parameters:

expected: Dict[str, str or List[str]], optional

key-value pairs of names and expected types of each argument that should be checked

Raises:

ValueError

when a name is provided that is not an argument of the function

TypeError

when type of the function argument doesn’t match the expected type or when expected is not provided in the correct type

Examples

from tmlib.utils import assert_type

class TypeCheckExample(object):

    @assert_type(value1='str', value2=['int', 'float', 'types.NoneType'])
    def test(self, value1, value2=None):
        print 'value1: "%s"' % value1
        if value2:
            print 'value2: %d' % value2

example = TypeCheckExample()
example.test('blabla', 2)
example.test('blabla', 2.0)
example.test('blabla', None)
example.test('blabla')
example.test('blabla', '2')  # raises TypeError
class tmlib.utils.autocreate_directory_property(func)

Bases: object

Decorator class that acts like a property. The value represents a path to a directory on disk. The directory is automatically created when it doesn’t exist. Once created, the value is cached, so that there is no reattempt to create the directory.

Raises:

TypeError

when the value of the property doesn’t have type basestring

ValueError

when the value of the property is empty

OSError

when the parent directory does not exist

Examples

from tmlib.utils import autocreate_directory_property

class Foo(object):

    @autocreate_directory_property
    def my_new_directory(self):
        return '/tmp/blabla'

foo = Foo()
foo.my_new_directory
tmlib.utils.common_substring(data)

Finds longest common substring across a collection of strings.

Parameters:data: List[str]
Returns:str
tmlib.utils.create_datetimestamp()

Creates a datetimestamp in the form “year-month-day_hour-minute-second”.

Returns:

str

datetimestamp

tmlib.utils.create_directory(location)

Creates a directory on disk in a safe way.

Parameters:

location: str

absolute path to the directory that should be created

tmlib.utils.create_partitions(li, n)

Creates a list of sublists from a list, where each sublist has length n.

Parameters:

li: list

list that should be partitioned

n: int

number of items per sublist

Returns:

List[list]

tmlib.utils.create_timestamp()

Creates a timestamp in the form “hour-minute-second”.

Returns:

str

timestamp

tmlib.utils.flatten(data)

Transforms a list of lists into a flat list.

Parameters:data: List[list]
Returns:list
tmlib.utils.indices(data, item)

Determines all indices of an item in a list.

Parameters:

data: list

item:

the element whose index positions should be determined

Returns:

List[int]

all indices of item in data

tmlib.utils.is_number(s)

Checks whether a string can be represented by a number.

Parameters:s: str
Returns:bool

Examples

>>> is_number('blabla')
False
>>> is_number('007')
True
tmlib.utils.list_directory_tree(start_dir)

Captures the whole directory tree downstream of start_dir.

Parameters:

start_dir: str

absolute path to the directory whose content should be listed

tmlib.utils.map_letter_to_number(letter)

Maps capital letter to number.

Parameters:

letter: str

capital letter

Returns:

int

one-based index number

Examples

>>> map_letter_to_number("A")
1
tmlib.utils.map_number_to_letter(number)

Maps number to capital letter.

Parameters:

number: int

one-based index number

Returns:

str

capital letter

Examples

>>> map_number_to_letter(1)
"A"
tmlib.utils.missing_elements(data, start=None, end=None)

Determines missing elements in a sequence of integers.

Parameters:

data: List[int]

sequence with potentially missing elements

start: int, optional

lower limit of the range (defaults to 0)

end: int, optional

upper limit of the range (defaults to len(data)-1)

Examples

>>> data = [10, 12, 13, 15, 16, 19, 20]
>>> list(missing_elements(data))
[11, 14, 17, 18]
tmlib.utils.notimplemented(func)

Decorator function for abstract methods that are not implemented in the derived class.

Raises:

NotImplementedError

when decorated function (method) is called

tmlib.utils.regex_from_format_string(format_string)

Converts a format string with keywords into a named regular expression.

Parameters:

format_string: str

Python format string

Returns:

_sre.SRE_Pattern

compiled named regular expression pattern

Examples

>>> r = regex_from_format_string("{directory}/{filename}")
>>> r.search("foo/bar.txt").groupdict()
{'directory': 'foo', 'filename': 'bar.txt'}
tmlib.utils.same_docstring_as(ref_func)

Decorator function that sets the docstring of the decorate function to the one of ref_func. This is helpful for methods of derived classes that should “inherit” the docstring of the abstract method in the base class.

Parameters:

ref_func: function

reference function from which the docstring should be copied

tmlib.version module

tmlib.writers module

class tmlib.writers.DataTableWriter(filename, truncate=False)

Bases: tmlib.writers.Writer

Class for writing data to a HDF5 file using the pytables library.

Parameters:

filename: str

absolute path to a file

truncate: bool, optional

truncate the file if it already exists (default: False)

append(path, data)

Append an existing data table.

Parameters:

path: str

absolute path to the dataset within the file

data: pandas.DataFrame

data table

exists(path)

Check whether a path exists within the file.

Parameters:

path: str

absolute path to a group or dataset in the file

Returns:

bool

True if path exists and False otherwise

write(path, data)

Write a data table.

Parameters:

path: str

absolute path to the dataset within the file

data: pandas.DataFrame

data table

class tmlib.writers.DatasetWriter(filename, truncate=False)

Bases: tmlib.writers.Writer

Class for writing data to a HDF5 file using the h5py library.

Parameters:

filename: str

absolute path to the HDF5 file

truncate: bool, optional

truncate the file if it already exists (default: False)

append(path, data)

Appends data to an existing one-dimensional dataset. The dataset needs to be created first using the tmlib.writers.DatasetWriter.create() method and the max_dims entry for the vertical dimension needs to be set to None.

Parameters:

path: str

absolute path to the dataset within the file

data:

dataset; will be put through numpy.array(data)

Raises:

ValueError

when the dataset is one-dimensional or when vertical dimensions of data and the dataset don’t match

TypeError

when data types of data and the dataset don’t match

create(path, dims, dtype, max_dims=None)

Creates a dataset with a given size and data type without actually writing data to it.

Parameters:

path: str

absolute path to the dataset within the file

dims: Tuple[int]

dimensions of the dataset (number of rows and columns)

dtype: type

datatype the dataset

max_dims: Tuple[int]

maximal dimensions of the dataset, useful if the dataset should be extendable along one or more dimensions (defaults to dims); (None, None) would mean extendable infinitely along both dimensions

Returns:

h5py._hl.dataset.Dataset

Raises:

IOError

when path already exists

create_group(path)

Creates a group.

Parameters:

path: str

absolute path to the group within the file

exists(path)

Checks whether path exists within the file.

Parameters:

path: str

absolute path to a group or dataset in the file

Returns:

bool

True if path exists and False otherwise

hstack(path, data)

Horizontally appends data to an existing multi-dimensional dataset. The dataset needs to be created first using the tmlib.writers.DatasetWriter.create() method and the max_dims entry for the horizontal dimension needs to be set to None.

Parameters:

path: str

absolute path to the dataset within the file

data:

dataset; will be put through numpy.array(data)

Raises:

IOError

when path doesn’t exist

ValueError

when the dataset is one-dimensional or when horizontal dimensions of data and the dataset don’t match

TypeError

when data types of data and the dataset don’t match

list_datasets(path='/', pattern='.*')

Lists datasets within a given group.

Parameters:

path: str, optional

absolute path to a group in the file (default: "/")

pattern: str, optional

regular expression pattern to filter datasets (default: ".*")

Returns:

List[str]

names of the datasets in path

Raises:

KeyError

when path does not exist

list_groups(path, pattern='.*')

Lists groups within a given group.

Parameters:

path: str

absolute path to a group in the file

pattern: str, optional

regular expression pattern to filter groups (default: ".*")

Returns:

List[str]

names of the groups in path

Raises:

KeyError

when path does not exist

set_attribute(path, name, data)

Attachs an attribute to a dataset.

Parameters:

path: str

absolute path to the dataset within the file

name: str

name of the attribute

data:

value of the attribute; will be put through numpy.array(data)

vstack(path, data)

Vertically appends data to an existing multi-dimensional dataset. The dataset needs to be created first using the tmlib.writers.DatasetWriter.create() method and the max_dims entry for the vertical dimension needs to be set to None.

Parameters:

path: str

absolute path to the dataset within the file

data:

dataset; will be put through numpy.array(data)

Raises:

ValueError

when the dataset is one-dimensional or when vertical dimensions of data and the dataset don’t match

TypeError

when data types of data and the dataset don’t match

write(path, data, compression=False)

Creates a dataset and writes data to it.

Parameters:

path: str

absolute path to the dataset within the file

data:

dataset; will be put through numpy.array(data)

compression: bool, optional

whether zip compression filter should be applied (default: False)

Raises:

IOError

when path already exists

write_subset(path, data, index=None, row_index=None, column_index=None)

Writes data to a subset of an existing dataset.

Parameters:

path: str

absolute path to the dataset within the file

data:

dataset; will be put through numpy.array(data)

index: int or List[int], optional

zero-based index

row_index: int or List[int], optional

zero-based row index

column_index: int or List[int], optional

zero-based column index

Raises:

TypeError

when data has a different data type than an existing dataset

IndexError

when a provided index exceeds dimensions of an existing dataset

KeyError

when a subset of the dataset should be written, i.e. an index is provided, but the dataset does not yet exist

class tmlib.writers.ImageWriter(filename)

Bases: tmlib.writers.Writer

Class for writing numpy.ndarray objects to image files using the OpenCV library.

Parameters:

filename: str

absolute path to a file

write(data)

Writes pixels array data to image file.

The format depends on the file extension:
  • *.png for PNG (8-bit and 16-bit)
  • *.tiff or *.tif for TIFF (8-bit and 16-bit)
  • *.jpeg or *.jpg for JPEG (only supports 8-bit)
Parameters:

data: numpy.ndarray

2D pixels plane that should be saved

Raises:

TypeError

when data is not of type numpy.ndarray

ValueError

when data has more than 2 dimensions

class tmlib.writers.JsonWriter(filename)

Bases: tmlib.writers.Writer

Class for writing data to file on disk in JSON format.

Parameters:

filename: str

absolute path to a file

write(data)

Writes data to JSON file.

Parameters:

data: list or dict

the JSON string that should be written to the file

class tmlib.writers.TextWriter(filename)

Bases: tmlib.writers.Writer

Class for writing text data to a file.

Parameters:

filename: str

absolute path to a file

write(data)

Writes data to file.

Parameters:

data: str

text that should be written to the file

class tmlib.writers.Writer(filename)

Bases: object

Abstract base class for writing data to files.

Writers make use of the with statement context manager. and follow a similar syntax:

with Writer('/path/to/file') as f:
    f.write()
Parameters:

filename: str

absolute path to a file

write(data)
class tmlib.writers.XmlWriter(filename)

Bases: tmlib.writers.Writer

Class for writing data to a file in XML format.

Parameters:

filename: str

absolute path to a file

write(data)

Writes data to XML file.

Parameters:

data: lxml.etree._Element

xml element that should be written to the file

class tmlib.writers.YamlWriter(filename)

Bases: tmlib.writers.Writer

Class for writing data to file on disk in YAML 1.2 format

Parameters:

filename: str

absolute path to a file

write(data)

Writes data to YAML file.

Parameters:

data: list or dict

the YAML string that should be written to the file