tmlib.workflow.imextract package

Module contents

Workflow step for extraction of pixel data from microscopy image files.

Microscopes usually store pixel data together with related acquisition metadata in vendor-specific formats. Image files may contain more than one pixel plane. Some microscopes even store all planes in a single file. This is not practical and may even become a bottleneck depending on file access patterns and implemented storage backend. These used file formates are also often not understood by standard readers and generally not optimized for scalable storage in a distributed computing environment. In addition, microscopes typically store images uncompressed, while it is often desirable to apply compression to reduce storage requirements. To meet these ends, the imextract step extracts each pixel plane from microscope files and stores them in a consistent way, which facilitate downstream processing.

Note that implementation details of the storage backend may be subject to change and files may not necessarily be accessible via a POSIX compliant file system! Users are therefore advised to use ChannelImageFile.get to retrieve the extraced images.


This step (at least temporarily) approximately doubles storage requirements, because pixel data are extracted in parallel during the run phase, but the microscope image files are only deleted afterwards during the collect phase (and only if the delete argument is set to True).


tmlib.workflow.imextract.api module

class tmlib.workflow.imextract.api.ImageExtractor(experiment_id)

Bases: tmlib.workflow.api.WorkflowStepAPI

Class for extraction of pixel arrays (planes) stored in image files using python-bioformats.


experiment_id: int

ID of the processed experiment


Deletes all instances of MicroscopeImageFile in case delete is set to True.


batch: dict

job description


Creates a job description for the collect phase.


args: tmlib.workflow.imextract.args.ImextractBatchArguments

step-specific arguments



job description


Creates job descriptions for parallel processing.


args: tmlib.workflow.imextract.args.ImextractBatchArguments

step-specific arguments



job descriptions


Deletes all instances of class ChannelImageFile as well as all children for the processed experiment.

has_collect_phase = True
run_job(batch, assume_clean_state=False)

Extracts individual planes from microscope image files and writes them into HDF5 files.


batch: dict

job description

assume_clean_state: bool, optional

assume that output of previous runs has already been cleaned up

tmlib.workflow.imextract.args module

class tmlib.workflow.imextract.args.ImextractBatchArguments(**kwargs)

Bases: tmlib.workflow.args.BatchArguments


**kwargs: dict, optional

keyword arguments to overwrite


int: number of image acquisition sites to process per job


bool: delete microscope files after pixel data got extracted (Warning: You won’t be able to rerun jobs afterwards!)

class tmlib.workflow.imextract.args.ImextractSubmissionArguments(**kwargs)

Bases: tmlib.workflow.args.SubmissionArguments


**kwargs: dict, optional

keyword arguments to overwrite

tmlib.workflow.imextract.cli module

class tmlib.workflow.imextract.cli.Imextract(api_instance, verbosity)

Bases: tmlib.workflow.cli.WorkflowStepCLI

Extracts pixel elements from heterogeneous microscopy image file formats based on the configured image metadata and stores them in a standardized file format.


api_instance: tmlib.workflow.metaextract.api.ImageExtractor

instance of API class to which processing is delegated

verbosity: int

logging level


Extracts pixel elements from heterogeneous microscopy image file formats based on the configured image metadata and stores them in a standardized file format.


ID of the experiment that should be processed

-h, --help

show this help message and exit

--verbosity, -v

increase logging verbosity

imextract experiment_id cleanup

cleans up the output of a previous submission, i.e. removes files and database entries created by previously submitted jobs

-h, --help

show this help message and exit

imextract experiment_id collect

collects the output of run jobs, i.e. performs a post-processing operation that either cannot be parallelized or needs to be performed afterwards

-h, --help

show this help message and exit

imextract experiment_id info

prints the description of a given batch job to the console

-h, --help

show this help message and exit

--job, -j

ID of the job for which information should be displayed

--phase {collect,run}, -p {collect,run}

phase of the workflow step to which the job belongs

imextract experiment_id init

creates batches for parallel processing and thereby defines how the computational task should be distrubuted over the cluster (also cleans up the output of previous submissions)

-h, --help

show this help message and exit

--batch-size <batch_size>, -b <batch_size>

number of image acquisition sites to process per job (default: 100)


delete microscope files after pixel data got extracted (Warning: You won’t be able to rerun jobs afterwards!)

imextract experiment_id log

prints the log output of a given batch job to the console

-h, --help

show this help message and exit

--job, -j

ID of the job for which log output should be shown

--phase {collect,init,run}, -p {collect,init,run}

phase of the workflow step to which the job belongs

imextract experiment_id resubmit

resubmits previously created jobs for “run” and “collect” phases to the cluster and monitors their status upon processing

-h, --help

show this help message and exit

--depth <index>, -d <index>

number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>

seconds to wait between monitoring iterations (default: 10)

imextract experiment_id run

runs an invidiual batch job on the local machine

-h, --help

show this help message and exit


assume that previous outputs have been cleaned up

--job <job_id>, -j <job_id>

ID of the job that should be run

imextract experiment_id submit

creates batch jobs for the “run” and “collect” phases, submits them to the cluster and monitors their status upon processing (requires a prior “init”)

-h, --help

show this help message and exit

--depth <index>, -d <index>

number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>

seconds to wait between monitoring iterations (default: 10)

--cores <number>

number of cores that should be allocated to each “run” job (may be increased in case memory requirements of a job exceed resources of a single core) (default: 1)

--duration <hh:mm:ss>

walltime that should be allocated to a each “run” job in the format “HH:MM:SS” (may need to be adapted depending on the choice of batch size) (default: 06:00:00)

--memory <mb>

amount of memory in megabytes that should be allocated to each “run” job (default: 2147)