tmlib.workflow.imextract package¶

Module contents¶

Workflow step for extraction of pixel data from microscopy image files.

Microscopes usually store pixel data together with related acquisition metadata in vendor-specific formats. Image files may contain more than one pixel plane. Some microscopes even store all planes in a single file. This is not practical and may even become a bottleneck depending on file access patterns and implemented storage backend. These used file formates are also often not understood by standard readers and generally not optimized for scalable storage in a distributed computing environment. In addition, microscopes typically store images uncompressed, while it is often desirable to apply compression to reduce storage requirements. To meet these ends, the imextract step extracts each pixel plane from microscope files and stores them in a consistent way, which facilitate downstream processing.

Note that implementation details of the storage backend may be subject to change and files may not necessarily be accessible via a POSIX compliant file system! Users are therefore advised to use ChannelImageFile.get to retrieve the extraced images.

Warning¶

This step (at least temporarily) approximately doubles storage requirements, because pixel data are extracted in parallel during the run phase, but the microscope image files are only deleted afterwards during the collect phase (and only if the delete argument is set to True).

Submodules¶

tmlib.workflow.imextract.api module¶

class tmlib.workflow.imextract.api.ImageExtractor(experiment_id)¶

Bases: tmlib.workflow.api.WorkflowStepAPI

Class for extraction of pixel arrays (planes) stored in image files using python-bioformats.

Parameters:

experiment_id: int

ID of the processed experiment

collect_job_output(batch)¶

Deletes all instances of MicroscopeImageFile in case delete is set to True.

Parameters:

batch: dict

job description

create_collect_batch(args)¶

Creates a job description for the collect phase.

Parameters:

args: tmlib.workflow.imextract.args.ImextractBatchArguments

step-specific arguments

Returns:

dict

job description

create_run_batches(args)¶

Creates job descriptions for parallel processing.

Parameters:

args: tmlib.workflow.imextract.args.ImextractBatchArguments

step-specific arguments

Returns:

generator

job descriptions

delete_previous_job_output()¶: Deletes all instances of class ChannelImageFile as well as all children for the processed experiment.

has_collect_phase = True¶

run_job(batch, assume_clean_state=False)¶

Extracts individual planes from microscope image files and writes them into HDF5 files.

Parameters:

batch: dict

job description

assume_clean_state: bool, optional

assume that output of previous runs has already been cleaned up

tmlib.workflow.imextract.args module¶

class tmlib.workflow.imextract.args.ImextractBatchArguments(**kwargs)¶

Bases: tmlib.workflow.args.BatchArguments

Parameters:

**kwargs: dict, optional

keyword arguments to overwrite

batch_size¶: int: number of image acquisition sites to process per job

delete¶: bool: delete microscope files after pixel data got extracted (Warning: You won’t be able to rerun jobs afterwards!)

class tmlib.workflow.imextract.args.ImextractSubmissionArguments(**kwargs)¶

Bases: tmlib.workflow.args.SubmissionArguments

Parameters:

**kwargs: dict, optional

keyword arguments to overwrite

tmlib.workflow.imextract.cli module¶

class tmlib.workflow.imextract.cli.Imextract(api_instance, verbosity)¶

Bases: tmlib.workflow.cli.WorkflowStepCLI

Extracts pixel elements from heterogeneous microscopy image file formats based on the configured image metadata and stores them in a standardized file format.

Parameters:

api_instance: tmlib.workflow.metaextract.api.ImageExtractor

instance of API class to which processing is delegated

verbosity: int

logging level

imextract¶

Extracts pixel elements from heterogeneous microscopy image file formats based on the configured image metadata and stores them in a standardized file format.

experiment_id¶: ID of the experiment that should be processed

-h, --help¶: show this help message and exit

--verbosity, -v¶: increase logging verbosity

imextract experiment_id cleanup¶

cleans up the output of a previous submission, i.e. removes files and database entries created by previously submitted jobs

-h, --help¶: show this help message and exit

imextract experiment_id collect¶

collects the output of run jobs, i.e. performs a post-processing operation that either cannot be parallelized or needs to be performed afterwards

-h, --help¶: show this help message and exit

imextract experiment_id info¶

prints the description of a given batch job to the console

-h, --help¶: show this help message and exit

--job, -j¶: ID of the job for which information should be displayed

--phase {collect,run}, -p {collect,run}¶: phase of the workflow step to which the job belongs

imextract experiment_id init¶

creates batches for parallel processing and thereby defines how the computational task should be distrubuted over the cluster (also cleans up the output of previous submissions)

-h, --help¶: show this help message and exit

--batch-size <batch_size>, -b <batch_size>¶: number of image acquisition sites to process per job (default: 100)

--delete¶: delete microscope files after pixel data got extracted (Warning: You won’t be able to rerun jobs afterwards!)

imextract experiment_id log¶

prints the log output of a given batch job to the console

-h, --help¶: show this help message and exit

--job, -j¶: ID of the job for which log output should be shown

--phase {collect,init,run}, -p {collect,init,run}¶: phase of the workflow step to which the job belongs

imextract experiment_id resubmit¶

resubmits previously created jobs for “run” and “collect” phases to the cluster and monitors their status upon processing

-h, --help¶: show this help message and exit

--depth <index>, -d <index>¶: number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>¶: seconds to wait between monitoring iterations (default: 10)

imextract experiment_id run¶

runs an invidiual batch job on the local machine

-h, --help¶: show this help message and exit

--assume-clean-state¶: assume that previous outputs have been cleaned up

--job <job_id>, -j <job_id>¶: ID of the job that should be run

imextract experiment_id submit¶

creates batch jobs for the “run” and “collect” phases, submits them to the cluster and monitors their status upon processing (requires a prior “init”)

-h, --help¶: show this help message and exit

--depth <index>, -d <index>¶: number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>¶: seconds to wait between monitoring iterations (default: 10)

--cores <number>¶: number of cores that should be allocated to each “run” job (may be increased in case memory requirements of a job exceed resources of a single core) (default: 1)

--duration <hh:mm:ss>¶: walltime that should be allocated to a each “run” job in the format “HH:MM:SS” (may need to be adapted depending on the choice of batch size) (default: 06:00:00)

--memory <mb>¶: amount of memory in megabytes that should be allocated to each “run” job (default: 2147)

tmlib.workflow.imextract package¶

Module contents¶

Warning¶

Submodules¶

tmlib.workflow.imextract.api module¶

tmlib.workflow.imextract.args module¶

tmlib.workflow.imextract.cli module¶

imextract¶

imextract experiment_id cleanup¶

imextract experiment_id collect¶

imextract experiment_id info¶

imextract experiment_id init¶

imextract experiment_id log¶

imextract experiment_id resubmit¶

imextract experiment_id run¶

imextract experiment_id submit¶

Table Of Contents

Related Topics