tmlib.workflow.imextract package¶
Module contents¶
Workflow step for extraction of pixel data from microscopy image files.
Microscopes usually store pixel data together with related acquisition metadata in vendor-specific formats. Image files may contain more than one pixel plane. Some microscopes even store all planes in a single file. This is not practical and may even become a bottleneck depending on file access patterns and implemented storage backend. These used file formates are also often not understood by standard readers and generally not optimized for scalable storage in a distributed computing environment. In addition, microscopes typically store images uncompressed, while it is often desirable to apply compression to reduce storage requirements. To meet these ends, the imextract step extracts each pixel plane from microscope files and stores them in a consistent way, which facilitate downstream processing.
Note that implementation details of the storage backend may be subject to
change and files may not necessarily be accessible via a POSIX compliant file
system! Users are therefore advised to use
ChannelImageFile.get
to retrieve the extraced images.
Warning¶
This step (at least temporarily) approximately doubles storage requirements,
because pixel data are extracted in parallel during the run phase,
but the microscope image files are only deleted afterwards during the
collect phase (and only if the
delete
argument is set to True
).
Submodules¶
tmlib.workflow.imextract.api module¶
-
class
tmlib.workflow.imextract.api.
ImageExtractor
(experiment_id)¶ Bases:
tmlib.workflow.api.WorkflowStepAPI
Class for extraction of pixel arrays (planes) stored in image files using python-bioformats.
Parameters: experiment_id: int
ID of the processed experiment
-
collect_job_output
(batch)¶ Deletes all instances of
MicroscopeImageFile
in casedelete
is set toTrue
.Parameters: batch: dict
job description
-
create_collect_batch
(args)¶ Creates a job description for the collect phase.
Parameters: args: tmlib.workflow.imextract.args.ImextractBatchArguments
step-specific arguments
Returns: dict
job description
-
create_run_batches
(args)¶ Creates job descriptions for parallel processing.
Parameters: args: tmlib.workflow.imextract.args.ImextractBatchArguments
step-specific arguments
Returns: generator
job descriptions
-
delete_previous_job_output
()¶ Deletes all instances of class
ChannelImageFile
as well as all children for the processed experiment.
-
has_collect_phase
= True¶
-
run_job
(batch, assume_clean_state=False)¶ Extracts individual planes from microscope image files and writes them into HDF5 files.
Parameters: batch: dict
job description
assume_clean_state: bool, optional
assume that output of previous runs has already been cleaned up
-
tmlib.workflow.imextract.args module¶
-
class
tmlib.workflow.imextract.args.
ImextractBatchArguments
(**kwargs)¶ Bases:
tmlib.workflow.args.BatchArguments
Parameters: **kwargs: dict, optional
keyword arguments to overwrite
-
batch_size
¶ int: number of image acquisition sites to process per job
-
delete
¶ bool: delete microscope files after pixel data got extracted (Warning: You won’t be able to rerun jobs afterwards!)
-
-
class
tmlib.workflow.imextract.args.
ImextractSubmissionArguments
(**kwargs)¶ Bases:
tmlib.workflow.args.SubmissionArguments
Parameters: **kwargs: dict, optional
keyword arguments to overwrite
tmlib.workflow.imextract.cli module¶
-
class
tmlib.workflow.imextract.cli.
Imextract
(api_instance, verbosity)¶ Bases:
tmlib.workflow.cli.WorkflowStepCLI
Extracts pixel elements from heterogeneous microscopy image file formats based on the configured image metadata and stores them in a standardized file format.
Parameters: api_instance: tmlib.workflow.metaextract.api.ImageExtractor
instance of API class to which processing is delegated
verbosity: int
logging level
imextract¶
Extracts pixel elements from heterogeneous microscopy image file formats based on the configured image metadata and stores them in a standardized file format.
-
experiment_id
¶
ID of the experiment that should be processed
-
-h
,
--help
¶
show this help message and exit
-
--verbosity
,
-v
¶
increase logging verbosity
imextract experiment_id cleanup¶
cleans up the output of a previous submission, i.e. removes files and database entries created by previously submitted jobs
-
-h
,
--help
¶
show this help message and exit
imextract experiment_id collect¶
collects the output of run jobs, i.e. performs a post-processing operation that either cannot be parallelized or needs to be performed afterwards
-
-h
,
--help
¶
show this help message and exit
imextract experiment_id info¶
prints the description of a given batch job to the console
-
-h
,
--help
¶
show this help message and exit
-
--job
,
-j
¶
ID of the job for which information should be displayed
-
--phase
{collect,run}
,
-p
{collect,run}
¶ phase of the workflow step to which the job belongs
imextract experiment_id init¶
creates batches for parallel processing and thereby defines how the computational task should be distrubuted over the cluster (also cleans up the output of previous submissions)
-
-h
,
--help
¶
show this help message and exit
-
--batch-size
<batch_size>
,
-b
<batch_size>
¶ number of image acquisition sites to process per job (default: 100)
-
--delete
¶
delete microscope files after pixel data got extracted (Warning: You won’t be able to rerun jobs afterwards!)
imextract experiment_id log¶
prints the log output of a given batch job to the console
-
-h
,
--help
¶
show this help message and exit
-
--job
,
-j
¶
ID of the job for which log output should be shown
-
--phase
{collect,init,run}
,
-p
{collect,init,run}
¶ phase of the workflow step to which the job belongs
imextract experiment_id resubmit¶
resubmits previously created jobs for “run” and “collect” phases to the cluster and monitors their status upon processing
-
-h
,
--help
¶
show this help message and exit
-
--depth
<index>
,
-d
<index>
¶ number of child tasks that should be monitored (default: 1)
-
--interval
<seconds>
,
-i
<seconds>
¶ seconds to wait between monitoring iterations (default: 10)
imextract experiment_id run¶
runs an invidiual batch job on the local machine
-
-h
,
--help
¶
show this help message and exit
-
--assume-clean-state
¶
assume that previous outputs have been cleaned up
-
--job
<job_id>
,
-j
<job_id>
¶ ID of the job that should be run
imextract experiment_id submit¶
creates batch jobs for the “run” and “collect” phases, submits them to the cluster and monitors their status upon processing (requires a prior “init”)
-
-h
,
--help
¶
show this help message and exit
-
--depth
<index>
,
-d
<index>
¶ number of child tasks that should be monitored (default: 1)
-
--interval
<seconds>
,
-i
<seconds>
¶ seconds to wait between monitoring iterations (default: 10)
-
--cores
<number>
¶ number of cores that should be allocated to each “run” job (may be increased in case memory requirements of a job exceed resources of a single core) (default: 1)
-
--duration
<hh:mm:ss>
¶ walltime that should be allocated to a each “run” job in the format “HH:MM:SS” (may need to be adapted depending on the choice of batch size) (default: 06:00:00)
-
--memory
<mb>
¶ amount of memory in megabytes that should be allocated to each “run” job (default: 2147)