tmlib.workflow.metaextract package¶
Module contents¶
Workflow step for extraction of microscope image metadata.
Microscopes typically write metadata about the image acquisition process into the header of the generated image files. Unfortunately, almost every vendor uses a custom file format. The metaextract step uses the Bio-Formats library to extract metadata from heterogeneous image file formats in form of OMEXML according to the standardized OME data model.
Submodules¶
tmlib.workflow.metaextract.api module¶
-
class
tmlib.workflow.metaextract.api.
MetadataExtractor
(experiment_id)¶ Bases:
tmlib.workflow.api.WorkflowStepAPI
Class for extraction of metadata from microscopic image files.
Extracted metadata is formatted according to the Open Microscopy Environment (OME) schema.
Parameters: experiment_id: int
ID of the processed experiment
-
collect_job_output
(obj, *args, **kwargs)¶ Not implemented.
-
create_run_batches
(args)¶ Creates job descriptions for parallel computing.
Parameters: args: tmlib.workflow.metaextract.args.MetaextractBatchArguments
step-specific arguments
Returns: generator
job descriptions
-
delete_previous_job_output
()¶ Deletes the output of a previous submission.
-
has_collect_phase
= False¶
-
run_job
(batch, assume_clean_state=False)¶ Extracts OMEXML from microscope image or metadata files.
Parameters: batch: dict
description of the run job
- assume_clean_state: bool, optional
assume that output of previous runs has already been cleaned up
Raises: subprocess.CalledProcessError
when extraction failed
-
tmlib.workflow.metaextract.args module¶
-
class
tmlib.workflow.metaextract.args.
MetaextractBatchArguments
(**kwargs)¶ Bases:
tmlib.workflow.args.BatchArguments
Parameters: **kwargs: dict, optional
keyword arguments to overwrite
-
batch_size
¶ int: number of images that should be processed per job
-
-
class
tmlib.workflow.metaextract.args.
MetaextractSubmissionArguments
(**kwargs)¶ Bases:
tmlib.workflow.args.SubmissionArguments
Parameters: **kwargs: dict, optional
keyword arguments to overwrite
tmlib.workflow.metaextract.cli module¶
-
class
tmlib.workflow.metaextract.cli.
Metaextract
(api_instance, verbosity)¶ Bases:
tmlib.workflow.cli.WorkflowStepCLI
Extraction of OMEXML metadata from heterogeneous microscopy image file formats.
Parameters: api_instance: tmlib.workflow.metaextract.api.MetadataExtractor
instance of API class to which processing is delegated
verbosity: int
logging verbosity
-
collect
(obj, *args, **kwargs)¶ Not implemented.
-
metaextract¶
Extraction of OMEXML metadata from heterogeneous microscopy image file formats.
-
experiment_id
¶
ID of the experiment that should be processed
-
-h
,
--help
¶
show this help message and exit
-
--verbosity
,
-v
¶
increase logging verbosity
metaextract experiment_id cleanup¶
cleans up the output of a previous submission, i.e. removes files and database entries created by previously submitted jobs
-
-h
,
--help
¶
show this help message and exit
metaextract experiment_id info¶
prints the description of a given batch job to the console
-
-h
,
--help
¶
show this help message and exit
-
--job
,
-j
¶
ID of the job for which information should be displayed
-
--phase
{collect,run}
,
-p
{collect,run}
¶ phase of the workflow step to which the job belongs
metaextract experiment_id init¶
creates batches for parallel processing and thereby defines how the computational task should be distrubuted over the cluster (also cleans up the output of previous submissions)
-
-h
,
--help
¶
show this help message and exit
-
--batch-size
<batch_size>
,
-b
<batch_size>
¶ number of images that should be processed per job (default: 100)
metaextract experiment_id log¶
prints the log output of a given batch job to the console
-
-h
,
--help
¶
show this help message and exit
-
--job
,
-j
¶
ID of the job for which log output should be shown
-
--phase
{collect,init,run}
,
-p
{collect,init,run}
¶ phase of the workflow step to which the job belongs
metaextract experiment_id resubmit¶
resubmits previously created jobs for “run” and “collect” phases to the cluster and monitors their status upon processing
-
-h
,
--help
¶
show this help message and exit
-
--depth
<index>
,
-d
<index>
¶ number of child tasks that should be monitored (default: 1)
-
--interval
<seconds>
,
-i
<seconds>
¶ seconds to wait between monitoring iterations (default: 10)
metaextract experiment_id run¶
runs an invidiual batch job on the local machine
-
-h
,
--help
¶
show this help message and exit
-
--assume-clean-state
¶
assume that previous outputs have been cleaned up
-
--job
<job_id>
,
-j
<job_id>
¶ ID of the job that should be run
metaextract experiment_id submit¶
creates batch jobs for the “run” and “collect” phases, submits them to the cluster and monitors their status upon processing (requires a prior “init”)
-
-h
,
--help
¶
show this help message and exit
-
--depth
<index>
,
-d
<index>
¶ number of child tasks that should be monitored (default: 1)
-
--interval
<seconds>
,
-i
<seconds>
¶ seconds to wait between monitoring iterations (default: 10)
-
--cores
<number>
¶ number of cores that should be allocated to each “run” job (may be increased in case memory requirements of a job exceed resources of a single core) (default: 1)
-
--duration
<hh:mm:ss>
¶ walltime that should be allocated to a each “run” job in the format “HH:MM:SS” (may need to be adapted depending on the choice of batch size) (default: 06:00:00)
-
--memory
<mb>
¶ amount of memory in megabytes that should be allocated to each “run” job (default: 2147)