tmlib.workflow.metaextract package¶

Module contents¶

Workflow step for extraction of microscope image metadata.

Microscopes typically write metadata about the image acquisition process into the header of the generated image files. Unfortunately, almost every vendor uses a custom file format. The metaextract step uses the Bio-Formats library to extract metadata from heterogeneous image file formats in form of OMEXML according to the standardized OME data model.

Submodules¶

tmlib.workflow.metaextract.api module¶

class tmlib.workflow.metaextract.api.MetadataExtractor(experiment_id)¶

Bases: tmlib.workflow.api.WorkflowStepAPI

Class for extraction of metadata from microscopic image files.

Extracted metadata is formatted according to the Open Microscopy Environment (OME) schema.

Parameters:

experiment_id: int

ID of the processed experiment

collect_job_output(obj, *args, **kwargs)¶: Not implemented.

create_run_batches(args)¶

Creates job descriptions for parallel computing.

Parameters:

args: tmlib.workflow.metaextract.args.MetaextractBatchArguments

step-specific arguments

Returns:

generator

job descriptions

delete_previous_job_output()¶: Deletes the output of a previous submission.

has_collect_phase = False¶

run_job(batch, assume_clean_state=False)¶

Extracts OMEXML from microscope image or metadata files.

Parameters:

batch: dict

description of the run job

assume_clean_state: bool, optional

assume that output of previous runs has already been cleaned up

Raises:

subprocess.CalledProcessError

when extraction failed

tmlib.workflow.metaextract.args module¶

class tmlib.workflow.metaextract.args.MetaextractBatchArguments(**kwargs)¶

Bases: tmlib.workflow.args.BatchArguments

Parameters:

**kwargs: dict, optional

keyword arguments to overwrite

batch_size¶: int: number of images that should be processed per job

class tmlib.workflow.metaextract.args.MetaextractSubmissionArguments(**kwargs)¶

Bases: tmlib.workflow.args.SubmissionArguments

Parameters:

**kwargs: dict, optional

keyword arguments to overwrite

tmlib.workflow.metaextract.cli module¶

class tmlib.workflow.metaextract.cli.Metaextract(api_instance, verbosity)¶

Bases: tmlib.workflow.cli.WorkflowStepCLI

Extraction of OMEXML metadata from heterogeneous microscopy image file formats.

Parameters:

api_instance: tmlib.workflow.metaextract.api.MetadataExtractor

instance of API class to which processing is delegated

verbosity: int

logging verbosity

collect(obj, *args, **kwargs)¶: Not implemented.

metaextract¶

Extraction of OMEXML metadata from heterogeneous microscopy image file formats.

experiment_id¶: ID of the experiment that should be processed

-h, --help¶: show this help message and exit

--verbosity, -v¶: increase logging verbosity

metaextract experiment_id cleanup¶

cleans up the output of a previous submission, i.e. removes files and database entries created by previously submitted jobs

-h, --help¶: show this help message and exit

metaextract experiment_id info¶

prints the description of a given batch job to the console

-h, --help¶: show this help message and exit

--job, -j¶: ID of the job for which information should be displayed

--phase {collect,run}, -p {collect,run}¶: phase of the workflow step to which the job belongs

metaextract experiment_id init¶

creates batches for parallel processing and thereby defines how the computational task should be distrubuted over the cluster (also cleans up the output of previous submissions)

-h, --help¶: show this help message and exit

--batch-size <batch_size>, -b <batch_size>¶: number of images that should be processed per job (default: 100)

metaextract experiment_id log¶

prints the log output of a given batch job to the console

-h, --help¶: show this help message and exit

--job, -j¶: ID of the job for which log output should be shown

--phase {collect,init,run}, -p {collect,init,run}¶: phase of the workflow step to which the job belongs

metaextract experiment_id resubmit¶

resubmits previously created jobs for “run” and “collect” phases to the cluster and monitors their status upon processing

-h, --help¶: show this help message and exit

--depth <index>, -d <index>¶: number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>¶: seconds to wait between monitoring iterations (default: 10)

metaextract experiment_id run¶

runs an invidiual batch job on the local machine

-h, --help¶: show this help message and exit

--assume-clean-state¶: assume that previous outputs have been cleaned up

--job <job_id>, -j <job_id>¶: ID of the job that should be run

metaextract experiment_id submit¶

creates batch jobs for the “run” and “collect” phases, submits them to the cluster and monitors their status upon processing (requires a prior “init”)

-h, --help¶: show this help message and exit

--depth <index>, -d <index>¶: number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>¶: seconds to wait between monitoring iterations (default: 10)

--cores <number>¶: number of cores that should be allocated to each “run” job (may be increased in case memory requirements of a job exceed resources of a single core) (default: 1)

--duration <hh:mm:ss>¶: walltime that should be allocated to a each “run” job in the format “HH:MM:SS” (may need to be adapted depending on the choice of batch size) (default: 06:00:00)

--memory <mb>¶: amount of memory in megabytes that should be allocated to each “run” job (default: 2147)

tmlib.workflow.metaextract package¶

Module contents¶

Submodules¶

tmlib.workflow.metaextract.api module¶

tmlib.workflow.metaextract.args module¶

tmlib.workflow.metaextract.cli module¶

metaextract¶

metaextract experiment_id cleanup¶

metaextract experiment_id info¶

metaextract experiment_id init¶

metaextract experiment_id log¶

metaextract experiment_id resubmit¶

metaextract experiment_id run¶

metaextract experiment_id submit¶

Table Of Contents

Related Topics