tmlib.workflow.metaextract package

Module contents

Workflow step for extraction of microscope image metadata.

Microscopes typically write metadata about the image acquisition process into the header of the generated image files. Unfortunately, almost every vendor uses a custom file format. The metaextract step uses the Bio-Formats library to extract metadata from heterogeneous image file formats in form of OMEXML according to the standardized OME data model.

Submodules

tmlib.workflow.metaextract.api module

class tmlib.workflow.metaextract.api.MetadataExtractor(experiment_id)

Bases: tmlib.workflow.api.WorkflowStepAPI

Class for extraction of metadata from microscopic image files.

Extracted metadata is formatted according to the Open Microscopy Environment (OME) schema.

Parameters:

experiment_id: int

ID of the processed experiment

collect_job_output(obj, *args, **kwargs)

Not implemented.

create_run_batches(args)

Creates job descriptions for parallel computing.

Parameters:

args: tmlib.workflow.metaextract.args.MetaextractBatchArguments

step-specific arguments

Returns:

generator

job descriptions

delete_previous_job_output()

Deletes the output of a previous submission.

has_collect_phase = False
run_job(batch, assume_clean_state=False)

Extracts OMEXML from microscope image or metadata files.

Parameters:

batch: dict

description of the run job

assume_clean_state: bool, optional

assume that output of previous runs has already been cleaned up

Raises:

subprocess.CalledProcessError

when extraction failed

tmlib.workflow.metaextract.args module

class tmlib.workflow.metaextract.args.MetaextractBatchArguments(**kwargs)

Bases: tmlib.workflow.args.BatchArguments

Parameters:

**kwargs: dict, optional

keyword arguments to overwrite

batch_size

int: number of images that should be processed per job

class tmlib.workflow.metaextract.args.MetaextractSubmissionArguments(**kwargs)

Bases: tmlib.workflow.args.SubmissionArguments

Parameters:

**kwargs: dict, optional

keyword arguments to overwrite

tmlib.workflow.metaextract.cli module

class tmlib.workflow.metaextract.cli.Metaextract(api_instance, verbosity)

Bases: tmlib.workflow.cli.WorkflowStepCLI

Extraction of OMEXML metadata from heterogeneous microscopy image file formats.

Parameters:

api_instance: tmlib.workflow.metaextract.api.MetadataExtractor

instance of API class to which processing is delegated

verbosity: int

logging verbosity

collect(obj, *args, **kwargs)

Not implemented.

metaextract

Extraction of OMEXML metadata from heterogeneous microscopy image file formats.


experiment_id

ID of the experiment that should be processed

-h, --help

show this help message and exit

--verbosity, -v

increase logging verbosity

metaextract experiment_id cleanup

cleans up the output of a previous submission, i.e. removes files and database entries created by previously submitted jobs


-h, --help

show this help message and exit

metaextract experiment_id info

prints the description of a given batch job to the console


-h, --help

show this help message and exit

--job, -j

ID of the job for which information should be displayed

--phase {collect,run}, -p {collect,run}

phase of the workflow step to which the job belongs

metaextract experiment_id init

creates batches for parallel processing and thereby defines how the computational task should be distrubuted over the cluster (also cleans up the output of previous submissions)


-h, --help

show this help message and exit

--batch-size <batch_size>, -b <batch_size>

number of images that should be processed per job (default: 100)

metaextract experiment_id log

prints the log output of a given batch job to the console


-h, --help

show this help message and exit

--job, -j

ID of the job for which log output should be shown

--phase {collect,init,run}, -p {collect,init,run}

phase of the workflow step to which the job belongs

metaextract experiment_id resubmit

resubmits previously created jobs for “run” and “collect” phases to the cluster and monitors their status upon processing


-h, --help

show this help message and exit

--depth <index>, -d <index>

number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>

seconds to wait between monitoring iterations (default: 10)

metaextract experiment_id run

runs an invidiual batch job on the local machine


-h, --help

show this help message and exit

--assume-clean-state

assume that previous outputs have been cleaned up

--job <job_id>, -j <job_id>

ID of the job that should be run

metaextract experiment_id submit

creates batch jobs for the “run” and “collect” phases, submits them to the cluster and monitors their status upon processing (requires a prior “init”)


-h, --help

show this help message and exit

--depth <index>, -d <index>

number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>

seconds to wait between monitoring iterations (default: 10)

--cores <number>

number of cores that should be allocated to each “run” job (may be increased in case memory requirements of a job exceed resources of a single core) (default: 1)

--duration <hh:mm:ss>

walltime that should be allocated to a each “run” job in the format “HH:MM:SS” (may need to be adapted depending on the choice of batch size) (default: 06:00:00)

--memory <mb>

amount of memory in megabytes that should be allocated to each “run” job (default: 2147)