tmlib.workflow.metaconfig package

Module contents

Workflow step for configuration of microscopy image metadata.

Metadata available from the microscope image files is often incomplete, either because the format is not fully supported by the Bio-Formats library or simply because the microscope provides insufficient information in the files. In particular, the relative position of images, required for overview creation, is typically not available from individual image files. The metaconfig step configures metadata extracted from image files in the tmlib.workflow.metaextract step and tries to obtain any missing information from microscope-specific metadata files or user input.

This is achieved via microscope-specific implementations of MetadataReader and MetadataHandler in a separate module of tmlib.workflow.metaconfig. The name given to the module determines the microscope type, e.g. tmlib.workflow.metaconfig.cellvoyager is the module for the cellvoyager microscope type.

To make the micorscope type available for usage add its name (name of the implemented module) to SUPPORTED_MICROSCOPE_TYPES.

tmlib.workflow.metaconfig.get_microscope_type_regex(microscope_type, as_string=False)

Gets regular expression patterns for the identification of microscope image files and microscope metadata files for a given microscope_type.

Parameters:

microscope_type: str

microscope type

as_string: bool, optional

whether regex pattern should be returned as strings (default: False)

Returns:

Tuple[_sre.SRE_Pattern or str]

regex pattern for image and metadata files

tmlib.workflow.metaconfig.import_microscope_type_module(microscope_type)

Imports the module for an implemented microscope_type.

Parameters:

microscope_type: str

microscope type

Returns:

module

loaded module instance

Raises:

ValueError

when microscope_type is not known

tmlib.workflow.metaconfig.metadata_handler_factory(microscope_type)

Gets the microscope_type-specific implementation of tmlib.workflow.metaconfig.base.MetadataHandler.

Parameters:

microscope_type: str

microscope type

Returns:

classobj

Raises:

AttributeError

when the miroscope_type-specific module does not implement a handler class

tmlib.workflow.metaconfig.metadata_reader_factory(microscope_type)

Gets the microscope_type-specific implementation of MetadataReader.

Parameters:

microscope_type: str

microscope type

Returns:

Union[classobj, None]

metadata reader class in case one is implemented

Submodules

tmlib.workflow.metaconfig.api module

class tmlib.workflow.metaconfig.api.MetadataConfigurator(experiment_id)

Bases: tmlib.workflow.api.WorkflowStepAPI

Class for configuration of microscope image metadata.

It provides methods for conversion of metadata extracted from heterogeneous microscope file formats into a TissueMAPS-specific schema. The original metadata has to be available in OMEXML format according to the OME schema.

The class further provides methods to complement the metadata retrieved via Bio-Formats with information available from of additional microscope-specific metadata files and/or user input.

Parameters:

experiment_id: int

ID of the processed experiment

collect_job_output(batch)

Assigns registered image files from different acquisitions to separate cycles. If an acquisition includes multiple time points, a separate cycle is created for each time point. The mapping from acquisitions to cycles is consequently 1 -> n, where n is the number of time points per acquisition (n >= 1).

Whether acquisition time points will be interpreted as actual time points in a time series depends on the value of tm.Experiment.plate_acquisition_mode.

Parameters:

batch: dict

description of the collect job

create_run_batches(args)

Creates job descriptions for parallel processing.

Parameters:

args: tmlib.workflow.metaconfig.args.MetaconfigBatchArguments

step-specific batch arguments

Returns:

generator

job descriptions

delete_previous_job_output()

Deletes all instances of class tm.Cycle, tm.Well, and tm.Channel as well as all children for the processed experiment.

has_collect_phase = True
run_job(batch, assume_clean_state=False)

Configures OMEXML metadata extracted from microscope image files and complements it with metadata retrieved from additional microscope metadata files and/or user input.

The actual processing is delegated to a format-specific implementation of MetadataHandler.

Parameters:

batch: dict

job description

assume_clean_state: bool, optional

assume that output of previous runs has already been cleaned up

tmlib.workflow.metaconfig.args module

class tmlib.workflow.metaconfig.args.MetaconfigBatchArguments(**kwargs)

Bases: tmlib.workflow.args.BatchArguments

Parameters:

**kwargs: dict, optional

keyword arguments to overwrite

mip

bool: perform maximum intensity projection along z axis

n_horizontal

int: number of images along the horizontal axis of the stitched well overview mosaic image

n_vertical

int: number of images along the vertical axis of the stitched well overview mosaic image

regex

str: named regular expression that defines group names for retrieval of metadata from image filenames

stitch_layout

str: layout of the stitched well overview mosaic image, i.e. the order in which images were acquired along the grid

class tmlib.workflow.metaconfig.args.MetaconfigSubmissionArguments(**kwargs)

Bases: tmlib.workflow.args.SubmissionArguments

Parameters:

**kwargs: dict, optional

keyword arguments to overwrite

tmlib.workflow.metaconfig.base module

class tmlib.workflow.metaconfig.base.MetadataFields

Bases: tuple

Create new instance of MetadataFields(s, c, z, t, w)

c

Alias for field number 1

s

Alias for field number 0

t

Alias for field number 3

w

Alias for field number 4

z

Alias for field number 2

class tmlib.workflow.metaconfig.base.MetadataHandler(omexml_images, omexml_metadata=None)

Bases: object

Abstract base class for handling metadata from heterogeneous microscope file formats as provided by the Bio-Formats library.

Metadata has to be provided as OMEXML according to the OME schema.

Attributes

metadata: pandas.DataFrame configured metadata
Parameters:

omexml_images: Dict[str, bioformats.omexml.OMEXML]

name and extracted metadata for each MicroscopeImageFile

omexml_metadata: bioformats.omexml.OMEXML, optional

additional metadata obtained from additional MicroscopeMetadataFile via a microscope type specific implementation of MetdataReader

assign_acquisition_site_indices()

Gives each acquisition site a globally (plate-wide) unique index.

Returns:

pandas.DataFrame

metadata for each 2D Plane element

classmethod check_regular_expression(regex)

Checks whether a named regular expression has all required fields.

Parameters:

regex: str

regular expression

Raises:

tmlib.erros.RegexError

when a provided field is not supported

configure_from_filenames(plate_dimensions, regex)

Configures metadata based on information encoded in image filenames using a regular expression with the followsing fields:

  • w: well
  • t: time point
  • s: acquisition site
  • z: focal plane (z dimension)
  • c: channel
Parameters:

plate_dimensions: Tuple[int]

number of rows and columns in the well plate

regex: str

named regular expression

Returns:

pandas.DataFrame

metadata for each 2D Plane element

Raises:

tmlib.errors.MetadataError

when image files contain more than more plane, since this case wouldn’t allow a 1-to-1 mapping of information from filename to image plane

configure_from_omexml()

Collects image metadata from OMEXML elements extracted form image files and an additional optional OMEXML element provided by a microscope-specific implementation of MetadataReader. All all available metadata gets combined into a table, where each row represents a single 2D pixels plane.

Returns:

pandas.DataFrame

metadata for each 2D Plane element

create_image_file_mappings()

Creates a file map for the extraction of individual planes from the microscopy image files.

Returns:

Dict[int, Dict[str, List[str or int]]]

a mapping of configured images hashable by their index in metadata to planes in the corresponding microscope image files (files key) and their location within the files (series and plane keys)

determine_grid_coordinates_from_layout(stitch_layout, stitch_dimensions)

Determines the coordinates of each image acquisition site within the continuous acquisition grid (slide or well in a plate) based on a provided layout.

Parameters:

stitch_layout: str

layout of the acquisition grid (options: "horizontal", "zigzag_horizontal", "vertical", or "zigzag_vertical")

stitch_dimensions: Tuple[int]

dimensions of the acquisition grid, i.e. number of images along the vertical and horizontal axis of the acquired area

Returns:

pandas.DataFrame

metadata for each 2D Plane element

See also

illuminati.stitch.calc_grid_coordinates_from_layout()

determine_grid_coordinates_from_stage_positions()

Determines the coordinates of each image acquisition site within the continuous acquisition grid (slide or well in a plate) based on the absolute microscope stage positions.

Returns:

pandas.DataFrame

metadata for each 2D Plane element

Raises:

MetadataError

when stage position information is not available from metadata

See also

illuminati.stitch.calc_grid_coordinates_from_positions()

determine_missing_metadata()

Determines if required basic metadata information, such as channel names or time point identifiers, could not yet been configured.

Returns:

Set[str]

names of missing basic metadata attributes

classmethod extract_fields_from_filename(regex, filename, defaults=True)

Extracts fields from image filenames using a regular expression.

Parameters:

regex: str

regular expression

filename: str

name of a microscope image file

defaults: bool, optional

whether default values should be used

Returns:

tmlib.workflow.metaconfig.base.MetadataFields

named tuple with extracted values

group_metadata_per_zstack()

Group all focal planes belonging to one z-stack (i.e. acquired at different z resolutions but at the same microscope stage position, time point and channel) together.

Returns:

pandas.DataFrame

metadata for each 2D Plane element

remove_redundant_columns()

Cleans up metadata, i.e. remove information that is no longer required.

Returns:

pandas.DataFrame

metadata for each 2D Plane element

update_indices()

Creates for each channel, time point and z-plane a zero-based unique identifier number.

Returns:

pandas.DataFrame

metadata for each 2D Plane element

class tmlib.workflow.metaconfig.base.MetadataReader

Bases: object

Abstract base class for reading metadata from additional non-image files, which are either generated by the microscope or provided by users.

The read() method of derived classes must return a single OMEXML object, according to the OME data model, see python-bioformats <http://pythonhosted.org/python-bioformats/#metadata>. The value of the image_count attribute in the OMEXML element provided by the implemented reader must equal the number of image files * the number of Image elements per file * number of Plane elements per Image element. In addition, the number of Plane elements and the values of SizeT, SizeC and SizeZ elements of the Image element must match those in OMEXML elements obtained from the corresponding image files. For example, if an image file contains one series with planes for one time point, one channel and 10 focal planes, then SizeT = 1, SizeC = 1 and SizeZ = 10.

The OME schema doesn’t provide information about wells at the individual Image level: see OME data model. Instead, it provides a Plate element, which contains Well elements. Well elements contain the positional information, such as row and column index of each well within the plate. WellSample elements map to to individual Image elements and hold information about the position of images within the Well. In addition, there is an ImageRef element, which can be used to map the WellSample to its corresponding Image element.

Derived classes should provide information for the SPW Plate element. For consistency, a slide should be represented as a Plate with a single Well element. Custom readers should futher specify the ImageRef element for each WellSample element. The value of ImageRef must be an unsigned integer in the range [0, n], where n is the total number of Image elements in the OMEXML element provided by the class. Individual Image elements may be distributed accross several OMEXML elements (one OMEXML element for each MicroscopeImageFile). The ImageRef value for a particular Image can be calculated as follows: one-based index in the naturally sorted list of image filenames * number of Image elements per image file.

See also

tmlib.workflow.metaconfig.base.MetadataHandler.configure_omexml_from_image_files(), tmlib.workflow.metaconfig.base.MetadataHandler.configure_omexml_from_metadata_files()

read(microscope_metadata_files, microscope_image_files)

Reads metadata from arbitrary files.

Parameters:

microscope_metadata_files: List[str]

absolute path to the microscope metadata files

microscope_image_files: List[str]

absolute path to the microscope image files

Returns:

bioformats.omexml.OMEXML

OMEXML metadata

tmlib.workflow.metaconfig.cellvoyager module

Implementation of classes for reading microscope image and metadata files provided in a format specfic to the Yokogawa CellVoyager 7000 microscope.

class tmlib.workflow.metaconfig.cellvoyager.CellvoyagerMetadataHandler(omexml_images, omexml_metadata=None)

Bases: tmlib.workflow.metaconfig.base.MetadataHandler

Class for handling metadata specific to the Yokogawa Cellvoyager 7000 microscope.

Parameters:

omexml_images: Dict[str, bioformats.omexml.OMEXML]

metadata extracted from microscope image files

omexml_metadata: bioformats.omexml.OMEXML

metadata extracted from microscope metadata files

class tmlib.workflow.metaconfig.cellvoyager.CellvoyagerMetadataReader

Bases: tmlib.workflow.metaconfig.base.MetadataReader

Class for reading metadata from files formats specific to the Yokogawa CellVoyager 7000 microscope.

Yokogawa doesn’t store the position and channel information at the level of individual images, but in additional metadata files with .mlf and .mrf extensions. Unfortunately, these file formats are not supported by Bio-Formats, see CellVoyagerReader. For compatibility with the OME data model, the CellvoyagerMetadataReader reads the XML from “MeasurementDetail.mrf” and “MeasurmentData.mlf” files, extracts the relevant data and stores them in an OMEXML object according to the Bio-Formats convention.

read(microscope_metadata_files, microscope_image_filenames)

Reads metadata from “mlf” and “mrf” metadata files in case they are provided.

Parameters:

microscope_metadata_files: List[str]

absolute path to microscope metadata files

microscope_image_filenames: List[str]

names of the corresponding microscope image files

Returns:

bioformats.omexml.OMEXML

OMEXML image metadata

tmlib.workflow.metaconfig.cellvoyager.IMAGE_FILE_REGEX_PATTERN = '[^_]+_(?P<w>[A-Z]\\d{2})_T(?P<t>\\d+)F(?P<s>\\d+)L\\d+A\\d+Z(?P<z>\\d+)C0(?P<c>\\d)\\.'

Regular expression pattern to identify image files

tmlib.workflow.metaconfig.cellvoyager.METADATA_FILE_REGEX_PATTERN = '.*\\.(mlf|mrf)$'

Supported extensions for metadata files

tmlib.workflow.metaconfig.cli module

class tmlib.workflow.metaconfig.cli.Metaconfig(api_instance, verbosity)

Bases: tmlib.workflow.cli.WorkflowStepCLI

Configuration of extracted OMEXML metadata and integration with additional microscope-specific information about the image acquisition process.

Parameters:

api_instance: tmlib.workflow.metaconfig.api.MetadataConfigurator

instance of API class to which processing is delegated

verbosity: int

logging verbosity

metaconfig

Configuration of extracted OMEXML metadata and integration with additional microscope-specific information about the image acquisition process.


experiment_id

ID of the experiment that should be processed

-h, --help

show this help message and exit

--verbosity, -v

increase logging verbosity

metaconfig experiment_id cleanup

cleans up the output of a previous submission, i.e. removes files and database entries created by previously submitted jobs


-h, --help

show this help message and exit

metaconfig experiment_id collect

collects the output of run jobs, i.e. performs a post-processing operation that either cannot be parallelized or needs to be performed afterwards


-h, --help

show this help message and exit

metaconfig experiment_id info

prints the description of a given batch job to the console


-h, --help

show this help message and exit

--job, -j

ID of the job for which information should be displayed

--phase {collect,run}, -p {collect,run}

phase of the workflow step to which the job belongs

metaconfig experiment_id init

creates batches for parallel processing and thereby defines how the computational task should be distrubuted over the cluster (also cleans up the output of previous submissions)


-h, --help

show this help message and exit

--mip

perform maximum intensity projection along z axis

--n_horizontal <n_horizontal>

number of images along the horizontal axis of the stitched well overview mosaic image

--n_vertical <n_vertical>

number of images along the vertical axis of the stitched well overview mosaic image

--regex <regex>

named regular expression that defines group names for retrieval of metadata from image filenames

--stitch_layout {vertical,horizontal,zigzag_horizontal,zigzag_vertical}

layout of the stitched well overview mosaic image, i.e. the order in which images were acquired along the grid (default: horizontal)

metaconfig experiment_id log

prints the log output of a given batch job to the console


-h, --help

show this help message and exit

--job, -j

ID of the job for which log output should be shown

--phase {collect,init,run}, -p {collect,init,run}

phase of the workflow step to which the job belongs

metaconfig experiment_id resubmit

resubmits previously created jobs for “run” and “collect” phases to the cluster and monitors their status upon processing


-h, --help

show this help message and exit

--depth <index>, -d <index>

number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>

seconds to wait between monitoring iterations (default: 10)

metaconfig experiment_id run

runs an invidiual batch job on the local machine


-h, --help

show this help message and exit

--assume-clean-state

assume that previous outputs have been cleaned up

--job <job_id>, -j <job_id>

ID of the job that should be run

metaconfig experiment_id submit

creates batch jobs for the “run” and “collect” phases, submits them to the cluster and monitors their status upon processing (requires a prior “init”)


-h, --help

show this help message and exit

--depth <index>, -d <index>

number of child tasks that should be monitored (default: 1)

--interval <seconds>, -i <seconds>

seconds to wait between monitoring iterations (default: 10)

--cores <number>

number of cores that should be allocated to each “run” job (may be increased in case memory requirements of a job exceed resources of a single core) (default: 1)

--duration <hh:mm:ss>

walltime that should be allocated to a each “run” job in the format “HH:MM:SS” (may need to be adapted depending on the choice of batch size) (default: 06:00:00)

--memory <mb>

amount of memory in megabytes that should be allocated to each “run” job (default: 2147)

tmlib.workflow.metaconfig.metamorph module

Implementation of classes for reading microscope image and metadata files provided in a format specific to microscopes equipped with Metamorph software.

tmlib.workflow.metaconfig.metamorph.IMAGE_FILE_REGEX_PATTERN = '.+_?(?P<w>[A-Z]\\d{2})_s(?P<s>\\d+)(_w(?P<c>\\d{1}))?[^_thumb]'

Regular expression pattern to identify image files

tmlib.workflow.metaconfig.metamorph.METADATA_FILE_REGEX_PATTERN = '(?!.*)'

Supported extensions for metadata files

class tmlib.workflow.metaconfig.metamorph.MetamorphMetadataHandler(omexml_images, omexml_metadata=None)

Bases: tmlib.workflow.metaconfig.base.MetadataHandler

Class for handling metadata specific to microscopes equipped with Metamorph software.

Parameters:

omexml_images: Dict[str, bioformats.omexml.OMEXML]

metadata extracted from microscope image files

omexml_metadata: bioformats.omexml.OMEXML

metadata extracted from microscope metadata files

tmlib.workflow.metaconfig.omexml module

tmlib.workflow.metaconfig.omexml.get_image_ix(image_id)

Get the index of an image within the OMEXML metadata object given the ID of the image.

Parameters:

image_id: str

image identifier in the format Image:\d+

Returns:

int

zero-based index number

tmlib.workflow.metaconfig.visiview module

Implementation of classes for reading microscope image and metadata files provided in a format specific to microscopes equipped with Visitron VisiView software.

tmlib.workflow.metaconfig.visiview.IMAGE_FILE_REGEX_PATTERN = '.+_?(?P<w>[A-Z]\\d{2})?_w\\d+(?P<c>\\w+)_s(?P<s>\\d+)_?t?(?P<t>\\d+)?\\.'

Regular expression pattern to identify image files

tmlib.workflow.metaconfig.visiview.METADATA_FILE_REGEX_PATTERN = '\\.nd$'

Supported extensions for metadata files

class tmlib.workflow.metaconfig.visiview.VisiviewMetadataHandler(omexml_images, omexml_metadata=None)

Bases: tmlib.workflow.metaconfig.base.MetadataHandler

Class for handling metadata specific to microscopes equipped with Visitron software.

Parameters:

omexml_images: Dict[str, bioformats.omexml.OMEXML]

metadata extracted from microscope image files

omexml_metadata: bioformats.omexml.OMEXML

metadata extracted from microscope metadata files

class tmlib.workflow.metaconfig.visiview.VisiviewMetadataReader

Bases: tmlib.workflow.metaconfig.base.MetadataReader

Class for reading metadata from files formats specific to microscopes equipped with the VisiView software.

read(microscope_metadata_files, microscope_image_filenames)

Read metadata from “nd” metadata file.

Parameters:

microscope_metadata_files: List[str]

absolute path to the microscope metadata files

microscope_image_filenames: List[str]

names of the corresponding microscope image files

Returns:

bioformats.omexml.OMEXML

OMEXML image metadata

Raises:

ValueError

when microscope_metadata_files doesn’t have length one

tmlib.workflow.metaconfig.visiview.read_nd_file(filename)

Read the lines of the .nd file as key-value pairs, and format the values, i.e. translate them into Python syntax.

The formatted content will have the following layout (shown for a small example dataset):

{
    "Description": "File recreated from images.",
    "StartTime1": "20150820 19:43:31.545",
    "DoTimelapse": False,
    "DoStage": True,
    "NStagePositions": 15,
    "Stage1": {'column': 1, 'row': 'A', 'site': 1},
    "Stage2": {'column': 1, 'row': 'A', 'site': 2},
    "Stage3": {'column': 1, 'row': 'A', 'site': 3},
    "Stage4": {'column': 1, 'row': 'A', 'site': 4},
    "Stage5": {'column': 1, 'row': 'A', 'site': 5},
    "Stage6": {'column': 1, 'row': 'A', 'site': 6},
    "Stage7": {'column': 1, 'row': 'A', 'site': 7},
    "Stage8": {'column': 1, 'row': 'A', 'site': 8},
    "Stage9": {'column': 1, 'row': 'A', 'site': 9},
    "Stage10": {'column': 1, 'row': 'A', 'site': 10},
    "Stage11": {'column': 1, 'row': 'A', 'site': 11},
    "Stage12": {'column': 1, 'row': 'A', 'site': 12},
    "Stage13": {'column': 1, 'row': 'A', 'site': 13},
    "Stage14": {'column': 1, 'row': 'A', 'site': 14},
    "Stage15": {'column': 1, 'row': 'A', 'site': 15},
    "DoWave": True,
    "NWavelengths": 2,
    "WaveName1": "sdcGFP",
    "WaveDoZ1": True,
    "WaveName2": "sdcDAPImRFPxm",
    "WaveDoZ2": True,
    "DoZSeries": True,
    "NZSteps": 8,
    "ZStepSize": 1.0,
    "WaveInFileName": True,
}
Parameters:

filename: str

absolute path to the .nd file

Returns:

Dict[str, str or List[str]]

formatted .nd file content