tmlib.workflow.metaconfig package¶
Module contents¶
Workflow step for configuration of microscopy image metadata.
Metadata available from the microscope image files is often incomplete, either
because the format is not fully supported by the Bio-Formats library or
simply because the microscope provides insufficient information in the files.
In particular, the relative position of images, required for overview creation,
is typically not available from individual image files. The metaconfig step
configures metadata extracted from image files in the
tmlib.workflow.metaextract
step and tries to obtain any missing
information from microscope-specific metadata files or user input.
This is achieved via microscope-specific implementations of
MetadataReader
and
MetadataHandler
in
a separate module of tmlib.workflow.metaconfig
.
The name given to the module determines the microscope type, e.g.
tmlib.workflow.metaconfig.cellvoyager
is the module for the
cellvoyager
microscope type.
To make the micorscope type available for usage add its name (name of the
implemented module) to
SUPPORTED_MICROSCOPE_TYPES
.
-
tmlib.workflow.metaconfig.
get_microscope_type_regex
(microscope_type, as_string=False)¶ Gets regular expression patterns for the identification of microscope image files and microscope metadata files for a given microscope_type.
Parameters: microscope_type: str
microscope type
as_string: bool, optional
whether regex pattern should be returned as strings (default:
False
)Returns: Tuple[_sre.SRE_Pattern or str]
regex pattern for image and metadata files
-
tmlib.workflow.metaconfig.
import_microscope_type_module
(microscope_type)¶ Imports the module for an implemented microscope_type.
Parameters: microscope_type: str
microscope type
Returns: module
loaded module instance
Raises: ValueError
when microscope_type is not known
-
tmlib.workflow.metaconfig.
metadata_handler_factory
(microscope_type)¶ Gets the microscope_type-specific implementation of
tmlib.workflow.metaconfig.base.MetadataHandler
.Parameters: microscope_type: str
microscope type
Returns: classobj
Raises: AttributeError
when the miroscope_type-specific module does not implement a handler class
-
tmlib.workflow.metaconfig.
metadata_reader_factory
(microscope_type)¶ Gets the microscope_type-specific implementation of
MetadataReader
.Parameters: microscope_type: str
microscope type
Returns: Union[classobj, None]
metadata reader class in case one is implemented
Submodules¶
tmlib.workflow.metaconfig.api module¶
-
class
tmlib.workflow.metaconfig.api.
MetadataConfigurator
(experiment_id)¶ Bases:
tmlib.workflow.api.WorkflowStepAPI
Class for configuration of microscope image metadata.
It provides methods for conversion of metadata extracted from heterogeneous microscope file formats into a TissueMAPS-specific schema. The original metadata has to be available in OMEXML format according to the OME schema.
The class further provides methods to complement the metadata retrieved via Bio-Formats with information available from of additional microscope-specific metadata files and/or user input.
Parameters: experiment_id: int
ID of the processed experiment
-
collect_job_output
(batch)¶ Assigns registered image files from different acquisitions to separate cycles. If an acquisition includes multiple time points, a separate cycle is created for each time point. The mapping from acquisitions to cycles is consequently 1 -> n, where n is the number of time points per acquisition (n >= 1).
Whether acquisition time points will be interpreted as actual time points in a time series depends on the value of
tm.Experiment.plate_acquisition_mode
.Parameters: batch: dict
description of the collect job
-
create_run_batches
(args)¶ Creates job descriptions for parallel processing.
Parameters: args: tmlib.workflow.metaconfig.args.MetaconfigBatchArguments
step-specific batch arguments
Returns: generator
job descriptions
-
delete_previous_job_output
()¶ Deletes all instances of class
tm.Cycle
,tm.Well
, andtm.Channel
as well as all children for the processed experiment.
-
has_collect_phase
= True¶
-
run_job
(batch, assume_clean_state=False)¶ Configures OMEXML metadata extracted from microscope image files and complements it with metadata retrieved from additional microscope metadata files and/or user input.
The actual processing is delegated to a format-specific implementation of
MetadataHandler
.Parameters: batch: dict
job description
assume_clean_state: bool, optional
assume that output of previous runs has already been cleaned up
-
tmlib.workflow.metaconfig.args module¶
-
class
tmlib.workflow.metaconfig.args.
MetaconfigBatchArguments
(**kwargs)¶ Bases:
tmlib.workflow.args.BatchArguments
Parameters: **kwargs: dict, optional
keyword arguments to overwrite
-
mip
¶ bool: perform maximum intensity projection along z axis
-
n_horizontal
¶ int: number of images along the horizontal axis of the stitched well overview mosaic image
-
n_vertical
¶ int: number of images along the vertical axis of the stitched well overview mosaic image
-
regex
¶ str: named regular expression that defines group names for retrieval of metadata from image filenames
-
stitch_layout
¶ str: layout of the stitched well overview mosaic image, i.e. the order in which images were acquired along the grid
-
-
class
tmlib.workflow.metaconfig.args.
MetaconfigSubmissionArguments
(**kwargs)¶ Bases:
tmlib.workflow.args.SubmissionArguments
Parameters: **kwargs: dict, optional
keyword arguments to overwrite
tmlib.workflow.metaconfig.base module¶
-
class
tmlib.workflow.metaconfig.base.
MetadataFields
¶ Bases:
tuple
Create new instance of MetadataFields(s, c, z, t, w)
-
c
¶ Alias for field number 1
-
s
¶ Alias for field number 0
-
t
¶ Alias for field number 3
-
w
¶ Alias for field number 4
-
z
¶ Alias for field number 2
-
-
class
tmlib.workflow.metaconfig.base.
MetadataHandler
(omexml_images, omexml_metadata=None)¶ Bases:
object
Abstract base class for handling metadata from heterogeneous microscope file formats as provided by the Bio-Formats library.
Metadata has to be provided as OMEXML according to the OME schema.
Attributes
metadata: pandas.DataFrame configured metadata Parameters: omexml_images: Dict[str, bioformats.omexml.OMEXML]
name and extracted metadata for each
MicroscopeImageFile
omexml_metadata: bioformats.omexml.OMEXML, optional
additional metadata obtained from additional
MicroscopeMetadataFile
via a microscope type specific implementation ofMetdataReader
-
assign_acquisition_site_indices
()¶ Gives each acquisition site a globally (plate-wide) unique index.
Returns: pandas.DataFrame
metadata for each 2D Plane element
-
classmethod
check_regular_expression
(regex)¶ Checks whether a named regular expression has all required fields.
Parameters: regex: str
regular expression
Raises: tmlib.erros.RegexError
when a provided field is not supported
-
configure_from_filenames
(plate_dimensions, regex)¶ Configures metadata based on information encoded in image filenames using a regular expression with the followsing fields:
- w: well
- t: time point
- s: acquisition site
- z: focal plane (z dimension)
- c: channel
Parameters: plate_dimensions: Tuple[int]
number of rows and columns in the well plate
regex: str
named regular expression
Returns: pandas.DataFrame
metadata for each 2D Plane element
Raises: tmlib.errors.MetadataError
when image files contain more than more plane, since this case wouldn’t allow a 1-to-1 mapping of information from filename to image plane
-
configure_from_omexml
()¶ Collects image metadata from OMEXML elements extracted form image files and an additional optional OMEXML element provided by a microscope-specific implementation of
MetadataReader
. All all available metadata gets combined into a table, where each row represents a single 2D pixels plane.Returns: pandas.DataFrame
metadata for each 2D Plane element
-
create_image_file_mappings
()¶ Creates a file map for the extraction of individual planes from the microscopy image files.
Returns: Dict[int, Dict[str, List[str or int]]]
a mapping of configured images hashable by their index in metadata to planes in the corresponding microscope image files (files key) and their location within the files (series and plane keys)
-
determine_grid_coordinates_from_layout
(stitch_layout, stitch_dimensions)¶ Determines the coordinates of each image acquisition site within the continuous acquisition grid (slide or well in a plate) based on a provided layout.
Parameters: stitch_layout: str
layout of the acquisition grid (options:
"horizontal"
,"zigzag_horizontal"
,"vertical"
, or"zigzag_vertical"
)stitch_dimensions: Tuple[int]
dimensions of the acquisition grid, i.e. number of images along the vertical and horizontal axis of the acquired area
Returns: pandas.DataFrame
metadata for each 2D Plane element
See also
illuminati.stitch.calc_grid_coordinates_from_layout()
-
determine_grid_coordinates_from_stage_positions
()¶ Determines the coordinates of each image acquisition site within the continuous acquisition grid (slide or well in a plate) based on the absolute microscope stage positions.
Returns: pandas.DataFrame
metadata for each 2D Plane element
Raises: MetadataError
when stage position information is not available from metadata
See also
illuminati.stitch.calc_grid_coordinates_from_positions()
-
determine_missing_metadata
()¶ Determines if required basic metadata information, such as channel names or time point identifiers, could not yet been configured.
Returns: Set[str]
names of missing basic metadata attributes
-
classmethod
extract_fields_from_filename
(regex, filename, defaults=True)¶ Extracts fields from image filenames using a regular expression.
Parameters: regex: str
regular expression
filename: str
name of a microscope image file
defaults: bool, optional
whether default values should be used
Returns: tmlib.workflow.metaconfig.base.MetadataFields
named tuple with extracted values
-
group_metadata_per_zstack
()¶ Group all focal planes belonging to one z-stack (i.e. acquired at different z resolutions but at the same microscope stage position, time point and channel) together.
Returns: pandas.DataFrame
metadata for each 2D Plane element
-
remove_redundant_columns
()¶ Cleans up metadata, i.e. remove information that is no longer required.
Returns: pandas.DataFrame
metadata for each 2D Plane element
-
update_indices
()¶ Creates for each channel, time point and z-plane a zero-based unique identifier number.
Returns: pandas.DataFrame
metadata for each 2D Plane element
-
-
class
tmlib.workflow.metaconfig.base.
MetadataReader
¶ Bases:
object
Abstract base class for reading metadata from additional non-image files, which are either generated by the microscope or provided by users.
The
read()
method of derived classes must return a single OMEXML object, according to the OME data model, see python-bioformats <http://pythonhosted.org/python-bioformats/#metadata>. The value of the image_count attribute in the OMEXML element provided by the implemented reader must equal the number of image files * the number of Image elements per file * number of Plane elements per Image element. In addition, the number of Plane elements and the values of SizeT, SizeC and SizeZ elements of the Image element must match those in OMEXML elements obtained from the corresponding image files. For example, if an image file contains one series with planes for one time point, one channel and 10 focal planes, then SizeT = 1, SizeC = 1 and SizeZ = 10.The OME schema doesn’t provide information about wells at the individual Image level: see OME data model. Instead, it provides a Plate element, which contains Well elements. Well elements contain the positional information, such as row and column index of each well within the plate. WellSample elements map to to individual Image elements and hold information about the position of images within the Well. In addition, there is an ImageRef element, which can be used to map the WellSample to its corresponding Image element.
Derived classes should provide information for the SPW Plate element. For consistency, a slide should be represented as a Plate with a single Well element. Custom readers should futher specify the ImageRef element for each WellSample element. The value of ImageRef must be an unsigned integer in the range [0, n], where n is the total number of Image elements in the OMEXML element provided by the class. Individual Image elements may be distributed accross several OMEXML elements (one OMEXML element for each
MicroscopeImageFile
). The ImageRef value for a particular Image can be calculated as follows: one-based index in the naturally sorted list of image filenames * number of Image elements per image file.See also
tmlib.workflow.metaconfig.base.MetadataHandler.configure_omexml_from_image_files()
,tmlib.workflow.metaconfig.base.MetadataHandler.configure_omexml_from_metadata_files()
-
read
(microscope_metadata_files, microscope_image_files)¶ Reads metadata from arbitrary files.
Parameters: microscope_metadata_files: List[str]
absolute path to the microscope metadata files
microscope_image_files: List[str]
absolute path to the microscope image files
Returns: bioformats.omexml.OMEXML
OMEXML metadata
-
tmlib.workflow.metaconfig.cellvoyager module¶
Implementation of classes for reading microscope image and metadata files provided in a format specfic to the Yokogawa CellVoyager 7000 microscope.
-
class
tmlib.workflow.metaconfig.cellvoyager.
CellvoyagerMetadataHandler
(omexml_images, omexml_metadata=None)¶ Bases:
tmlib.workflow.metaconfig.base.MetadataHandler
Class for handling metadata specific to the Yokogawa Cellvoyager 7000 microscope.
Parameters: omexml_images: Dict[str, bioformats.omexml.OMEXML]
metadata extracted from microscope image files
omexml_metadata: bioformats.omexml.OMEXML
metadata extracted from microscope metadata files
-
class
tmlib.workflow.metaconfig.cellvoyager.
CellvoyagerMetadataReader
¶ Bases:
tmlib.workflow.metaconfig.base.MetadataReader
Class for reading metadata from files formats specific to the Yokogawa CellVoyager 7000 microscope.
Yokogawa doesn’t store the position and channel information at the level of individual images, but in additional metadata files with .mlf and .mrf extensions. Unfortunately, these file formats are not supported by Bio-Formats, see CellVoyagerReader. For compatibility with the OME data model, the CellvoyagerMetadataReader reads the XML from “MeasurementDetail.mrf” and “MeasurmentData.mlf” files, extracts the relevant data and stores them in an OMEXML object according to the Bio-Formats convention.
-
read
(microscope_metadata_files, microscope_image_filenames)¶ Reads metadata from “mlf” and “mrf” metadata files in case they are provided.
Parameters: microscope_metadata_files: List[str]
absolute path to microscope metadata files
microscope_image_filenames: List[str]
names of the corresponding microscope image files
Returns: bioformats.omexml.OMEXML
OMEXML image metadata
-
-
tmlib.workflow.metaconfig.cellvoyager.
IMAGE_FILE_REGEX_PATTERN
= '[^_]+_(?P<w>[A-Z]\\d{2})_T(?P<t>\\d+)F(?P<s>\\d+)L\\d+A\\d+Z(?P<z>\\d+)C0(?P<c>\\d)\\.'¶ Regular expression pattern to identify image files
-
tmlib.workflow.metaconfig.cellvoyager.
METADATA_FILE_REGEX_PATTERN
= '.*\\.(mlf|mrf)$'¶ Supported extensions for metadata files
tmlib.workflow.metaconfig.cli module¶
-
class
tmlib.workflow.metaconfig.cli.
Metaconfig
(api_instance, verbosity)¶ Bases:
tmlib.workflow.cli.WorkflowStepCLI
Configuration of extracted OMEXML metadata and integration with additional microscope-specific information about the image acquisition process.
Parameters: api_instance: tmlib.workflow.metaconfig.api.MetadataConfigurator
instance of API class to which processing is delegated
verbosity: int
logging verbosity
metaconfig¶
Configuration of extracted OMEXML metadata and integration with additional microscope-specific information about the image acquisition process.
-
experiment_id
¶
ID of the experiment that should be processed
-
-h
,
--help
¶
show this help message and exit
-
--verbosity
,
-v
¶
increase logging verbosity
metaconfig experiment_id cleanup¶
cleans up the output of a previous submission, i.e. removes files and database entries created by previously submitted jobs
-
-h
,
--help
¶
show this help message and exit
metaconfig experiment_id collect¶
collects the output of run jobs, i.e. performs a post-processing operation that either cannot be parallelized or needs to be performed afterwards
-
-h
,
--help
¶
show this help message and exit
metaconfig experiment_id info¶
prints the description of a given batch job to the console
-
-h
,
--help
¶
show this help message and exit
-
--job
,
-j
¶
ID of the job for which information should be displayed
-
--phase
{collect,run}
,
-p
{collect,run}
¶ phase of the workflow step to which the job belongs
metaconfig experiment_id init¶
creates batches for parallel processing and thereby defines how the computational task should be distrubuted over the cluster (also cleans up the output of previous submissions)
-
-h
,
--help
¶
show this help message and exit
-
--mip
¶
perform maximum intensity projection along z axis
-
--n_horizontal
<n_horizontal>
¶ number of images along the horizontal axis of the stitched well overview mosaic image
-
--n_vertical
<n_vertical>
¶ number of images along the vertical axis of the stitched well overview mosaic image
-
--regex
<regex>
¶ named regular expression that defines group names for retrieval of metadata from image filenames
-
--stitch_layout
{vertical,horizontal,zigzag_horizontal,zigzag_vertical}
¶ layout of the stitched well overview mosaic image, i.e. the order in which images were acquired along the grid (default: horizontal)
metaconfig experiment_id log¶
prints the log output of a given batch job to the console
-
-h
,
--help
¶
show this help message and exit
-
--job
,
-j
¶
ID of the job for which log output should be shown
-
--phase
{collect,init,run}
,
-p
{collect,init,run}
¶ phase of the workflow step to which the job belongs
metaconfig experiment_id resubmit¶
resubmits previously created jobs for “run” and “collect” phases to the cluster and monitors their status upon processing
-
-h
,
--help
¶
show this help message and exit
-
--depth
<index>
,
-d
<index>
¶ number of child tasks that should be monitored (default: 1)
-
--interval
<seconds>
,
-i
<seconds>
¶ seconds to wait between monitoring iterations (default: 10)
metaconfig experiment_id run¶
runs an invidiual batch job on the local machine
-
-h
,
--help
¶
show this help message and exit
-
--assume-clean-state
¶
assume that previous outputs have been cleaned up
-
--job
<job_id>
,
-j
<job_id>
¶ ID of the job that should be run
metaconfig experiment_id submit¶
creates batch jobs for the “run” and “collect” phases, submits them to the cluster and monitors their status upon processing (requires a prior “init”)
-
-h
,
--help
¶
show this help message and exit
-
--depth
<index>
,
-d
<index>
¶ number of child tasks that should be monitored (default: 1)
-
--interval
<seconds>
,
-i
<seconds>
¶ seconds to wait between monitoring iterations (default: 10)
-
--cores
<number>
¶ number of cores that should be allocated to each “run” job (may be increased in case memory requirements of a job exceed resources of a single core) (default: 1)
-
--duration
<hh:mm:ss>
¶ walltime that should be allocated to a each “run” job in the format “HH:MM:SS” (may need to be adapted depending on the choice of batch size) (default: 06:00:00)
-
--memory
<mb>
¶ amount of memory in megabytes that should be allocated to each “run” job (default: 2147)
tmlib.workflow.metaconfig.metamorph module¶
Implementation of classes for reading microscope image and metadata files provided in a format specific to microscopes equipped with Metamorph software.
-
tmlib.workflow.metaconfig.metamorph.
IMAGE_FILE_REGEX_PATTERN
= '.+_?(?P<w>[A-Z]\\d{2})_s(?P<s>\\d+)(_w(?P<c>\\d{1}))?[^_thumb]'¶ Regular expression pattern to identify image files
-
tmlib.workflow.metaconfig.metamorph.
METADATA_FILE_REGEX_PATTERN
= '(?!.*)'¶ Supported extensions for metadata files
-
class
tmlib.workflow.metaconfig.metamorph.
MetamorphMetadataHandler
(omexml_images, omexml_metadata=None)¶ Bases:
tmlib.workflow.metaconfig.base.MetadataHandler
Class for handling metadata specific to microscopes equipped with Metamorph software.
Parameters: omexml_images: Dict[str, bioformats.omexml.OMEXML]
metadata extracted from microscope image files
omexml_metadata: bioformats.omexml.OMEXML
metadata extracted from microscope metadata files
tmlib.workflow.metaconfig.omexml module¶
-
tmlib.workflow.metaconfig.omexml.
get_image_ix
(image_id)¶ Get the index of an image within the OMEXML metadata object given the ID of the image.
Parameters: image_id: str
image identifier in the format
Image:\d+
Returns: int
zero-based index number
tmlib.workflow.metaconfig.visiview module¶
Implementation of classes for reading microscope image and metadata files provided in a format specific to microscopes equipped with Visitron VisiView software.
-
tmlib.workflow.metaconfig.visiview.
IMAGE_FILE_REGEX_PATTERN
= '.+_?(?P<w>[A-Z]\\d{2})?_w\\d+(?P<c>\\w+)_s(?P<s>\\d+)_?t?(?P<t>\\d+)?\\.'¶ Regular expression pattern to identify image files
-
tmlib.workflow.metaconfig.visiview.
METADATA_FILE_REGEX_PATTERN
= '\\.nd$'¶ Supported extensions for metadata files
-
class
tmlib.workflow.metaconfig.visiview.
VisiviewMetadataHandler
(omexml_images, omexml_metadata=None)¶ Bases:
tmlib.workflow.metaconfig.base.MetadataHandler
Class for handling metadata specific to microscopes equipped with Visitron software.
Parameters: omexml_images: Dict[str, bioformats.omexml.OMEXML]
metadata extracted from microscope image files
omexml_metadata: bioformats.omexml.OMEXML
metadata extracted from microscope metadata files
-
class
tmlib.workflow.metaconfig.visiview.
VisiviewMetadataReader
¶ Bases:
tmlib.workflow.metaconfig.base.MetadataReader
Class for reading metadata from files formats specific to microscopes equipped with the VisiView software.
-
read
(microscope_metadata_files, microscope_image_filenames)¶ Read metadata from “nd” metadata file.
Parameters: microscope_metadata_files: List[str]
absolute path to the microscope metadata files
microscope_image_filenames: List[str]
names of the corresponding microscope image files
Returns: bioformats.omexml.OMEXML
OMEXML image metadata
Raises: ValueError
when microscope_metadata_files doesn’t have length one
-
-
tmlib.workflow.metaconfig.visiview.
read_nd_file
(filename)¶ Read the lines of the .nd file as key-value pairs, and format the values, i.e. translate them into Python syntax.
The formatted content will have the following layout (shown for a small example dataset):
{ "Description": "File recreated from images.", "StartTime1": "20150820 19:43:31.545", "DoTimelapse": False, "DoStage": True, "NStagePositions": 15, "Stage1": {'column': 1, 'row': 'A', 'site': 1}, "Stage2": {'column': 1, 'row': 'A', 'site': 2}, "Stage3": {'column': 1, 'row': 'A', 'site': 3}, "Stage4": {'column': 1, 'row': 'A', 'site': 4}, "Stage5": {'column': 1, 'row': 'A', 'site': 5}, "Stage6": {'column': 1, 'row': 'A', 'site': 6}, "Stage7": {'column': 1, 'row': 'A', 'site': 7}, "Stage8": {'column': 1, 'row': 'A', 'site': 8}, "Stage9": {'column': 1, 'row': 'A', 'site': 9}, "Stage10": {'column': 1, 'row': 'A', 'site': 10}, "Stage11": {'column': 1, 'row': 'A', 'site': 11}, "Stage12": {'column': 1, 'row': 'A', 'site': 12}, "Stage13": {'column': 1, 'row': 'A', 'site': 13}, "Stage14": {'column': 1, 'row': 'A', 'site': 14}, "Stage15": {'column': 1, 'row': 'A', 'site': 15}, "DoWave": True, "NWavelengths": 2, "WaveName1": "sdcGFP", "WaveDoZ1": True, "WaveName2": "sdcDAPImRFPxm", "WaveDoZ2": True, "DoZSeries": True, "NZSteps": 8, "ZStepSize": 1.0, "WaveInFileName": True, }
Parameters: filename: str
absolute path to the .nd file
Returns: Dict[str, str or List[str]]
formatted .nd file content