VirtualMicrobes.post_analysis package

Submodules

VirtualMicrobes.post_analysis.lod module

class VirtualMicrobes.post_analysis.lod.LOD(lod, name, stride, time_interval, lod_range, save_dir=None)[source]

Bases: object

classdocs

standardized_production(test_params)[source]
strided_lod(stride, time_interval, lod_range)[source]

Sample individuals within a range of the LOD at regular intervals.

Either use a stride or a time interval to sample individuals from the lod. If a time interval is provided, ancestors are sampled that have a time of birth that is approximately separated by time_interval in the evolutionary simulation.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
Returns:

Return type:

list of ancestor VirtualMicrobes.virtual_cell.Cell.Cell s

t_interval_iter(time_interval)[source]

Iterate ancestors that are approximately ‘time_interval’ timesteps apart in their time of birth.

class VirtualMicrobes.post_analysis.lod.LOD_Analyser(args)[source]

Bases: object

Analyses the evolutionary history of a population by tracing ancestors in the line of descent.

Loads a simulation save from a file, keeping a reference in ref_sim. From this, initialise ref_pop_hist as a PopulationHistory object that analyses the phylogenetic tree of the population.

The PopulationHistory generates a LOD for 1 or more individuals in the saved population. For each LOD, evolutionary data and network and genome plots can be produced.

It is possible to load additional simulation snapshots that preceed the ref_pop_hist and compare individuals to their contemporaries present in the preceding populations. compare_saves contains a list of file names of populations-saves that should be compared.

anc_cells(runtime=None, tcs=False)[source]

Dump all cells in the fossil record (e.g. to map onto the newick trees)

args = None

config and command line arguments used for initialisation

compare_saves = []

names of snapshot files to copmare to ref_sim

compare_to_pops()[source]

Compare reference simulation to a set of previous population snapshots.

Compares each of the simulation snapshot saves in compare_saves to the ref_pop_hist. A PopulationHistory is constructed for each of the compare snapshots. Within the compare snapshot, individuals that correspond to the are part of (any of) the LOD`(s) of the :attr:`ref_pop_hist will be identified. Properties of these ancestors will then be compare with their statistical values for the whole population.

draw_ref_trees()[source]

Draw a reference phylogenetic tree, with individual, selected LODs marked

init_compare_saves(compare_saves)[source]

Parse and check compare saves parameter.

Compare saves can be either a list of file names or a list of generation times (or None). In the latter case, the file names should be constructed using the time point and the file name of the reference simulation. Checks are made to ensure files exist and also to ensure that no compares save points come after the reference simulation save point, as this would not make sense in the comparison functions.

init_ref_history(ref_sim=None, nr_lods=None, prune_depth=0, pop_hist_dir='population_history')[source]

Create a PopulationHistory from the ref_sim VirtualMicrobes.simulation.Simulation.Simulation object.

For the PopulationHistory object constructs its phylogenetic tree and prune back the tree to a maximum depth of (max_depth - prune_depth) counted from the root. Then create LOD objects representing the line of descent of the nr_lods most diverged branches in the tree.

Parameters:
  • ref_sim (VirtualMicrobes.simulation.Simulation.Simulation object) – simulation snapshot that is the basis for LOD analysis
  • nr_lods (int nr_lods) – nr of separate (most distant) LODs to initialize
  • prune_depth (int) – prune back the phylogenetic tree with this many timesteps
  • pop_hist_dir (str) – name of directory to store lod analysis output
lod_binding_conservation(stride=None, time_interval=None, lod_range=None)[source]

Write time series for TF binding conservation for LODs.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
lod_cells(stride=None, time_interval=None, lod_range=None, runtime=None)[source]

Write time series of evolutionary changes along all LODs.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
lod_graphs(stride=None, time_interval=None, lod_range=None, formats=None)[source]

Draw network and genome graphs for LODs

It is possible to set an interval and a range to sample individuals in the LOD.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD

Note

Either use a stride or a time interval to sample individuals from the lod.

lod_network_stats(stride=None, time_interval=None, lod_range=None)[source]

Write time series for evolutionary network property changes along all LODs.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
lod_stats(stride=None, time_interval=None, lod_range=None)[source]

Write time series of evolutionary changes along all LODs.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
lod_time_course_plots(stride=None, time_interval=None, lod_range=None, formats=None)[source]

Draw time course diagrams for individuals in the LODs.

It is possible to set an interval and a range to sample individuals in the LOD.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD

Note

Either use a stride or a time interval to sample individuals from the lod.

lod_time_courses(lod_range=None, chunk_size=None)[source]

Write time series of molecule concentrations within the LOD

It is possible to set a range to sample individuals in the LOD.

Parameters:
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
  • chunk_size (int) – number of generations in LOD to concatenate per chunk
pop_cells(runtime=None, tcs=False)[source]

Dump all cells in this save file

ref_pop_hist = None

PopulationHistory for the reference simulation (ref_sim) snapshot

ref_sim = None

VirtualMicrobes.simulation.Simulation snapshot to analyse

write_newick_trees()[source]

write newick trees for all phylogenies in attr:ref_pop_hist

class VirtualMicrobes.post_analysis.lod.PopulationHistory(sim, params, save_dir=None, prune_depth=None)[source]

Bases: object

Performs and stores evolutionary history analysis of VirtualMicrobes.simulation.Simulation.Simulation snapshots.

Generates LODs for 1 or more individuals in the population. Reconstruct the evolutionary events along the line of descent.

A reference PopulationHistory can also be compared to population history at earlier simulation time points. In this case the ancestors of individuals in the reference population history will be identified and compared to the rest of the population at that point in time. In this way, evolutionary biases on the line of descent can be brought to light.

anc_cells(pop, time)[source]

Write cell files for all cells in the ancestry, which can be mapped on the newick tree :param pop: :type pop: current population that contains the current_ancestry list :param time: :type time: run_time

draw_ref_trees(rescale=False)[source]

Output reference trees for phylogenetic trees with lods labeled.

Uses phylogenetic tree drawing methods to annotate the leaf nodes of lods. Reference trees give a visual overview of the position of the lods that are analysed in the tree.

dump_anc_cells(time)[source]

Dump all ancestors (perfect fossil record) to files, and also save the newick tree. Should be all in there?

dump_lod_cells(time)[source]

Dump all cells used in LOD analysis to files (i.o.w. a single lineages / subset of anc_cells)

dump_pop_cells(time, prunegens)[source]

Output current population cells as cellfiles

environment = None

Short cut to VirtualMicrobes.environment.Environment of sim.

identify_lod_ancestor(ete_tree_struct, lod)[source]

Identify the individual in the population that is on the line of descent (lod) under consideration.

The nodes in the ete tree corresponding to the lod will be annotated with a tag.

Parameters:
  • ete_tree_struct (VirtualMicrobes.my_tools.utility.ETEtreeStruct) – container structure for phylogenetic tree representations
  • lod (LOD) – line of descent
Returns:

init_lods(nr_lods, save_dir=None, stride=None, time_interval=None, lod_range=None)[source]

Initialize the line of descent (LOD) container objects.

Iterate over the phylogenetic trees of the population and for each tree select nr_lods leaf nodes that are at maximum phylogenetic distance.

For each of the selected leafs, construct a line of descent object (LOD).

Parameters:
  • nr_lods (int) – number of LOD objects per phylogenetic tree
  • save_dir (str) –
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
init_phylo_tree(prune_depth=None)[source]

Update the phylogenetic tree of the population.

Clears the change in the population of the final regular simulation step. Prunes back the tree to a maximum depth.

Parameters:prune_depth (int) – number of generations to prune from the leafs of phylogenetic tree
lod_binding_conservation(stride, time_interval, lod_range)[source]

Write time series for line of descent properties such as network connectivity, protein expression etc.

Either use a stride or a time interval to sample individuals from the lod.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
lod_cells(stride, time_interval, lod_range, runtime)[source]

Write cell files for line of descent

The leaf of the tree is saved as CellLeaf<LOD_ID>, and all it’s ancestors are saved as CellNode<BIRTHTIME>_<LOD_ID>.cell

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
lod_network_stats(stride, time_interval, lod_range)[source]

Write time series for line of descent properties such as network connectivity, protein expression etc.

Either use a stride or a time interval to sample individuals from the lod.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
lod_stats(stride, time_interval, lod_range)[source]

Write time series for line of descent properties such as network connectivity, protein expression etc.

Either use a stride or a time interval to sample individuals from the lod.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
lods_time_course_data(lod_range, chunk_size)[source]

Write time series data in the line of descent to files.

Concatenates time courses of individuals along a LOD. Concatenations are done in chunks of a chosen chunk_size. For each chunk .csv files are stored in a directory named part*n*, where n is the chunk number.

Parameters:
  • ancestors (list of VirtualMicrobes.virtual_cell.Cell.Cells) –
  • base_save_dir (str) –
  • viewer_path (str) – path to utility files for html data viewer
  • chunk_size (int) – length of chunks of concatenated data
lods_time_course_plots(stride, time_interval, lod_range, formats)[source]

Output time course graphs for the line of descent.

Either use a stride or a time interval to sample individuals from the lod.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
params = None

The (updated) simulation parameters.

plot_lod_graphs(stride, time_interval, lod_range, formats)[source]

Output metabolic, GRN and genome graphs for the line of descent.

Either use a stride or a time interval to sample individuals from the lod.

Parameters:
  • stride (int) – stride in generations for sampling individuals along the LOD
  • time_interval (int) – interval in simulation time for sampling individuals along the LOD
  • lod_range ((float,float)) – bounds in fractions of the total range of the LOD
population = None

Short cut to VirtualMicrobes.virtual_cell.Population.Population of sim.

prune_depth = 0

Number of generations from leaves to prune the phylogenetic tree of the pophist.

sim = None

The VirtualMicrobes.simulation.Simulation.Simulation snapshot for which this pophist was made.

time_point = None

Last simulation time of the sim.

tree_lods = []

List of lists of LODs. One list for each independent phylogenetic tree within the population.

write_newick_trees()[source]

Write newick representation of phylogenetic trees to files.

VirtualMicrobes.post_analysis.network_funcs module

VirtualMicrobes.post_analysis.network_funcs.prune_GRN(grn, log_dif_effect=0.5, rescue_regulated=True, iterative=True)[source]

VirtualMicrobes.post_analysis.network_properties module

class VirtualMicrobes.post_analysis.network_properties.PhyloGeneticAnalysis[source]

Bases: object

Analyze biological networks

VirtualMicrobes.post_analysis.network_properties.calculate_overlap(tf_connections, connections_of_homologous_tfs, closest_bound_homologs_dict)[source]

Calculate the overlap in bound genes between tf homologs.

Parameters:
Returns:

Tuple of fractions: [0]: Fraction of downstream genes who’s homologs are bound by a homolog of the reference TF. [1]: Fraction of new connections (averaged over tf homologs) per original connection of the reference TF.

Return type:

float,float

VirtualMicrobes.post_analysis.network_properties.find_homolog_distances(gene, genome, closest_homolog=False)[source]

Find homologs and their distance for a gene in a target genome.

Parameters:
VirtualMicrobes.post_analysis.network_properties.find_homologs(gene, genome)[source]

For a gene, find all its homologs in a given genome.

This is a naive approach that uses a combination of the gene’s type and its VirtualMicrobes.virtual_cell.Identifier.Identifier attribute to detect common descent.

Parameters:
Returns:

Return type:

The set of homologs of gene in the genome.

VirtualMicrobes.post_analysis.network_properties.tf_binding_overlap(cell1, cell2, closest_homolog=False, no_phylogeny=False, verbose=False)[source]

Measure the overlap in target genes for tf homologs in phylogenetically related individuals.

cell1 : VirtualMicrobes.virtual_cell.Cell.Cell
Reference individual for which to find homologs
cell2 : VirtualMicrobes.virtual_cell.Cell.Cell
Homologs of TFs and downstream targets will be detected in this individual.
closest_homolog : bool
Flag to filter found homologs to those that have the shortest phylogenetic distance to the gene.
verbose : bool
Print messages about homologs found.
Returns:Mapping from VirtualMicrobes.virtual_cell.Gene.TranscriptionFactor to (maximum) binding overlap score.
Return type:dict

Module contents