VirtualMicrobes.post_analysis package¶
Submodules¶
VirtualMicrobes.post_analysis.lod module¶
-
class
VirtualMicrobes.post_analysis.lod.
LOD
(lod, name, stride, time_interval, lod_range, save_dir=None)[source]¶ Bases:
object
classdocs
-
strided_lod
(stride, time_interval, lod_range)[source]¶ Sample individuals within a range of the LOD at regular intervals.
Either use a stride or a time interval to sample individuals from the lod. If a time interval is provided, ancestors are sampled that have a time of birth that is approximately separated by time_interval in the evolutionary simulation.
Parameters: Returns: Return type: list of ancestor
VirtualMicrobes.virtual_cell.Cell.Cell
s
-
-
class
VirtualMicrobes.post_analysis.lod.
LOD_Analyser
(args)[source]¶ Bases:
object
Analyses the evolutionary history of a population by tracing ancestors in the line of descent.
Loads a simulation save from a file, keeping a reference in
ref_sim
. From this, initialiseref_pop_hist
as aPopulationHistory
object that analyses the phylogenetic tree of the population.The
PopulationHistory
generates aLOD
for 1 or more individuals in the saved population. For eachLOD
, evolutionary data and network and genome plots can be produced.It is possible to load additional simulation snapshots that preceed the
ref_pop_hist
and compare individuals to their contemporaries present in the preceding populations.compare_saves
contains a list of file names of populations-saves that should be compared.-
anc_cells
(runtime=None, tcs=False)[source]¶ Dump all cells in the fossil record (e.g. to map onto the newick trees)
-
args
= None¶ config and command line arguments used for initialisation
-
compare_saves
= []¶ names of snapshot files to copmare to ref_sim
-
compare_to_pops
()[source]¶ Compare reference simulation to a set of previous population snapshots.
Compares each of the simulation snapshot saves in
compare_saves
to theref_pop_hist
. APopulationHistory
is constructed for each of the compare snapshots. Within the compare snapshot, individuals that correspond to the are part of (any of) theLOD`(s) of the :attr:`ref_pop_hist
will be identified. Properties of these ancestors will then be compare with their statistical values for the whole population.
-
init_compare_saves
(compare_saves)[source]¶ Parse and check compare saves parameter.
Compare saves can be either a list of file names or a list of generation times (or None). In the latter case, the file names should be constructed using the time point and the file name of the reference simulation. Checks are made to ensure files exist and also to ensure that no compares save points come after the reference simulation save point, as this would not make sense in the comparison functions.
-
init_ref_history
(ref_sim=None, nr_lods=None, prune_depth=0, pop_hist_dir='population_history')[source]¶ Create a
PopulationHistory
from theref_sim
VirtualMicrobes.simulation.Simulation.Simulation
object.For the
PopulationHistory
object constructs its phylogenetic tree and prune back the tree to a maximum depth of (max_depth - prune_depth) counted from the root. Then createLOD
objects representing the line of descent of the nr_lods most diverged branches in the tree.Parameters: - ref_sim (
VirtualMicrobes.simulation.Simulation.Simulation
object) – simulation snapshot that is the basis for LOD analysis - nr_lods (int nr_lods) – nr of separate (most distant)
LOD
s to initialize - prune_depth (int) – prune back the phylogenetic tree with this many timesteps
- pop_hist_dir (str) – name of directory to store lod analysis output
- ref_sim (
-
lod_binding_conservation
(stride=None, time_interval=None, lod_range=None)[source]¶ Write time series for TF binding conservation for
LOD
s.Parameters:
-
lod_cells
(stride=None, time_interval=None, lod_range=None, runtime=None)[source]¶ Write time series of evolutionary changes along all
LOD
s.Parameters:
-
lod_graphs
(stride=None, time_interval=None, lod_range=None, formats=None)[source]¶ Draw network and genome graphs for
LOD
sIt is possible to set an interval and a range to sample individuals in the
LOD
.Parameters: Note
Either use a stride or a time interval to sample individuals from the lod.
-
lod_network_stats
(stride=None, time_interval=None, lod_range=None)[source]¶ Write time series for evolutionary network property changes along all
LOD
s.Parameters:
-
lod_stats
(stride=None, time_interval=None, lod_range=None)[source]¶ Write time series of evolutionary changes along all
LOD
s.Parameters:
-
lod_time_course_plots
(stride=None, time_interval=None, lod_range=None, formats=None)[source]¶ Draw time course diagrams for individuals in the
LOD
s.It is possible to set an interval and a range to sample individuals in the
LOD
.Parameters: Note
Either use a stride or a time interval to sample individuals from the lod.
-
lod_time_courses
(lod_range=None, chunk_size=None)[source]¶ Write time series of molecule concentrations within the
LOD
It is possible to set a range to sample individuals in the
LOD
.Parameters: - lod_range ((float,float)) – bounds in fractions of the total range of the
LOD
- chunk_size (int) – number of generations in LOD to concatenate per chunk
- lod_range ((float,float)) – bounds in fractions of the total range of the
-
ref_pop_hist
= None¶ PopulationHistory
for the reference simulation (ref_sim) snapshot
-
ref_sim
= None¶ VirtualMicrobes.simulation.Simulation
snapshot to analyse
-
-
class
VirtualMicrobes.post_analysis.lod.
PopulationHistory
(sim, params, save_dir=None, prune_depth=None)[source]¶ Bases:
object
Performs and stores evolutionary history analysis of
VirtualMicrobes.simulation.Simulation.Simulation
snapshots.Generates
LOD
s for 1 or more individuals in the population. Reconstruct the evolutionary events along the line of descent.A reference
PopulationHistory
can also be compared to population history at earlier simulation time points. In this case the ancestors of individuals in the reference population history will be identified and compared to the rest of the population at that point in time. In this way, evolutionary biases on the line of descent can be brought to light.-
anc_cells
(pop, time)[source]¶ Write cell files for all cells in the ancestry, which can be mapped on the newick tree :param pop: :type pop: current population that contains the current_ancestry list :param time: :type time: run_time
-
draw_ref_trees
(rescale=False)[source]¶ Output reference trees for phylogenetic trees with lods labeled.
Uses phylogenetic tree drawing methods to annotate the leaf nodes of lods. Reference trees give a visual overview of the position of the lods that are analysed in the tree.
-
dump_anc_cells
(time)[source]¶ Dump all ancestors (perfect fossil record) to files, and also save the newick tree. Should be all in there?
-
dump_lod_cells
(time)[source]¶ Dump all cells used in LOD analysis to files (i.o.w. a single lineages / subset of anc_cells)
-
environment
= None¶ Short cut to
VirtualMicrobes.environment.Environment
of sim.
-
identify_lod_ancestor
(ete_tree_struct, lod)[source]¶ Identify the individual in the population that is on the line of descent (lod) under consideration.
The nodes in the ete tree corresponding to the lod will be annotated with a tag.
Parameters: - ete_tree_struct (
VirtualMicrobes.my_tools.utility.ETEtreeStruct
) – container structure for phylogenetic tree representations - lod (
LOD
) – line of descent
Returns: - (
VirtualMicrobes.virtual_cell.Cell.Cell
,ete3.TreeNode
) - (oldest ancestor cell, its tree node representation)
- ete_tree_struct (
-
init_lods
(nr_lods, save_dir=None, stride=None, time_interval=None, lod_range=None)[source]¶ Initialize the line of descent (
LOD
) container objects.Iterate over the phylogenetic trees of the
population
and for each tree select nr_lods leaf nodes that are at maximum phylogenetic distance.For each of the selected leafs, construct a line of descent object (
LOD
).Parameters: - nr_lods (int) – number of
LOD
objects per phylogenetic tree - save_dir (str) –
- stride (int) – stride in generations for sampling individuals along the
LOD
- time_interval (int) – interval in simulation time for sampling individuals along the
LOD
- lod_range ((float,float)) – bounds in fractions of the total range of the
LOD
- nr_lods (int) – number of
-
init_phylo_tree
(prune_depth=None)[source]¶ Update the phylogenetic tree of the population.
Clears the change in the population of the final regular simulation step. Prunes back the tree to a maximum depth.
Parameters: prune_depth (int) – number of generations to prune from the leafs of phylogenetic tree
-
lod_binding_conservation
(stride, time_interval, lod_range)[source]¶ Write time series for line of descent properties such as network connectivity, protein expression etc.
Either use a stride or a time interval to sample individuals from the lod.
Parameters:
-
lod_cells
(stride, time_interval, lod_range, runtime)[source]¶ Write cell files for line of descent
The leaf of the tree is saved as CellLeaf<LOD_ID>, and all it’s ancestors are saved as CellNode<BIRTHTIME>_<LOD_ID>.cell
Parameters:
-
lod_network_stats
(stride, time_interval, lod_range)[source]¶ Write time series for line of descent properties such as network connectivity, protein expression etc.
Either use a stride or a time interval to sample individuals from the lod.
Parameters:
-
lod_stats
(stride, time_interval, lod_range)[source]¶ Write time series for line of descent properties such as network connectivity, protein expression etc.
Either use a stride or a time interval to sample individuals from the lod.
Parameters:
-
lods_time_course_data
(lod_range, chunk_size)[source]¶ Write time series data in the line of descent to files.
Concatenates time courses of individuals along a
LOD
. Concatenations are done in chunks of a chosen chunk_size. For each chunk .csv files are stored in a directory named part*n*, where n is the chunk number.Parameters: - ancestors (list of
VirtualMicrobes.virtual_cell.Cell.Cell
s) – - base_save_dir (str) –
- viewer_path (str) – path to utility files for html data viewer
- chunk_size (int) – length of chunks of concatenated data
- ancestors (list of
-
lods_time_course_plots
(stride, time_interval, lod_range, formats)[source]¶ Output time course graphs for the line of descent.
Either use a stride or a time interval to sample individuals from the lod.
Parameters:
-
params
= None¶ The (updated) simulation parameters.
-
plot_lod_graphs
(stride, time_interval, lod_range, formats)[source]¶ Output metabolic, GRN and genome graphs for the line of descent.
Either use a stride or a time interval to sample individuals from the lod.
Parameters:
-
population
= None¶ Short cut to
VirtualMicrobes.virtual_cell.Population.Population
of sim.
-
prune_depth
= 0¶ Number of generations from leaves to prune the phylogenetic tree of the pophist.
-
sim
= None¶ The
VirtualMicrobes.simulation.Simulation.Simulation
snapshot for which this pophist was made.
-
time_point
= None¶ Last simulation time of the sim.
-
VirtualMicrobes.post_analysis.network_funcs module¶
VirtualMicrobes.post_analysis.network_properties module¶
-
class
VirtualMicrobes.post_analysis.network_properties.
PhyloGeneticAnalysis
[source]¶ Bases:
object
Analyze biological networks
-
VirtualMicrobes.post_analysis.network_properties.
calculate_overlap
(tf_connections, connections_of_homologous_tfs, closest_bound_homologs_dict)[source]¶ Calculate the overlap in bound genes between tf homologs.
Parameters: - tf_connections (list of
VirtualMicrobes.virtual_cell.Gene.Gene
s) – Downstream connections of the reference TF - connections_of_homologous_tfs (list of sets of
VirtualMicrobes.virtual_cell.Gene.Gene
s) – List of sets of downstream genes for each homolog of the reference TF - closest_bound_homologs_dict (dict of sets of
VirtualMicrobes.virtual_cell.Gene.Gene
s) – Mapping of each original downstream gene of the reference TF to sets of homologs of these downstream genes.
Returns: Tuple of fractions: [0]: Fraction of downstream genes who’s homologs are bound by a homolog of the reference TF. [1]: Fraction of new connections (averaged over tf homologs) per original connection of the reference TF.
Return type: float,float
- tf_connections (list of
-
VirtualMicrobes.post_analysis.network_properties.
find_homolog_distances
(gene, genome, closest_homolog=False)[source]¶ Find homologs and their distance for a gene in a target genome.
Parameters: - gene (
VirtualMicrobes.virtual_cell.GenomicElement.GenomicElement
) – Reference gene for homology search. - genome (
VirtualMicrobes.virtual_cell.Genome.Genome
) – Genome in which to search for homologs. - closest_homolog (bool) – Flag to filter found homologs to those that have the shortest phylogenetic distance to the gene.
- gene (
-
VirtualMicrobes.post_analysis.network_properties.
find_homologs
(gene, genome)[source]¶ For a gene, find all its homologs in a given genome.
This is a naive approach that uses a combination of the gene’s type and its
VirtualMicrobes.virtual_cell.Identifier.Identifier
attribute to detect common descent.Parameters: - gene (
VirtualMicrobes.virtual_cell.GenomicElement.GenomicElement
) – Reference gene for homology search. - genome (
VirtualMicrobes.virtual_cell.Genome.Genome
) – Genome in which to search for homologs.
Returns: Return type: The set of homologs of gene in the genome.
- gene (
-
VirtualMicrobes.post_analysis.network_properties.
tf_binding_overlap
(cell1, cell2, closest_homolog=False, no_phylogeny=False, verbose=False)[source]¶ Measure the overlap in target genes for tf homologs in phylogenetically related individuals.
- cell1 :
VirtualMicrobes.virtual_cell.Cell.Cell
- Reference individual for which to find homologs
- cell2 :
VirtualMicrobes.virtual_cell.Cell.Cell
- Homologs of TFs and downstream targets will be detected in this individual.
- closest_homolog : bool
- Flag to filter found homologs to those that have the shortest phylogenetic distance to the gene.
- verbose : bool
- Print messages about homologs found.
Returns: Mapping from VirtualMicrobes.virtual_cell.Gene.TranscriptionFactor
to (maximum) binding overlap score.Return type: dict - cell1 :