VirtualMicrobes.data_tools package

Submodules

VirtualMicrobes.data_tools.store module

class VirtualMicrobes.data_tools.store.DataCollection(save_dir='', name='dummy', filename=None)[source]

Bases: object

change_save_location(new_save_dir=None, copy_orig=True, current_save_path=None)[source]
get_data_point(index)[source]

Get a data point at index.

Exception catching to handle case where due to code update a loaded form of a DataStore does not yet hold a particular DataCollection. In that case a dummy dict producing empty numpy arrays is returned.

Parameters:index – key to a dict shaped data point
init_file(name=None, labels=[], suffix='.csv')[source]
prune_data_file_to_time(min_tp, max_tp)[source]

prune a data file by dropping lines starting from max_tp.

This function assumes that the first line contains column names and subsequent lines are time point data, starting with an comma separated (integer) time point.

Parameters:
  • max_tp – time point before which lines are dropped
  • max_tp – time point after which lines are dropped
update_data_points(index, dat)[source]
write_data()[source]
class VirtualMicrobes.data_tools.store.DataStore(base_save_dir, name, utility_path, n_most_frequent_metabolic_types, n_most_frequent_genomic_counts, species_markers, reactions_dict, small_mols, clean=True, create=True)[source]

Bases: object

Storage object for simulation data

Keep a store of simulation data that can be written to disk and retrieved for online plotting. Typically, the store will only hold the most recent data points before appending the data to relevant on disk storage files.

add_ancestry_data_point(comp_dat, ref_lods, time_point, leaf_samples=100)[source]

Compare lines of descent in a reference tree to a population snapshot at a previous time point.

The comp_dat is a po

add_best_data_point(best, attribute_mapper, time_point, affix='')[source]
add_cell_tc(cell, path, attribute_mapper, time_point, affix='')[source]
add_collection(dc)[source]
add_count_stats_dp(dc_name, dat, time_point)[source]
add_dp(dc_name, dat, time_point)[source]
add_eco_data_point(system, time_point)[source]
add_expression_data_point(system, time_point)[source]
add_frequency_stats_dp(dc_name, dat, column_names, time_point)[source]
add_gain_loss_dp(dc_name, cur_dat, prev_dat, time_point)[source]
add_list_dp(dc_name, dat, time_point)[source]
add_lod_binding_conservation(lod, stride, time_interval, lod_range)[source]
add_lod_data(lod, pop, env, stride, time_interval, lod_range)[source]
add_lod_network_data(lod, stride, time_interval, lod_range, save_dir=None)[source]
add_metabolic_stats_dp(dc_name, dat, time_point, cutoff=0.05)[source]
add_pop_data_point(system, time_point)[source]
add_raw_values_dp(dc_name, dat, time_point)[source]
add_simple_stats_dp(dc_name, dat, time_point)[source]
best_stats_dir = 'best_dat'
change_save_location(base_save_dir=None, name=None, clean=False, copy_orig=True, create=True, current_save_path=None)[source]
class_version = '1.7'
copy_utility_files()[source]
crossfeed_stats = ['crossfeeding', 'strict crossfeeding', 'exploitive crossfeeding']
eco_diversity_stats_dict = {'consumer type': <function <lambda>>, 'export type': <function <lambda>>, 'genotype': <function <lambda>>, 'import type': <function <lambda>>, 'metabolic type': <function <lambda>>, 'producer type': <function <lambda>>, 'reaction genotype': <function <lambda>>}
eco_stats_dir = 'ecology_dat'
eco_type_stats = ['metabolic_type_vector', 'genotype_vector']
fit_stats_names = ['toxicity', 'toxicity_change_rate', 'raw_production', 'raw_production_change_rate']
frequency_stats(dat, column_names)[source]
functional_stats_names = ['conversions_type', 'genotype', 'reaction_genotype', 'metabolic_type', 'import_type', 'export_type', 'tf_sensed', 'consumes', 'produces']
gain_loss(cur_dat, prev_dat)[source]
gain_loss_columns = ['gain', 'loss']
genome_dist_stats = ['copy_numbers', 'copy_numbers_tfs', 'copy_numbers_enzymes', 'copy_numbers_inf_pumps', 'copy_numbers_eff_pumps']
genome_simple_stats_names = ['tf_promoter_strengths', 'enz_promoter_strengths', 'pump_promoter_strengths', 'tf_ligand_differential_ks', 'enz_subs_differential_ks', 'pump_subs_differential_ks', 'pump_ene_differential_ks', 'tf_differential_reg', 'tf_k_bind_ops', 'enz_vmaxs', 'pump_vmaxs', 'tf_ligand_ks', 'enz_subs_ks', 'pump_ene_ks', 'pump_subs_ks']
genome_simple_val_stats_names = ['genome_size', 'chromosome_count', 'tf_count', 'enzyme_count', 'eff_pump_count', 'inf_pump_count', 'tf_avrg_promoter_strengths', 'enz_avrg_promoter_strengths', 'pump_avrg_promoter_strengths', 'tf_sum_promoter_strengths', 'enz_sum_promoter_strengths', 'pump_sum_promoter_strengths']
grid_stats = ['neighbor crossfeeding', 'strict neighbor crossfeeding', 'exploitive neighbor crossfeeding', 'grid production values', 'grid production rates', 'grid death rates', 'grid cell sizes', 'lineages', 'grid ages', 'grid genomesizes', 'grid divided', 'metabolic_type_grid']
grn_edits = {'': <function <lambda>>, '_pruned_1._cT_iT': <function <lambda>>}
init_ancestry_compare_stores(pop_hist)[source]
init_dict_stats_store(save_dir, stats_name, column_names, index_name='time_point', **kwargs)[source]
init_eco_data_stores(save_dir=None)[source]
init_expression_data_stores(save_dir=None)[source]
init_gain_loss_store(save_dir, stats_name, index_name='time_point', filename=None)[source]
init_list_stats_store(save_dir, stats_name)[source]
init_lod_stores(lod, met_classes, conversions, transports, first_anc, last_anc)[source]
init_phylo_hist_stores(phylo_hist)[source]
init_pop_data_stores(save_dir=None)[source]
init_save_dirs(clean=False, create=True)[source]

create the paths to store various data types

Parameters:
  • clean – (bool) remove existing files in path
  • create – create the path within the file system
init_simple_stats_plus_store(save_dir, stats_name, index_name='time_point')[source]
init_simple_stats_store(save_dir, stats_name, index_name='time_point', filename=None)[source]
init_simple_value_store(save_dir, stats_name, index_name='time_point', filename=None)[source]
init_stats_column_names(n_most_freq_met_types, n_most_freq_genomic_counts, species_markers, reactions_dict, small_mols)[source]

Initialize column names for various data collections :param n_most_freq_met_types: :param n_most_genomic_counts:

lod_stats_dir = 'lod_dat'
meta_stats_names = ['providing_count', 'strict_providing_count', 'exploiting_count', 'strict_exploiting_count', 'producing_count', 'consuming_count', 'importing_count', 'exporting_count']
metabolic_categories = ['producer', 'consumer', 'import', 'export']
mut_stats_names = ['point_mut_count', 'chromosomal_mut_count', 'stretch_mut_count', 'chromosome_dup_count', 'chromosome_del_count', 'chromosome_fuse_count', 'chromosome_fiss_count', 'sequence_mut_count', 'tandem_dup_count', 'stretch_del_count', 'stretch_invert_count', 'translocate_count', 'internal_hgt_count', 'external_hgt_count']
network_stats_funcs = {'all_node_connectivities': <function <lambda>>, 'degree': <function <lambda>>, 'in_degree': <function <lambda>>, 'out_degree': <function <lambda>>}
phy_dir = 'phylogeny_dat'
pop_genomic_stats_dict = {'chromosome counts': <function <lambda>>, 'genome sizes': <function <lambda>>}
pop_simple_stats_dict = {'ages': <function <lambda>>, 'cell sizes': <function <lambda>>, 'death rates': <function <lambda>>, 'differential regulation': <function <lambda>>, 'enz_subs_ks': <function <lambda>>, 'enzyme counts': <function <lambda>>, 'enzyme promoter strengths': <function <lambda>>, 'enzyme vmaxs': <function <lambda>>, 'exporter counts': <function <lambda>>, 'hgt neigh': <function <lambda>>, 'importer counts': <function <lambda>>, 'iterages': <function <lambda>>, 'offspring counts': <function <lambda>>, 'pos production': <function <lambda>>, 'production rates': <function <lambda>>, 'production values': <function <lambda>>, 'pump promoter strengths': <function <lambda>>, 'pump vmaxs': <function <lambda>>, 'pump_ene_ks': <function <lambda>>, 'pump_subs_ks': <function <lambda>>, 'regulator_score': <function <lambda>>, 'tf counts': <function <lambda>>, 'tf promoter strengths': <function <lambda>>, 'tf_k_bind_ops': <function <lambda>>, 'tf_ligand_ks': <function <lambda>>, 'times divided': <function <lambda>>, 'toxicity rates': <function <lambda>>, 'uptake rates': <function <lambda>>}
pop_stats_dir = 'population_dat'
prune_data_files_to_time(min_tp, max_tp)[source]
save_anc_cells(pop, time)[source]
save_dir
save_lod_cells(lod, stride, time_interval, lod_range, runtime)[source]
save_phylo_tree(pop, time)[source]
simple_stats(dat)[source]
simple_stats_columns = ['avrg', 'min', 'max', 'median', 'std', 'total']
simple_value(dat)[source]
simple_value_column = ['value']
snapshot_stats_names = ['historic_production_max']
trophic_type_columns = ['fac-mixotroph', 'autotroph', 'heterotroph', 'obl-mixotroph']
type_differences_stats(types, column_names)[source]
type_totals_stats(types, column_name)[source]
upgrade(odict)[source]

Upgrading from older pickled version of class to latest version. Version information is saved as class variable and should be updated when class invariants (e.g. fields) are added. (see also __setstate__)

Adapted from recipe at http://code.activestate.com/recipes/521901-upgradable-pickles/

write_data()[source]
write_genome_json(save_dir, name, genome, attribute_mapper, labels, suffix='.json')[source]
class VirtualMicrobes.data_tools.store.DictDataCollection(column_names, index_name, to_string=None, **kwargs)[source]

Bases: VirtualMicrobes.data_tools.store.DataCollection

Entry point for storing and retrieving structured data

update_data_points(index, data_dict)[source]
write_column_names(columns=None, index_name=None, to_string=None)[source]
write_data(sep=', ')[source]
class VirtualMicrobes.data_tools.store.ListDataCollection(**kwargs)[source]

Bases: VirtualMicrobes.data_tools.store.DataCollection

update_data_points(index, data_vector)[source]
write_data(sep=', ')[source]
VirtualMicrobes.data_tools.store.create_gene_type_time_course_dfs(cell)[source]
VirtualMicrobes.data_tools.store.create_tc_df(tc_dict)[source]
VirtualMicrobes.data_tools.store.eco_type_vector_to_dict(vect_dict)[source]
VirtualMicrobes.data_tools.store.tf_conservation_to_dict(tf_conservation_dict)[source]
VirtualMicrobes.data_tools.store.tf_name(tf)[source]

Module contents