bulum.stats.stochastic_data_check module

class StochasticDataComparison(dfs_dict: dict, wy_month=7, allow_part_years=False, show_bands=False)

Bases: object

Set of outputs for comparison of baseline dataset with additional dataset(s).

Internal attributes provide raw outputs and charts for comparison of timeseries cross-correlations, distribution and general summary statistics. Can be applied to stochastically generated data e.g. some subset (mean, percentile) of climate data replicates or climate factor adjusted datasets.

Legend in most charts is selectable to reduce the number of displayed datasets.

Parameters:

dfs_dict (dict of pd.DataFrames) – Dict containing name (key) and reference to (value) of input dataframe(s). Outputs are calculated for each of the columns in the first DataFrame (first entry in dict). Assumes that subsequent DataFrames contain at least the subset of columns from first DataFrame. Column headers must be identical.
wy_month (int, default 7) – Water Year start month for annual aggregation.
allow_part_years (bool, default False) – Allow part water years or only complete water years.
show_bands (bool, default False) – Whether to show value ranges as grey band in statistic charts

Correlations

outputs: Multi-index df with Lag-0 and Lag-1 cross correlations grouped by - Period (annual vs. months vs. daily), Lag-type, Timeseries, Dataset chts: Multi-level dictionary with charts stored by - Period (annual vs. months vs. daily), Lag-type, Timeseries heatmaps: Multi-level dictionary with charts stored by - Period (annual vs. months vs. daily), Lag-type, Dataset

Type:: {outputs, chts, heatmaps}

Distributions

outputs: Multi-index df with distribution of each timeseries grouped by - Period (annual vs. months), Timeseries, Dataset chts: Multi-level dictionary with charts stored by - Period (annual vs. months), Timeseries

Type:: {outputs, chts}

Stats

outputs: Multi-index df with stats grouped by - Period (annual vs. months), Statistic, Dataset, Timeseries chts: Multi-level dictionary with charts stored by - Period (annual vs. months vs. monthly), Statistic, Timeseries (if monthly)

Type:: {outputs, chts}

Examples

Constructing StochasticDataComparison.

>>> Comparison = StochasticDataComparison(dfs_dict = {'Dataset1': df_1, 'Dataset2': df_2})

Output annual distribution all timeseries and datasets.

>>> Comparison.Distributions["outputs"]["annual"]

Output July (month 7) distribution comparison for given timeseries (“col1”).

>>> Comparison.Distributions["outputs"]["07"]["col1"]

Output July (month 7) distribution chart for given timeseries (“col1”).

>>> Comparison.Distributions["chts"]["07"]["col1"]

Output annual Lag-0 and Lag-1 cross correlations for all timeseries and datasets.

>>> Comparison.Correlations["outputs"]["annual"]

Output annual Lag-0 cross correlation chart comparison for given timeseries (“col1”)

>>> Comparison.Correlations["chts"]["annual"]["lag0"]["col1"]

Output annual Lag-0 cross correlation heatmap for given dataset (“Dataset1”)

>>> Comparison.Correlations["chts"]["annual"]["lag0"]["Dataset1"]

Output annual statistic summary for all timeseries and datasets.

>>> Comparison.Stats["outputs"]["annual"]

Output July mean total comparison for all timeseries and datasets.

>>> Comparison.Stats["outputs"]["07"]["mean"]

Output July mean total chart for all timeseries and datasets.

>>> Comparison.Stats["chts"]["07"]["mean"]

Output July (month 7) distribution chart for all timeseries.

>>> alt.vconcat(*Comparison.Distributions["chts"]["07"].values())

Output July (month 7) distribution chart for all timeseries and adjust properties.

>>> alt.vconcat(*[x.properties(width=800).interactive() for x in Comparison.Distributions["chts"]["07"].values()])

Output annual distribution chart for given timeseries (“col1”), convert to log-scale.

>>> Comparison.Distributions["chts"]["annual"]["col1"].layer[0].encoding.y.scale = {'type': 'log'}