bulum.utils.dataframe_functions module

assert_df_format_standards(df: DataFrame) → None: c.f. check_df_format_standards()

assert_df_has_one_column(df: DataFrame) → None

Assert that a dataframe has exactly one column.

Parameters:: df (DataFrame) – The dataframe to check.
Raises:: ValueError – If the dataframe does not have exactly one column.

check_data_equivalence(df1: DataFrame, df2: DataFrame, check_col_order: bool = True, threshold: float = 1e-06, details: dict | None = None) → bool

Checks if two numeric dataframes are the same. It checks the names & order of the columns, the values of the index, and summary stats on all the data columns.

Parameters:

df1 (DataFrame)
df2 (DataFrame)
check_col_order (bool, default True) – Specifies whether column order is important or not.
threshold (float, default 1e-6) – Numerical threshold for checking if stats are the same.
details (dict, optional) – This is a dictionary for returning detailed results to the user. Defaults to None. Results are returned by appending messages to the dictionary.

check_df_format_standards(df: DataFrame) → list[str]: Checks if a given dataframe meets standards generally required by bulum functions. These standards include: - Dataframe is not None - Dateframe is not empty - Dataframe index name is “Date” - Dataframe index values are daily sequential strings with the format “%Y-%m-%d” - Data columns all have datatype of double - Missing values are nan (not na, not -nan)

convert_index_to_datetime(df: DataFrame, **kwargs) → DataFrame

Converts the index to pandas datetime. Accepts a dataframe with a index as datetime or strings.

Parameters:

df (DataFrame)
**kwargs – Passed to strings_to_datetimes()

Raises:

ValueError – Empty or null dataframe passed.

convert_index_to_string(df: DataFrame, str_format: str = '%Y-%m-%d') → DataFrame: Converts the index of df to strings. Accepts a dataframe with a index as datetime or strings.

crop_to_wy(df: DataFrame, wy_month: int = 7) → DataFrame

crop_to_wy(df: Series, wy_month: int = 7) → Series

Crop dataframe or series to complete water years only.

This function removes partial water years from the beginning and end of the dataframe or series, keeping only complete water years based on the specified water year start month.

Parameters:

df (DataFrame or Series) – Input dataframe or series with date index.
wy_month (int, optional) – Water year start month (1=January, 7=July, etc.). Defaults to 7.

Returns:

Cropped dataframe or series containing only complete water years. Returns the same type as the input.

Return type:

DataFrame or Series

datetimes_to_strings(v: Iterable[datetime | Timestamp], str_format: str = '%Y-%m-%d') → list[str]: Converts a list of datetimes to strings using the given format.

find_col(df: DataFrame, string_pattern: str, unique_match: bool = True) → Series | DataFrame

Find columns in dataframe that match a string pattern.

Parameters:

df (DataFrame) – The dataframe to search.
string_pattern (str) – The string pattern to match against column names.
unique_match (bool, optional) – If True, ensures exactly one column matches the pattern. If False, returns all matching columns. Defaults to True.

Returns:

If unique_match=True, returns a Series (single column). If unique_match=False, returns a DataFrame with all matching columns.

Return type:

Series or DataFrame

Raises:

ValueError – If unique_match=True and no columns or multiple columns match the pattern.

set_index_dt(df: DataFrame, dt_values: list | None = None, start_dt: datetime | None = None, **kwargs) → DataFrame

Returns a dataframe with datetime index. Useful for converting bulum dataframes to datetime as needed.

Warning

The returned dataframe will be inconsistent with bulum standards which uses string dates.

If no optional arguments are provided, the function will look for a column named “date” (not case-sensitive) within the input dataframe. Otherwise dt_values or start_dt (assumes daily) may be provided.

Parameters:

df (pd.DataFrame)
dt_values (_type_, optional)
start_dt (_type_, optional)
**kwargs – Passed to pandas.to_datetime() to convert df.index to datetime.

strings_to_datetimes(v: list[str], engine: Literal['pandas'], date_format: str, **kwargs) → DatetimeIndex

strings_to_datetimes(v: list[str], engine: Literal['numpy', 'np'], date_format: str, **kwargs) → ndarray[tuple[Any, ...], dtype[datetime64]]

Converts a list of strings to datetimes.

Pandas uses nanosecond precision timestamps and is not suitable for stochastic data. It is the default engine for backwards compatibility.

Parameters:: target (Literal[pandas, numpy, np]) – Specifies whether to output