bulum.utils.dataframe_functions module

assert_df_format_standards(df: DataFrame) None

c.f. check_df_format_standards()

assert_df_has_one_column(df: DataFrame) None

Assert that a dataframe has exactly one column.

Parameters:

df (DataFrame) – The dataframe to check.

Raises:

ValueError – If the dataframe does not have exactly one column.

check_data_equivalence(df1: DataFrame, df2: DataFrame, check_col_order: bool = True, threshold: float = 1e-06, details: dict | None = None) bool

Checks if two numeric dataframes are the same. It checks the names & order of the columns, the values of the index, and summary stats on all the data columns.

Parameters:
  • df1 (DataFrame)

  • df2 (DataFrame)

  • check_col_order (bool, default True) – Specifies whether column order is important or not.

  • threshold (float, default 1e-6) – Numerical threshold for checking if stats are the same.

  • details (dict, optional) – This is a dictionary for returning detailed results to the user. Defaults to None. Results are returned by appending messages to the dictionary.

check_df_format_standards(df: DataFrame) list[str]

Checks if a given dataframe meets standards generally required by bulum functions. These standards include: - Dataframe is not None - Dateframe is not empty - Dataframe index name is “Date” - Dataframe index values are daily sequential strings with the format “%Y-%m-%d” - Data columns all have datatype of double - Missing values are nan (not na, not -nan)

convert_index_to_datetime(df: DataFrame, **kwargs) DataFrame

Converts the index to pandas datetime. Accepts a dataframe with a index as datetime or strings.

Parameters:
Raises:

ValueError – Empty or null dataframe passed.

convert_index_to_string(df: DataFrame, str_format: str = '%Y-%m-%d') DataFrame

Converts the index of df to strings. Accepts a dataframe with a index as datetime or strings.

crop_to_wy(df: DataFrame, wy_month: int = 7) DataFrame

Crop dataframe to complete water years only.

This function removes partial water years from the beginning and end of the dataframe, keeping only complete water years based on the specified water year start month.

Parameters:
  • df (DataFrame) – Input dataframe with date index.

  • wy_month (int, optional) – Water year start month (1=January, 7=July, etc.). Defaults to 7.

Returns:

Cropped dataframe containing only complete water years.

Return type:

DataFrame

datetimes_to_strings(v: Iterable[datetime | Timestamp], str_format: str = '%Y-%m-%d') list[str]

Converts a list of datetimes to strings using the given format.

find_col(df: DataFrame, string_pattern: str, unique_match: bool = True) Series | DataFrame

Find columns in dataframe that match a string pattern.

Parameters:
  • df (DataFrame) – The dataframe to search.

  • string_pattern (str) – The string pattern to match against column names.

  • unique_match (bool, optional) – If True, ensures exactly one column matches the pattern. If False, returns all matching columns. Defaults to True.

Returns:

If unique_match=True, returns a Series (single column). If unique_match=False, returns a DataFrame with all matching columns.

Return type:

Series or DataFrame

Raises:

ValueError – If unique_match=True and no columns or multiple columns match the pattern.

set_index_dt(df: DataFrame, dt_values: list | None = None, start_dt: datetime | None = None, **kwargs) DataFrame

Returns a dataframe with datetime index. Useful for converting bulum dataframes to datetime as needed.

Warning

The returned dataframe will be inconsistent with bulum standards which uses string dates.

If no optional arguments are provided, the function will look for a column named “date” (not case-sensitive) within the input dataframe. Otherwise dt_values or start_dt (assumes daily) may be provided.

Parameters:
  • df (pd.DataFrame)

  • dt_values (_type_, optional)

  • start_dt (_type_, optional)

  • **kwargs – Passed to pandas.to_datetime() to convert df.index to datetime.

strings_to_datetimes(v: list[str], engine: Literal['pandas'], date_format: str, **kwargs) DatetimeIndex
strings_to_datetimes(v: list[str], engine: Literal['numpy', 'np'], date_format: str, **kwargs) ndarray[tuple[Any, ...], dtype[datetime64]]

Converts a list of strings to datetimes.

Pandas uses nanosecond precision timestamps and is not suitable for stochastic data. It is the default engine for backwards compatibility.

Parameters:

target (Literal[pandas, numpy, np]) – Specifies whether to output