bulum.utils.dataframe_functions module
- assert_df_has_one_column(df: DataFrame) None
Assert that a dataframe has exactly one column.
- Parameters:
df (
DataFrame) – The dataframe to check.- Raises:
ValueError – If the dataframe does not have exactly one column.
- check_data_equivalence(df1: DataFrame, df2: DataFrame, check_col_order: bool = True, threshold: float = 1e-06, details: dict | None = None) bool
Checks if two numeric dataframes are the same. It checks the names & order of the columns, the values of the index, and summary stats on all the data columns.
- Parameters:
df1 (DataFrame)
df2 (DataFrame)
check_col_order (bool, default True) – Specifies whether column order is important or not.
threshold (float, default 1e-6) – Numerical threshold for checking if stats are the same.
details (dict, optional) – This is a dictionary for returning detailed results to the user. Defaults to None. Results are returned by appending messages to the dictionary.
- check_df_format_standards(df: DataFrame) list[str]
Checks if a given dataframe meets standards generally required by bulum functions. These standards include: - Dataframe is not None - Dateframe is not empty - Dataframe index name is “Date” - Dataframe index values are daily sequential strings with the format “%Y-%m-%d” - Data columns all have datatype of double - Missing values are nan (not na, not -nan)
- convert_index_to_datetime(df: DataFrame, **kwargs) DataFrame
Converts the index to pandas datetime. Accepts a dataframe with a index as datetime or strings.
- Parameters:
df (DataFrame)
**kwargs – Passed to
strings_to_datetimes()
- Raises:
ValueError – Empty or null dataframe passed.
- convert_index_to_string(df: DataFrame, str_format: str = '%Y-%m-%d') DataFrame
Converts the index of df to strings. Accepts a dataframe with a index as datetime or strings.
- crop_to_wy(df: DataFrame, wy_month: int = 7) DataFrame
Crop dataframe to complete water years only.
This function removes partial water years from the beginning and end of the dataframe, keeping only complete water years based on the specified water year start month.
- datetimes_to_strings(v: Iterable[datetime | Timestamp], str_format: str = '%Y-%m-%d') list[str]
Converts a list of datetimes to strings using the given format.
- find_col(df: DataFrame, string_pattern: str, unique_match: bool = True) Series | DataFrame
Find columns in dataframe that match a string pattern.
- Parameters:
- Returns:
If unique_match=True, returns a Series (single column). If unique_match=False, returns a DataFrame with all matching columns.
- Return type:
- Raises:
ValueError – If unique_match=True and no columns or multiple columns match the pattern.
- set_index_dt(df: DataFrame, dt_values: list | None = None, start_dt: datetime | None = None, **kwargs) DataFrame
Returns a dataframe with datetime index. Useful for converting bulum dataframes to datetime as needed.
Warning
The returned dataframe will be inconsistent with bulum standards which uses string dates.
If no optional arguments are provided, the function will look for a column named “date” (not case-sensitive) within the input dataframe. Otherwise dt_values or start_dt (assumes daily) may be provided.
- Parameters:
df (pd.DataFrame)
dt_values (_type_, optional)
start_dt (_type_, optional)
**kwargs – Passed to
pandas.to_datetime()to convert df.index to datetime.
- strings_to_datetimes(v: list[str], engine: Literal['pandas'], date_format: str, **kwargs) DatetimeIndex
- strings_to_datetimes(v: list[str], engine: Literal['numpy', 'np'], date_format: str, **kwargs) ndarray[tuple[Any, ...], dtype[datetime64]]
Converts a list of strings to datetimes.
Pandas uses nanosecond precision timestamps and is not suitable for stochastic data. It is the default engine for backwards compatibility.
- Parameters:
target (Literal[pandas, numpy, np]) – Specifies whether to output