bulum.utils.datetime_functions module

get_date_format(date_str: str) str

Determine the date format of a date string using trial and error.

This function tries several common date formats and returns the first one that successfully parses the input string.

Parameters:

date_str (str) – A date string to analyze for format detection.

Returns:

The date format string (e.g., '%Y-%m-%d', '%d/%m/%Y') that matches the input string.

Return type:

str

Raises:

ValueError – If none of the supported date formats can parse the input string.

Examples

>>> get_date_format("2023-12-25")
'%Y-%m-%d'
>>> get_date_format("25/12/2023")
'%d/%m/%Y'
get_dates(start_date: str, end_date: str | None = None, days: int = 0, years: int = 1, include_end_date: bool = False, str_format: str | None = None) list[str]
get_dates(start_date: datetime, end_date: datetime | None = None, days: int = 0, years: int = 1, include_end_date: bool = False, str_format: str | None = None) list[str] | list[datetime]
get_dates(start_date: datetime, end_date: datetime | None = None, days: int = 0, years: int = 1, include_end_date: bool = False, str_format: str | None = None) list[str] | list[datetime]

Generates a list of daily datetime values from a given start date.

The length may be defined by an end_date, or a number of days, or a number of years. This function is useful for working with daily datasets and models. Defaults to 1 year after start_date if end_date, days, and years are not specified.

Parameters:
  • start_date (Union[datetime, str]) – The starting date for the sequence.

  • end_date (Optional[Union[datetime, str]], default None) – The ending date for the sequence. If provided, takes precedence over days and years.

  • days (int, default 0) – Number of days to generate. If > 0, takes precedence over years parameter.

  • years (int, default 1) – Number of years to generate if neither end_date nor days are specified.

  • include_end_date (bool, default False) – Whether to include the end_date in the generated sequence.

  • str_format (Optional[str], default None) – If provided, returns string dates in this format instead of datetime objects.

Returns:

A list of datetime objects or formatted date strings covering the specified range.

Return type:

Union[list[str], list[datetime]]

Raises:

ValueError – If years <= 0 when using years parameter for date generation.

Examples

>>> get_dates(datetime(2023, 1, 1), days=3)
[datetime.datetime(2023, 1, 1, 0, 0), datetime.datetime(2023, 1, 2, 0, 0), datetime.datetime(2023, 1, 3, 0, 0)]
>>> get_dates('2023-01-01', '2023-01-03', str_format='%Y-%m-%d')
['2023-01-01', '2023-01-02']
get_month(dates: Iterable[str]) list[int]

Extract month numbers from a list of date strings.

Parameters:

dates (Iterable[str]) – Iterable of date strings in YYYY-MM-DD format. Assumes consecutive dates.

Returns:

List of month numbers (1-12) corresponding to the input dates.

Return type:

list[int]

Examples

>>> get_month(['2023-01-15', '2023-01-16'])
[1, 1]
>>> get_month(['2023-12-31'])
[12]
get_next_month_start(stringdate: str) str

Get the first day of the next month for a given date.

Parameters:

stringdate (str) – Date string in YYYY-MM-DD format.

Returns:

Date string in YYYY-MM-DD format representing the first day of the next month.

Return type:

str

Examples

>>> get_next_month_start("2023-02-15")
'2023-03-01'
>>> get_next_month_start("2023-12-15")
'2024-01-01'
get_prev_month_end(stringdate: str) str

Get the last day of the previous month for a given date.

Parameters:

stringdate (str) – Date string in YYYY-MM-DD format.

Returns:

Date string in YYYY-MM-DD format representing the last day of the previous month.

Return type:

str

Examples

>>> get_prev_month_end("2023-03-15")
'2023-02-28'
>>> get_prev_month_end("2024-03-15")  # Leap year
'2024-02-29'
get_this_month_end(stringdate: str) str

Get the last day of the current month for a given date.

Parameters:

stringdate (str) – Date string in YYYY-MM-DD format.

Returns:

Date string in YYYY-MM-DD format representing the last day of the current month.

Return type:

str

Examples

>>> get_this_month_end("2023-02-15")
'2023-02-28'
>>> get_this_month_end("2024-02-15")  # Leap year
'2024-02-29'
>>> get_this_month_end("2023-04-15")
'2023-04-30'
get_wy(dates: str, wy_month: int = 7, *, using_end_year: bool = False, as_list: bool = True) int
get_wy(dates: Index | list[str] | list[datetime64], wy_month: int = 7, *, using_end_year: bool = False, as_list: Literal[True]) list[int]
get_wy(dates: Index | list[str] | list[datetime64], wy_month: int = 7, *, using_end_year: bool = False, as_list: Literal[False]) ndarray[tuple[Any, ...], dtype[int64]]

Returns water years for a given array of dates.

Use this function to add water year information into a pandas DataFrame. Assumes consecutive dates for efficiency.

Parameters:
  • dates (str or pd.Index or list[str] or list[np.datetime64]) – Date or array of dates. Assumes consecutive dates.

  • wy_month (int, default 7) – Water year start month (1=January, 7=July, etc.).

  • using_end_year (bool, default False) –

    Water year labeling convention:

    • False : Aligns water years with the primary water allocation at the start of the water year.

    • True : Follows the fiscal year convention whereby water years are labeled based on their end dates. Using the fiscal convention, the 2022 water year is from 2021-07-01 to 2022-06-30 inclusive.

Returns:

The water years corresponding to the given dates.

Return type:

list[int]

Examples

Basic usage with default July start:

>>> get_wy(['2023-06-30', '2023-07-01'])
[2022, 2023]

Using fiscal year convention:

>>> get_wy(['2023-06-30', '2023-07-01'], using_end_year=True)
[2023, 2024]

Integration with pandas for aggregation:

>>> df.groupby(get_wy(df.index, wy_month=7)).sum().median()
get_wy_end_date(df: DataFrame, wy_month: int = 7) datetime

Returns an appropriate water year end date based on data frame dates and the water year start month.

Parameters:
  • df (pd.DataFrame) – Dataframe with date as index

  • wy_month (int, optional) – Water year start month. Defaults to 7.

Returns:

Water year end date.

Return type:

datetime

get_wy_start_date(df: Series | DataFrame, wy_month: int = 7) datetime

Returns an appropriate water year start date based on data frame dates and the water year start month.

Parameters:
  • df (pd.DataFrame) – Dataframe with date as index

  • wy_month (int, optional) – Water year start month. Defaults to 7.

  • Returns – datetime: Water year start date.

get_year_and_month(v: list[str] | list[datetime]) list[str]

Extract year and month strings from a list of dates.

Returns year and month strings in YYYY-MM format for aggregation by month.

Parameters:

v (Union[list[str], list[datetime]]) – List of date strings in YYYY-MM-DD format or datetime objects.

Returns:

List of year-month strings in YYYY-MM format.

Return type:

list[str]

Examples

>>> get_year_and_month(['2023-01-15', '2023-02-20'])
['2023-01', '2023-02']
>>> from datetime import datetime
>>> get_year_and_month([datetime(2023, 1, 15), datetime(2023, 2, 20)])
['2023-01', '2023-02']
standardise_datestring_format(*args, **kwargs)

Australian spelling version of standardize_datestring_format().

standardize_datestring_format(values: Index | list[str], *, as_index: Literal[True]) Index
standardize_datestring_format(values: Index | list[str], *, as_index: Literal[False] = False) list[str]

Convert date strings to YYYY-MM-DD format.

Automatically detects the input date format and converts all dates to ISO 8601 (YYYY-MM-DD). Uses numpy datetime64 for efficient processing. Tested over the range 0001-01-01 to 9999-12-31.

Parameters:
  • values (pd.Index or list[str]) – Date strings in any supported format.

  • as_index (bool, optional) – If True, return a pandas.Index instead of a list. Useful when assigning directly to df.index. Default is False.

Returns:

Date strings in YYYY-MM-DD format. Type depends on as_index.

Return type:

list[str] or pd.Index

Examples

>>> standardize_datestring_format(["25/12/2023", "26/12/2023"])
['2023-12-25', '2023-12-26']
>>> standardize_datestring_format(["25/12/2023"], as_index=True)
Index(['2023-12-25'], dtype='object')
to_np_datetimes64d(values: list[str], date_fmt: str = '%Y-%m-%d', *, mode: Literal['generate', 'parse'] = 'generate', check_dates: bool | Literal['warn', 'strict'] = 'warn') ndarray[tuple[Any, ...], dtype[datetime64]]

Convert a list of date strings to numpy datetime64[D] array.

This function converts date strings to numpy datetime64 arrays with day precision. Two modes are available: “generate” efficiently creates all dates in a range, while “parse” individually converts each date string preserving gaps.

Parameters:
  • values (list[str]) – List of date strings to convert. Can also accept pandas Series.

  • date_fmt (str, default '%Y-%m-%d') – The date format string for parsing the input dates.

  • mode ({"generate", "parse"}, default "generate") –

    Conversion mode:

    • "generate" : Generate all dates between first and last date (inclusive). Efficient for consecutive or near-consecutive dates. Uses numpy.arange.

    • "parse" : Parse each date string individually, preserving non-consecutive dates and gaps. Iterates over all values.

  • check_length (bool or {"warn", "strict"}, default "warn") –

    Controls validation for “generate” mode only (ignored in “parse” mode):

    • False : No validation (suppress all warnings/errors)

    • True or "warn" : Issue UserWarning if lengths don’t match

    • "strict" : Raise ValueError if lengths don’t match

Returns:

Numpy array of datetime64[D] values. - In “generate” mode: All dates from first to last (inclusive) - In “parse” mode: Exactly the dates provided (same length as input)

Return type:

np.typing.NDArray[np.datetime64]

Raises:

ValueError – If mode=”generate” and check_length=”strict”, raises error when generated dates don’t match input length (indicating non-consecutive dates or gaps).

Warns:

UserWarning – If mode=”generate” and check_length is True or “warn”, warns when generated dates don’t match input length (indicating non-consecutive dates or gaps).

Examples

Generate mode (default) - fills in gaps:

>>> dates = to_np_datetimes64d(['2023-01-01', '2023-01-02', '2023-01-03'])
>>> dates.dtype
dtype('<M8[D]')
>>> len(to_np_datetimes64d(['2023-01-01', '2023-01-03'], check_length=False))
3  # Generates all 3 dates: Jan 1, 2, 3

Parse mode - preserves gaps:

>>> dates = to_np_datetimes64d(['2023-01-01', '2023-01-03'], mode="parse")
>>> len(dates)
2  # Only Jan 1 and 3, no filling
>>> len(to_np_datetimes64d(['2023-01-01', '2023-01-03'], mode="generate", check_length="warn"))
3  # Issues warning but returns all dates
>>> to_np_datetimes64d(['2023-01-01', '2023-01-03'], mode="generate", check_length="strict")
Traceback (most recent call last):
    ...
ValueError: Date sequence validation failed...