bulum.utils.datetime_functions module
- get_date_format(date_str: str) str
Determine the date format of a date string using trial and error.
This function tries several common date formats and returns the first one that successfully parses the input string.
- Parameters:
date_str (str) – A date string to analyze for format detection.
- Returns:
The date format string (e.g.,
'%Y-%m-%d','%d/%m/%Y') that matches the input string.- Return type:
- Raises:
ValueError – If none of the supported date formats can parse the input string.
Examples
>>> get_date_format("2023-12-25") '%Y-%m-%d' >>> get_date_format("25/12/2023") '%d/%m/%Y'
- get_dates(start_date: str, end_date: str | None = None, days: int = 0, years: int = 1, include_end_date: bool = False, str_format: str | None = None) list[str]
- get_dates(start_date: datetime, end_date: datetime | None = None, days: int = 0, years: int = 1, include_end_date: bool = False, str_format: str | None = None) list[str] | list[datetime]
- get_dates(start_date: datetime, end_date: datetime | None = None, days: int = 0, years: int = 1, include_end_date: bool = False, str_format: str | None = None) list[str] | list[datetime]
Generates a list of daily datetime values from a given start date.
The length may be defined by an end_date, or a number of days, or a number of years. This function is useful for working with daily datasets and models. Defaults to 1 year after start_date if end_date, days, and years are not specified.
- Parameters:
start_date (Union[datetime, str]) – The starting date for the sequence.
end_date (Optional[Union[datetime, str]], default None) – The ending date for the sequence. If provided, takes precedence over days and years.
days (int, default 0) – Number of days to generate. If > 0, takes precedence over years parameter.
years (int, default 1) – Number of years to generate if neither end_date nor days are specified.
include_end_date (bool, default False) – Whether to include the end_date in the generated sequence.
str_format (Optional[str], default None) – If provided, returns string dates in this format instead of datetime objects.
- Returns:
A list of datetime objects or formatted date strings covering the specified range.
- Return type:
- Raises:
ValueError – If years <= 0 when using years parameter for date generation.
Examples
>>> get_dates(datetime(2023, 1, 1), days=3) [datetime.datetime(2023, 1, 1, 0, 0), datetime.datetime(2023, 1, 2, 0, 0), datetime.datetime(2023, 1, 3, 0, 0)]
>>> get_dates('2023-01-01', '2023-01-03', str_format='%Y-%m-%d') ['2023-01-01', '2023-01-02']
- get_month(dates: Iterable[str]) list[int]
Extract month numbers from a list of date strings.
- Parameters:
dates (Iterable[str]) – Iterable of date strings in YYYY-MM-DD format. Assumes consecutive dates.
- Returns:
List of month numbers (1-12) corresponding to the input dates.
- Return type:
Examples
>>> get_month(['2023-01-15', '2023-01-16']) [1, 1] >>> get_month(['2023-12-31']) [12]
- get_next_month_start(stringdate: str) str
Get the first day of the next month for a given date.
- Parameters:
stringdate (str) – Date string in YYYY-MM-DD format.
- Returns:
Date string in YYYY-MM-DD format representing the first day of the next month.
- Return type:
Examples
>>> get_next_month_start("2023-02-15") '2023-03-01' >>> get_next_month_start("2023-12-15") '2024-01-01'
- get_prev_month_end(stringdate: str) str
Get the last day of the previous month for a given date.
- Parameters:
stringdate (str) – Date string in YYYY-MM-DD format.
- Returns:
Date string in YYYY-MM-DD format representing the last day of the previous month.
- Return type:
Examples
>>> get_prev_month_end("2023-03-15") '2023-02-28' >>> get_prev_month_end("2024-03-15") # Leap year '2024-02-29'
- get_this_month_end(stringdate: str) str
Get the last day of the current month for a given date.
- Parameters:
stringdate (str) – Date string in YYYY-MM-DD format.
- Returns:
Date string in YYYY-MM-DD format representing the last day of the current month.
- Return type:
Examples
>>> get_this_month_end("2023-02-15") '2023-02-28' >>> get_this_month_end("2024-02-15") # Leap year '2024-02-29' >>> get_this_month_end("2023-04-15") '2023-04-30'
- get_wy(dates: str, wy_month: int = 7, *, using_end_year: bool = False, as_list: bool = True) int
- get_wy(dates: Index | list[str] | list[datetime64], wy_month: int = 7, *, using_end_year: bool = False, as_list: Literal[True]) list[int]
- get_wy(dates: Index | list[str] | list[datetime64], wy_month: int = 7, *, using_end_year: bool = False, as_list: Literal[False]) ndarray[tuple[Any, ...], dtype[int64]]
Returns water years for a given array of dates.
Use this function to add water year information into a pandas DataFrame. Assumes consecutive dates for efficiency.
- Parameters:
dates (str or pd.Index or list[str] or list[np.datetime64]) – Date or array of dates. Assumes consecutive dates.
wy_month (int, default 7) – Water year start month (1=January, 7=July, etc.).
using_end_year (bool, default False) –
Water year labeling convention:
False: Aligns water years with the primary water allocation at the start of the water year.True: Follows the fiscal year convention whereby water years are labeled based on their end dates. Using the fiscal convention, the 2022 water year is from 2021-07-01 to 2022-06-30 inclusive.
- Returns:
The water years corresponding to the given dates.
- Return type:
Examples
Basic usage with default July start:
>>> get_wy(['2023-06-30', '2023-07-01']) [2022, 2023]
Using fiscal year convention:
>>> get_wy(['2023-06-30', '2023-07-01'], using_end_year=True) [2023, 2024]
Integration with pandas for aggregation:
>>> df.groupby(get_wy(df.index, wy_month=7)).sum().median()
- get_wy_end_date(df: DataFrame, wy_month: int = 7) datetime
Returns an appropriate water year end date based on data frame dates and the water year start month.
- Parameters:
df (pd.DataFrame) – Dataframe with date as index
wy_month (int, optional) – Water year start month. Defaults to 7.
- Returns:
Water year end date.
- Return type:
datetime
- get_wy_start_date(df: Series | DataFrame, wy_month: int = 7) datetime
Returns an appropriate water year start date based on data frame dates and the water year start month.
- Parameters:
df (pd.DataFrame) – Dataframe with date as index
wy_month (int, optional) – Water year start month. Defaults to 7.
Returns – datetime: Water year start date.
- get_year_and_month(v: list[str] | list[datetime]) list[str]
Extract year and month strings from a list of dates.
Returns year and month strings in YYYY-MM format for aggregation by month.
- Parameters:
v (Union[list[str], list[datetime]]) – List of date strings in YYYY-MM-DD format or datetime objects.
- Returns:
List of year-month strings in YYYY-MM format.
- Return type:
Examples
>>> get_year_and_month(['2023-01-15', '2023-02-20']) ['2023-01', '2023-02']
>>> from datetime import datetime >>> get_year_and_month([datetime(2023, 1, 15), datetime(2023, 2, 20)]) ['2023-01', '2023-02']
- standardise_datestring_format(*args, **kwargs)
Australian spelling version of
standardize_datestring_format().
- standardize_datestring_format(values: Index | list[str], *, as_index: Literal[True]) Index
- standardize_datestring_format(values: Index | list[str], *, as_index: Literal[False] = False) list[str]
Convert date strings to YYYY-MM-DD format.
Automatically detects the input date format and converts all dates to ISO 8601 (YYYY-MM-DD). Uses numpy datetime64 for efficient processing. Tested over the range 0001-01-01 to 9999-12-31.
- Parameters:
values (pd.Index or list[str]) – Date strings in any supported format.
as_index (bool, optional) – If True, return a
pandas.Indexinstead of a list. Useful when assigning directly todf.index. Default is False.
- Returns:
Date strings in YYYY-MM-DD format. Type depends on
as_index.- Return type:
Examples
>>> standardize_datestring_format(["25/12/2023", "26/12/2023"]) ['2023-12-25', '2023-12-26'] >>> standardize_datestring_format(["25/12/2023"], as_index=True) Index(['2023-12-25'], dtype='object')
- to_np_datetimes64d(values: list[str], date_fmt: str = '%Y-%m-%d', *, mode: Literal['generate', 'parse'] = 'generate', check_dates: bool | Literal['warn', 'strict'] = 'warn') ndarray[tuple[Any, ...], dtype[datetime64]]
Convert a list of date strings to numpy datetime64[D] array.
This function converts date strings to numpy datetime64 arrays with day precision. Two modes are available: “generate” efficiently creates all dates in a range, while “parse” individually converts each date string preserving gaps.
- Parameters:
values (list[str]) – List of date strings to convert. Can also accept pandas Series.
date_fmt (str, default '%Y-%m-%d') – The date format string for parsing the input dates.
mode ({"generate", "parse"}, default "generate") –
Conversion mode:
"generate": Generate all dates between first and last date (inclusive). Efficient for consecutive or near-consecutive dates. Uses numpy.arange."parse": Parse each date string individually, preserving non-consecutive dates and gaps. Iterates over all values.
check_length (bool or {"warn", "strict"}, default "warn") –
Controls validation for “generate” mode only (ignored in “parse” mode):
False: No validation (suppress all warnings/errors)Trueor"warn": Issue UserWarning if lengths don’t match"strict": Raise ValueError if lengths don’t match
- Returns:
Numpy array of datetime64[D] values. - In “generate” mode: All dates from first to last (inclusive) - In “parse” mode: Exactly the dates provided (same length as input)
- Return type:
np.typing.NDArray[np.datetime64]
- Raises:
ValueError – If mode=”generate” and check_length=”strict”, raises error when generated dates don’t match input length (indicating non-consecutive dates or gaps).
- Warns:
UserWarning – If mode=”generate” and check_length is True or “warn”, warns when generated dates don’t match input length (indicating non-consecutive dates or gaps).
Examples
Generate mode (default) - fills in gaps:
>>> dates = to_np_datetimes64d(['2023-01-01', '2023-01-02', '2023-01-03']) >>> dates.dtype dtype('<M8[D]')
>>> len(to_np_datetimes64d(['2023-01-01', '2023-01-03'], check_length=False)) 3 # Generates all 3 dates: Jan 1, 2, 3
Parse mode - preserves gaps:
>>> dates = to_np_datetimes64d(['2023-01-01', '2023-01-03'], mode="parse") >>> len(dates) 2 # Only Jan 1 and 3, no filling
>>> len(to_np_datetimes64d(['2023-01-01', '2023-01-03'], mode="generate", check_length="warn")) 3 # Issues warning but returns all dates
>>> to_np_datetimes64d(['2023-01-01', '2023-01-03'], mode="generate", check_length="strict") Traceback (most recent call last): ... ValueError: Date sequence validation failed...