DB rp empirical #4

zackarno · 2024-12-05T19:16:05Z

add empirical RP simplification functions.

I've added functions to calculate empirical RPs to src/utils/return_periods.py
An example of the implementation is shown in exploration/04_add_return_periods_example.py

I've kept the loading functions and processing functions separate for flexibility in how the pipeline will be configured/parameterized.

t-downing

Nice! I believe the empirical RP calculation is correct, but may just want to include more NA handling.

Also, just a heads up that alot of this can also be done straight in pandas instead of numpy, which I've just found to be a bit more legible and easier to debug (e.g., don't need to worry about matching the ordering of the arrays and stuff since it's all in the same dataframe).

Just as an example, this is how I've done it to calculate the RPs by admin1, but could be done by any groupby by:

def calculate_rp(group, col_name):
    group[f"rank_{col_name}"] = group[col_name].rank(ascending=False)
    group[f"rp_{col_name}"] = (len(group) + 1) / group[f"rank_{col_name}"]
    return group

era5_adm1_jas_df = (
    era5_adm1_jas_df.groupby("ADM1_PCODE")
    .apply(calculate_rp, col_name="mean", include_groups=False)
    .reset_index()
    .drop(columns=["level_1"])
)

In this example it was for flooding, so high value = bad. But for drought (i.e. low value = bad), you can just set ascending=True in .rank() (which is the default behaviour anyways).

requirements.txt

src/utils/pg.py

src/utils/return_periods.py

zackarno · 2024-12-09T16:12:37Z

Nice! I believe the empirical RP calculation is correct, but may just want to include more NA handling.

Also, just a heads up that alot of this can also be done straight in pandas instead of numpy, which I've just found to be a bit more legible and easier to debug (e.g., don't need to worry about matching the ordering of the arrays and stuff since it's all in the same dataframe).

Just as an example, this is how I've done it to calculate the RPs by admin1, but could be done by any groupby by:
def calculate_rp(group, col_name):
    group[f"rank_{col_name}"] = group[col_name].rank(ascending=False)
    group[f"rp_{col_name}"] = (len(group) + 1) / group[f"rank_{col_name}"]
    return group

era5_adm1_jas_df = (
    era5_adm1_jas_df.groupby("ADM1_PCODE")
    .apply(calculate_rp, col_name="mean", include_groups=False)
    .reset_index()
    .drop(columns=["level_1"])
)
In this example it was for flooding, so high value = bad. But for drought (i.e. low value = bad), you can just set ascending=True in .rank() (which is the default behaviour anyways).

yeah this code makes sense. But I don't really get how the group arg is being applied in the function implementation. I'm not sure the details/trade-offs here in python, but in R i always think it's slick when you can vectorize a function so that all it takes is the x vector and allow the dplyr::group_by()/ pd.groupby to hand to handle the grouping aspect independently. In R there are a lot of advantages to doing this, but i'll DM you to chat further as I'm afraid i might start a circular conversation here due to my lack of python/pandas knowledge

zackarno · 2024-12-09T21:18:01Z

yeah i switched to your suggestion of using pandas instead of numpy - seems a bit more user-friendly for this use-case

t-downing

Nice 😎 Looks good to me!

I didn't go into as much detail checking the values, but confirmed the basics:

higher value means higher RP
RPs look plausible, and are inf for when RP > 10
NAs don't generate false RPs

t-downing · 2024-12-10T05:25:13Z

src/utils/return_periods.py

+    df_filt = df[
+        ~df[by].apply(tuple, axis=1).isin(df_nans.apply(tuple, axis=1))
+    ]
+    df_maxima_filt = df_maxima[
+        ~df_maxima[by]
+        .apply(tuple, axis=1)
+        .isin(df_maxima_nans.apply(tuple, axis=1))
+    ]


I found the source of the SettingWithCopyWarning 👀

To avoid those coming up, can just add .copy() to each of these filtering steps like:

df_filt = df[ ~df[by].apply(tuple, axis=1).isin(df_nans.apply(tuple, axis=1)) ].copy() df_maxima_filt = df_maxima[ ~df_maxima[by] .apply(tuple, axis=1) .isin(df_maxima_nans.apply(tuple, axis=1)) ].copy()

Anyways not necessary, up to you if you want to switch it.

yea i had gotten rid of a bunch of them by using .loc , but not sure i how i could do that for these operations. Anyways i added the .copy() for now. good call, thanks!

zackarno added 4 commits December 4, 2024 16:56

db connect/query/aggregate funcs

8c37a0b

add args to create view func

c293f72

add empirical rp funcs

0fee196

add doc strings and example notebook showing implementation

458a328

zackarno requested a review from t-downing December 5, 2024 19:16

zackarno mentioned this pull request Dec 5, 2024

Remaining Steps #5

Open

3 tasks

update requirements

f5bfa06

t-downing reviewed Dec 6, 2024

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

src/utils/pg.py Show resolved Hide resolved

src/utils/return_periods.py Outdated Show resolved Hide resolved

src/utils/return_periods.py Show resolved Hide resolved

zackarno added 2 commits December 9, 2024 13:39

add psycog2

6d93131

address PR comments

e800c83

zackarno mentioned this pull request Dec 9, 2024

non-linear interpolation #6

Open

t-downing approved these changes Dec 10, 2024

View reviewed changes

add .copy() to get rid of warning

8791ab5

zackarno merged commit b48964e into main Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DB rp empirical #4

DB rp empirical #4

zackarno commented Dec 5, 2024

t-downing left a comment

zackarno commented Dec 9, 2024

zackarno commented Dec 9, 2024

t-downing left a comment •

edited

Loading

t-downing Dec 10, 2024

zackarno Dec 10, 2024 •

edited

Loading

DB rp empirical #4

DB rp empirical #4

Conversation

zackarno commented Dec 5, 2024

t-downing left a comment

Choose a reason for hiding this comment

zackarno commented Dec 9, 2024

zackarno commented Dec 9, 2024

t-downing left a comment • edited Loading

Choose a reason for hiding this comment

t-downing Dec 10, 2024

Choose a reason for hiding this comment

zackarno Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

t-downing left a comment •

edited

Loading

zackarno Dec 10, 2024 •

edited

Loading